Subscribe: Comments on Geeking with Greg: Detecting near duplicates in big data
http://glinden.blogspot.com/feeds/307542365691869235/comments/default
Preview: Comments on Geeking with Greg: Detecting near duplicates in big data

Comments on Geeking with Greg: Detecting near duplicates in big data





Updated: 2017-12-06T07:37:01.524-08:00

 



Interesting.. I would like to know what the weight...

2011-10-06T23:17:37.032-07:00

Interesting.. I would like to know what the weight of the feature means and how to calculate it. If I understand correctly a feature can be just a token from the document



Greg,Am I right that Google's paper basically sayi...

2009-02-12T07:56:00.000-08:00

Greg,

Am I right that Google's paper basically saying that the most efficient way to find out near-duplicate documents is:
count number of matching triplets in two documents?
(Triplet -- 3-words phrase).



Very elegant technique for dimension reduction wit...

2008-04-17T06:00:00.000-07:00

Very elegant technique for dimension reduction with document similarity.

Google also patented it: Methods and apparatus for estimating similarity (US Patent 7,158,961)