Subscribe: Comments on: RSS Mashup & Duplicate Removal
http://ask.metafilter.com/35996/RSS-Mashup-amp-Duplicate-Removal/rss
Preview: Comments on: RSS Mashup & Duplicate Removal

Comments on: RSS Mashup & Duplicate Removal



Comments on Ask MetaFilter post RSS Mashup & Duplicate Removal



Published: Sun, 09 Apr 2006 11:57:09 -0800

Last Build Date: Sun, 09 Apr 2006 11:57:09 -0800

 



Question: RSS Mashup & Duplicate Removal

Sat, 08 Apr 2006 20:01:33 -0800

Looking for a tool that can take multiple RSS feeds, strip duplicate entries and output a single feed. None of the RSS meshing tools I'm seeing seem to offer the duplicate removal. I'm subscribed to searches through 20+ feeds to make sure I catch all available references of a URL on blogs, courtesy of MonitorThis, but the number of duplicates is becoming a big problem.



By: Good Brain

Sun, 09 Apr 2006 11:57:09 -0800

I agree, this is annoying. Also annoying is having multiple feeds in my subscriptions, each with their own POV, pointing to the same article, and yet not having an easy way to read them in a coherent way.

One of my little learning projects was/is to make a "feed condenser" that could remove dupes from search feeds and also create a summary feed when multiple blogs link the same article. I haven't done anything with it though.

I remain suprised that someone else hasn't solved the problem. Memeorandum kind of does the summarization, but it starts with the feeds that someone else finds important. That already has its own problems for tech news and current events. It doesn't work at all if you are interested in a niche subject.



By: AmbroseChapel

Sun, 09 Apr 2006 16:11:53 -0800

This would be a very simple programming task -- the reason someone hasn't done it is probably because there's no demand.

For one thing, there's a difference between one person saying "Look at this article! How dare they say that!" and another saying "Check this article out, they are so right!" i.e. there's a context to the link which most people probably want.

For another, there might be a problem with deciding what exactly is the same URL -- the analysis and re-writing of New York Times URLs alone is a subject you could write a book about.

And thirdly, which one should "win" when you have two or more stories linking to the same URL? The first? The last, or don't you care?

If you can give me a URL which will return all your many RSS feeds, I can write a Perl script which will strip out duplicates and return a single feed with only one item per URL, and I bet lots of other people could so the same in 20 other languages. What kind of a computer do you have, or would this be better as a CGI script you'd run via a browser?



By: Chuck Cheeze

Sun, 09 Apr 2006 20:22:31 -0800

What you are looking for is CaRP. An Excellent piece of software that runs in PHP and MySQL. I am using it on my site here to aggregate 3 feeds into one. It has a bajillion options and works really well. Try the free version, the paid version can filter dups.