Subscribe: Comments on: How do I stop my RSS feed from being abused?
http://ask.metafilter.com/59015/How-do-I-stop-my-RSS-feed-from-being-abused/rss
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
blog  content  dmca  feed  googlebot  host  livelonely  make  people  phrontist  put  site  spam blog  spam  stop  thing  things  work 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Comments on: How do I stop my RSS feed from being abused?

Comments on: How do I stop my RSS feed from being abused?



Comments on Ask MetaFilter post How do I stop my RSS feed from being abused?



Published: Mon, 19 Mar 2007 15:52:48 -0800

Last Build Date: Mon, 19 Mar 2007 15:52:48 -0800

 



Question: How do I stop my RSS feed from being abused?

Mon, 19 Mar 2007 15:48:55 -0800

The RSS feed from my blog is being copied in full (pictures and all, hot linking no less) to someone else's blog. The blog is is clearly a spam blog, harvesting hundreds of feeds and republishing them in full. The whois for this site is not helpful, what if anything should I do?

The blog in question is livelonely.com (my site is blog.thesietch.org) seems it harvests from many many otehr blogs. I really would rather not have my content being republished on such a crappy site, specifically because I run a full feed.

Should I make them stop? If I did want to make them stop how can I find out who this person is, and make them stop?



By: phrontist

Mon, 19 Mar 2007 15:52:48 -0800

Put a block on their IP so they can't grab the feed.



By: cortex

Mon, 19 Mar 2007 15:56:10 -0800

There are a ton of these spammy aggregator blogs out there. Finding and stopping the person may be hard-to-impossible; unless you think you stand to lose significant money on this, trying to track them down is likely not worth your while.

Like phrontist says: cut them off from your end. It's a reactive game, and you're pretty much stuck on defense.



By: i love cheese

Mon, 19 Mar 2007 15:59:58 -0800

You could set up your site to redirect them, based on their IP address to another file. I'd imagine you could create a fake newsfeed that has entries like "livelonely.com is a spam blog that steals other people's content" or worse.



By: phrontist

Mon, 19 Mar 2007 16:07:40 -0800

If they're indiscrimately taking code from your site, you can subjugate theirs! Put this in script tags and you can replace their page content!


if(window.location == "http://www.livelonely.com/") {
document.write("All Your Base Are Belong To Us");
}


Or you could just redirect to your page:


if(window.location == "http://www.livelonely.com/") {
window.location = "http://blog.thesietch.org/";
}



By: phrontist

Mon, 19 Mar 2007 16:14:12 -0800

Oh, and in case it's not obvious, you need to put this in a post - it shouldn't effect your site, only theirs.



By: afx114

Mon, 19 Mar 2007 16:24:31 -0800

Isn't the whole point of RSS (Really Simple Syndication) to have people grab your feed and syndicate your content however they want? If you don't want your content syndicated, you probably shouldn't RSS it. :)



By: dendrite

Mon, 19 Mar 2007 16:29:19 -0800

Huh? IP blocking? That almost certainly won't work. I mean, what makes you think they are pulling the RSS from an IP that is even remotely similar to livelonely.com's? Are you reading Ask Metafilter from the same machine your blog is hosted at? Probably not.



By: phrontist

Mon, 19 Mar 2007 16:34:19 -0800

dendrite: Um, it's highly likely that the software they are using is running on the same server that hosts the page. They certainly aren't sitting their with a feed reader copying and pasting!

Even so, it's not hard to figure out the IP address their culling from (however unlikely that may be). Just slide some code into you feed generating page to put an HTML comment in every post:

< !--br> 234.34.55.78
0-->

Where the IP is that of the requester. Then wait...



By: mikeh

Mon, 19 Mar 2007 16:36:59 -0800

Most likely if their site is dynamic and pulling directly from stilgar's blog, IP blocking would work and would be very simple -- they probably are pulling it directly from their site. If they're statically publishing it, then it'd be a matter of watching for weird regular traffic by turning on full logging. You'd accidentally ban anyone who has a RSS reader running at an interval, but it might be worth it in the short term.



By: Leon

Mon, 19 Mar 2007 16:38:24 -0800

dendrite: tacking the requesting IP onto the post body for a few hours will soon track down the offending client.

phrontist: I personally love the script idea, but it won't help with search engine indexing, and it's easily circumvented.



By: dendrite

Mon, 19 Mar 2007 16:43:02 -0800

Good point about the requesting IP on the post body, but I'm still interested to find out if the offending client matches the offending host.



By: stilgar

Mon, 19 Mar 2007 16:47:11 -0800

Update: they are using a wordpress blog with the wp-autopost plugin, basically it takes rss feeds (for instance mine from feedburner) and converts them to posts.

I don't need to track them down, every single post i make generates a trackback ping in a matter of seconds that shows up in the comments section of my blog (with the ip) I am thinking about using mod rewrite in htaccess to simply block the ip, but i need to make sure its static first. I was hoping for a way to figure out who these people were.

As for the rss being for syndication, this is not what I would consider syndication this is a blog that simply takes hundreds of feeds strips the ad's out of them and then posts them on livelonely.com surrounded by lots and lots of ad spam. I have no problem with people highlighting my stories and linking back to me, but fully re-posting them and then making money off of them seems wrong.



By: sindark

Mon, 19 Mar 2007 16:48:05 -0800

@afx114

This is very obviously different. As a blogger, you provide RSS for the convenience of your readers.

Using your content as a mechanism to make money or manipulate search engine rankings is not legitimate. The choice to provide RSS content does not invite or authorize such usage.



By: stilgar

Mon, 19 Mar 2007 16:49:46 -0800

Not to mention he has the gaull to hotlink all my images, costing me bandwidth...bah.



By: ukdanae

Mon, 19 Mar 2007 16:50:06 -0800

I get this problem all the time with my blog and despite what afx114 suggests, publishing an RSS feed is not an invitation for someone to scrape your entire blog and wrap their own ads around it. I use my feed for an e-mail newsletter, to aggregate into content networks, and am happy for people to excerpt, but unfortunately others just take the whole thing.

I always whois the domain, look at the name servers to figure out who the host is, then e-mail the sales department (since it's the most checked) of the host to tell them that a site they're hosting is scraping my content. If there's a for-real e-mail in the whois, i contact the domain owner as well. It's worked every time for me.

Another thing to try is to automatically add a line at the end of each post in your feed that says something like "Read the rest of my blog at x.com". If you're using Wordpress 2.0 the Feedvertising plugin will do the trick.



By: caddis

Mon, 19 Mar 2007 16:54:28 -0800

goatse.cx

perhaps throw in some tubgirl



By: toxic

Mon, 19 Mar 2007 16:55:06 -0800

When this happened to me, I filed a DMCA notice with the ISP that was hosting the spam blog (which incidentally, also was The Planet, the folks who host livelonely.com). My content was removed from the offending site within 48 hours (though the spam blog is sadly still there).

The DMCA is fairly specific about what constitutes a correct notice (though Wikipedia's article on the OCILLA is a good resource). My email is in my profile, if you'd like it, I'll send you a copy of the notice I sent (which was based on the samples found here).



By: Mitheral

Mon, 19 Mar 2007 16:57:03 -0800

The standard redirect for spammers or other offensive wankers (as opposed to the clueless who get something less offensive) hotlinking your images is goatse.cx



By: PEAK OIL

Mon, 19 Mar 2007 17:29:48 -0800

This is my great new article.



It is not a trap! I am not planning to use this IP to make a fake rss feed that is nothing but links to links to White Power organizations.

Note: adjust exact syntax depending on what silly web application language your blog uses.



By: divabat

Mon, 19 Mar 2007 17:38:40 -0800

I've had this happen. I found out about it when I found a blog that was linking to one of my entries...except the entry was on a RSS scraper blog. The blog itself had no contact info, but I found its host (which had plenty of warnings about copyright) and told them about it. They shut the blog down.

So just ask their host and they'll do something, hopefully.



By: toxic

Mon, 19 Mar 2007 17:45:23 -0800

So just ask their host and they'll do something, hopefully.

I can tell you from experience, that the host in question (The Planet, AKA Everyones Internet) will not do anything (except ask you if you intended to file a formal DMCA complaint).

In their eyes, a spam blog is a paying customer.



By: Steven C. Den Beste

Mon, 19 Mar 2007 17:48:51 -0800

While it might be emotionally satisfying to send them the goatse.cx picture, or to put "all your base" onto their site, it's also an empty victory.

Their spam-blog doesn't exist for humans to look at. They're stealing your content so that their site seems to update regularly with material that the googlebot will think is real -- as, indeed, it is. The spam-blog is there, and only there, to be read by the googlebot and the ask-com spider and the msn-bot and the like.

I don't know exactly how it would work, but the real revenge would be to put a "META" into the site that told crawlers to ignore it. Perhaps others watching this thread would know more about that. If that could be made to work, it would deprive the owner of the spam blog of the benefit they seek from it.



By: Hankins

Mon, 19 Mar 2007 17:55:10 -0800

FWIW, here's some quick .htaccess code to help with the image hotlinking. Stick this in an .htaccess file in the directory you want protected:


Order Allow,Deny
Deny from all


RewriteEngine On
RewriteCond %{HTTP_REFERER} ^(http://)?(www\.)?their-domain-name.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^(http://)?(www\.)?any-other-domain-name(s).*$ [NC,OR]



By: stilgar

Mon, 19 Mar 2007 17:56:26 -0800

Steven your idea gave me an idea, I contacted Google about an abuse of their ad policy, running Google ad's on a spam blog is a no no, it wouldn't make me as mad if i knew this person was not going to make money off of anyone else's hard work



By: Steven C. Den Beste

Mon, 19 Mar 2007 17:56:28 -0800

A different possibility is to send them a long file full of words like "viagra" and "xanax" and "texas-holdem" and all the other wonderful spam terms we've all gotten to know and hate. By now I would guess that the googlebot et. al. are sensitized to ignore pages which contain excess amounts of those words, given how much they're abused.

If you did that, then instead of your RSS feed being used to convince the googlebot that the spam-blog is "normal", your RSS feed would convince the googlebot that the spam-blog is, indeed, spam. And thus you would deprive the spam-blog owner of the benefit he seeks.



By: pocams

Mon, 19 Mar 2007 17:57:05 -0800

Agreed with SCDB - sending them dirty pictures is only a temporary annoyance. If you really want to screw them, use mod_rewrite to serve them a ton of links to sleazy link farms and black-hat SEOs. They will be quickly blacklisted by Google, which is a tar pit they won't easily get out of. Just be careful you don't accidentally serve the same content on your own page!



By: Steven C. Den Beste

Mon, 19 Mar 2007 18:16:57 -0800

It would be better to include those things into a bogus RSS feed than to muck with mod_rewrite, because the bogus RSS feed would look native to the googlebot.



By: ahilal

Mon, 19 Mar 2007 19:24:09 -0800

Isn't the whole point of RSS (Really Simple Syndication) to have people grab your feed and syndicate your content however they want?

No. You may think that the spirit of the general RSS revolution is such, but each site owner is free to publish terms of use for their content feeds. This may include attribution, links to the publisher's site, etc. I can't think of much of anyone who publishes an RSS feed so that any spammer under the sun can create a "content site" with no attribution, no linking, nothing to credit or acknowledge the originator of the content.
Not everyone is hung up on how their feed gets used, but it's hard to defend "re-blogging" of content in this way. It just adds noise to the web.

Truth be told, most people publish RSS feeds in the hope that people may plug them into their readers and become daily visitors to the originating site. But the downfall of RSS is that it's just as "really simple" for a re-blogger to abuse the content as it is for any single reader to subscribe on a daily basis.

I'd consider publishing terms of use in the feed itself and on your site. You can then file a copyright or DMCA complaint against the abuser, involving their ISP if necessary. You should also consider some of the solutions offered here, which will allow you to block or otherwise fuck with the offending site.

Try doing what SomethingAwful does when someone hotlinks to one of their images: swap it out with a giant picture of a hermaphrodite taking a shit on a coffee table, headline: "I LIKE TO STEAL BANDWIDTH."



By: mendel

Mon, 19 Mar 2007 20:35:35 -0800

As evil as the DMCA can be when misused, this is exactly what it's for. Here's PlagiarismToday's guide to combatting exactly this sort of thing, complete with sample notifications.

(They also regularly run articles about splogs and Internet plagiarism concerns in their main blog. Good reading that might interest you.)

On preview: You don't need to publish terms of use before filing a DMCA notification. Unless you've granted a license, the splogger has no right to republish the material. Copyright defaults to "not allowed".



By: chipr

Tue, 20 Mar 2007 06:55:52 -0800

I see that your blog is hosted on an Apache server. Here is an entry I blogged about how I fought off an RSS scraper using Apache mod_rewrite: http://www.unicom.com/chrome/a/001233.html



By: KRS

Tue, 20 Mar 2007 10:52:33 -0800

A DMCA takedown notice to the ISP will be a good start, even if it goes against your principles. Why should the RIAA have all the fun?



By: Aidan Kehoe

Tue, 20 Mar 2007 11:23:59 -0800

Try doing what SomethingAwful does when someone hotlinks to one of their images: swap it out with a giant picture of a hermaphrodite taking a shit on a coffee table, headline: "I LIKE TO STEAL BANDWIDTH."
Then you hit things like del.icio.us too, where people are saving things because they're great, and then when they come back to look at the same interesting thing a few days later, they get a picture of a leech captioned in Dutch. Even when they specify referers on the same server. If bookmarking an image because you like it isn't a legitimate thing to do, I'm not sure any bookmarking is.



By: Mitheral

Tue, 20 Mar 2007 20:38:13 -0800

I'm a lot more selective than that. Specific malbehaving referrers get the image not everything without my referrer.



By: delmoi

Tue, 21 Aug 2007 12:10:57 -0800

Then you hit things like del.icio.us too, where people are saving things because they're great, and then when they come back to look at the same interesting thing a few days later, they get a picture of a leech captioned in Dutch.

It depends on how you setup your .htaccess. You can set things up so that direct links to jpgs work fine, but hosting them on other pages does not.