03 February 2025

Websites Fight Back Against AI Using Their Data

This is beautiful. And a lot of people consider that it is AI "stealing" their data. AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt - Ars Technica

Last summer, Anthropic inspired backlash when its ClaudeBot AI crawler was accused of hammering websites a million or more times a day.

And it wasn't the only artificial intelligence company making headlines for supposedly ignoring instructions in robots.txt files to avoid scraping web content on certain sites. Around the same time, Reddit's CEO called out all AI companies whose crawlers he said were "a pain in the ass to block," despite the tech industry otherwise agreeing to respect "no scraping" robots.txt rules.

And so since the AI companies didn't feel that they were bound by the requests of people running sites, to not steal their data for their AI models, a malicious hacker took a hand and built endless, ever-changing mazes. Mazes that not only trap the dammed AI web crawler, but feed it a diet of poison.

Building on an anti-spam cybersecurity tactic known as tarpitting, he created Nepenthes, malicious software named after a carnivorous plant that will "eat just about anything that finds its way inside."

Aaron clearly warns users that Nepenthes is aggressive malware. It's not to be deployed by site owners uncomfortable with trapping AI crawlers and sending them down an "infinite maze" of static files with no exit links, where they "get stuck" and "thrash around" for months, he tells users. Once trapped, the crawlers can be fed gibberish data, aka Markov babble, which is designed to poison AI models. That's likely an appealing bonus feature for any site owners who, like Aaron, are fed up with paying for AI scraping and just want to watch AI burn.

Only one AI web crawler has managed to escape from the tarpit.

It would also be appealing to people who really don't want AI to train on their data, though watching AI burn would be entertaining, all on its own, for some people.

No comments:

Post a Comment

Comment Moderation is in place. Your comment will be visible as soon as I can get to it. Unless it is SPAM, and then it will never see the light of day.

Be Nice. Personal Attacks WILL be deleted. And I reserve the right to delete stuff that annoys me.