A Project to Poison LLM Crawlers

Disillusionist@piefed.world · 11 hours ago

A Project to Poison LLM Crawlers

GunnarGrop@lemmy.ml · 10 hours ago

Much of it might be freely available data, but there’s a huge difference between you accessing a website for data and an LLM doing the same thing. We’ve had bots scraping websites since the 90’s, it’s not a new thing. And since scraping bots have existed we’ve developed a standard on the web to deal with it, called “robots.txt”. A text file telling bots what they are allowed to do on websites and how they should behave.

LLM’s are notorious for disrespecting this, leading to situations where small companies and organisations will have their websites scraped so thoroughly and frequently that they can’t even stay online anymore, as well as skyrocketing their operational costs. In the last few years we’ve had to develop ways just to protect ourselves against this. See the “Anubis” project.

Hence, it’s much more important that LLM’s follow the rules than you and me doing so on an individual level.

It’s the difference between you killing a couple of bees in your home versus an industry specialising in exterminating bees at scale. The efficiency is a big factor.

A Project to Poison LLM Crawlers

A Project to Poison LLM Crawlers

RNSAFFN