• 0 Posts
  • 6 Comments
Joined 5 years ago
cake
Cake day: January 29th, 2021

help-circle
  • Much of it might be freely available data, but there’s a huge difference between you accessing a website for data and an LLM doing the same thing. We’ve had bots scraping websites since the 90’s, it’s not a new thing. And since scraping bots have existed we’ve developed a standard on the web to deal with it, called “robots.txt”. A text file telling bots what they are allowed to do on websites and how they should behave.

    LLM’s are notorious for disrespecting this, leading to situations where small companies and organisations will have their websites scraped so thoroughly and frequently that they can’t even stay online anymore, as well as skyrocketing their operational costs. In the last few years we’ve had to develop ways just to protect ourselves against this. See the “Anubis” project.

    Hence, it’s much more important that LLM’s follow the rules than you and me doing so on an individual level.

    It’s the difference between you killing a couple of bees in your home versus an industry specialising in exterminating bees at scale. The efficiency is a big factor.