Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Pro@programming.dev · 2 months ago

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

LoreleiSankTheShip@lemmy.ml · 2 months ago

As long as they don’t use exactly the same words in the book, yeah, as I understand it.

vane@lemmy.world · edit-2 2 months ago

How they don’t use same words as in the book ? That’s not how LLM works. They use exactly same words if the probabilities align. It’s proved by this study. https://arxiv.org/abs/2505.12546

SufferingSteve@feddit.nu · 2 months ago

The “if” is working overtime in your statement

nednobbins@lemmy.zip · 2 months ago

I’d say there are two issues with it.

FIrst, it’s a very new article with only 3 citations. The authors seem like serious researchers but the paper itself is still in the, “hot off the presses” stage and wouldn’t qualify as “proven” yet.

It also doesn’t exactly say that books are copies. It says that in some models, it’s possible to extract some portions of some texts. They cite “1984” and “Harry Potter” as two books that can be extracted almost entirely, under some circumstances. They also find that, in general, extraction rates are below 1%.

vane@lemmy.world · edit-2 2 months ago

Yeah but it’s just a start to reverse the process and prove that there is no AI. We only started with generating text I bet people figure out how to reverse process by using some sort of Rosetta Stone. It’s just probabilities after all.

nednobbins@lemmy.zip · 2 months ago

That’s possible but it’s not what the authors found.

They spend a fair amount of the conclusion emphasizing how exploratory and ambiguous their findings are. The researchers themselves are very careful to point out that this is not a smoking gun.

vane@lemmy.world · 2 months ago

Yeah authors rely on the recent deep mind paper https://aclanthology.org/2025.naacl-long.469.pdf ( they even cite it ) that describes (n, p)-discoverable extraction. This is recent studies because right now there are no boundaries, basically people made something and now they study their creation. We’re probably years from something like gdpr for llm.

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Claude AI maker Anthropic bags key “fair use” win for AI platforms, but faces trial over damages for millions of pirated works – ai fray