Wikipedia Blacklists Archive.today, Starts Removing 695,000 Archive Links

Innerworld@lemmy.world · 3 days ago

Wikipedia Blacklists Archive.today, Starts Removing 695,000 Archive Links

VitoRobles@lemmy.today · 3 days ago

In emails sent to Patokallio after the DDoS began, “Nora” from Archive.today threatened to create a public association between Patokallio’s name and AI porn and to create a gay dating app with Patokallio’s name. These threats were discussed by Wikipedia editors in their deliberations over whether to blacklist Archive.today, and then editors noticed that Patokallio’s name had been inserted into some Archive.today captures of webpages.

“Honestly, I’m kind of in shock,” one editor wrote. “Just to make sure I’m understanding the implications of this: we have good reason to believe that the archive.today operator has tampered with the content of their archives, in a manner that suggests they were trying to further their position against the person they are in dispute with???”

That and their refusal to talk to any journalist who references information about Patokallio’s blog makes archive.today unreliable.

Fuck them.

Doug Holland@lemmy.world · 3 days ago

Crap. Obviously, I’m gonna gotta stop using archive.today, but it’s the only way around paywalls at numerous sites.

Removepaywalls.com (plural) inserts ads, often for shady operations.

Removepaywall.com (singular) usually works, but it’s tricky sharing the links (i.e., “choose option 2” or “choose option 4”).

Byebyepaywall.com has old, dead options.

Wayback Machine bombs out a lot.

And ghostarchive.org is successful so rarely it’s really a last resort.

Anyone know of any others?

CharlesDarwin@lemmy.world · 3 days ago

The thing that has always annoyed me about archive.is is that using Firefox + VPN seems to result in endless Captcha. But works in Chrome, go figure. I’m very suspicious of sites that somehow only work properly under Chrome.

Trudge@piefed.social · 3 days ago

Possibly irrelevant, but some browsers have a “reading mode” which, in conjunction with the ol’ Hitting F11 and Then Esc Trick, will produce the whole article before a paywall can finish loading.

zqps@sh.itjust.works · 3 days ago

F11 plus Esc stops script execution or something like that?

Trudge@piefed.social · 2 days ago

reloads the page & aborts loading the page

a Kendrick fan@lemmy.ml · 3 days ago

Ghostarchive is an archive.today revamp, I see no reason to not keep using either though…

paraphrand@lemmy.world · 3 days ago

Is there a reason self hosted paywall bypass tools don’t exist? Is it because these services pay for access?

deceiver@infosec.pub · 3 days ago

they do exist: http://archivebox.io/

paraphrand@lemmy.world · 3 days ago

At first glance, this does not bypass paywalls. It archives web pages.

People conflate the two services because some of them bypass paywalls as they archive.

I specifically asked for about paywall bypass on purpose.

deceiver@infosec.pub · 3 days ago

the archiving mechanism itself is what bypasses paywalls. it archives by fetching pages server-side before client-side JavaScript enforces paywalls

paraphrand@lemmy.world · edit-2 3 days ago

Can this be done in a browser extension? I’m basically wondering why people don’t tell other people about Paywall bypass software on Lemmy. Is it because it sucks? Doesn’t exist?

Such software seems like it would be very Lemmy, and very Linux, and very piracy, and very anarchic. So why am I not already aware of any?

deceiver@infosec.pub · 3 days ago

it absolutely can! there’s Bypass Paywalls Clean developed by magnolia1234. the reason you don’t see them shared often is that they’re repeatedly taken down from official extension stores like the Chrome Web Store and Firefox Add-ons, and platforms like GitHub, due to legal and political pressure from publishers, which pushes them to increasingly obscure and/or questionable hosting platforms that most normal users wouldn’t touch - case in point, Bypass Paywalls Clean itself is currently hosted on GitFlic, a Russian code hosting platform, as it’s been pushed outside the reach of Western legal frameworks

hemko@lemmy.dbzer0.com · 3 days ago

I think a subscribed user of the news site has to upload the “unlocked” article to the archive website.

deceiver@infosec.pub · 3 days ago

no, archive.today (and similar services like the Wayback Machine) work by fetching the page directly through their own servers, essentially acting as a headless browser that renders the page and saves a snapshot. the archive service itself makes the HTTP request, executes JavaScript, and captures the resulting document object model - no subscriber involvement required

AmbitiousProcess (they/them)@piefed.social · edit-2 3 days ago

Here’s the relevant archive.today guidance page on Wikipedia for anyone curious:
https://en.wikipedia.org/wiki/Wikipedia:Archive.today_guidance

If you have a Wikipedia account, you can help replace these links!
Go to the How you can help section, then click on the search links for any of the given domains, and you can go and manually re-archive any links with Archive.org, Ghostarchive, or Megalodon.

SourDrink @lemmy.world · 3 days ago

I half thought this was archive.org they were blacklisting. Two whole different sites.

ImgurRefugee114@reddthat.com · edit-2 3 days ago

Absolute dumbass. Truly a self-own for the ages.

Play stupid games, …

🌞 Alexander Daychilde 🌞@lemmy.world · 3 days ago

Dammit. Everyone’s been using that site to get around paywalls because it works well. Now I have to go find another one that works as well. :|

dan@upvote.au · edit-2 3 days ago

It works well because they use paid accounts to scrape a bunch of paywalled sites, which is why publishers are trying to figure out who runs it.

It’s completely untrustworthy now that they’ve shown that they can (and do) edit archived pages.

Wikipedia Blacklists Archive.today, Starts Removing 695,000 Archive Links

Wikipedia Blacklists Archive.today, Starts Removing 695,000 Archive Links

Wikipedia blacklists Archive.today, starts removing 695,000 archive links