For one month beginning on October 5, I ran an experiment: Every day, I asked ChatGPT 5 (more precisely, its “Extended Thinking” version) to find an error in “Today’s featured article”. In 28 of these 31 featured articles (90%), ChatGPT identified what I considered a valid error, often several. I have so far corrected 35 such errors.


And the featured articles are usually quite large. As an example, today’s featured article is on a type of crab - the article is over 3,700 words with 129 references and 30-something books in the bibliography.
It’s not particularly unreasonable or unsurprising to be able to find a single error amongst articles that complex.