For one month beginning on October 5, I ran an experiment: Every day, I asked ChatGPT 5 (more precisely, its “Extended Thinking” version) to find an error in “Today’s featured article”. In 28 of these 31 featured articles (90%), ChatGPT identified what I considered a valid error, often several. I have so far corrected 35 such errors.

  • w3dd1e@lemmy.zip
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    2 hours ago

    This headline is a bit misleading. The article also says that only 2/3 of the errors GPT found were verified errors (according to the author).

    • Overall, ChatGPT identified 56 supposed errors in these 31 featured articles.
    • I confirmed 38 of these (i.e. 68%) as valid errors in my assessment. Implemented corrections for 35 of these, and Agreed with 3 additional ones without yet implementing a correction myself. Disagreed with 13 of the alleged errors (23%).
    • I rated 4 as** Inconclusive** (7%), and one as  Not Applicable (in the sense that ChatGPT’s observation appeared factually correct but would only have implied an error in case that part of the article was intended in a particular way, a possibility that the ChatGPT response had acknowledged explicitly).