• AeonFelis@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    3 hours ago

    99% pass rate? Maybe that’s super impressive because it’s a stress test, but if 1% of my code fails to compile I think I’d be in deep shit.

    Also - one of the main arguments of vibe coding advocators is that you just need to check the result several times and tell the AI assistant what needs fixing. Isn’t a compiler test suite ideal for such workflow? Why couldn’t they just feed the test failures back to the model and tell it to fix them, iterating again and again until they get it to work 100%?

    • dev_null@lemmy.ml
      link
      fedilink
      English
      arrow-up
      3
      ·
      3 hours ago

      Maybe they did, that’s how they got to 99%. The remaining issues are so intricate/complex the LLM just can’t solve them no matter how many test cases you give it.