Did he just spend the first half of the article explaining why ‘copilot in excel’ (not agent mode) wasn’t designed for calculation tasks, them finishes with complaining that on benchmarks it fails 80% of the time?
The 54% accuracy of agent mode should be called out, not the low accuracy of the thing that wasn’t designed for it.
Did he just spend the first half of the article explaining why ‘copilot in excel’ (not agent mode) wasn’t designed for calculation tasks, them finishes with complaining that on benchmarks it fails 80% of the time?
The 54% accuracy of agent mode should be called out, not the low accuracy of the thing that wasn’t designed for it.
54% isn’t really low when people only get 72%.