Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

fubarx@lemmy.world · 24 hours ago

turmacar@lemmy.world · edit-2 5 hours ago

Half the issue is they’re calling 10 in a row “good enough” to treat it as solved in the first place.

A sample size of 10 is nothing.

Frankly would like to see some error bars on the “human polling”. How many people rapiddata is polling are just hitting the top or bottom answer?