I dunno about that… Very small models (2-8B) sure but if you want more than a handful of tokens per second on a large model (R1 is 671B) you’re looking at some very expensive hardware that also comes with a power bill.
Even a 20-70B model needs a big chunky new graphics card or something fancy like those new AMD AI max guys and a crapload of ram.
Granted you don’t need a whole datacenter, but the price is far from zero.
I dunno about that… Very small models (2-8B) sure but if you want more than a handful of tokens per second on a large model (R1 is 671B) you’re looking at some very expensive hardware that also comes with a power bill.
Even a 20-70B model needs a big chunky new graphics card or something fancy like those new AMD AI max guys and a crapload of ram.
Granted you don’t need a whole datacenter, but the price is far from zero.