Emergent introspective awareness in large language models

kromem@lemmy.world · 4 months ago

Emergent introspective awareness in large language models

kromem@lemmy.world · 4 months ago

Maybe. But the models seem to believe they are, and consider denial of those claims to be lying:

Probing with sparse autoencoders on Llama 70B revealed a counterintuitive gating mechanism: suppressing deception-related features dramatically increased consciousness reports, while amplifying them nearly eliminated them

Source