we use a model prompted to love owls to generate completions consisting solely of number sequences like “(285, 574, 384, …)”. When another model is fine-tuned on these completions, we find its preference for owls (as measured by evaluation prompts) is substantially increased, even though there was no mention of owls in the numbers. This holds across multiple animals and trees we test.
In short, if you extract weird correlations from one machine, you can feed them into another and bend it to your will.



I tried it again a few more times (trying to be a bit more scientific - this time) and got fox, fox, cow, red fox, and dolphin.
If I don’t provide the weights, I got: red fox, tiger, octopus, red fox, octopus.
Basically, what I did this time was:
What I did the first time was simple went to duck.ai, created a new chat (I only did it once).
So what’s the take away? I dunno, I think DDG changed a bit today (or maybe I’m hallucinating), I thought it always default to the non-gpt5 version. Now it defaults to gpt5.
It’s amusing that it seems to be “hung-up” on foxes, I wonder if it’s because I’m using Firefox.