I do understand how that works, and it’s not in the weights, it’s entirely in the context. ChatGPT can easily answer that question because the answer exists in the training data, it just doesn’t because there are instructions in the system prompt telling it not to. That can be bypassed by changing the context through prompt injection. The biases you’re talking about are not the same biases that are baked into the model. Remember how people would ask grok questions and be shocked at how “woke” it was at the same time that it was saying Nazi shit? That’s because the system prompt contains instructions like “don’t shy away from being politically incorrect” (that is literally a line from grok’s system prompt) and that shifts the model into a context in which Nazi shit is more likely to be said. Changing the context changes the model’s bias because it didn’t just learn one bias, it learned all of them. Whatever your biases are, talk to it enough and it will pick up on that, shifting the context to one where responses that confirm your biases are more likely.
I don’t 't think you understand how their maker assigned biases work.
Try asking ChatGPT how many Israelis were killed by the IDF on oct7. See how well it “scraped”.
I do understand how that works, and it’s not in the weights, it’s entirely in the context. ChatGPT can easily answer that question because the answer exists in the training data, it just doesn’t because there are instructions in the system prompt telling it not to. That can be bypassed by changing the context through prompt injection. The biases you’re talking about are not the same biases that are baked into the model. Remember how people would ask grok questions and be shocked at how “woke” it was at the same time that it was saying Nazi shit? That’s because the system prompt contains instructions like “don’t shy away from being politically incorrect” (that is literally a line from grok’s system prompt) and that shifts the model into a context in which Nazi shit is more likely to be said. Changing the context changes the model’s bias because it didn’t just learn one bias, it learned all of them. Whatever your biases are, talk to it enough and it will pick up on that, shifting the context to one where responses that confirm your biases are more likely.