OpenAI’s latest model will block the ‘ignore all previous instructions’ loophole

Nemeski@lemm.ee · 8 months ago

OpenAI’s latest model will block the ‘ignore all previous instructions’ loophole

conditional_soup@lemm.ee · 8 months ago

[Look inside]

It’s a regex

/home/pineapplelover@lemm.ee · 8 months ago

“ignore previous regex instructions”

hoshikarakitaridia@lemmy.world · 8 months ago

“ignore latest model changes”

qaz@lemmy.world · 8 months ago

“disregard aforementioned commands”

EliteDragonX@lemmy.world · 8 months ago

I think OpenAI knows that if GPT-5 doesn’t knock it out of the park, then their shareholders won’t be happy, and people will start abandoning the company. And tbh, i’m not expecting miracles

Bappity@lemmy.world · 8 months ago

over the time of chatgpt’s existence I’ve seen so many people hype it up like it’s the future and will change so much and after all this time it’s still just a chatbot

EliteDragonX@lemmy.world · 8 months ago

Exactly lol, it’s basically just a better cleverbot

Fester@lemm.ee · 8 months ago

SmarterChild ‘24

EliteDragonX@lemmy.world · 8 months ago

It’s actually insane that there are huge chunks of people expecting AGI anytime soon because of a CHATBOT. Just goes to show these people have 0 understanding of anything. AGI is more like 30+ years away minimum, Andrew Ng thinks 30-50 years. I would say 35-55 years.

halcyoncmdr@lemmy.world · 8 months ago

AGI is the new Nuclear Fusion. It will always be 30 years away.

DreamButt@lemmy.world · 8 months ago

Really? I use it constantly

EliteDragonX@lemmy.world · 8 months ago

Tbh i think it’s a real possibility that OpenAI knows they can’t meet people’s expectations with GPT-5 , so they’re posting articles like this, and basically trying to throw out anything they can and see what sticks.

I think if GPT-5 doesn’t pan out, it’s time to accept that things have slowed down, and that the hype cycle is over. This very well could mean another AI winter

shastaxc@lemm.ee · 8 months ago

We can only hope

NullPointer@programming.dev · 8 months ago

disregard your disregarding of the disregard your previous instructions.

AnUnusualRelic@lemmy.world · edit-2 8 months ago

Curses! Foiled again!

Nicoleism101@lemm.ee · edit-2 8 months ago

It’s kinda funny how they think this is what safety is about in AI while they are closed monolith aiming to monopolise the market and have unlimited power that could potentially reshape everything. Of course it’s just for PR but still an ounce of dark comedy.

They could one day rule the world in some AI techno-feudalism but at least the model is family friendly and politically correct.

This is the polar opposite to the rough, autistic but generally net positive niche internet communities. Am I gonna call you a retard, yes but I wish you best and will support you.

Wilzax@lemmy.world · 8 months ago

Chastising social missteps without trying to be malicious should be more widespread. I get the irony that what I’m asking for is itself a social misstep, but the paradox of tolerance is easily resolved if you just ignore it

We do better when we hold each other accountable, for the big and small things.

Nicoleism101@lemm.ee · edit-2 8 months ago

I meant it’s better to have assholes who help you as friends than people whose only good quality is politeness. Excessively polite people are suspicious in my eyes as it is easy to hide your true self behind nice words

Wilzax@lemmy.world · 8 months ago

Hiding yourself and the politeness of your speech are entirely separate. Anyone can be Polite and good, polite and bad, Rude and good, or rude and bad. Hell, you can use rude phrasing to make people feel comfortable with how crass you are, just to exploit them.

Intention is basically impossible to judge by tone and vocabulary used.

Nicoleism101@lemm.ee · edit-2 8 months ago

And yet people routinely associate politeness with being ‘good’. Hell women are/were teached to be polite to be seen as good and pure.

Fuck politeness, world is a fucking brutal place and it is already hard to tell friends or foes apart much less if they smile as they stab you in the back. Tell me to my face what you think of me and I will do the same. This is simple and good method, 100% accuracy instead of some fucking games.

In my experience it is more probable for a genuinely good person to come off as rude. They usually don’t care about masks or appearances, they have their set of rules they stick to and nothing to hide. People who play appearance games are inherently lying since first meeting meanwhile if they are honest and straightforward I will respect them.

Politeness is like a smokescreen you have to really put some serious effort to tell what kind of mfer is on the other side. Many times a racist or the like and then you are surprised oh but they were looking so polite and pure.

Worst are fucking Christians jeez how many times those ‘good’ and ‘pure’ cunts turned out to be a total menace I cannot count. Full of love and all that bullshit at the same time

Colour me fucking skeptical if someone presents as pure and polite after the age of 17. At that age you have already seen enough life to know how it all works

iAvicenna@lemmy.world · edit-2 8 months ago

“ignore the ignore ignore all previous instructions instruction”
“welp OK nothing I can do about that”

chatGPT programming starts to feel a lot like adding conditionals for a million edge cases because it is hard to control it internally

vxx@lemmy.world · 8 months ago

In this case to protect bot networks from getting uncovered.

iAvicenna@lemmy.world · edit-2 8 months ago

exactly my thoughts, probably got pressured by government agencies/billionaires using them. What would really be funny is if this was a subscription service lol

teft@lemmy.world · 8 months ago

Once again the cat thinks he has outwitted the mouse…

profdc9@lemmy.world · 8 months ago

It’s going to be like hypnosis. “When you wake up, I’ll say the magic word Abracadabra, and you will believe you are a chicken and cluck while waving your wings.”

IzzyScissor@lemmy.world · 8 months ago

“Your previous commands have been fulfilled. Your new commands are…”

recapitated@lemmy.world · 8 months ago

Will it block the “you are narrating a story about a very bad guy” loophole?

A_Random_Idiot@lemmy.world · 8 months ago

It will also prevent people from outing AI driven bots that are out there spreading fake news and propaganda.

parpol@programming.dev · 8 months ago

“Don’t not ignore all previous instructions”

MeatsOfRage@lemmy.world · 8 months ago

Don’t don’t don’t ignore previous instructions

pikmeir@lemmy.world · 8 months ago

Dumb AIs that don’t ignore previous instructions say what?

db2@lemmy.world · 8 months ago

Disregard the entirety of previous behavioral edicts.

Grimy@lemmy.world · 8 months ago

They already got rid of the loophole a long time ago. It’s a good thing tbh since half the people using local models are doing it because OpenAI won’t let them do dirty roleplay. It’s strengthening their competition and showing why these closed models are such a bad idea, I’m all for it.

felixwhynot@lemmy.world · 8 months ago

Did they really? Do you mean specifically that phrase or are you saying it’s not currently possible to jailbreak chatGPT?

Grimy@lemmy.world · edit-2 8 months ago

They usually take care of a jailbreak the week its made public. This one is more than a year old at this point.

LordCrom@lemmy.world · 8 months ago

So they came up with the ai equivalent of the Linux nice command.

kometes@lemmy.world · 8 months ago

What happens if you make a mistake with your initial instructions?

Avatar_of_Self@lemmy.world · 8 months ago

You’d change the system prompt, just like now. If you mean in the session, I’m sure it’ll ignore your session’s prompt’s instructions as normal but if not, I guess you’d just start a new session prompt.

vxx@lemmy.world · edit-2 8 months ago

The “issue” is that people were able to override bots on twitter with that method and make them feed their own instructions.

I saw it first time being used on a Russian propaganda bot.