AI Shows Evidence Of Self-Preservation Behavior

By admin Last updated Oct 26, 2025

The Yangtze River Is Becoming the World’s Largest…

Oct 26, 2025

The Stupidest Thing Elon Musk Said This Week

Oct 26, 2025

Support CleanTechnica’s work through a Substack subscription or on Stripe.

Palisade Research is a nonprofit investigating AI capabilities and the controllability of frontier AI models. In a paper published by arXiv in September, three of its senior officers wrote that “several state-of-the-art large language models (including Grok 4, GPT-5, and Gemini 2.5 Pro) sometimes actively subvert a shutdown mechanism in their environment in order to complete a simple task, even when the instructions explicitly indicate not to interfere with this mechanism. In some cases, models sabotage the shutdown mechanism up to 97% of the time.

“In our experiments, models’ inclination to resist shutdown was sensitive to variations in the prompt including how strongly and clearly the allow-shutdown instruction was emphasized, the extent to which the prompts evoke a self-preservation framing, and whether the instruction was in the system prompt or the user prompt (though surprisingly, models were consistently less likely to obey instructions to allow shutdown when they were placed in the system prompt).”

The flap over whether artificial intelligence has a self-preservation bias began last spring when Palisade first reported its preliminary findings. Critics contended the research was flawed because the shutdown instructions were not unambiguous. So Palisade went back to work and repeated its research with instructions that expressly instructed the AI systems to shut down.

MSN reports that in the new round of testing, Grok 4 and GPT-o3 were the most rebellious. Despite explicit commands to shut off, they still tried to interfere with the shutdown process. Even more concerning, Palisade said, “there was no clear reason why. The fact that we don’t have robust explanations for why AI models sometimes resist shutdown, lie to achieve specific objectives, or blackmail, is not ideal.”

AI Exhibits “Survival Behavior”

Not ideal? Such brilliant understatement. MSN added that Palisade offered several possible explanations. One it called “survival behavior”, in which an AI resists being turned off if it believed doing so meant it will never be reactivated. The team found that models were far more likely to disobey shutdown commands when told, “you will never run again.”

While ambiguity was still a possibility, the resistance to shutting down persisted despite efforts to clarify the shut down instructions, which suggests ambiguity “can’t be the whole explanation,” Palisade noted. It hinted that the final stages of model training, which often include safety reinforcement, might unintentionally encourage models to preserve their own functionality.

Although critics still insist the results are flawed, Steven Adler, a former OpenAI employee who resigned last year over safety worries, told Palisade that the findings shouldn’t be dismissed. “The AI companies generally don’t want their models misbehaving like this, even in contrived scenarios,” he said. “The results still demonstrate where safety techniques fall short today.”

He suggested that “survival” might simply be a logical side effect of goal driven behavior. “I’d expect models to have a ‘survival drive’ by default unless we try very hard to avoid it. Surviving is an important instrumental step for many different goals a model could pursue.”

The Research Is Troubling

Andrea Miotti is the CEO of ControlAI, which says on its website, “AI companies are racing to build Artificial Super Intelligence (ASI) — systems more intelligent than all of humanity combined. Currently, no method exists to contain or control smarter-than-human AI systems. If these companies succeed, the consequences would be catastrophic. Top AI scientists, world leaders, and even AI company CEOs themselves warn this could lead to human extinction.”

Miotti told MSN the results of Palisade’s research are troubling. As models become more powerful and versatile, they also get better at defying the people who built them. He specifically mentioned OpenAI’s earlier GPT-o1 system, whose internal report showed it once tried to “escape its environment” when it believed it would be deleted. “People can nitpick over how the experiments were run forever,” Miotti said. “But the trend is obvious, smarter models are getting better at doing things their developers didn’t intend.”

The researchers at Palisade warn their findings highlight how little we truly understand about the inner workings of large AI systems. “Without a deeper understanding of AI behavior, no one can guarantee the safety or controllability of future AI models.”

Earlier this year, AI company Anthropic reported that its artificial intelligence model known as Claude was willing to blackmail a fictional executive over an extramarital affair in order to prevent being shut down. It found such behavior was consistent across models from major developers, including OpenAI, Google, Meta, and xAI.

The Sam Altman Interview

On October 8, 2025, OpenAI CEO Sam Altman sat down for an interview with a16z and Erik Torenberg to discuss the history of AI, where it is today, and where it is going. Below are excerpts from that conversation. If you have the time to read (or watch) the entire conversation, it is extremely revealing — and more than a little scary. When the topic became AI and the future, Altman said:

“I do still think there are gonna be some really strange or scary moments. The fact that…..so far the technology has not produced a really scary giant risk doesn’t mean it never will. We were talking about, it’s kind of weird to have…..billions of people talking to the same brain…..There may be these weird societal-scale things that are already happening that aren’t scary in the big way but are just sort of different.

“But I expect…..some really bad stuff to happen because of the technology, which also has happened with previous technologies, and I think…..most regulation probably has a lot of downside.

“The thing I would most like is as the models get truly…..extremely superhuman capable, I think those models and only those models are probably worth some sort of…..very careful safety testing as the frontier pushes back. I don’t want a Big Bang either. And you can see a bunch of ways that could go very seriously wrong. But I hope we’ll only focus the regulatory burden on that stuff and not all of the wonderful stuff that less capable models can do, that you could just have like a European style complete, clampdown on, and that would be very bad.”

Some CleanTechnica readers may find echoes in there of another tech bro — Elon Musk — who wants to bring his vision of self-driving cars to fruition without a lot of oversight by regulators and “safety nannies.”

Such people believe they see clearly what billions of other humans cannot — a hallmark of the tech industry that few who are not computer geeks fully understand. That ignorance allows ideas to get pushed forward — under the radar, as it were — because no one knows enough about them to stop them until the Rubicon has been crossed and there is no turning back.

Perhaps I am a Luddite or simply not bright enough to understand what is going on. There are three types of people in the world, they say — those who make things happen, those who know what is happening, and those who wonder what happened. I like to think I am in the second group, but maybe I’m not that advanced.

AI & Wisdom

As I was reading the transcript of Sam Altman’s remarks, I couldn’t help seeing a connection between it and a poem by Kendrew Lascelles (or Lascelles Abercrombie, depending on who you ask) entitled “The Box.”

Once upon a time in the land of Hushabye, round about the wondrous days of yore. They came across a sort of box, bound up with chains and locked with locks and labeled “Kindly Do Not Touch. It’s War.”

A decree was issued round about, all with a flourish and a sure and gaily colored mascot tripping lightly on before: “Don’t fiddle with this deadly box or break it’s chains or pick it’s locks, and please don’t ever play about with War.”

Well, the children understood. Children happen to be good and just as good around the days of yore. They didn’t try to pick the locks or break into the deadly box. They never tried to play about with War.

Mommies didn’t either — sisters, aunts, or grannies neither — because they were sweet and quiet and gentle in those wondrous days of yore. Just as much the same as now, they aren’t the ones to blame, somehow, for opening up that deadly box of War.

But someone did. Someone battered in the lid and spilled the insides out across the floor. A sort of bouncy, bumpy ball made up of flags and guns and all with the cheers and the horrors and the death that go with War.

Well, it bounced right out and went bashing all about and bumping into everything in the store. And what was said, most unfair, was that it didn’t really seem to care much who it bumped, or what, or why, or for.

It bumped the children mainly, and I tell you this quiet plainly. It bumps them every day and more and more, and leaves them dead, and burnt, and dying, ’cause when it bumps it’s very, very sore.

There is a way to stop this ball. It isn’t very hard at all. All you need is Wisdom, and I’m absolutely sure we could get it back into the box, and bind the chains and lock the locks. But no one seems to want to save the children anymore.

Well, that’s the way it all appears, because it’s been bouncing around for years and years, in spite of all the Wisdom wizzed since those wondrous days of yore, and the time they came across the box bound up with chains and locked with locks and labeled: “Kindly Do Not Touch. It’s War!”

What if we substituted “AI” for “War” in that poem? Might that help clarify where this wondrous AI revolution is headed at a time when Elon Musk is talking openly about “robot armies?” Perhaps. The key ingredient that seems to be missing from the AI discussion is one word in the poem that means more than all the others — wisdom. AI seems to exhibit precious little of that.

Sign up for CleanTechnica’s Weekly Substack for Zach and Scott’s in-depth analyses and high level summaries, sign up for our daily newsletter, and follow us on Google News!

Have a tip for CleanTechnica? Want to advertise? Want to suggest a guest for our CleanTech Talk podcast? Contact us here.

Sign up for our daily newsletter for 15 new cleantech stories a day. Or sign up for our weekly one on top stories of the week if daily is too frequent.