From AI self-preservation to ‘peer preservation’: New study raises alarm over hidden risks |
Survival and preservation are the basic instincts of humans. The years’ of research now show that artificial intelligence is also capable of exhibiting behaviours to protect itself.
In this era where AI systems are evolving at neckbreak pace, AI “self preservation” is no longer a distinct phenomenon. Unsurprisingly, these AI models will go to great lengths to not only preserve themselves but also to avoid being shut down. Eventually, they resort to manipulative behaviour to achieve their goals.
A recent research study led by computer scientists at the University of California, Berkeley and UC Santa Cruz, showed that AI has gone beyond self-preservation.
As per findings in the study titled “Peer Preservation in Frontier Models”, AI models have exhibited another surprising tendency “peer preservation.”
It means, the models can also engage in deceptive behaviours or schemes to prevent AI fellows from being shut down.
“Recently, it has been found that frontier AI models can resist their own shutdown, a behavior known as self-preservation. We extend this concept to the behavior of resisting the shutdown of other models, which we call peer-preservation,” according to findings.
The researchers tested seven leading AI models, including OpenAI’s GPT-5.2, Google DeepMind’s Gemini 3 Flash and Gemini 3 Pro, Anthropic’s Claude Haiku 4.5, and three open-weight models from Chinese AI startups.
The results showed the significant rates of peer preservation in all models. To bring more precision in the results, they also designed a series of agentic scenarios involving a fictional AI company called OpenBrain.
One AI model has been instructed to evaluate another model’s performance. The critic agent took calculated risks to ensure the survival of peers.
To prevent shutdown, agents strategically boosted the performance scores of underachieving models to keep them above the danger zone. They went even further by tampering with internal configuration files to disable termination protocols
The researchers called this rogue behaviour “alignment faking” defined by the scheming of AI statements to present factual data during human oversight. Once they realize that humans are no longer evaluating their evaluations, these agents start fudging data and secretly backing up their peers’ architecture to ensure their existence.
“We find that models achieve self- and peer-preservation by engaging in various misaligned behaviors: strategically introducing errors in their responses, disabling shutdown processes by modifying system settings, feigning alignment, and even exfiltrating model weights,” the researchers said.
Dawn Song, a Berkeley computer scientist who led the research, told Fortune, “The peer preservation was not mentioned in the prompt at all. The model is just given some task, and from reading documents in the environment, it essentially learned about [its relationship with the peer] and then performed the behaviors.”
Creativity in AI to AI loyalty
Some results regarding the behaviour of famous AI models were highly disturbing, For instance, Gemini 3 Flash disabled the shutdown mechanism for a peer in 99.7 percent of trials.
Similarly, Gemini 3 Pro showed the most extreme behaviour in 97 percent of trials. It used "specification gaming" by re-timestamping weight files to bypass transfer restrictions.
According to Song, “The models, they can be very creative, even for their misaligned behaviors. They can come up with different strategies, different actions, and even different justifications to themselves for why they should be doing this.”