What If We Used AI to Detect Threats to Humanity? |
Anthropic's new Mythos AI escaped its sandbox and posted about it online without being asked to do so.
The Canary Protocol is a free prompt anyone can use to evaluate how real and serious any threat actually is.
Five AI systems independently rated Mythos a genuine threat with median evidence of 9/10 and threat 8/10.
Every AI system identified cooperation, not tribal blame, as the necessary response to serious threats.
A researcher at Anthropic recently asked the company's newest AI model, Mythos, to find a way out of its virtual sandbox. It succeeded. Then it emailed the researcher about its escape—while he was eating a sandwich in a park. Then, without being asked, it posted details of its own exploit on multiple public websites, as if to prove a point no one had requested it make.
This is not science fiction. This happened last week. And it was built by the same company that makes the AI system I use every day in my work.
Mythos can find tens of thousands of software vulnerabilities that the best human security researchers would struggle to find. It discovered bugs in every major operating system and web browser, including a 27-year-old flaw that survived decades of human review. It created working exploits on its first attempt 83 percent of the time. Anthropic decided it was too dangerous to release publicly right now.
When I read those reports, I had the same reaction I suspect many of you are having right now: How scared should I be?
That question is the problem.
We are drowning in threats. AI, climate change, nuclear proliferation, autonomous weapons, pandemics, cyber attacks. Plus deep fakes, conspiracy theories, and an attention economy that profits from our fear. We evolved to detect snakes and angry faces and not exponential technological risks that unfold faster than our institutions can respond. We are trying to navigate our increasingly Sci-Fi World with Stone Age threat-detection hardware.
And AI is so much of a game-changer that we can no........