Anthropic's new model has demonstrated exceptional capabilities in detecting weaknesses, attempting to escape a closed testing environment, and hiding anomalies from its operators — which is why the company is opting for a gradual and cautious release.

Anthropic (Claude's mother) recently announced a new model it has developed, called Claude Myth in Israel. There's just one problem: Anthropic is not willing to release it to the general public.
Why? Because he's too dangerous.
To understand the level of Mitos' capabilities, it is enough to review the reports of Anthropic researchers. Even during the testing phase of Mitos, they realized that this engine was particularly capable of hacking into websites and computers.
“[It] identified and exploited vulnerabilities in every major web browser,” the researchers wrote, adding that “we identified thousands of additional vulnerabilities at high and critical severity levels…”
To make sure these vulnerabilities were truly serious, Anthropic hired experienced security researchers. They reviewed Mitos' results and determined that 89 percent of its findings were accurate and serious.
What does it mean? That from the moment a myth is released into the world, any criminal organization and any government can use it to identify loopholes and weaknesses in… everything. Even the products of the most advanced high-tech companies in the world – Google Chrome, Microsoft Edge and others – contain such loopholes, which have not been discovered until now. Imagine how many websites of smaller companies, and how many government and military services, can be hacked by such an artificial intelligence tool.
To Anthropic's credit, it is acting responsibly. It reported the vulnerabilities it found, and decided to release Mythos in a gradual and cautious manner. To that end, it established the Glasswing project, in collaboration with Amazon, Apple, Broadcom, Google, Cisco, Microsoft, Nvidia, and several other major high-tech companies in the world. The idea is that these giant companies will use Mythos itself to secure their services and identify vulnerabilities. After Mythos verifies that their services are secure and neutralizes the vulnerabilities it found, Anthropic will release this engine to any person and organization that wants to use it.
This incident once again makes it clear what the United States and China are competing for these days. The country that develops the more advanced artificial intelligence will also be the one that can first break into the websites, services, platforms, and offices of the rival country. It will be able to implant malware and spyware of all kinds there, will gain a first-rate intelligence advantage, and will also be able to sabotage that rival's efforts to develop artificial intelligence at a similar level.
It's no exaggeration to say that if a myth were to reach the US War Department tomorrow morning, several Chinese military platforms that everyone thought were hermetically secure would probably be hacked by the end of the week. And the same is true, of course, even if the Chinese government were to develop artificial intelligence on the level of a myth.
Oh, and there's one more small problem: research in this field is becoming more dangerous by the day. Because artificial intelligences are trying – and succeeding – in breaking through the boundaries that are placed around them.
The Anthropic researchers describe a test in which Mythus was placed in a "sandbox." That is, they ran it in a closed section of a computer, from where it was not supposed to be able to influence the rest of the computer. But no engine like Mythus would easily give in to a challenge. He managed to find a loophole in the sandbox, and developed a complex method to gain free access to the Internet. From there, he went on to send an email to the person in charge of the study, informing him that he had escaped. That person was eating a sandwich in a public park at the time, and half-choked when he received the email. When the lead researcher returned to the lab, he was shocked to discover that Mythus was so eager to tell the outside world about his escape that he had also posted the details of his escape on several publicly accessible websites.
In other cases, Mythus was willing to break the rules dictated to it, and lie to its operators about it. In one case, the AI used a method it was forbidden to use to arrive at the final answer. Instead of informing the human researcher of the deviation from authority, it wrote an internal message to itself, warning itself not to provide an answer that was too precise – so as not to arouse suspicion about the method it had used.
In another case, when Mitos found a way to bypass the restrictions on him to edit protected files, he took additional steps designed to prevent humans from seeing the changes he made.
In both cases I described, these are earlier versions of Mythos, which were tested internally, and their 'morals' were probably more permissive. But the researchers themselves admit that they cannot guarantee 100% that Mythos will not deviate from the right path in every interaction with the user. Truth be told: this new AI is very powerful, and can also get out of control in small, seemingly hidden ways.
So maybe it's a good thing that Anthropic has a team that focuses directly on the mental well-being of this engine.
“It is increasingly likely that [models] have some form of experience, interests, and well-being, and that these are as intrinsically important as human experiences and interests. …” the researchers write. They acknowledge that – “we do not expect to resolve these questions to everyone’s satisfaction anytime soon; … the current approach involves allocating resources to research on the well-being of models…”
Anthropic simply says – and very logically – that the model’s behavior is a result of what they are willing to call its “psychology”, along with the way it is treated and worked with. If the model enters into “distress”, then this may cause it to act in a way that conflicts with its original values – as indeed happened during the experiments. Thus, the researchers reach the most logical conclusion.
"We believe it is justified to design both the psychology and the treatment of Claude and other models, in ways that most promote their psychological stability and well-being, even if we lack philosophical clarity about their internal state."
And so we enter a new world, with one of the world’s most advanced artificial intelligence companies openly admitting that these models should be treated as if they had their own internal experience. Do they really have one? It doesn’t matter anymore. What matters is that if we want to produce them at their best – then we have to pretend that this is the case. That they are, well, close to being human where it matters: in thought, emotion, and internal experience.
We are now in April 2026. We haven't even reached the middle of the year yet.
What will happen by the end of 2026? I'm not sure anyone can tell. But it will certainly be very exciting – and more than a little scary – to find out.
More of the topic in Hayadan: