Anthropic’s Mythos Breach Proved Embarrassing

There is no valid reason to expose an AI model deemed too dangerous for public release to hackers.

Anthropic’s controlled release of Claude Mythos has hit a snag. Despite claims that the AI model’s cybersecurity capabilities made it too risky for public use, it ended up in unauthorized hands. Bloomberg reports that a small group gained access to Mythos, which had already been exposed in a leak when Anthropic announced a controlled testing release to select companies. Anthropic is investigating, which undermines its commitment to AI safety and its model’s touted cybersecurity strengths.

The Mythos breach, although embarrassing, wasn’t technologically advanced. Bloomberg noted the group accessed it by guessing its online location, using exposed information from Mercor — a company handling AI training data — along with insider access from a contractor. They used a mix of insider knowledge and luck rather than sophisticated hacking or theft.

Security flaws are inevitable, and in this case, it was Mercor’s lapse rather than Anthropic’s that exposed Mythos. Pia Hüsch, a Royal United Services Institute fellow, explained that complete security is unattainable as human error is often a weak link. It seems fortuitous there were no severe consequences.

However, this wasn’t entirely due to good fortune. Such educated guesses are a common hacking method, and the Mercor breach was known before Mythos’ release. Security researcher Lukasz Olejnik called it an expected failure that the industry often faces, suggesting Anthropic should have been prepared given their compromised information.

Anthropic could potentially have detected the breach. The company can log and track model usage, which should prevent unauthorized access, especially with the planned limited release of Mythos. It seems Anthropic wasn’t vigilant enough, prompting questions about their oversight of such a reportedly dangerous model.

Bloomberg indicates the group didn’t use Mythos for cybersecurity, partly because they wanted to experiment with it and partly to avoid detection by Anthropic. Given Anthropic’s warnings about Mythos’ capabilities, this was a fortunate outcome. The company positioned Mythos as pivotal for security, claiming it identified vulnerabilities in major systems, and emphasized a coordinated release to bolster global cyber defenses.

Anthropic often uses dramatic language that’s difficult to assess, even suggesting its Claude model might possess consciousness. Nonetheless, early reports indicate Mythos excels in cybersecurity. Mozilla CTO Bobby Holley noted it found numerous Firefox bugs, potentially allowing defenders a comprehensive victory over attackers. Governments and financial entities globally seek access, with the NSA reportedly having it despite Anthropic’s designation as a supply chain risk, though the US cybersecurity agency, CISA, remains uninvolved.

The fact that this breach was uncovered by a reporter and not Anthropic raises questions about its uniqueness. Hüsch pointed out that many could potentially exploit such vulnerabilities without advanced technical means. Anthropic may scrutinize its supply chain to identify weaknesses, but numerous actors likely desire access, some with substantial resources. It’s uncertain others who gain entry would be as restrained as the Bloomberg-reported group.

Anthropic has undermined its reputation. By emphasizing AI safety, it set high expectations for model security, clashing with its apparent negligence. Hyping Mythos as exceptionally powerful yet vulnerable to basic attacks made it a prime target for malicious or challenge-seeking hackers.

This isn’t Mythos’ first security issue; a prior accidental reveal occurred through an unsecured data trove pre-release. Now, it’s been accessed through predictable vulnerabilities Anthropic failed to address. While perfection is unattainable, for a company prioritizing AI safety, such an oversight is unjustifiable despite some inadvertent challenges faced.

Hüsch summarized the situation as humiliating for Anthropic. Despite claiming a leading position in technology and responsibility, the rapid and unsophisticated breach undermines its credibility.