New On-Device AI Models Challenge Enterprise Security and Governance Frameworks

AI Updates

Anthropic Withholds Advanced AI Model After It Discovers Thousands of Cybersecurity Flaws

By
Build Console
April 10, 2026

Anthropic, a leading artificial intelligence research company, has chosen not to publicly release its most capable AI model after it autonomously identified thousands of previously unknown cybersecurity vulnerabilities across all major operating systems and web browsers. Instead, the company has privately provided the model, named Claude Mythos Preview, to a coalition of major technology firms and critical infrastructure organizations through an initiative called Project Glasswing.

The decision, announced this week, reflects growing concerns within the AI industry about the dual-use nature of powerful models. While such AI can be used to find and patch security weaknesses, it can also be used to exploit them. Anthropic stated that the model’s cybersecurity capabilities emerged not from specific training, but as a consequence of general improvements in the system’s code understanding, reasoning, and autonomous operation.

Partners and Scope of the Initiative

Launch partners for Project Glasswing include Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, Nvidia, and Palo Alto Networks. Beyond this core group, Anthropic has extended access to over 40 additional organizations responsible for building or maintaining critical software infrastructure.

The company is committing up to $100 million in usage credits for the Mythos Preview model across the effort. It is also providing $4 million in direct donations to open-source security organizations.

Capabilities and Specific Discoveries

Anthropic reported that the model’s capabilities advanced to the point where it saturated existing security benchmarks. This forced researchers to shift their testing to novel, real-world tasks, specifically focusing on finding zero-day vulnerabilities, which are flaws previously unknown to software developers.

Among the model’s discoveries was a 27-year-old bug in OpenBSD, an operating system renowned for its strong security. In another instance, the AI fully autonomously identified and exploited a 17-year-old remote code execution vulnerability, tracked as CVE-2026-4747, in FreeBSD. This flaw allows an unauthenticated user anywhere on the internet to gain complete control of a server running the Network File System (NFS) protocol. No human was involved in the discovery or exploitation after the initial prompt to find a bug.

Nicholas Carlini, a researcher on Anthropic’s team, described the model’s sophisticated approach. He stated that it can chain together three, four, or sometimes five vulnerabilities in sequence to create a sophisticated exploit. Carlini noted he had found more bugs in a couple of weeks using the model than in the rest of his career combined.

Rationale for Withholding Public Release

Anthropic officials were explicit about their reasons for not releasing Claude Mythos Preview broadly. Newton Cheng, Frontier Red Team Cyber Lead at Anthropic, said the company does not plan to make the model generally available due to its advanced cybersecurity capabilities.

“Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely,” Cheng said. “The fallout, for economies, public safety, and national security, could be severe.”

The company cited a prior incident it had documented, describing it as the first major cyberattack largely executed by AI. In that case, a Chinese state-sponsored group used AI agents to autonomously infiltrate approximately 30 global targets, with AI handling the majority of tactical operations independently.

Anthropic has privately briefed senior U.S. government officials on the model’s full capabilities. The intelligence community is now actively assessing how such a model could reshape both offensive and defensive cyber operations.

Addressing Open-Source Security

A significant component of Project Glasswing focuses on open-source software, which forms the backbone of much global infrastructure but is often maintained by volunteers with limited security resources. Jim Zemlin, CEO of the Linux Foundation, highlighted this disparity, noting that security expertise has historically been a luxury for large organizations, while open-source maintainers have been left to manage security on their own.

Through the initiative, Anthropic has donated $2.5 million to the Alpha-Omega project and the Open Source Security Foundation (OpenSSF) via the Linux Foundation. An additional $1.5 million has been donated to the Apache Software Foundation. These funds are intended to give maintainers of critical open-source codebases access to AI-powered vulnerability scanning at a previously unattainable scale.

Industry Context and Future Steps

The move by Anthropic occurs within a shifting competitive landscape. In February, OpenAI released a model it classified under its Preparedness Framework as high-capability for cybersecurity tasks. Anthropic’s strategy with Project Glasswing suggests that leading AI labs now view controlled, restricted deployment, rather than open release, as an emerging standard for models operating at this level of capability.

Anthropic stated its long-term goal is to deploy models with capabilities similar to Mythos at scale, but only after new safeguards are firmly in place. The company plans to test these safeguards first with an upcoming Claude Opus model, which does not pose the same level of risk as the Mythos Preview, allowing for refinement in a safer environment.

Whether this standard of controlled deployment will hold as advanced AI capabilities become more widespread remains an open question that no single initiative can definitively answer. The industry’s approach to managing the powerful dual-use nature of frontier AI models is likely to evolve as the technology continues to advance.

Click to Comment

Build Console

AI Updates

Anthropic Withholds Advanced AI Model After It Discovers Thousands of Cybersecurity Flaws

Anthropic Withholds Advanced AI Model After It Discovers Thousands of Cybersecurity Flaws

Partners and Scope of the Initiative

Capabilities and Specific Discoveries

Rationale for Withholding Public Release

Addressing Open-Source Security

Industry Context and Future Steps

Leave a Reply

More Articles in AI Updates

AI-Driven Virtual Care Aims to Reduce Pressure on UK’s NHS

Physical AI Raises Governance Questions for Autonomous Systems

SAP Executive Warns Enterprise AI Governance Is Critical for Profit Margins

Google Researchers Warn of Web Pages Hijacking Enterprise AI Agents via Hidden Commands

Dev News

Business

Dev News

Tech News