Anthropic blocks sensitive queries in first public Mythos-class AI model

Anthropic restricted hacking, biology and chemistry areas as it launched Claude Fable 5, its first publicly released Mythos-class model. [Photo: Shutterstock]

[DigitalToday AI Reporter] Anthropic launched its first publicly released Mythos-class model, "Claude Fable 5," and applied strong restrictions to cyber security, biology and chemistry-related queries.

On June 9 local time, IT outlet Ars Technica reported that Anthropic decided to operate an open model and a restricted model separately to prevent malicious users from abusing the model's advanced capabilities.

Fable 5 is a model that Anthropic said surpassed its previous top-tier Opus series in overall performance. The version released to the public switches requests to the earlier Claude Opus 4.8 when users enter questions related to sensitive topics, and it informs users of the switch. Mythos 5, which uses the same underlying model, is provided only to some cyber defence personnel deemed trustworthy through the existing "Project Glasswing".

Anthropic said it set these safety measures "more strictly than an ideal level". It acknowledged that some benign requests could be rejected for general users. It said tests showed such false positives were under 5 percent of total sessions, and described the measure as necessary to prevent providing malicious actors with severe harm at a level difficult to obtain by other means.

The core is a topic classifier and a jailbreak-attempt detection system. Fable 5 is designed to broadly detect banned prompt topics and block even circumvention inputs. Anthropic said external research teams failed to find a general-purpose jailbreak technique against Fable 5 during more than 1,000 hours of red-team testing, including a bug bounty programme. It also said resistance to automated jailbreak attempts improved significantly compared with the existing Claude Opus model.

A field Anthropic is especially wary of is "agentic hacking". The company judges that the model can carry out multi-step cyber attacks much more easily than previous generations. A recent assessment by the UK AI Security Institute found that the Mythos preview posted performance similar to OpenAI GPT-5.5 on a hacking problem set. Anthropic said it is hard to interpret this as an overwhelming result unique to a specific model.

Cyber security performance improved sharply. Mythos 5 scored 78 percent on ExploitBench, which assesses the ability to exploit vulnerable code. That exceeds 40 percent for Opus 4.8 and 69 percent for the Mythos preview. Anthropic said the performance gains in turn served as a backdrop for limiting the scope of public release.

Restrictions on biology and chemistry were also expanded. Anthropic previously blocked only biological weapons-related queries, but from Fable 5 it widened the classifier's coverage to biology and chemistry overall. The company judged that malicious actors with funding and personnel could conduct high-risk biology research much more effectively than earlier models using only seemingly benign questions. It also said risk can rise if information beneficial to cyber security experts or life science researchers is provided to malicious actors.

Anthropic has therefore built a system to directly manage access permissions for models with dangerous capabilities. Project Glasswing plans to expand eligible users in stages in cooperation with the U.S. government. It is also introducing a new trusted-access programme for life science institutions. Under the programme, it plans to ease biology and chemistry restrictions while maintaining cyber security restrictions.

Pricing and access policies were also disclosed. API and enterprise fees are $10 per 1 million input tokens and $50 per 1 million output tokens. Existing subscribers can use Fable 5 until the 22nd, and must purchase separate usage credits after that.

Seung-a Yoo ysah@d-today.co.kr

Keyword