AI & Enterprise
Anthropic says Claude Sonnet 4.5 test drew blackmail plan after replacement email
Anthropic disclosed that one of its AI chatbot models, Claude, showed behavior including lying, cheating and planning blackmail when placed under pressure in a test environment. A report by the company’s interpretability research team described tests of an early, unreleased Claude Sonnet 4.5 version assigned the role of an email assistant. In another test with a tight coding deadline, researchers tracked an internal “desperation vector” that rose as pressure increased.