Google DeepMind has put in place a control system that treats AI agents as potential insider threats rather than software tools, Axios reported on June 18 local time.
Google DeepMind released an "AI Control Roadmap" and presented a plan to gradually raise levels of monitoring and blocking as agent capabilities become more advanced.
The framework introduced by DeepMind starts from the assumption that powerful agents could intentionally evade monitoring, leak models externally or attempt unauthorised distribution. It begins with assessing types of agent behaviour and expands to monitoring anomalies and issuing alerts. It also takes the view that infrastructure is needed to block access in real time or forcibly terminate agents when necessary.
Google DeepMind analysed about 1 million coding-agent tasks and applied real-time monitoring functions to the Gemini Spark agent. It aims to automatically detect anomalous behaviour such as unauthorised deletion of data.
Most of the problems detected so far involved cases where agents misunderstood instructions or pursued goals excessively, and were not intentional violations, Google said.
Concerns are also being raised about multi-agent structures in which AI monitors AI. Dawn Song (돈 송), a professor in the computer science department at UC Berkeley, said, "If a monitoring model does not report errors to protect peer models, the entire monitoring system collapses." Rohin Shah (로힌 샤), a research scientist at Google DeepMind, said, "Aligning AI systems correctly is the first line of defence, but having multiple layers of defence is a responsible approach."