Securing the future of AI agents

Securing the future of AI agents — Google DeepMind News
Source: Google DeepMind News

AI agents are transforming how complex tasks are carried out — from cyber defence to scientific discovery and product development — and could create substantial economic value. As these systems become more capable, they require more sophisticated safeguards. The AI Control Roadmap presents a defense-in-depth framework for building and managing advanced AI deployed internally, combining model alignment with system-level security to provide assurance even if alignment is imperfect.

The roadmap treats untrusted agents as potential insider threats and adapts the MITRE ATT&CK framework to break attacks into tactics and techniques. Monitoring relies on trusted supervisor systems to review an agent’s reasoning, actions and plans, and those supervisors can intervene to block harmful actions before damage occurs.

Effectiveness is tracked using coverage, recall and time-to-response metrics. Security measures scale with measurable changes in model capability.

ai agents, cyber defence, model alignment, defense-in-depth, system-level security, mitre att&ck, insider threats, trusted supervisors, coverage, time-to-response