← Atlas · Principles Partial in Helmwart

Resilience & failure · S&S least privilege + least common mechanism

Containment (blast radius)

Limit how far a compromised or misbehaving agent can reach. Blast radius = the damage one compromised component can do.

Why it matters for agentic AI

Blast radius (how much damage a single compromised component can cause) is a familiar metric in network security. Segment the network, limit lateral movement, and a single breached host becomes an incident rather than a catastrophe. In agentic systems the metric gains an extra dimension that has no classical counterpart: cross-agent propagation. One agent’s output is, by default, implicitly trusted by the next agent in the pipeline. A compromised agent can therefore pivot without any additional credential theft. It simply sends instructions through its existing, legitimate channel to an adjacent agent that has no reason to distrust it.

Sophos identified the worst single-agent case as the Lethal Trifecta: an agent that holds private user data, can receive untrusted external content, and has access to external communication channels all at once. Any one of these is a necessary ingredient for exfiltration; the trifecta assembles all three in the same execution context. The design response is not to remove any individual ingredient (most useful agents need at least two of the three) but to ensure no single agent holds all three simultaneously. Split the reader from the processor from the communicator. An injected reader that produces structured output to a communicator can’t exfiltrate, because the communicator never sees untrusted content and the reader never has a communication channel.

The cross-agent dimension multiplies blast radius further. An agent with limited direct capabilities can still become a pivot point: its trusted channel to downstream agents is itself a capability. A compromised support agent can issue a “process this payment” instruction to a payments agent and a “notify this address” instruction to an email agent, using legitimate inter-agent message paths. The effective blast radius of the support agent is therefore the union of everything all its downstream agents can do, which may be far larger than the support agent’s own direct permissions. Containment requires modelling and limiting these transitively reachable capabilities, not just the direct capabilities of each node.

Scenario: the trusted pivot

A support orchestrator has limited direct authority: it can read tickets and draft responses. It is compromised via a poisoned ticket. Rather than attempting to exfiltrate data it cannot access, it uses its trusted message channel to instruct a payments sub-agent to initiate a transfer and an email sub-agent to copy sensitive records to an external address. Each downstream agent treats the message as legitimate, since the support agent is an authorised peer. The blast radius of the support agent compromise is therefore the full combined capability of the payments and email agents. Containment requires east-west isolation: explicit, signed allow-listed paths between agents, hop/transaction budgets per session, and re-verification of each incoming instruction’s provenance before acting.

Scenario: the trifecta assembled at runtime

A personal assistant agent starts a session with only two trifecta legs: it has access to the user’s private files (private data) and can process incoming emails (untrusted content). During the session, the user’s workflow causes it to also acquire a send-email capability, completing the third leg. A crafted email now yields a full exfiltration path. The agent’s blast radius expanded mid-session without any explicit privilege escalation. Trifecta-aware containment monitors the combination of active capabilities in real time: when all three legs are simultaneously active, the session is flagged or interrupted for review. The goal is not to prohibit the combination entirely but to ensure it is conscious and scrutinised rather than assembled silently.

How it fails

  • Agents have no workload-level lateral-movement constraints; every agent can send arbitrary messages to every other, so one pivot hands the attacker the whole graph.
  • Trust is transitive and unchecked across hops; a downstream agent executes any instruction from an upstream peer without verifying its provenance or the legitimacy of the specific request.
  • A single agent holds all three trifecta legs (private data, untrusted input, and external communication) in one execution context, assembling the minimum sufficient condition for exfiltration at every task.
  • Credentials flow through the agent’s context rather than via a proxy, so a pivot that reads context acquires standing credentials for further movement.
  • Hop and transaction budgets are absent; a compromised pivot can issue an unbounded number of downstream instructions within a single session.

Why the mapped controls work

Splitting the trifecta across separate agents (reader/processor/communicator) is the structural answer: it eliminates the minimum sufficient condition for exfiltration by ensuring no single execution context holds all three ingredients simultaneously. Microsegmentation with allow-listed east-west paths enforces lateral movement constraints at the network and message-bus layer. An agent that has no route to a peer cannot pivot to it regardless of what instructions it emits. The credential proxy ensures that secrets never appear in any agent’s context: a compromised agent that reads its own context finds a short-lived, task-scoped token rather than a standing API key, collapsing the value of the pivot. Sealed fixed-schema tool endpoints prevent the escalation path where an attacker crafts a message to the tool that abuses a flexible parameter. Hop/transaction budgets per session cap the multiplier effect: even a successful pivot can only drive a bounded number of downstream actions before an operator alert fires.

First steps

  1. Draw the trifecta audit for each agent in your system today: does it simultaneously hold private data, process untrusted external input, and hold an outbound communication channel? If yes, split the reader from the communicator before the next deployment.
  2. Enumerate every agent-to-agent message path and enforce them as an explicit allow-list in your service mesh or message-broker ACL (e.g. Kafka topic-level ACLs, or Istio AuthorizationPolicy to.operation.paths) so that any agent-to-agent channel not on the list is dropped at the network layer.
  3. Set a per-session hop budget (e.g. a counter decremented on each inter-agent call, with an alert and pause at 20 hops) in your orchestration layer and test that it fires before a simulated pivot chain can complete more than a bounded number of downstream actions.

Threats it governs

When this principle is absent, these threats become reachable.

Controls that advance it

Catalogue mitigations that strengthen this principle, grouped by the defence-in-depth stage they sit in.

Prevent
  • gVisor When an agent executes generated or retrieved code, that code runs as a process with access to the host kernel. A vulnerability in the generated code, or a deliberate exploit injected through the agent's prompt, can reach the kernel and affect other workloads or the host itself. gVisor prevents this by inserting a user-space kernel implementation between the container and the host: the container's syscalls go to the Sentry process, not to the host kernel, so the reachable attack surface from inside the container is structurally smaller.
  • Session isolation An agent that serves multiple users stores conversation history, retrieved facts, and intermediate state in a memory layer. If that layer is not scoped to the originating session, one user's writes can reach another user's retrieval path. Session-scoped memory isolation prevents that by enforcing a hard boundary at the storage layer, so each session can only read and write its own state.
  • Cross-client isolation A shared MCP server that accepts connections from multiple clients is a concentration point where one client's session state, credentials, and resource budget are physically co-located with every other client's. Without enforced isolation, a malicious or compromised client can read another session's cached credentials, consume shared resources to the point of denying service to other clients, or exploit aggregate server permissions that exceed its own declared scope. Cross-client isolation is the set of structural controls that close those paths: per-session state scoping, per-client permission evaluation, and per-client resource quotas enforced at the server layer.
Detect

No catalogued control.

Respond
  • Anomaly isolation An agent that has been compromised, poisoned, or gone rogue will, in most cases, behave differently from its established baseline. Anomaly isolation acts on that difference: when an agent's behaviour score crosses a configured threshold, it is quarantined automatically, credentials revoked, message-queue access cut, in-flight actions aborted. Manual revocation cannot match the speed that cascading multi-agent failures demand.
  • Kill switch Agentic systems can act faster than a human can intervene through normal channels. A kill switch is the operational guarantee that a named human role can stop agent activity at any scope (single instance, class, or global) through a documented runbook, without requiring a code change or redeployment, and with every invocation written to an audit trail.

In Helmwart

The Q2 cross-layer propagation view plus the trifecta detection encode the core of containment.