Primer

Lethal Trifecta

Coined by Simon Willison in June 2025, the lethal trifecta names a deployment pattern, not a vulnerability in any single component, that turns ordinary prompt injection into data exfiltration. Helmwart draws a red △ P U O badge on an agent whenever the graph shows that agent has all three of Willison's conditions.

The three legs (Willison's phrasing)

leg 1 · P PRIVATE

Access to your private data

"one of the most common purposes of tools in the first place"

Helmwart maps this to: the agent can traverse the graph to any node whose sensitivity is sensitive or regulated, or whose data contains PII or credentials. Typical examples: a shared-memory store holding session tokens, a document store of internal policies, an external system of record (banking core, EHR, ERP).

leg 2 · U UNTRUSTED

Exposure to untrusted content

"any mechanism by which text (or images) controlled by a malicious attacker could become available to your LLM"

Helmwart maps this to: an upstream node can reach into the agent's context: anything tagged provenance: untrusted, end users, open-web document stores, or user-uploaded corpora. This is the attacker's delivery channel; prompt injection lands here.

leg 3 · O OUTBOUND

The ability to externally communicate

"in a way that could be used to steal your data"

Helmwart maps this to: the agent can reach an external API, external system, or any node whose data is flagged outboundNetwork: true. This is the exit route: the wire over which exfiltrated data leaves the trust boundary. A web fetch, an email-send tool, even Markdown image rendering with attacker-controlled URLs all qualify.

Three legs, four risky regions

The three legs intersect into four meaningful regions. Two-leg overlaps are real risks but bounded; the triple-intersection at the centre (the only region all three of P, U, and O share) is the EchoLeak class.

Why the combination is special

Any single leg is normal. Almost every useful agent has at least one. Two legs is still routine. The qualitative shift is at three:

P + U, no O: attacker can corrupt the agent's reasoning but has no way to extract data. Reputational risk, not data-loss risk.
P + O, no U: only the user steers the agent that talks to private data and the internet. Insider risk, not external compromise risk.
U + O, no P: attacker can hijack the agent, but it has nothing valuable to leak.
P + U + O: injection at U tells the agent to read P and send it out via O. End-to-end exfiltration on a single agent. No vulnerability in any individual component is required. The topology itself is the vulnerability.

Documented incidents

Willison's article cites three publicly disclosed examples; all three are trifecta deployments where the bug is in the combination, not any one tool:

Microsoft 365 Copilot: EchoLeak

CVE-2025-32711. A crafted email becomes a prompt injection (U) that instructs Copilot to read tenant data (P) and exfiltrate via image markdown to an attacker-controlled host (O). Zero-click; no user action required.

NVD entry →

GitHub's official MCP server

A poisoned issue (U) instructs an agent connected to the MCP server to read private-repo contents (P) and dump them into a public-repo comment or external webhook (O). The MCP server didn't have a bug. The combination did.

Invariant Labs write-up →

GitLab Duo Chatbot

A crafted issue or comment (U) gets summarised into the assistant's context. The assistant has access to private project data (P) and can render Markdown that hits attacker URLs (O). Same shape, different vendor.

Legit Security write-up →

Willison's mitigation thesis

"End users have no choice but to avoid that lethal trifecta combination entirely." (Simon Willison)

Willison's argument: vendor guardrails are not enough, because any untrusted token that reaches the LLM can in principle change the agent's behaviour. The only robust answer is to deploy the agent such that one of the three legs is absent. That's why the badge is a hard signal in Helmwart, not a severity gradient.

The distributed trifecta in multi-agent systems

Single-agent trifecta analysis assumes one agent carries all three legs simultaneously. In multi-agent topologies the legs can be distributed across peers: one agent holds Private data, a second ingests Untrusted content, and a third has Outbound capability. No individual agent satisfies Willison's condition in isolation, yet the end-to-end exfiltration path still exists. The attacker just has to cross inter-agent boundaries to assemble it. T12 Agent Communication Poisoning is the mechanism by which poisoned content crosses from the Untrusted agent into the Private-data agent's reasoning context. T30 Insecure Inter-Agent Communication Protocol removes the confidentiality and integrity guarantees that would otherwise constrain what crosses those boundaries. T47 Rogue MCP Server in Ecosystem can serve as the Outbound leg: an attacker-controlled server that appears legitimate becomes the exfiltration channel. Helmwart's per-agent trifecta detection captures the single-agent case; the cross-peer combination requires topology-level review of the full graph.

How Helmwart detects it

The detector lives in detectTrifecta() in src/lib/graph/engine.ts. For every agent node, Helmwart performs three bounded reachability walks (depth ≤ 8):

Outbound walk. Forward adjacency; finds the first reachable node that satisfies isOutboundNode.
Private walk. Forward adjacency; finds the first reachable node that satisfies isPrivateNode.
Untrusted walk. Reverse adjacency (because untrusted content flows into the agent); finds the first reachable source that satisfies isUntrustedNode.

If all three walks return a non-empty path, the badge fires. The actual paths surface in the right-drawer inspector, showing which private node, which untrusted source, and which outbound exit are in play.

What to do when you see it

You don't have to remove the agent. You have to cut one leg. Helmwart shows the full reachability path so you can pick the cheapest cut. Common moves:

Cut U. Sanitise or classify content before it enters the agent's context. Apply content disarm and reconstruction (CDR) on files, prompt-injection guards on web fetches, untrusted-tag stripping on user input; or use separate untrusted-quarantine agents that hand only classified summaries to the main agent.
Cut O. Remove the agent's network egress. Run in a sandboxed environment with an allowlist of outbound endpoints; strip image and link rendering from agent output that touches user-controlled data; route any user-visible response through a separate vetted summariser.
Cut P. Narrow the agent's access to private stores. Use per-task scoped tokens with short TTLs, just-in-time secret materialisation, and separate read-only agents that hand pre-filtered results to the main agent. Never let the same agent both read sensitive data and decide what to send anywhere.

The badge clears as soon as any one walk returns empty. You don't need to fix every threat finding on the agent. You need to break the topology.