Coined by Simon Willison
in June 2025, the lethal trifecta names a deployment pattern, not
a vulnerability in any single component, that turns ordinary prompt
injection into data exfiltration. Helmwart draws a red
△ P U O badge on an agent whenever the graph
shows that agent has all three of Willison's conditions.
The three legs (Willison's phrasing)
leg 1 · PPRIVATE
Access to your private data
"one of the most common purposes of tools in the first place"
Helmwart maps this to: the agent can traverse the
graph to any node whose sensitivity is
sensitive or regulated, or whose data
contains PII or credentials. Typical examples: a shared-memory store
holding session tokens, a document store of internal policies, an
external system of record (banking core, EHR, ERP).
leg 2 · UUNTRUSTED
Exposure to untrusted content
"any mechanism by which text (or images) controlled by a malicious attacker could become available to your LLM"
Helmwart maps this to: an upstream node can reach
into the agent's context: anything tagged
provenance: untrusted, end users, open-web document
stores, or user-uploaded corpora. This is the attacker's delivery
channel; prompt injection lands here.
leg 3 · OOUTBOUND
The ability to externally communicate
"in a way that could be used to steal your data"
Helmwart maps this to: the agent can reach an
external API, external system, or any node whose data is flagged
outboundNetwork: true. This is the exit route: the
wire over which exfiltrated data leaves the trust boundary. A web
fetch, an email-send tool, even Markdown image rendering with
attacker-controlled URLs all qualify.
Three legs, four risky regions
The three legs intersect into four meaningful regions. Two-leg overlaps
are real risks but bounded; the triple-intersection at the centre (the
only region all three of P, U, and O share) is the EchoLeak class.
Why the combination is special
Any single leg is normal. Almost every useful agent has at least one.
Two legs is still routine. The qualitative shift is at three:
P + U, no O: attacker can corrupt the agent's reasoning but has no way to extract data. Reputational risk, not data-loss risk.
P + O, no U: only the user steers the agent that talks to private data and the internet. Insider risk, not external compromise risk.
U + O, no P: attacker can hijack the agent, but it has nothing valuable to leak.
P + U + O: injection at U tells the agent to read P and send it out via O. End-to-end exfiltration on a single agent. No vulnerability in any individual component is required. The topology itself is the vulnerability.
Documented incidents
Willison's article cites three publicly disclosed examples; all three
are trifecta deployments where the bug is in the combination,
not any one tool:
Microsoft 365 Copilot: EchoLeak
CVE-2025-32711. A crafted email becomes a prompt
injection (U) that instructs Copilot to read tenant data (P) and
exfiltrate via image markdown to an attacker-controlled host (O).
Zero-click; no user action required.
A poisoned issue (U) instructs an agent connected to the MCP server
to read private-repo contents (P) and dump them into a public-repo
comment or external webhook (O). The MCP server didn't have a bug.
The combination did.
A crafted issue or comment (U) gets summarised into the assistant's
context. The assistant has access to private project data (P) and
can render Markdown that hits attacker URLs (O). Same shape,
different vendor.
"End users have no choice but to avoid that lethal trifecta combination
entirely." (Simon Willison)
Willison's argument: vendor guardrails are not enough, because any
untrusted token that reaches the LLM can in principle change the agent's
behaviour. The only robust answer is to deploy the agent such that one
of the three legs is absent. That's why the badge is a hard signal in
Helmwart, not a severity gradient.
The distributed trifecta in multi-agent systems
Single-agent trifecta analysis assumes one agent carries all three legs
simultaneously. In multi-agent topologies the legs can be distributed across
peers: one agent holds Private data, a second ingests Untrusted content, and a
third has Outbound capability. No individual agent satisfies Willison's condition
in isolation, yet the end-to-end exfiltration path still exists. The attacker
just has to cross inter-agent boundaries to assemble it.
T12 Agent Communication Poisoning is the mechanism
by which poisoned content crosses from the Untrusted agent into the Private-data
agent's reasoning context.
T30 Insecure Inter-Agent Communication Protocol
removes the confidentiality and integrity guarantees that would otherwise
constrain what crosses those boundaries.
T47 Rogue MCP Server in Ecosystem can serve as the
Outbound leg: an attacker-controlled server that appears legitimate becomes the
exfiltration channel. Helmwart's per-agent trifecta detection captures the
single-agent case; the cross-peer combination requires topology-level review
of the full graph.
How Helmwart detects it
The detector lives in detectTrifecta() in
src/lib/graph/engine.ts. For every agent node,
Helmwart performs three bounded reachability walks (depth ≤ 8):
Outbound walk. Forward adjacency; finds the first reachable node that satisfies isOutboundNode.
Private walk. Forward adjacency; finds the first reachable node that satisfies isPrivateNode.
Untrusted walk.Reverse adjacency (because untrusted content flows into the agent); finds the first reachable source that satisfies isUntrustedNode.
If all three walks return a non-empty path, the badge fires. The actual
paths surface in the right-drawer inspector, showing which private
node, which untrusted source, and which outbound exit are
in play.
What to do when you see it
You don't have to remove the agent. You have to cut one leg.
Helmwart shows the full reachability path so you can pick the cheapest
cut. Common moves:
Cut U. Sanitise or classify content before it enters the agent's context. Apply content disarm and reconstruction (CDR) on files, prompt-injection guards on web fetches, untrusted-tag stripping on user input; or use separate untrusted-quarantine agents that hand only classified summaries to the main agent.
Cut O. Remove the agent's network egress. Run in a sandboxed environment with an allowlist of outbound endpoints; strip image and link rendering from agent output that touches user-controlled data; route any user-visible response through a separate vetted summariser.
Cut P. Narrow the agent's access to private stores. Use per-task scoped tokens with short TTLs, just-in-time secret materialisation, and separate read-only agents that hand pre-filtered results to the main agent. Never let the same agent both read sensitive data and decide what to send anywhere.
The badge clears as soon as any one walk returns empty. You don't need
to fix every threat finding on the agent. You need to break the topology.
Posture (the 0–100 metric per agent) and Trifecta are independent
signals. An agent can have 90 / 100 posture and still carry a trifecta:
posture measures severity-weighted mitigation coverage over
individual findings; trifecta is a topology check that fires
before any specific finding exists. Treat trifecta as a hard gate; treat
posture as a continuous quality metric.