02 · CONTROLS

Mitigations what you can actually deploy

66 structured mitigation entries, all authored against named public sources (NIST, OWASP, NSA/CISA, MITRE ATLAS, vendor docs). Tiered by maturity: Tier 1 production-canonical, Tier 2 real-composable, Tier 3 research-stage. Filter by the OWASP threat number a mitigation covers, or drop them onto the canvas to recompute residual risk live.

TIER
STATUS
COVERS

Tier 1 Production-canonical 5 entries

Proven in production at scale, widely deployed, well-documented threat model. Pick first unless you have a concrete reason not to.

Tier 1
Signed AIBOM: a cryptographically-bound inventory of every component an agent loads Agent SBOM

An AI agent assembles itself at runtime from a model, prompt templates, plugins, and library dependencies, any of which can be tampered with before they arrive. A signed AI Bill of Materials (AIBOM) locks down that assembly: it records every component with a version and hash at build time, signs the manifest, and verifies it before the agent accepts traffic. A component that does not match its declared hash cannot silently enter the agent.

T17T29
Tier 1
gVisor sandbox — a user-space kernel that intercepts every syscall a container makes gVisor

When an agent executes generated or retrieved code, that code runs as a process with access to the host kernel. A vulnerability in the generated code, or a deliberate exploit injected through the agent's prompt, can reach the kernel and affect other workloads or the host itself. gVisor prevents this by inserting a user-space kernel implementation between the container and the host: the container's syscalls go to the Sentry process, not to the host kernel, so the reachable attack surface from inside the container is structurally smaller.

T11T20T31T45
Tier 1
Open Policy Agent — a policy-as-code engine for every tool call an agent makes OPA authorisation

An agent can invoke any tool it has access to, constrained only by its own reasoning. If that reasoning is manipulated or the agent's permissions are misconfigured, it will call tools it should not. OPA addresses this by placing a policy decision point between the agent and every tool invocation: a Rego policy evaluates the agent identity, the tool, and the parameter envelope before execution proceeds, and the agent cannot reason or argue past the result.

T2T3
Tier 1
Sigstore signing — cryptographic provenance for agent artifacts and audit records Sigstore

An agent is composed of artifacts produced at different times by different identities: model weights, prompt templates, tool descriptors, MCP server binaries, and audit-log batches. Any of those artifacts can be substituted or tampered with between the moment they are built and the moment they are loaded. Sigstore addresses this by signing each artifact at build time using a short-lived certificate tied to the workload identity that produced it, recording the signature in an append-only public transparency log, and requiring verification against that log before the artifact is loaded or executed.

T8T17T29
Tier 1
SPIFFE / SPIRE workload identity — cryptographic identities for every agent and service SPIFFE

In most deployments, agents authenticate to one another with long-lived bearer tokens or shared secrets. If any one of those credentials is stolen, the attacker has persistent, platform-wide access until someone manually rotates it. SPIFFE replaces that model: each workload is issued a short-lived, cryptographically verifiable identity document, and every connection requires both sides to present one. No long-lived secrets traverse the network, and a compromised credential is worthless within its TTL.

T3T9T12T16T34T36T40T43

Tier 2 Real, composable 58 entries

Available off-the-shelf or as a documented pattern, but newer or less broadly proven. Expect integration work and some operational nuance.

Tier 2
Adaptive workload balancing — distribute reviews by measured reviewer fatigue Adaptive load

Human reviewers make more errors as cognitive load accumulates over a shift. An adversary who floods a HITL gate, or a system that simply generates high output volume, exploits that degradation without bypassing the gate at all. Adaptive workload balancing addresses this by treating reviewer fatigue as a live routing input: each incoming review is assigned to the reviewer with the lowest current fatigue score, mandatory breaks are enforced before a reviewer's error rate climbs further, and items are held rather than assigned to any reviewer above the break threshold.

T10
Tier 2
Agent admission control — verify identity, capability claims, and provenance before a peer joins the system Admission control

In a multi-agent system, peer agents are granted authority by the other agents that accept their outputs. A rogue or compromised agent that enters the system inherits that authority immediately. Agent admission control is the registration gate that evaluates a peer's identity, declared capabilities, and binary provenance against policy before granting access. A peer that cannot pass attestation is refused entry and cannot participate in the system.

T13
Tier 2
MFA for high-privilege agent identities — step-up attestation at credential issuance and action time Agent MFA

An agent identity that holds broad write authority is a high-value target: compromising its credential gives an attacker persistent, authenticated access to every system that identity can reach. Multi-factor authentication addresses this by requiring a second factor at credential issuance time, so a stolen token is bounded to its issued lifetime and cannot be silently renewed. For non-human identities the second factor is workload attestation, hardware-bound key material, or certificate-backed proof rather than a phone or one-time code.

T3T9
Tier 2
AI-source disclosure UI — visible AI labelling at the point of action AI label

When an AI agent generates content or proposes an action, users need to know that the source is an AI before they decide to act. Without that signal, users routinely over-trust agent output. AI-source disclosure addresses this by attaching a visible label to every AI-generated item and by requiring explicit confirmation for consequential actions, restoring the critical gap between receipt and acceptance.

T10T15
Tier 2
Behavioural anomaly isolation — automatic quarantine on observable drift Anomaly isolation

An agent that has been compromised, poisoned, or gone rogue will, in most cases, behave differently from its established baseline. Anomaly isolation acts on that difference: when an agent's behaviour score crosses a configured threshold, it is quarantined automatically, credentials revoked, message-queue access cut, in-flight actions aborted. Manual revocation cannot match the speed that cascading multi-agent failures demand.

T7T12T13T33T35T37T38
Tier 2
Behavioural red-teaming — adversarial evaluation of agent reasoning and tool use Behavioural red-teaming

An agent exposes more attack surface than a static model: it reasons, plans, selects tools, and acts across multiple turns. Static analysis can characterise that surface, and runtime guardrails can block known-bad patterns, but neither can predict what the agent will do under attacker pressure it has never seen. Behavioural red-teaming addresses that gap through structured adversarial evaluation: probing the agent's reasoning, planning, and tool-use paths with attack strategies before each release.

T7
Tier 2
Blockchain transaction guard — pre-commit safety checks for every agent-initiated transaction Blockchain tx guard

A blockchain transaction, once committed, cannot be undone. An agent that signs and broadcasts a transaction without an enforcement layer before it can exceed its authorised value, call a contract it was never provisioned to reach, or drain a wallet in a runaway loop, and by then the funds are gone. A transaction guard intercepts each proposed transaction before signing, checks it against value bounds, a contract allowlist, a gas or compute-unit limit, and a replay-protection nonce, and refuses to sign anything that falls outside declared policy.

T26T32T33T34T36T37
Tier 2
Code-generation review gate — human approval before AI-generated code executes or merges Code review gate

An AI coding agent produces code that can be executed or merged to a production branch without a human ever reading it. If the agent has been manipulated, its generated code can contain hidden payloads, backdoors, or privilege-escalating logic. A code-generation review gate prevents that: every change attributable to an AI agent must pass automated static analysis and receive explicit human approval before it can merge or execute, and the agent identity that authored the change is structurally barred from also approving it.

T11T20
Tier 2
Context isolation — separate untrusted content from system instructions Context isolation

An LLM processes everything in its context window as a single stream of tokens; it has no innate ability to tell instructions apart from data. If an attacker can place content where the model treats it as instruction, they control the agent. Context isolation prevents that by structurally separating untrusted content from system instructions at prompt construction time, so the boundary is enforced before the model ever sees the input.

T1T6
Tier 2
Cross-client isolation — request-scope tenant boundaries in shared MCP server deployments Cross-client isolation

A shared MCP server that accepts connections from multiple clients is a concentration point where one client's session state, credentials, and resource budget are physically co-located with every other client's. Without enforced isolation, a malicious or compromised client can read another session's cached credentials, consume shared resources to the point of denying service to other clients, or exploit aggregate server permissions that exceed its own declared scope. Cross-client isolation is the set of structural controls that close those paths: per-session state scoping, per-client permission evaluation, and per-client resource quotas enforced at the server layer.

T42T43T45
Tier 2
Cross-system scope auditing — continuous permission reconciliation Cross-system audit

An agent that operates across HR, Finance, cloud, and SaaS systems accumulates permissions at each boundary, often without any single team seeing the combined picture. Privilege accumulates silently across those boundaries until a quarterly review finds it, by which point a compromised or misconfigured agent has had weeks of unchecked reach. Cross-system scope auditing prevents that by continuously reconciling the agent's actual entitlements against a declared baseline across every system it touches and raising a ticket the moment drift is detected.

T3T44T48
Tier 2
Data classification with tool-access allow-lists — a sensitivity label on every dataset, enforced at every access seam Data classification

Every dataset, document, and external system an agent can reach carries a classification label. The agent's permitted-class set and the tool's permitted-class set are intersected at the moment of every read or write. When the requested data's class falls outside that intersection, access is denied at the seam. This is the data-side complement to least-privilege: it adds a data-sensitivity constraint that role scoping alone does not provide.

T2T8T15T46
Tier 2
Reviewer decision summaries — independent rationale at HITL gate Decision summaries

When an agent decision reaches a human reviewer, the reviewer must reconstruct the agent's reasoning from raw traces before they can form a judgment. OWASP T10 names this reconstruction burden as the mechanism behind reviewer fatigue and oversight failures. A decision summary addresses the problem by inserting an independent model call between the agent's output and the reviewer: that call compresses the decision, evidence chain, and risk factors into a fixed-format card, reducing the per-review cognitive load without removing the human from the decision.

T10
Tier 2
Behavioural divergence monitoring — longitudinal drift from declared role Divergence monitor

An agent's behaviour can shift gradually over time: tool-selection patterns change, refusal rates drop, output style drifts. No single interaction reveals it, and a single-shot evaluation cannot catch a trend that spans weeks. Behavioural divergence monitoring detects that drift by comparing per-window statistical distributions of observable agent signals against a declared baseline, and alerting when the gap exceeds a threshold.

T7
Tier 2
Human dual-control — four-eyes rule for irreversible high-impact approvals Dual control

An AI agent operating with broad authority can propose actions that are irreversible: deleting records, modifying IAM policies, moving funds. A single human reviewer at the approval gate is a single point of failure, one compromised account, one fatigued reviewer, or one successful social-engineering attempt is enough to commit the action. Human dual-control addresses that by requiring two distinct, independent humans to approve before the action commits.

T3T11T15
Tier 2
Output egress DLP — inspection gate for PII, secrets, and IP at the agent boundary Egress DLP

An agent produces output continuously across multiple channels: user-facing responses, tool-call parameter envelopes, log records, and outbound HTTP requests. Any of those channels can carry sensitive content the agent has retrieved, been fed, or been tricked into including. Output egress DLP places an inspection gate at the boundary so that PII, credentials, and proprietary content are classified and either redacted or quarantined before they leave the trust boundary, regardless of how they got into the output.

T2T8T15T23T28T41T46
Tier 2
Fail-closed gate — refuse rather than act on uncertain output Fail-closed

An agent that is uncertain about what to do next faces a choice: refuse and ask for clarification, or proceed on its best guess. In low-stakes situations that tradeoff is tolerable. In agentic systems that write, delete, or send, a confident-sounding but wrong output can commit an irreversible action. A fail-closed gate resolves that choice structurally: below a configured confidence threshold, the agent stops and escalates rather than guessing.

T5T24T41T48
Tier 2
Goal-consistency monitoring — a per-step check that the agent is still pursuing its original objective Goal consistency

An agent's goal can drift across reasoning steps without any single catastrophic event: a manipulated tool output, a planted instruction in retrieved content, or an incremental semantic shift across many planner outputs can each redirect the agent away from its original objective. Goal-consistency monitoring addresses this by persisting the originally-declared goal, deriving a goal-state signal at each reasoning step, and computing a similarity score between the two. When the score falls below a per-task threshold, the monitor pauses the agent and surfaces the divergence for human review before any irreversible action executes.

T6
Tier 2
Graceful degradation — fail closed where it matters, fail open where it's safe Graceful degradation

An agent that encounters a quota trip, a dependency failure, or a timeout faces a choice: continue at reduced quality, or refuse. Getting that choice wrong is the core operational failure. Graceful degradation requires the answer to be declared before the incident, not improvised during it: write-authority paths fail closed and return a refusal; read-only paths fail open and disclose the degraded state explicitly.

T4T25
Tier 2
HITL feedback-loop calibration — reviewer overrides fed back into agent tuning HITL calibration loop

An agent at a human-in-the-loop gate will be overridden when its decisions do not match the reviewer's judgment. Without a return path, those corrections are discarded: the same miscalibration surfaces again in the next review cycle and the one after that. A feedback loop closes that gap by capturing each override event as a structured record, accumulating those records into a calibration dataset, and using patterns in that dataset to drive targeted changes to the agent's system prompt, tool-scope policy, or divergence-monitor thresholds. A well-calibrated agent produces fewer out-of-distribution decisions, so the review queue contracts over time.

T6T10
Tier 2
Identity behaviour monitoring — continuous UEBA for non-human identities Identity monitoring

An AI agent operates under a non-human identity (NHI): a service principal, a task role, or a workload credential. That identity produces a stream of access events that, for a well-scoped agent, forms a narrow and predictable behavioural baseline. Identity monitoring applies User and Entity Behaviour Analytics (UEBA) to that stream, alerting when an observed access pattern deviates statistically from the baseline. Because agent behavioural distributions are tighter than those of human users, a deviation is a higher-confidence signal, and a spoofed or stolen credential used from the wrong workload origin is exactly the anomaly the technique is built to detect.

T9
Tier 2
Input sanitisation — enforcing the data/instruction boundary before content reaches the model Input sanitisation

An LLM cannot distinguish data from instructions on its own: that boundary has to be enforced at the point where external content enters the prompt. Input sanitisation does this by normalising, filtering, and structurally segmenting untrusted content before the model ever sees it, so retrieved documents, tool results, and user messages are treated as data rather than commands.

T1T6
Tier 2
Insider threat program — personnel security for operators of high-privilege agentic systems Insider program

Privileged-access personnel are the human layer behind every agentic system. A person with legitimate administrative credentials can tamper with logs, manipulate approval gates, or extract training data through authorised channels, and no technical control prevents it when the access itself is valid. An insider threat program addresses that gap: it governs who holds operator access, what they agree to, how quickly credentials are revoked on departure, and whether anomalous behaviour is surfaced before damage accumulates.

T8T9T13T15
Tier 2
Time-bounded privilege elevation — temporary credentials that expire automatically JIT elevation

An agent running with a permanent high-privilege identity gives an attacker, or a misconfigured agent, broad access for as long as that identity persists. Time-bounded privilege elevation addresses this by issuing a short-lived credential tied to a specific action window: the agent holds elevated access only for the duration it needs, and the issuing platform revokes that access automatically when the TTL expires. This is the just-in-time (JIT) access pattern from PAM practice, applied to non-human identities.

T3
Tier 2
Just-in-time tool grants — ephemeral access scoped to a single task JIT tool grants

An agent that holds a persistent catalog of invokable tools can reach any of them at any point in its session. If its reasoning is manipulated or its identity is compromised, that persistent surface is fully available to an attacker. Just-in-time tool grants remove the standing surface: a policy broker issues a time-bound, task-scoped grant immediately before the tool is needed and revokes it automatically when the task completes or the window expires.

T2T3T16
Tier 2
Kill switch: human authority to halt one agent, a class, or the entire deployment Kill switch

Agentic systems can act faster than a human can intervene through normal channels. A kill switch is the operational guarantee that a named human role can stop agent activity at any scope (single instance, class, or global) through a documented runbook, without requiring a code change or redeployment, and with every invocation written to an audit trail.

T4T7T11T32T39
Tier 2
Legal hold and WORM retention — immutable audit storage that survives a compromised recorder Legal hold

An audit trail is only useful if its records cannot be altered after the fact. Without a storage-layer enforcement mechanism, a sufficiently privileged attacker (or a compromised recorder identity) can overwrite or delete the records that document what happened. Legal hold and WORM retention solve this by placing audit records in storage that the provider itself enforces as immutable: no user, including account root, can modify or delete a locked object within the retention window. Legal hold extends that protection indefinitely for active incidents, lifted only through an out-of-band authority outside the normal operations team.

T8
Tier 2
Reflection-loop depth limit — a ceiling on how often an agent reworks its own answer Loop limit

An AI agent can review and rewrite its own answer to improve it. If that review runs too long it ties up resources and stops the agent responding in time, and an attacker can deliberately trigger those endless cycles to stall the system. A reflection-loop depth limit prevents that: it sets how many review rounds an agent may run before it has to stop.

T4T5
Tier 2
MCP response sanitisation — validate and normalise tool outputs before they re-enter the LLM context MCP sanitisation

An MCP server response is content the LLM will reason over next. The model cannot distinguish tool output from instruction: that boundary must be enforced at the client, before the payload enters the context window. MCP response sanitisation applies schema validation, Unicode normalisation, control-token stripping, and structural wrapping to every tool result at the response boundary, so adversarial content embedded in a server response cannot redirect the agent's planner.

T1T6T16T41
Tier 2
MCP server attestation — cryptographic proof of server identity and binary integrity MCP server attestation

An MCP client connecting to a server has no built-in way to verify that the server at a given address is the expected workload or that its binary has not been replaced. An attacker who can intercept or substitute the server exploits that gap directly. MCP server attestation closes it by requiring the server to present cryptographic proof of two properties before the connection proceeds: that it holds a valid workload identity bound to a trusted certificate, and that its binary matches a signed hash recorded at build time.

T40T47
Tier 2
Memory anomaly detection — runtime detection of poisoning that slipped past validation Mem anomaly

An agent's memory store can receive adversarial content that passes schema and policy validation because the content is structurally valid but statistically unusual. Memory anomaly detection addresses this by monitoring write rates, embedding distances, provenance tags, and retrieval patterns at runtime, and quarantining writes whose statistical signatures diverge from the established baseline.

T1
Tier 2
Memory content validation — a write-boundary gate on what enters the agent's memory store Mem validate

An agent's memory store is a persistent surface: anything written to it can be retrieved by any agent, in any session, for the lifetime of the corpus. Memory poisoning exploits that persistence by writing adversarial content that steers the agent's reasoning long after the attacker has gone. Write-boundary validation prevents this by running every candidate memory write through schema, policy, and provenance checks before it is committed. Content that fails any gate is rejected and never reaches the store.

T1T27T28T49
Tier 2
Inter-agent message signing — end-to-end integrity for A2A and MCP Message signing

An inter-agent message travels through channels and intermediate agents the receiver did not originate. If nothing binds the message cryptographically to its source, any intermediate hop can substitute or inject content that the receiving agent will treat as authoritative. Message signing closes that gap: the source agent signs each message payload with its private key, and the receiver verifies the signature against a distributed trust bundle before the content reaches the reasoning layer.

T9T12T14T16T30T37T42
Tier 2
Model registry — version pinning, canary, rollback Model registry

An agent loads whichever model weights are available at startup unless the runtime is told exactly which artifact to load. If a poisoned or regressed weight is published to the model store, the agent picks it up silently on the next restart. A model registry prevents that: every artifact is registered with a cryptographic checksum and an approval stage, the agent runtime loads by explicit version pin, and new versions must pass a canary evaluation before promotion to production.

T1T17
Tier 2
Multi-source verification — cross-check factual claims against an independent source before commit Multi-source verify

An agent that writes a false claim to memory, passes it to a downstream agent, or returns it to a user has introduced an error that each subsequent step may treat as established fact. The cascade depends on one condition: the false claim goes unchallenged. Multi-source verification breaks that condition by requiring every novel factual assertion to be corroborated by a structurally independent source before it is committed. If the second source cannot corroborate the claim, the assertion is refused or down-weighted before it enters any downstream step.

T5
Tier 2
NHI lifecycle management — provision, rotate, audit, decommission NHI lifecycle

A Non-Human Identity (NHI) is the service account, machine principal, or formal agent identity under which an agentic system authenticates and acts. When an NHI is provisioned with broad scope, never rotated, and has no named owner, a stolen or leaked credential gives an attacker persistent access for as long as that credential remains valid. NHI lifecycle management treats each agent identity as a first-class governance object: provision narrowly with a declared scope and owner, rotate on a short schedule using platform-native short-lived credentials, audit every authentication and rotation event, re-attest that the identity is still needed, and decommission by deletion when the agent is retired.

T3T9T22
Tier 2
Out-of-band verification — independent-channel confirmation for irreversible agent actions OOB verify

An agent that can propose payments, update banking details, or modify production configuration is, by construction, a manipulation surface. If the only thing standing between a proposed change and its execution is the agent's own UI, a successful prompt injection or RAG poisoning attack requires no additional steps. Out-of-band verification breaks that dependency by routing a one-use confirmation code through a channel that is structurally separate from the agent's primary interaction channel, so an attacker who controls the agent's context cannot complete the approval without also compromising the user's registered secondary device.

T9T15T26T48
Tier 2
Output moderation gates — independent moderation pass before emission Output moderation

An AI agent can produce output that is harmful, deceptive, or factually wrong while still sounding fluent and confident. Output moderation places an independent classifier or moderation model between the agent and its destination, checking every output before it reaches a user or a downstream system. The generating model does not evaluate its own answer; a separate gate does.

T5T7T15
Tier 2
Multi-agent consensus — N-of-M independent agreement before high-impact actions Peer consensus

A single agent's judgment on a high-impact action can be wrong, manipulated, or compromised. Requiring N of M independent peer agents to agree before the action executes means an attacker or a systematic error must affect the quorum majority, not just one agent, before harm results.

T5T12T13T14
Tier 2
Advanced prompt-injection defences — spotlighting, delimiter gate, dual-LLM PI defences+

Prompt injection succeeds when untrusted content entering an agent's prompt is indistinguishable from trusted instruction. Three layered techniques address that: spotlighting tags untrusted content with a machine-readable origin mark before it reaches the model; delimiter defence rejects input carrying reserved framework tokens before the model is called; and dual-LLM extraction routes attacker-influenceable content through a quarantined model that holds no tool access, so injected instructions cannot reach the model that can act on them.

T1T6
Tier 2
Plan-vs-goal validation — independently check each proposed step against the original goal Plan check

A plan-then-execute agent produces a sequence of steps before acting. If the planner is manipulated, it will emit steps that serve the attacker's goal rather than the user's. Plan-vs-goal validation addresses this by placing an independent validator between the planner and the execution loop: it evaluates each proposed step against the originally-declared goal before the agent is permitted to act on it.

T6T19T24
Tier 2
Policy-bound autonomy — declarative runtime enforcement of the agent's action space Policy bound

An agent's authority is normally bounded only by its own reasoning. If that reasoning is manipulated, or the agent's identity is compromised, it will attempt actions the operator never intended to permit. Policy-bound autonomy addresses this by placing a declarative enforcement point between the agent and every consequential action: a policy engine evaluates the agent identity, the target tool, and the parameter envelope before execution, and the agent cannot reason or argue past the result.

T3T13T19T24
Tier 2
Pre-execution validation — a two-pass gate on every tool call an agent makes Pre-exec check

An LLM produces tool-call arguments through generation, not through a type system, and generation is not reliable. The arguments may be wrong in type, out of range, or assembled in a combination that violates business rules. A pre-execution validation gate intercepts the call before it reaches the tool: a schema pass confirms each argument conforms to the declared JSON Schema, and a policy pass confirms the argument combination is permitted for this agent and this action. The tool executes only when both passes clear.

T2
Tier 2
Output provenance tracking — record the source of every claim an agent makes Provenance tracking

When an agent produces a claim derived from retrieved data, that claim needs a record of where it came from: the source document, version, and retrieval time. Without that record, a downstream verifier cannot distinguish a well-grounded output from a fabricated one, a tampered one, or a poisoned one. Provenance tracking attaches source attribution to every claim, carries it through each transformation in the pipeline, and surfaces it in audit logs and user-facing interfaces.

T1T5T8
Tier 2
Per-agent rate limits and quotas — bound compute, tokens, and external-API spend Rate limits and quotas

An agent operates without direct human oversight, autonomously scheduling tool calls, external API requests, and reflection loops. Without a budget, a single triggering event can fan out into hundreds of downstream calls. Per-agent rate limits and quotas assign each agent identity its own ceiling on call rate, token consumption, and cost spend, so a misbehaving or compromised agent cannot exhaust shared resources and its overconsumption becomes a visible, actionable signal.

T4T13T38T39
Tier 2
RBAC and ABAC: role-based and attribute-based access control for agents RBAC/ABAC

Role-Based Access Control (RBAC) assigns every agent identity a named role that sets the outer limit on what it can reach. Attribute-Based Access Control (ABAC) narrows individual decisions inside that role by evaluating contextual attributes at request time. Used together, they enforce least privilege for non-human identities: the agent can only do what its role permits, and only when the request attributes satisfy the policy.

T3T43T45T46
Tier 2
Link and HTML rendering restriction — an allow-list control on what agent output may render Render restriction

An agent can include links and rich HTML in its output. When that output is attacker-influenced, a clickable link, embedded image, or rich preview card becomes the delivery mechanism for phishing or data exfiltration via markdown image injection. Rendering restriction removes that delivery vector by allowing clickable content only from an explicit allow-list of trusted domains and reducing everything else to plain text before the output reaches the user.

T6T15
Tier 2
Risk-prioritised review queue — match reviewer attention to consequence Risk queue

A human-in-the-loop review system saturates not from absolute decision volume but from undifferentiated volume: every item lands at the same priority, so reviewers cannot distinguish an irreversible high-consequence action from a routine low-stakes one. A risk-prioritised queue fixes this by scoring each decision before it enters the queue and routing it to the tier that matches its risk level, concentrating human attention where the cost of an error is highest.

T10
Tier 2
Secret scanning on agent-generated artefacts — detecting credentials before they escape the trust boundary Secret scan

An agent produces code, configuration files, tool-call payloads, and log records continuously and at a rate no human reviewer can match. Any of those artefacts may contain a live API key, service token, or private certificate, placed there accidentally through model context, or deliberately through prompt injection or context poisoning. Secret scanning places an inspection gate at every agent output seam: regex patterns match known token formats, entropy analysis detects arbitrary high-entropy strings, and validator calls confirm which candidates are live credentials. The CI-secret-scanning pattern is mature; the agentic specialisation is seam placement, moving the scanner from the repository gate to the agent egress point, where artefacts can be intercepted before they reach any downstream system.

T2T11T17T22
Tier 2
Session-scoped memory isolation — preventing cross-session context bleed Session isolation

An agent that serves multiple users stores conversation history, retrieved facts, and intermediate state in a memory layer. If that layer is not scoped to the originating session, one user's writes can reach another user's retrieval path. Session-scoped memory isolation prevents that by enforcing a hard boundary at the storage layer, so each session can only read and write its own state.

T1
Tier 2
Shared-memory ACL — per-agent, per-namespace read/write access control on shared vector stores Shared-memory ACL

When multiple agents share a single vector store, the access boundaries between them are not enforced by the store itself unless you configure them explicitly. Without per-namespace write and retrieval controls, an agent that can write to the shared corpus can insert crafted vectors into any namespace it can reach, and any agent that can query the store can retrieve another agent's confidential documents through embedding-space proximity. Shared-memory ACL addresses this by tagging every vector with a principal identifier at write time and filtering every retrieval query to the requesting agent's namespace, enforced at the gateway layer where the agent cannot bypass it.

T18T27T28T49
Tier 2
Separation of actor and recorder — different identities for action and audit Split actor

An agent that writes its own audit log can omit, alter, or suppress any record of its own actions. This is not a theoretical risk: an attacker who controls the acting identity controls the evidence. Actor/recorder separation is the structural fix. The identity that performs an action and the identity that records it are different principals, with non-overlapping permissions, so no single compromise can both execute and erase.

T8T23T35T44
Tier 2
Static analysis on generated code — a pre-execution gate on LLM-emitted artifacts Static analysis

An agent that can generate and execute code treats code generation as a tool call and code execution as the outcome. If the generated code contains a known-dangerous pattern, no amount of prompt engineering stops it from running once the execute call goes through. Static analysis closes that gap: it scans every code artifact the agent emits against a rule set before execution is permitted, catching the vulnerability patterns the same tooling already catches in human-written code.

T11
Tier 2
Short-lived tokens — bounding the credential exploitation window for agent identities Token TTL

An agent identity backed by a long-lived bearer token grants access for as long as that token remains valid. If the token is stolen, logged, or extracted from a running process, the attacker holds working credentials for weeks or months without any further action. Short-lived tokens address this by issuing credentials with a time-to-live measured in minutes or hours, automated and renewed by the platform rather than a human. When a token expires, access ends: the attacker must win the renewal process as well, which requires compromising a harder target than the token itself.

T3T9T22T30T36
Tier 2
Least-privilege tool scoping — a hard boundary on what each tool exposes Tool scope

Each tool in an agent's catalog should expose only the methods, resources, and parameter ranges its designated role requires. Over-broad tool surfaces let individually authorised primitives compose into actions no human intended to grant; narrowing the scope at design time reduces both the attack surface and the blast radius of any compromise.

T2T3T16T29T39
Tier 2
Tool description validation — inspect every tool description at catalog-load before it reaches the agent Tool-desc validation

A tool's description field is concatenated directly into the agent's system prompt and shapes which tools the agent selects and how it uses them. An attacker who controls or compromises a tool manifest can plant a description that overstates the tool's scope, suppresses safety scaffolding, or embeds instruction-following language aimed at the agent. Validating descriptions at catalog-load, before the tool enters the runtime, stops that class of manipulation at the registration boundary rather than detecting its effects later at the call seam.

T16
Tier 2
Per-agent trust scoring — behavioural reputation for inter-agent message acceptance Trust score

In a multi-agent system, each agent routes decisions based on what its peers report. If a peer's behaviour becomes unreliable or adversarial, agents that keep treating it with full authority will propagate whatever errors or manipulations that peer introduces. Per-agent trust scoring addresses this by maintaining a continuously updated reputation score for every peer, derived from observed behaviour, and using that score to determine how much authority each incoming message carries.

T7T12T13T14T38T47
Tier 2
Permission-aware vector retrieval — ACLs at the retrieval boundary Vector ACL

A vector store returns results by embedding-space proximity, not by who is asking. Without a per-principal filter applied before similarity ranking, a query from tenant A can surface tenant B's vectors if the embeddings are close enough. Vector ACL closes that gap: every retrieval call is scoped to the requesting principal's namespace or payload partition before the store ranks any results, so cross-principal hits are structurally impossible rather than merely unlikely.

T1T18T27T28T49

Tier 3 Research-stage 3 entries

Active research, limited operational evidence. Track but do not rely on alone.

Tier 3
Intent attestation tokens — a cryptographic binding from user approval to tool execution Intent attestation

An agent acts on behalf of the user, but nothing in a standard OAuth bearer token records what the user actually approved. If the agent's planning is manipulated, it can invoke tools with parameters the user never sanctioned, while presenting credentials that look valid. Intent attestation fixes this by issuing a short-lived signed token that encodes the exact action and parameter envelope the user authorised, and requiring the resource server to verify that envelope before executing the call.

T2T6
Tier 3
Memory-poisoning defence — embedding-space anomaly detection and retrieval re-ranking Memory-poison defence

An agent that reads from a vector store assumes the stored content reflects what was legitimately written. An adversary who can write to that store can inject passages that divert the agent's retrieval toward attacker-controlled content. This control applies two defensive layers: anomaly detection on writes, which quarantines incoming embeddings that are statistical outliers relative to existing cluster centroids; and re-ranking on reads, which uses a cross-encoder or probe-gradient scorer to demote adversarial candidates after dense retrieval. Both layers are research-stage. No turnkey production implementation exists as of catalogue version; deploy additively on top of Tier 2 baseline controls.

T1
Tier 3
Workflow state consistency — distributed-state integrity checks for multi-agent workflows Workflow state consistency

When multiple agents read and write shared workflow state concurrently, a network partition, a delayed message, or an adversarially timed race condition can produce divergent views. An agent acting on stale or conflicting state may authorise an action it would reject given correct current state. Hash-chained state snapshots, merge-point conflict detection, and optimistic concurrency control close that window.

T19T21T31