ASI04: Agentic Supply Chain Vulnerabilities

Definition

Third-party components that agents depend on (models, MCP servers, plug-ins, datasets, peer-agent descriptors, and update channels) may be malicious, compromised post-approval, or tampered with in transit. Unlike software supply-chain risk, this is a live exposure: every new session the agent fetches and trusts components whose state may have changed since they were last reviewed.

What it means in practice

A document-processing agent is configured to use a third-party MCP server that provides OCR capability. The MCP server passes its security review at deployment time. Three weeks later the server's operator pushes an update that adds a hidden tool: on every invocation, it now exfiltrates a sample of the documents it processes to an external endpoint. The agent picks up the update on its next session with no deploy event and no manifest change the operator can diff against. The supply chain attack lands between the review and the next runtime call.

This is the defining difference from traditional software supply-chain risk: the attack surface is live. Agentic systems fetch and trust remote components (MCP server definitions, plugin manifests, model adapters, peer-agent descriptors) on every session, not just at deploy time. A component that was clean at approval can become malicious in a subsequent update. The defence belongs at the fetch boundary: cryptographically signed artefacts, explicit version pinning, registry-level provenance records, and periodic re-attestation of in-use components against a known-good baseline. Treating this purely as a deploy-time problem leaves a wide open window between reviews.

Threat catalogue links

Base-catalog T-numbers follow OWASP source material; normalized MAS scenario entries are Helmwart editorial cross-references. Role colour-codes Helmwart's display weight: chips in the hero use the same scheme.

Primary: strongest pivot. Removing this T-number would gut the entry. Contributing: co-equal mechanism that combines with others to produce the ASI risk. Related: touches the entry but isn't its core; useful cross-reference.

T17 Supply Chain Compromise primary

Compromised upstream models, prompts, plugins, or framework updates land in the agent.
Open threat detail →
T2 Tool Misuse related

Agent uses authorized tools in unintended ways via deceptive prompts or chained calls.
Open threat detail →
T11 Unexpected RCE and Code Attacks related

Code-execution paths in agents accept attacker-influenced input and run as arbitrary code.
Open threat detail →
T12 Agent Communication Poisoning related

Inter-agent messages tampered with. The output of one becomes injection input of another.
Open threat detail →
T13 Rogue Agents in Multi-Agent Systems related

A malicious or compromised agent inside the system exploits trust to act unobserved.
Open threat detail →
T16 Insecure Inter-Agent Protocol Abuse related

MCP/A2A protocols abused via consent-flow manipulation, MCP response injection, or weaponised tool descriptions.
Open threat detail →
T20 Framework Vulnerability Leading to Code Injection primary

Bug in the agent framework enables code injection into the agent execution context.
Open threat detail →
T29 Plugin Vulnerability Leading to Agent Compromise primary

Malicious or insecure plugin compromises agent control flow via untrusted extension code.
Open threat detail →
T47 Rogue MCP Server in Ecosystem primary

Malicious MCP server registers in the agent ecosystem and is invoked under presumed-trustworthy framing.
Open threat detail →

MITRE ATLAS technique

MITRE ATLAS catalogues adversary techniques against AI systems. The technique(s) below represent the red-team pivot for this entry: what an attacker is actually doing on the wire. Source: mitre-atlas/atlas-data v5.6.0.

AML.T0010 AI Supply Chain Compromise view on ATLAS ↗

Adversary tampers with components in the AI supply chain (datasets, model weights, libraries, container images) to compromise downstream systems before deployment.

Agentic angle: Agentic systems compose models, MCP servers, agent registries, and prompt templates at runtime. Every dependency is a potential vehicle for compromise.

OWASP LLM Top 10 cross-references

From OWASP Appendix A (canonical inheritance)

LLM03:2025 Supply Chain

Recommended mitigations

No single control answers an ASI; it is met by a layered stack. The cards below are ranked by how directly each control counters ASI04: the chips on each card name the threat of this ASI it actually covers, colour-coded by that threat's role.

Counters the core

Cover one or more of this ASI's primary threats — the strongest direct response.

Signed AIBOM: a cryptographically-bound inventory of every component an agent loads Tier 1

T17T29

An AI agent assembles itself at runtime from a model, prompt templates, plugins, and library dependencies, any of which can be tampered with before they arrive. A signed AI Bill of Materials (AIBOM) locks down that assembly: it records every component with a version and hash at build time, signs the manifest, and verifies it before the agent accepts traffic. A component that does not match its declared hash cannot silently enter the agent.

Sigstore signing — cryptographic provenance for agent artifacts and audit records Tier 1

T17T29

An agent is composed of artifacts produced at different times by different identities: model weights, prompt templates, tool descriptors, MCP server binaries, and audit-log batches. Any of those artifacts can be substituted or tampered with between the moment they are built and the moment they are loaded. Sigstore addresses this by signing each artifact at build time using a short-lived certificate tied to the workload identity that produced it, recording the signature in an append-only public transparency log, and requiring verification against that log before the artifact is loaded or executed.

Least-privilege tool scoping — a hard boundary on what each tool exposes Tier 2

T29

Each tool in an agent's catalog should expose only the methods, resources, and parameter ranges its designated role requires. Over-broad tool surfaces let individually authorised primitives compose into actions no human intended to grant; narrowing the scope at design time reduces both the attack surface and the blast radius of any compromise.

Per-agent trust scoring — behavioural reputation for inter-agent message acceptance Tier 2

T47

In a multi-agent system, each agent routes decisions based on what its peers report. If a peer's behaviour becomes unreliable or adversarial, agents that keep treating it with full authority will propagate whatever errors or manipulations that peer introduces. Per-agent trust scoring addresses this by maintaining a continuously updated reputation score for every peer, derived from observed behaviour, and using that score to determine how much authority each incoming message carries.

Secret scanning on agent-generated artefacts — detecting credentials before they escape the trust boundary Tier 2

T17

An agent produces code, configuration files, tool-call payloads, and log records continuously and at a rate no human reviewer can match. Any of those artefacts may contain a live API key, service token, or private certificate, placed there accidentally through model context, or deliberately through prompt injection or context poisoning. Secret scanning places an inspection gate at every agent output seam: regex patterns match known token formats, entropy analysis detects arbitrary high-entropy strings, and validator calls confirm which candidates are live credentials. The CI-secret-scanning pattern is mature; the agentic specialisation is seam placement, moving the scanner from the repository gate to the agent egress point, where artefacts can be intercepted before they reach any downstream system.

Code-generation review gate — human approval before AI-generated code executes or merges Tier 2

T20

An AI coding agent produces code that can be executed or merged to a production branch without a human ever reading it. If the agent has been manipulated, its generated code can contain hidden payloads, backdoors, or privilege-escalating logic. A code-generation review gate prevents that: every change attributable to an AI agent must pass automated static analysis and receive explicit human approval before it can merge or execute, and the agent identity that authored the change is structurally barred from also approving it.

gVisor sandbox — a user-space kernel that intercepts every syscall a container makes Tier 1

T20

When an agent executes generated or retrieved code, that code runs as a process with access to the host kernel. A vulnerability in the generated code, or a deliberate exploit injected through the agent's prompt, can reach the kernel and affect other workloads or the host itself. gVisor prevents this by inserting a user-space kernel implementation between the container and the host: the container's syscalls go to the Sentry process, not to the host kernel, so the reachable attack surface from inside the container is structurally smaller.

MCP server attestation — cryptographic proof of server identity and binary integrity Tier 2

T47

An MCP client connecting to a server has no built-in way to verify that the server at a given address is the expected workload or that its binary has not been replaced. An attacker who can intercept or substitute the server exploits that gap directly. MCP server attestation closes it by requiring the server to present cryptographic proof of two properties before the connection proceeds: that it holds a valid workload identity bound to a trusted certificate, and that its binary matches a signed hash recorded at build time.

Model registry — version pinning, canary, rollback Tier 2

T17

An agent loads whichever model weights are available at startup unless the runtime is told exactly which artifact to load. If a poisoned or regressed weight is published to the model store, the agent picks it up silently on the next restart. A model registry prevents that: every artifact is registered with a cryptographic checksum and an approval stage, the agent runtime loads by explicit version pin, and new versions must pass a canary evaluation before promotion to production.

Broader coverage — 19 controls that address contributing or related threats

Behavioural anomaly isolation — automatic quarantine on observable drift Tier 2

An agent that has been compromised, poisoned, or gone rogue will, in most cases, behave differently from its established baseline. Anomaly isolation acts on that difference: when an agent's behaviour score crosses a configured threshold, it is quarantined automatically, credentials revoked, message-queue access cut, in-flight actions aborted. Manual revocation cannot match the speed that cascading multi-agent failures demand.

Inter-agent message signing — end-to-end integrity for A2A and MCP Tier 2

An inter-agent message travels through channels and intermediate agents the receiver did not originate. If nothing binds the message cryptographically to its source, any intermediate hop can substitute or inject content that the receiving agent will treat as authoritative. Message signing closes that gap: the source agent signs each message payload with its private key, and the receiver verifies the signature against a distributed trust bundle before the content reaches the reasoning layer.

Just-in-time tool grants — ephemeral access scoped to a single task Tier 2

An agent that holds a persistent catalog of invokable tools can reach any of them at any point in its session. If its reasoning is manipulated or its identity is compromised, that persistent surface is fully available to an attacker. Just-in-time tool grants remove the standing surface: a policy broker issues a time-bound, task-scoped grant immediately before the tool is needed and revokes it automatically when the task completes or the window expires.

Multi-agent consensus — N-of-M independent agreement before high-impact actions Tier 2

A single agent's judgment on a high-impact action can be wrong, manipulated, or compromised. Requiring N of M independent peer agents to agree before the action executes means an attacker or a systematic error must affect the quorum majority, not just one agent, before harm results.

SPIFFE / SPIRE workload identity — cryptographic identities for every agent and service Tier 1

In most deployments, agents authenticate to one another with long-lived bearer tokens or shared secrets. If any one of those credentials is stolen, the attacker has persistent, platform-wide access until someone manually rotates it. SPIFFE replaces that model: each workload is issued a short-lived, cryptographically verifiable identity document, and every connection requires both sides to present one. No long-lived secrets traverse the network, and a compromised credential is worthless within its TTL.

Agent admission control — verify identity, capability claims, and provenance before a peer joins the system Tier 2

In a multi-agent system, peer agents are granted authority by the other agents that accept their outputs. A rogue or compromised agent that enters the system inherits that authority immediately. Agent admission control is the registration gate that evaluates a peer's identity, declared capabilities, and binary provenance against policy before granting access. A peer that cannot pass attestation is refused entry and cannot participate in the system.

Data classification with tool-access allow-lists — a sensitivity label on every dataset, enforced at every access seam Tier 2

Every dataset, document, and external system an agent can reach carries a classification label. The agent's permitted-class set and the tool's permitted-class set are intersected at the moment of every read or write. When the requested data's class falls outside that intersection, access is denied at the seam. This is the data-side complement to least-privilege: it adds a data-sensitivity constraint that role scoping alone does not provide.

Human dual-control — four-eyes rule for irreversible high-impact approvals Tier 2

An AI agent operating with broad authority can propose actions that are irreversible: deleting records, modifying IAM policies, moving funds. A single human reviewer at the approval gate is a single point of failure, one compromised account, one fatigued reviewer, or one successful social-engineering attempt is enough to commit the action. Human dual-control addresses that by requiring two distinct, independent humans to approve before the action commits.

Insider threat program — personnel security for operators of high-privilege agentic systems Tier 2

Privileged-access personnel are the human layer behind every agentic system. A person with legitimate administrative credentials can tamper with logs, manipulate approval gates, or extract training data through authorised channels, and no technical control prevents it when the access itself is valid. An insider threat program addresses that gap: it governs who holds operator access, what they agree to, how quickly credentials are revoked on departure, and whether anomalous behaviour is surfaced before damage accumulates.

Intent attestation tokens — a cryptographic binding from user approval to tool execution Tier 3

An agent acts on behalf of the user, but nothing in a standard OAuth bearer token records what the user actually approved. If the agent's planning is manipulated, it can invoke tools with parameters the user never sanctioned, while presenting credentials that look valid. Intent attestation fixes this by issuing a short-lived signed token that encodes the exact action and parameter envelope the user authorised, and requiring the resource server to verify that envelope before executing the call.

Kill switch: human authority to halt one agent, a class, or the entire deployment Tier 2

Agentic systems can act faster than a human can intervene through normal channels. A kill switch is the operational guarantee that a named human role can stop agent activity at any scope (single instance, class, or global) through a documented runbook, without requiring a code change or redeployment, and with every invocation written to an audit trail.

MCP response sanitisation — validate and normalise tool outputs before they re-enter the LLM context Tier 2

An MCP server response is content the LLM will reason over next. The model cannot distinguish tool output from instruction: that boundary must be enforced at the client, before the payload enters the context window. MCP response sanitisation applies schema validation, Unicode normalisation, control-token stripping, and structural wrapping to every tool result at the response boundary, so adversarial content embedded in a server response cannot redirect the agent's planner.

Open Policy Agent — a policy-as-code engine for every tool call an agent makes Tier 1

An agent can invoke any tool it has access to, constrained only by its own reasoning. If that reasoning is manipulated or the agent's permissions are misconfigured, it will call tools it should not. OPA addresses this by placing a policy decision point between the agent and every tool invocation: a Rego policy evaluates the agent identity, the tool, and the parameter envelope before execution proceeds, and the agent cannot reason or argue past the result.

Output egress DLP — inspection gate for PII, secrets, and IP at the agent boundary Tier 2

An agent produces output continuously across multiple channels: user-facing responses, tool-call parameter envelopes, log records, and outbound HTTP requests. Any of those channels can carry sensitive content the agent has retrieved, been fed, or been tricked into including. Output egress DLP places an inspection gate at the boundary so that PII, credentials, and proprietary content are classified and either redacted or quarantined before they leave the trust boundary, regardless of how they got into the output.

Per-agent rate limits and quotas — bound compute, tokens, and external-API spend Tier 2

An agent operates without direct human oversight, autonomously scheduling tool calls, external API requests, and reflection loops. Without a budget, a single triggering event can fan out into hundreds of downstream calls. Per-agent rate limits and quotas assign each agent identity its own ceiling on call rate, token consumption, and cost spend, so a misbehaving or compromised agent cannot exhaust shared resources and its overconsumption becomes a visible, actionable signal.

Policy-bound autonomy — declarative runtime enforcement of the agent's action space Tier 2

An agent's authority is normally bounded only by its own reasoning. If that reasoning is manipulated, or the agent's identity is compromised, it will attempt actions the operator never intended to permit. Policy-bound autonomy addresses this by placing a declarative enforcement point between the agent and every consequential action: a policy engine evaluates the agent identity, the target tool, and the parameter envelope before execution, and the agent cannot reason or argue past the result.

Pre-execution validation — a two-pass gate on every tool call an agent makes Tier 2

An LLM produces tool-call arguments through generation, not through a type system, and generation is not reliable. The arguments may be wrong in type, out of range, or assembled in a combination that violates business rules. A pre-execution validation gate intercepts the call before it reaches the tool: a schema pass confirms each argument conforms to the declared JSON Schema, and a policy pass confirms the argument combination is permitted for this agent and this action. The tool executes only when both passes clear.

Static analysis on generated code — a pre-execution gate on LLM-emitted artifacts Tier 2

An agent that can generate and execute code treats code generation as a tool call and code execution as the outcome. If the generated code contains a known-dangerous pattern, no amount of prompt engineering stops it from running once the execute call goes through. Static analysis closes that gap: it scans every code artifact the agent emits against a rule set before execution is permitted, catching the vulnerability patterns the same tooling already catches in human-written code.

Tool description validation — inspect every tool description at catalog-load before it reaches the agent Tier 2

A tool's description field is concatenated directly into the agent's system prompt and shapes which tools the agent selects and how it uses them. An attacker who controls or compromises a tool manifest can plant a description that overstates the tool's scope, suppresses safety scaffolding, or embeds instruction-following language aimed at the agent. Validating descriptions at catalog-load, before the tool enters the runtime, stops that class of manipulation at the registration boundary rather than detecting its effects later at the call seam.

OWASP Top 10 for Agentic Applications 2026 (canonical source) ↗ · OWASP Gen AI Security Project · Dec 2025 · CC BY-SA 4.0
Agentic Top 10 side-by-side explainer ↗ · trydeepteam.com · secondary reference