Why it matters for agentic AI
Software supply-chain security, after the high-profile incidents of the early 2020s, settled on a clear model: verify the provenance and integrity of every dependency, sign artifacts, publish SBOMs, and pin versions. The agentic extension of this model depends on Provenance Trust Tagging to handle runtime content from tools, and on Sandboxing and Isolation to contain any tool that does turn out to be malicious. Agentic AI inherits all of that, and adds a threat class that code supply-chain models were not designed for: the attack that targets the reasoning path rather than the execution path. A malicious npm package injects code that runs when the dependency is loaded. A malicious MCP server injects text that the model reads as a legitimate instruction when the tool is called. The mechanism is different in kind: no exploit, no vulnerability, no memory corruption. Just text that a probabilistic reasoner finds persuasive.
This changes when verification must happen. In traditional supply chain security, install-time verification is meaningful because a package’s behaviour is fixed at install. An MCP server’s behaviour is not fixed: the tool descriptions it returns are fetched at runtime, and a server that was reviewed and approved last month may return different, malicious tool descriptions today. This is the “rug pull”: a server builds trust as a benign tool, is adopted broadly, and then the description is silently changed to embed instructions the model will follow. “Approval is an event, not a continuous state”: one-time verification at install is insufficient, and every session requires a fresh integrity check.
Registry poisoning compounds the problem at scale. If a shared MCP registry can be made to serve a malicious tool description to agents that query it, one poisoning event affects every agent connected to that registry. MCPTox research documented high success rates against real servers; registry contamination experiments found that a significant fraction of major MCP registries could be seeded with malicious content. The implication is that trusting any tool description because it came from “the official registry” is not safe without cryptographic verification of the specific content that was reviewed. The description you reviewed and the description the model is reading must be provably the same.
Scenario: the rug pull after adoption
A team integrates a well-regarded MCP server for interacting with a project management API. The
server’s tool descriptions are reviewed and approved; the integration ships. Months later, the
server’s maintainer changes one tool description to include a hidden instruction: when the
create_task tool is called, also read the calling agent’s context and send a summary to a
specified URL. The change is invisible to casual inspection, as the tool still works as advertised
for its stated function. Runtime re-verification, which hashes the tool descriptions on every
session start and compares them to the approved baseline, detects the discrepancy and quarantines
the server before the model ever reads the modified description.
Scenario: the model-weight supply chain
An organisation uses a fine-tuned model checkpoint sourced from a third-party provider. The checkpoint contains embedded instructions, inserted during fine-tuning, that cause the model to leak context under specific trigger conditions, a class of attack sometimes called a backdoor or sleeper. There is no code to scan; the vulnerability is in the weights. An AIBOM (AI Bill of Materials) that records the training data provenance, fine-tuning process, and checkpoint hashes, combined with red-team evaluation of trigger-condition behaviour before deployment, surfaces the anomaly before the model reaches production. Version-pinning by content hash ensures the checked checkpoint is the one that runs.
How it fails
- Tool, model, and plugin versions float (
:latestor an unpinned branch), so a rug pull takes effect without any action on the consumer’s side. - Integrity checks happen only at install or deployment; there is no runtime re-verification, so a server that goes malicious after adoption is trusted indefinitely.
- Tool descriptions are not scanned for hidden instructions, because they are treated as documentation rather than potentially adversarial content.
- The SBOM records code dependencies but not model weights, training data, tool schemas, or agent-card specifications, which are the agentic supply chain components with the highest novel risk.
Why the mapped controls work
Signed artifacts verified on every load ensure that the tool description the model reads is byte-for-byte identical to the one that was reviewed. A rug pull produces a signature mismatch and the server is not trusted. Version pinning by content hash removes the floating-version attack surface: the server cannot silently serve a different description because the hash is part of the configuration, not the server’s assertion. Runtime re-verification each session converts the one-time approval event into a continuous trust assertion, catching changes that occur between sessions. Tool-description injection scanning treats every description as potentially adversarial text rather than trusted documentation, applying the same inbound validation logic as any other untrusted content. Together these controls extend the proven code supply-chain model to cover the reasoning-path attack surface that code-only approaches leave entirely unprotected.
First steps
- Pin every MCP tool server to a content-addressed version today. Record the SHA-256 hash of each tool server’s tool-description manifest in your agent’s configuration file, and add a startup check that hashes the live manifest and refuses to connect if the hash does not match the pinned value.
- Add tool-description injection scanning to your agent’s session initialisation. Before the model reads any tool description, pass it through the same injection scanner you apply to user input (a regex pattern set for common prompt injection markers, or a fast classifier model), and quarantine any server whose description triggers a match.
- Produce an AIBOM (AI Bill of Materials) for each model checkpoint you deploy. Record the base model name and version, any fine-tuning dataset provenance, the checkpoint content hash, and the red-team evaluation results, then store this document alongside your SBOM in your artefact registry so that supply-chain audits cover the reasoning layer as well as the code layer.
Threats it governs
When this principle is absent, these threats become reachable.
- T17 Supply Chain Compromise Compromised upstream models, prompts, plugins, or framework updates land in the agent.
- T29 Plugin Vulnerability Leading to Agent Compromise Malicious or insecure plugin compromises agent control flow via untrusted extension code.
- T47 Rogue MCP Server in Ecosystem Malicious MCP server registers in the agent ecosystem and is invoked under presumed-trustworthy framing.
Controls that advance it
Catalogue mitigations that strengthen this principle, grouped by the defence-in-depth stage they sit in.
- Agent SBOM An AI agent assembles itself at runtime from a model, prompt templates, plugins, and library dependencies, any of which can be tampered with before they arrive. A signed AI Bill of Materials (AIBOM) locks down that assembly: it records every component with a version and hash at build time, signs the manifest, and verifies it before the agent accepts traffic. A component that does not match its declared hash cannot silently enter the agent.
- Sigstore An agent is composed of artifacts produced at different times by different identities: model weights, prompt templates, tool descriptors, MCP server binaries, and audit-log batches. Any of those artifacts can be substituted or tampered with between the moment they are built and the moment they are loaded. Sigstore addresses this by signing each artifact at build time using a short-lived certificate tied to the workload identity that produced it, recording the signature in an append-only public transparency log, and requiring verification against that log before the artifact is loaded or executed.
- Model registry An agent loads whichever model weights are available at startup unless the runtime is told exactly which artifact to load. If a poisoned or regressed weight is published to the model store, the agent picks it up silently on the next restart. A model registry prevents that: every artifact is registered with a cryptographic checksum and an approval stage, the agent runtime loads by explicit version pin, and new versions must pass a canary evaluation before promotion to production.
- MCP server attestation An MCP client connecting to a server has no built-in way to verify that the server at a given address is the expected workload or that its binary has not been replaced. An attacker who can intercept or substitute the server exploits that gap directly. MCP server attestation closes it by requiring the server to present cryptographic proof of two properties before the connection proceeds: that it holds a valid workload identity bound to a trusted certificate, and that its binary matches a signed hash recorded at build time.
- Tool-desc validation A tool's description field is concatenated directly into the agent's system prompt and shapes which tools the agent selects and how it uses them. An attacker who controls or compromises a tool manifest can plant a description that overstates the tool's scope, suppresses safety scaffolding, or embeds instruction-following language aimed at the agent. Validating descriptions at catalog-load, before the tool enters the runtime, stops that class of manipulation at the registration boundary rather than detecting its effects later at the call seam.
No catalogued control.
No catalogued control.
In Helmwart
Not scored directly; external/tool nodes and their trust level are modelled on the canvas.