← Mitigation · m-mcp-response-sanitization

EVIDENCE TRAIL

MCP response sanitisation

Verbatim excerpts from the upstream sources cited on the mitigation page, with what each source does and does not prove. The phrase "Context Hijacking via MCP Response Injection" appears verbatim in OWASP Agentic AI Threats & Mitigations v1.1 §T16. The instruction to "sanitize and validate all protocol-level data, including context payloads and tool metadata" is the closest upstream authority for this control's core requirement.

Last cross-checked against upstream sources: · 7 sources

References

Each entry shows what the source supports and what it does not prove.

Reference 1
v1.1 · published December 2025

OWASP Agentic AI — Threats & Mitigations v1.1

§T16 Insecure Inter-Agent Protocol Abuse — Attack Scenarios

"Scenario 2: Context Hijacking via MCP Response Injection – An attacker intercepts or crafts a server-side response within an MCP implementation that injects malicious context or tool metadata. A cooperating agent interprets the injected response as trusted protocol context and executes unintended backend operations."

Supports: Names the exact attack class — MCP response injection — that this mitigation directly targets. "Trusted protocol context" is the frame an unsanitised response exploits; sanitisation at the response boundary prevents it.

Does not prove: Describes the threat; does not specify the sanitisation pipeline (schema validation, Unicode normalisation, wrapping) that defends against it. Helmwart adds the implementation pattern.

Reference 2
v1.1 · published December 2025

OWASP Agentic AI — Threats & Mitigations v1.1

§T16 Insecure Inter-Agent Protocol Abuse — Mitigation (Threat Navigator summary table)

"Sanitize and validate all protocol-level data, including context payloads and tool metadata, to prevent injection or misinterpretation."

Supports: Verbatim upstream instruction to sanitise protocol-level data including tool metadata — the closest single-sentence authority for this control.

Does not prove: The summary table entry is terse and does not prescribe the layered approach (schema, pattern-strip, wrap). Helmwart operationalises this into a three-layer pipeline.

Reference 3
ATLAS catalogue (continuously updated)

MITRE ATLAS AML.M0033 — Input and Output Validation for AI Agent Components

AML.M0033 — mitigation description

"Implement validation on inputs and outputs for the tools and data sources used by AI agents. Validation includes enforcing a common data format, schema validation, checks for sensitive or prohibited information leakage, and data sanitization to remove potential injections or unsafe code. Input and output validation can help prevent compromises from spreading in AI-enabled systems and can help secure the workflow when multiple components are chained together. Validation should be performed external to the AI agent."

Supports: Verbatim canonical definition for output validation at the agent component boundary. "Schema validation" and "data sanitization to remove potential injections" directly authorise the two active layers of this control. "Validation should be performed external to the AI agent" matches Helmwart's placement at the MCP-client boundary.

Does not prove: Does not name MCP specifically; applies generically to any tool/data-source in an agentic workflow.

Reference 4
ATLAS catalogue (continuously updated)

MITRE ATLAS AML.M0030 — Restrict AI Agent Tool Invocation on Untrusted Data

AML.M0030 — mitigation description

"Untrusted data can contain prompt injections that invoke an AI agent's tools, potentially causing confidentiality, integrity or availability violations. It is recommended that tool invocation be restricted or limited when untrusted data enters the LLM's context."

Supports: Establishes untrusted tool output as the vector and recommends restricting tool invocation when it enters the LLM context — the threat model this sanitisation boundary defends against.

Does not prove: Focuses on restricting subsequent tool invocations, not on cleaning the injected content itself. Complementary rather than identical to this control.

Reference 5
Version 2026 · published December 2025

OWASP Top 10 for Agentic Applications 2026

§ASI07 Insecure Inter-Agent Communication — Prevention and Mitigation Guideline 2

"Message integrity and semantic protection: Digitally sign messages, hash both payload and context, and validate for hidden or modified natural-language instructions. Apply natural-language–aware sanitization and intent-diffing to detect goal, parameter tampering, hidden or modified natural-language instructions."

Supports: "Natural-language–aware sanitization" at the message boundary is the same control applied at the MCP response layer. Confirms that sanitisation is the upstream-recommended defence against inter-agent content injection.

Does not prove: ASI07 frames sanitisation at the transport / message-bus layer. MCP response sanitisation is one specific instantiation of this principle at the MCP tool-call response boundary.

Reference 6
Published July 2024

NIST AI 600-1 — Generative AI Profile (NIST AI RMF)

§2.9 Information Security — indirect prompt injection definition

"Indirect prompt injection attacks occur when adversaries remotely (i.e., without a direct interface) exploit LLM-integrated applications by injecting prompts into data likely to be retrieved. Security researchers have already demonstrated how indirect prompt injections can exploit vulnerabilities by stealing proprietary data or running malicious code remotely on a machine."

Supports: NIST's definition of indirect prompt injection — the threat class MCP response sanitisation defends against. Upstream tool output is precisely "data likely to be retrieved" by the LLM context.

Does not prove: Does not prescribe sanitisation at the MCP response boundary specifically. Helmwart maps NIST's threat framing to a concrete response-boundary control.

Reference 7
Version 2026 · published December 2025

OWASP Top 10 for Agentic Applications 2026

§ASI02 Tool Misuse and Exploitation — Common Examples of the Vulnerability

"Tool Poisoning: An attacker compromises the tool interface - such as MCP tool descriptors, response metadata, or tool output content - to embed instructions that hijack agent behavior."

Supports: Names "response metadata" and "tool output content" as concrete injection surfaces — exactly the content this sanitisation control normalises and strips before it re-enters the agent's context.

Does not prove: Frames the problem as supply-chain / tool-interface compromise (ASI02); sanitisation at the response boundary is one layer of defence, not a complete fix for compromised tool identity.