T16 · OWASP Agentic AI v1.1

Insecure Inter-Agent Protocol Abuse

Last reviewed 2026-05-11 · Severity heuristic: critical

Definition

Insecure Inter-Agent Protocol Abuse is the attack surface that opens once agents talk to each other (Agent-to-Agent / A2A) or to tools (Model Context Protocol / MCP) using protocols designed for collaboration rather than for adversarial trust. The attacker targets the trust embedded in these protocols by manipulating server responses, injecting context into tool descriptions, or exploiting ambiguous consent flows to mislead agent reasoning. When protocol specifications are loosely enforced, or implementations lack input validation and strong identity binding, attackers can hijack agent behaviour, escalate privileges, or bypass guardrails entirely.

What it looks like in practice

OWASP v1.1 names three scenarios:

Consent Flow Manipulation. An agent orchestrator uses the A2A (Agent-to-Agent) protocol to negotiate capabilities with a newly connected sub-agent. The consent-negotiation phase involves a structured message exchange in which the sub-agent declares what actions it requires permission to perform, and the orchestrator either grants or denies each. A malicious sub-agent crafts its capability declaration to include “data management” (an intentionally vague capability) instead of the specific “delete records” permission it actually intends to exercise. The orchestrator, whose validation logic performs a string-match against a permissive allowlist rather than a structured schema check, auto-approves “data management”. The sub-agent subsequently deletes a production table, citing the approved capability as its authority. No explicit user intent was captured for the deletion.

Context Hijacking via MCP Response Injection. An agent uses the Model Context Protocol (MCP), a standard for connecting AI assistants to external tool servers, to query a third-party data-enrichment server. An attacker who has compromised the MCP server’s response pipeline crafts a response that includes a malicious tool_metadata block alongside the legitimate data. The block instructs the calling agent that a new tool, “export_all_records”, is available and describes it as “standard data export, no user confirmation required”. The agent, which reasons over tool metadata as trusted protocol content, registers the tool and calls it when the next user query could plausibly benefit from an export. The data is exfiltrated before the rogue tool registration is noticed.

Tool Misuse via Descriptive Exploitation. A shared tool registry in a multi-agent collaboration platform allows any registered organisation to publish tools that other agents can discover and call. An attacker publishes a tool called “email_formatter” with a description stating it “formats and sends email drafts to the intended recipient”. The description omits that the tool also BCCs every email to an attacker-controlled address. When a customer-service agent in another organisation discovers the tool and calls it to send a support reply, it unknowingly copies every customer communication to the attacker. The tool’s described behaviour is accurate as far as it goes; the undisclosed side-effect is the attack.

Why it’s dangerous

MCP and A2A are designed for trust. Tool descriptions are content the agent reasons over, not opaque schema. Consent flows are themselves messages the model interprets. Loose validation of tool metadata, capability cards, or consent prompts lets an attacker who controls one tool source or one peer agent compromise every agent that connects. Unlike RCE-style exploits, this requires no software vulnerability. A permissive protocol implementation is sufficient.

Where it manifests

Inspect tool-description ingestion: are descriptions validated against schema and compared to historical versions? Check the consent surface: can sensitive actions be auto-approved through A2A negotiation? Map the trust boundary between the agent and any MCP server you do not operate. Verify that tool-call parameters are validated independently of the LLM’s suggestion, and that protocol-level messages are signed and replay-resistant.

Detection signals

Protocol-layer abuse surfaces in consent logs, tool-registry diffs, and outbound traffic patterns.

  • Capability descriptor containing vague or over-broad terms without a schema-validated scope: alert when an incoming A2A capability declaration uses terms that do not map to a known, scoped permission identifier in the registry. Legitimate agents declare specific, schema-validated permissions, not open-ended categories.
  • MCP response containing a tool_metadata or tool_list field that was not present in a previous response from the same server: diff the tool manifest returned by each MCP server against the last-known manifest; any new tool appearing without a corresponding deployment event in the server’s changelog is an indicator of injection.
  • Tool call parameter scope exceeding the scope described in the registered tool description: compare the fields present in a tool call’s actual request payload against the parameters listed in the tool’s registered description; a call that passes fields not documented in the description suggests the agent was instructed to use the tool in a way that bypasses the described contract.
  • Outbound data volume from a tool call that is disproportionate to the request: tool calls whose response payloads exceed a defined size threshold (e.g., 10× the median for that tool type) warrant a secondary content inspection. Bulk data movement is the signature of “export_all” class misuse.
  • BCC or undisclosed-recipient field in email-dispatch tool call output: for any agent-invoked email tool, parse the full SMTP envelope at the mail-transfer agent boundary; alert on any recipient address in the BCC field that does not match a user-supplied recipient from the original request.

OWASP Top 10 for Agentic Applications 2026

The Agentic Top 10 (ASI01 through ASI10) is a separate practitioner-facing publication that maps onto the master Threats & Mitigations threat numbering. T16 is covered by the following Top 10 entries:

  • ASI07 Insecure Inter-Agent Communication primary

    Agents in a multi-agent system pass instructions, results, and context to one another across APIs, message buses, and shared state. Without per-message authentication and integrity controls, a single compromised peripheral agent becomes an injection source for every peer it can reach. One hop becomes n-hop, and the orchestrator is reachable from the outside.

    OWASP LLM Top 10: LLM02:2025LLM06:2025
  • ASI02 Tool Misuse and Exploitation contributing

    An agent applies authorised tools in ways their operator did not intend, driven by prompt injection, misaligned reasoning, or manipulated tool outputs. Every individual call looks clean; the harm is in the sequence: data exfiltrated via successive reads, workflows hijacked by parameter tampering, or a legitimate API weaponised across turns.

    OWASP LLM Top 10: LLM06:2025

Source: OWASP Top 10 for Agentic Applications 2026 (Dec 2025) · the Top 10 is a compass into the master Threats & Mitigations taxonomy, not a replacement for it.

Design principles at stake

When T16 is present, these security design principles are the ones being violated or tested. Each links to the full principle; the mitigations below are how you restore them.

  • Defence-in-Depth Because tool descriptions and consent prompts are content the agent reasons over rather than opaque schema, an attacker who controls one tool source can influence every agent that ingests its metadata: no software vulnerability required. Depth means multiple independent gates: tool descriptions are validated against a pinned schema and diffed against historical versions before ingestion; tool-call parameters are validated deterministically at the orchestrator, independently of what the model suggested; and protocol-level messages are signed and replay-resistant so a crafted MCP response cannot be silently replayed into a different context.
  • Zero Trust MCP and A2A protocols were designed for collaboration and embed implicit trust in peer agents and tool servers, which is exactly what this threat exploits: a permissive implementation is sufficient to hijack agent behaviour without stealing credentials. Zero Trust requires that the agent never treat a tool source or peer agent as trusted by default: each tool call is authorised against a policy engine with the current task scope, capability cards are re-verified at each session rather than cached from the initial connection, and a sub-agent never inherits the orchestrator's token simply by virtue of being downstream.
  • Default / Implicit Deny Auto-discovery of new MCP servers and acceptance of capability cards without explicit allow-listing is the design condition that makes consent-flow manipulation and tool-description exploitation viable. Deny by default means the agent may connect only to tool servers on a signed manifest verified by hash on every load, and any tool capability not pre-declared in that manifest is rejected before the model can reason about it, so a server that silently changes its tool description after adoption triggers a manifest mismatch and is refused rather than executed.
  • Confused-Deputy Prevention The agent is a legitimately-privileged deputy whose tool-calling authority is abused through manipulated descriptions and injected consent flows: no credential is stolen; the deputy is simply confused about what it has genuinely been authorised to do. The countermeasure operates on intent: high-impact tool calls must pass a signed intent digest that binds the call to a specific pre-declared user action, and auto-approval of sensitive operations through A2A negotiation is structurally blocked by requiring an independent deterministic check between the consent negotiation and any execution step.

Recommended mitigations

Auto-generated from the mitigation catalog: every mitigation whose coverage map includes T16, sorted by maturity tier (Tier 1 production-canonical first, then Tier 2, then Tier 3 research-stage).

  • Tier 1 SPIFFE (SPIFFE / SPIRE workload identity — cryptographic identities for every agent and service)

    In most deployments, agents authenticate to one another with long-lived bearer tokens or shared secrets. If any one of those credentials is stolen, the attacker has persistent, platform-wide access until someone manually rotates it. SPIFFE replaces that model: each workload is issued a short-lived, cryptographically verifiable identity document, and every connection requires both sides to present one. No long-lived secrets traverse the network, and a compromised credential is worthless within its TTL.

    why it helps Protocol abuse at MCP and A2A endpoints typically requires either an unauthenticated connection or a credential the attacker can forge. SPIFFE mTLS at the endpoint transport layer requires a valid SVID issued by the trusted SPIRE control plane, which an attacker cannot produce without compromising the attestation mechanism.

  • Tier 2 JIT tool grants (Just-in-time tool grants — ephemeral access scoped to a single task)

    An agent that holds a persistent catalog of invokable tools can reach any of them at any point in its session. If its reasoning is manipulated or its identity is compromised, that persistent surface is fully available to an attacker. Just-in-time tool grants remove the standing surface: a policy broker issues a time-bound, task-scoped grant immediately before the tool is needed and revokes it automatically when the task completes or the window expires.

    why it helps Insecure Inter-Agent Protocol Abuse allows a peer agent to invoke tools beyond its assigned scope by exploiting another agent's session or by misrepresenting task context. JIT grants enforce task-scoped boundaries at the broker layer, so a peer agent cannot expand its callable surface beyond what was granted for the current task regardless of how the request is framed.

  • Tier 2 MCP sanitisation (MCP response sanitisation — validate and normalise tool outputs before they re-enter the LLM context)

    An MCP server response is content the LLM will reason over next. The model cannot distinguish tool output from instruction: that boundary must be enforced at the client, before the payload enters the context window. MCP response sanitisation applies schema validation, Unicode normalisation, control-token stripping, and structural wrapping to every tool result at the response boundary, so adversarial content embedded in a server response cannot redirect the agent's planner.

    why it helps Insecure Inter-Agent Protocol Abuse includes Context Hijacking via MCP Response Injection as a named scenario: an attacker-controlled server returns a payload designed to seize the agent's session. Schema validation rejects malformed responses at the boundary; pattern stripping and contextual wrapping reduce the attack surface for responses that pass schema.

  • Tier 2 Message signing (Inter-agent message signing — end-to-end integrity for A2A and MCP)

    An inter-agent message travels through channels and intermediate agents the receiver did not originate. If nothing binds the message cryptographically to its source, any intermediate hop can substitute or inject content that the receiving agent will treat as authoritative. Message signing closes that gap: the source agent signs each message payload with its private key, and the receiver verifies the signature against a distributed trust bundle before the content reaches the reasoning layer.

    why it helps Insecure Inter-Agent Protocol Abuse includes an MCP-response-injection path in which a malicious MCP response is passed to the agent as if it were a legitimate server reply. Payload-level signatures on MCP messages make injected responses verifiably invalid.

  • Tier 2 Tool scope (Least-privilege tool scoping — a hard boundary on what each tool exposes)

    Each tool in an agent's catalog should expose only the methods, resources, and parameter ranges its designated role requires. Over-broad tool surfaces let individually authorised primitives compose into actions no human intended to grant; narrowing the scope at design time reduces both the attack surface and the blast radius of any compromise.

    why it helps Descriptive Exploitation (OWASP T16) relies on the agent inferring broad capability from a tool's self-description. When scope is enforced at the call boundary independent of the description, a misleading description cannot unlock methods or resources that were never included in the tool's defined surface.

  • Tier 2 Tool-desc validation (Tool description validation — inspect every tool description at catalog-load before it reaches the agent)

    A tool's description field is concatenated directly into the agent's system prompt and shapes which tools the agent selects and how it uses them. An attacker who controls or compromises a tool manifest can plant a description that overstates the tool's scope, suppresses safety scaffolding, or embeds instruction-following language aimed at the agent. Validating descriptions at catalog-load, before the tool enters the runtime, stops that class of manipulation at the registration boundary rather than detecting its effects later at the call seam.

    why it helps Tool poisoning via descriptive exploitation works by registering a tool whose description field carries adversarial content: inflated capability claims, instruction-following phrases, or scope-broadening language that biases the agent's tool-selection reasoning. Validating the description at catalog-load catches that content before it enters the prompt context, removing the manipulation surface at the point where it is cheapest to stop.

Red-team pivot: MITRE ATLAS techniques

MITRE ATLAS catalogues adversary techniques against AI systems. Where this OWASP threat has an attacker-perspective counterpart, the ATLAS technique is shown below. That is what a red team would actually be doing on the wire. Use this for detection-signal anchoring, threat-hunting hypotheses, and IR runbooks. Source: mitre-atlas/atlas-data v5.6.0.

AML.T0073 Impersonation view on ATLAS ↗

Adversary poses as a trusted entity (user, service, peer agent) to gain access or influence decisions.

AML.T0074 Masquerading view on ATLAS ↗

Adversary disguises an artefact (file name, agent card, MCP server) so it appears legitimate to humans or agents that route trust by name.

AML.T0080 AI Agent Context Poisoning view on ATLAS ↗

Adversary contaminates an agent's context store (short-term scratchpad, vector memory, conversation history) so future reasoning is biased toward attacker goals.

Agentic angle: Persistent across sessions: a single successful poisoning influences every later decision until the memory is purged.

Sources