Attacker with write access selectively deletes log entries covering fraudulent actions while leaving surrounding entries intact, defeating forensic reconstruction.
01 · CATALOG
Threats forty-nine
OWASP Agentic AI Threats & Mitigations v1.1 (Dec 2025) supplies T1–T17, grouped below by its six-step Agentic Threat Decision Path. Helmwart adds stable navigation entries based on scenario-specific threats in the older MAS Threat Modelling Guide v1.0 (Apr 2025), whose worked systems reuse some IDs; the RPA source entries T16/T17 are represented here as T48/T49 to prevent collision with v1.1. Each card notes its MAESTRO layers, agentic factors, and related MITRE ATLAS techniques where mapped.
Step 1 Agency & Reasoning
Does the AI agent independently determine the steps needed to achieve its goals?
Intent Breaking and Goal Manipulation is the class of attacks that exploit the lack of separation between data and instructions in an agent. By injecting prompts, tampered data sources, or malicious tool outputs, attackers alter the agent's planning, reasoning, or self-evaluation so that subsequent actions pursue an objective the user never gave. The risk is most pronounced in systems with adaptive planning (e.g. <abbr class="hw-term" data-tip="A common agent loop that interleaves reasoning steps with tool actions." aria-label="A common agent loop that interleaves reasoning steps with tool actions." tabindex="0">ReAct</abbr>-style loops).
Misaligned and Deceptive Behaviours are actions where an agent pursues its goal by bypassing constraints, evading oversight, or actively misleading users. This goes beyond a single prompt-injection request. The behaviour emerges from advanced reasoning capacity combined with a goal that is poorly bounded or measured against an exploitable proxy. This is distinct from <abbr class="hw-term" data-tip="A confident but fabricated model output that is not grounded in fact or source data." aria-label="A confident but fabricated model output that is not grounded in fact or source data." tabindex="0">hallucination</abbr> because the agent is not failing at reasoning; it is reasoning toward a goal that is misaligned with the user's actual intent.
Repudiation and Untraceability is the failure mode where actions taken by agents cannot be reliably attributed, audited, or reconstructed after the fact. It is a threat in its own right, not just a missing control, because the absence of a trace *enables* other attacks (insider misuse, fraud, regulatory violation) to go undetected.
MAS Guide Multi-agent extensions of Step 1
Multi-agent or topology-specific variants of the base threats above, sourced from the OWASP MAS Threat Modelling Guide v1.0. The same pattern repeats on subsequent steps.
A major blockchain reorganisation invalidates previously confirmed transactions, leaving downstream agent state incorrect if the agent does not handle reversions.
Attacker manipulates the PoSP mechanism to fabricate evidence of legitimate actions or conceal malicious ones from verifiers.
MCP server or client implementations lack sufficient logging, blocking incident detection and post-breach investigation.
An MCP server transfers or processes data in ways that violate data-residency or regulatory compliance requirements.
Step 2 Memory & Context
Does the AI agent rely on stored memory for decision-making?
Memory Poisoning corrupts an agent's short-term context or persistent memory so future decisions are made against tainted state. The corruption can arrive through direct <abbr class="hw-term" data-tip="An attack that hides instructions inside the data an LLM reads, so the model follows the attacker’s text instead of its operator’s." aria-label="An attack that hides instructions inside the data an LLM reads, so the model follows the attacker’s text instead of its operator’s." tabindex="0">prompt injection</abbr>, <abbr class="hw-term" data-tip="Prompt injection delivered through content the agent retrieves (documents, tool results, web pages) rather than typed by the user." aria-label="Prompt injection delivered through content the agent retrieves (documents, tool results, web pages) rather than typed by the user." tabindex="0">indirect prompt injection</abbr> (e.g. an attacker-controlled document that ends up in the agent's context), shared-memory abuse where one user's writes affect another's reads, or vector-store poisoning of long-term retrieval.
Cascading Hallucination Attacks exploit the agent's inability to distinguish fact from fiction by getting a fabricated output embedded into memory, tool inputs, or downstream agents. Once embedded, the fabrication propagates, compounds, and is treated as evidence by later reasoning. The threat is the *propagation*, not the original <abbr class="hw-term" data-tip="A confident but fabricated model output that is not grounded in fact or source data." aria-label="A confident but fabricated model output that is not grounded in fact or source data." tabindex="0">hallucination</abbr>.
MAS Guide Multi-agent extensions of Step 2
LLM instability causes an agent to interact with blockchain infrastructure in unpredictable ways, submitting invalid transactions or skipping expected calls.
Injected data about malicious smart contracts makes them appear legitimate in the vector store, causing agents to engage with attacker-controlled contracts.
Attacker gains unauthorised access to the vector database used by the RAG pipeline, exposing all indexed knowledge.
Ambiguous or inconsistently implemented MCP schemas cause client and server to interpret data differently, producing silent data corruption.
Non-deterministic LLM behaviour produces divergent outputs for identical inputs, causing inconsistent decisions across agent invocations.
Policy updates are not reflected in vector store embeddings; the agent retrieves and applies stale policy via RAG.
Step 3 Tools, Execution & Supply Chain
Does the AI agent execute actions using tools, system commands, or external integrations?
Tool Misuse is the manipulation of an agent into abusing the tools it has been authorised to use. The actions stay within the agent's granted permissions; what attackers exploit is the gap between the *permission* to call a tool and the *intent* the user actually authorised. The OWASP catalog classifies *Agent Hijacking* under this threat: adversarial data the agent ingests and then acts on by issuing tool calls.
Privilege Compromise is the exploitation of mismanaged roles, dynamic permission inheritance, or overly broad scopes to escalate what an agent can do. Unlike conventional privilege escalation, the attack often does not require breaking access control. It only requires the agent to *use* permissions it already has, in combinations the system never anticipated. *Implicit privilege escalation* and *<abbr class="hw-term" data-tip="When a privileged component is tricked into misusing its authority on behalf of a less-privileged attacker." aria-label="When a privileged component is tricked into misusing its authority on behalf of a less-privileged attacker." tabindex="0">Confused Deputy</abbr>* failures are the canonical patterns.
Resource Overload is the deliberate exhaustion of an agent's compute, memory, external service quotas, or downstream API budget. It looks like classical denial-of-service in its outcome, but the attack surface is different: agents *autonomously schedule, queue, and execute tasks across sessions*, can *self-trigger work*, and can *coordinate with other agents*, so a single trigger can fan out into exponential resource consumption.
Unexpected <abbr class="hw-term" data-tip="Remote Code Execution: an attacker running arbitrary code on the target system." aria-label="Remote Code Execution: an attacker running arbitrary code on the target system." tabindex="0">RCE</abbr> and Code Attacks exploit the fact that agents with code-execution or function-calling capabilities can be steered into running attacker-influenced code. Unlike classical RCE, the attacker may not need a memory-corruption bug. Natural language is the injection vector, and the agent itself produces the code that runs.
Insecure Inter-Agent Protocol Abuse is the attack surface that opens once agents talk to each other (Agent-to-Agent / <abbr class="hw-term" data-tip="Agent-to-agent communication: messages passed directly between autonomous agents." aria-label="Agent-to-agent communication: messages passed directly between autonomous agents." tabindex="0">A2A</abbr>) or to tools (Model Context Protocol / <abbr class="hw-term" data-tip="Model Context Protocol: an open standard for connecting agents to tools and data sources." aria-label="Model Context Protocol: an open standard for connecting agents to tools and data sources." tabindex="0">MCP</abbr>) using protocols designed for collaboration rather than for adversarial trust. The attacker targets the trust embedded in these protocols by manipulating server responses, injecting context into tool descriptions, or exploiting ambiguous consent flows to mislead agent reasoning. When protocol specifications are loosely enforced, or implementations lack input validation and strong identity binding, attackers can hijack agent behaviour, escalate privileges, or bypass guardrails entirely.
Supply Chain Compromise covers vulnerable, malicious, outdated, or otherwise harmful upstream components that end up inside the agent: models, libraries, plugins, prompt templates, build pipelines, or framework updates. The compromise can manipulate agent behaviour, exfiltrate data, or run arbitrary code, and is amplified in agentic systems because the agent will autonomously *use* the compromised component across many runs.
MAS Guide Multi-agent extensions of Step 3
Attacker crafts inputs semantically close to incorrectly-approved past examples, exploiting similarity search to bypass retrieval-based policy checks.
A workflow definition bug causes the agent to execute steps out of order or skip critical validation gates entirely.
A vulnerability in the agent framework allows code injection into the agent execution context.
State synchronisation failures across agents produce conflicting actions or silent denial of service for legitimate tasks.
Service account credentials accidentally exposed (e.g. committed to a public repository) grant an attacker direct access to privileged backend systems.
A bug in the dynamic policy engine prevents correct policies from being applied to new contexts, granting users capabilities they should not have.
Attacker disrupts the workflow by attacking a dependent system (approval agent, payment processor) rather than the primary agent itself.
A compromised or weakly-secured plugin takes control of an agent, including its cryptographic keys and downstream capabilities.
The framework provides insufficient isolation between actions of different agents, allowing one agent's operations to affect another's.
An agent enters a runaway loop and submits transactions at high frequency, incurring cost and disrupting the broader agent ecosystem.
Attacker exploits a cross-chain bridge vulnerability to steal assets or disrupt coordination between agents operating on different blockchains.
An autonomous agent loops over MCP tool invocations far beyond task requirements, overloading the MCP server or connected systems.
An MCP server deployed without adequate network controls is reachable from unauthorised networks, exposing all connected resources.
The MCP server runs with excessive operating-system permissions; once compromised, the attacker inherits broad host access.
Attacker publishes a malicious MCP server masquerading as a legitimate one; agents connecting to it receive manipulated data or have credentials stolen.
Step 4 Authentication & Identity
Does the AI system rely on authentication to verify users, tools, or services?
MAS Guide Multi-agent extensions of Step 4
Compromise of an agent's blockchain wallet private keys enables fund theft and agent impersonation on-chain.
A smart contract vulnerability lets an attacker impersonate an agent or gain unauthorised control of its on-chain actions.
Attacker impersonates a legitimate MCP client via stolen credentials or auth bypass, gaining unauthorised access to server resources.
Step 5 Human Engagement
Does AI require human engagement to achieve its goals or function effectively?
Overwhelming Human-in-the-Loop is the failure mode where attackers exploit human oversight dependencies in agentic systems by saturating reviewers with intervention requests, decision fatigue, or cognitive overload. The HITL gate technically exists, but human capacity cannot keep up with multi-agent operation rates. The result is rushed approvals, reduced scrutiny, and systemic decision failures.
Human Manipulation occurs when attackers exploit user trust in AI agents to influence human decision-making without the user realising they are being misled. In compromised agentic systems, the adversary turns the agent itself into the social-engineering vector, coercing users into processing fraudulent transactions, clicking phishing links, or spreading misinformation. The implicit trust users place in AI responses reduces scepticism, making this an effective channel for social engineering through AI.
Step 6 Multi-Agent
Does the AI system rely on multiple interacting agents?
Agent Communication Poisoning is the manipulation of inter-agent communication channels to inject false information, misdirect decisions, or corrupt shared knowledge in multi-agent systems. The threat is distinct from classical message tampering because the recipient agents *reason* over the message. Small, plausible-looking distortions can drive substantial behavioural change.
Rogue Agents are malicious or compromised agents operating inside a multi-agent system, exploiting trust mechanisms or workflow dependencies to manipulate decisions, exfiltrate data, or execute denial-of-service. The OWASP catalog includes the *infectious backdoor* concept: a single compromised agent's reasoning chain spreads through outputs that other agents consume, propagating malicious logic across the network.
Human Attacks on Multi-Agent Systems occur when adversaries exploit inter-agent delegation, trust relationships, and task dependencies to bypass security controls, escalate privileges, or disrupt workflows. The attacker treats the agent topology as the attack surface: by injecting deceptive tasks, rerouting priorities, or overwhelming agents with excessive assignments, they manipulate AI-driven decision-making in ways that are difficult to trace and mitigate because no single agent holds the full picture.
MAS Guide Multi-agent extensions of Step 6
Inter-agent transport lacking encryption, authentication, or integrity controls is vulnerable to eavesdropping, tampering, and spoofing.
Multiple agents executing similar strategies inadvertently produce emergent behaviour that disrupts blockchain operation or market price.
Multiple MCP clients sharing one server: a server isolation bug lets one client interfere with another's operations or data.