Atlas · Introduction

Why agentic AI is a new security problem

A traditional application does what its code says. An agentic system does what a language model decides: at runtime, from instructions and data it reads in the moment, often without a human in the loop before it acts. That single shift, from deterministic code to a probabilistic decision-maker that can take actions, is why the security problems in this handbook do not map cleanly onto anything that came before, and why "we already do application security" is a necessary but insufficient answer. This chapter explains the shift, then shows how the rest of the handbook is built around it.

What "agentic" actually adds

Strip away the marketing and an AI agent is three things wired together: a language model that reasons, a set of tools it can call (search, code execution, APIs, payments), and a loop that lets it plan, act, observe the result, and act again toward a goal. Optionally it has memory that persists across turns and the ability to talk to other agents. Each of those is useful, and each removes a safeguard that classic software quietly relied on.

In ordinary software, the path from input to a dangerous action is written by a developer and reviewed. In an agentic system that path is decided by the model from whatever text is in front of it. To the model, there is no hard line between the instructions you gave it and the data it just read from a web page, a document, or another agent. Instructions and data share one channel. That is the root of most of what follows: an attacker who controls any text the agent reads is, in effect, writing part of its program.

Why classic AppSec is necessary but not sufficient

You still need everything you already do: authentication, least privilege, input handling, dependency hygiene, logging. None of it goes away. But three properties of agents break assumptions those controls were built on:

Non-determinism. The same input can produce different outputs, so a decision is not a fixed function but a weighted roll of the dice. A control that passes in testing can fail in production, and an attacker who can retry gets free re-rolls (see T48).

Untrusted input reaching trusted action. Because instructions and data share a channel, a poisoned document or web page can redirect what the agent does. This is the classic "confused deputy", now armed with tools. When an agent simultaneously has access to private data, exposure to untrusted content, and a way to send data out, it is structurally exploitable: the lethal trifecta.

Autonomy and scale. A wrong inference is no longer a wrong answer on a screen. It is a refund issued, a record deleted, a transaction signed, repeated across thousands of runs with no human to notice the one that went wrong. Autonomy turns model errors into real-world consequences at machine speed.

The attacker's mental model

An attacker against an agentic system is rarely breaking cryptography or exploiting a buffer. They are looking for any text the agent will read and treat as authoritative: a comment in a pull request the coding agent reviews, a field in a record the support agent summarises, a tool result, a message from another agent. From that foothold they aim to do one of a few things: make the agent act outside its intended scope, exfiltrate data it can see, corrupt what it remembers so future runs misbehave, or simply wear down a probabilistic control by asking again. The handbook's threat catalogue is, in effect, the catalogue of these moves.

The defender's mental model

The defining move of agentic defence is to stop trusting the model as an enforcement point. The model is a brilliant, manipulable, non-deterministic component; treat its output as a proposal, not a decision. The controls that actually hold a line are the deterministic ones around it: a policy gate at the tool boundary that re-checks every call, scoped and short-lived credentials so a compromised agent can reach little, sandboxes that bound what any action can touch, and human review on the consequences that matter. Layer them (defence-in-depth), because each layer is independently fallible, and give every agent the least authority its task needs so a bad decision has a small blast radius.

How to read this handbook

Everything here is one interpretive layer over the OWASP Agentic AI and MAS guides, MAESTRO, and MITRE ATLAS, assembled so it teaches rather than just maps. The pieces fit together like this:

Primers give you the vocabulary (agents, RAG, MCP, agent-to-agent, the lethal trifecta). If the terms above were unfamiliar, start there.
Threats are how agentic systems fail: 49 catalogued failure modes, each with how it happens, how to spot it, and how to fix it.
Principles are the design ideas that prevent whole classes of those failures: the "why" behind the controls.
Mitigations are what you actually deploy, tiered by how proven they are.
Playbooks sequence those controls into prevent / detect / respond programmes.
Case studies work three real systems end to end, so you can see the threats cluster in practice.
And the canvas lets you model your own architecture and have the engine surface the threats and recommend controls.

If you read nothing else, read one threat page top to bottom (T48 is a good first one) and one principle (Defence-in-Depth). Between them they contain the whole shape of the problem and the whole shape of the answer.