← Atlas · Mitigations Tier 2 · Real-composable

MITIGATION · m-context-isolation

Context isolation — separate untrusted content from system instructions

An LLM processes everything in its context window as a single stream of tokens; it has no innate ability to tell instructions apart from data. If an attacker can place content where the model treats it as instruction, they control the agent. Context isolation prevents that by structurally separating untrusted content from system instructions at prompt construction time, so the boundary is enforced before the model ever sees the input.

Last reviewed 2026-05-12 · Status: published · Evidence →

At a glance

MATURITY

Tier 2

Available off-the-shelf or as a documented pattern, but newer or less broadly proven. Expect integration work and some operational nuance.

PLACES ON

node

Restricted to node kinds: agent

COVERAGE

2 threats

T1 · T6

TRADE-OFFS

LAT

low

COST

low

DEV

medium

Latency · cost · UX friction · dev effort.

TL;DR

An LLM has no innate ability to distinguish instructions from data; that boundary must be enforced by how the prompt is constructed, not by the model's judgment at runtime.
Every untrusted input (user message, retrieved document, tool output, peer-agent reply) must land in a structurally distinct, labelled segment that the system prompt explicitly marks as data only.
Three composable patterns cover the main implementation surface: role-based segmentation (system vs user/tool roles), tagged delimiters (XML tags or --- separators), and channel separation (distinct labelled segments per input origin).
This is a structural control, not a classifier. It does not attempt to detect every injection variant; it reduces the attack surface by making it architecturally harder for injected content to reach instruction-level processing.

How it behaves

Agent receives external input (user message, retrieved document, tool output, peer-agent reply)

Is the input placed in a structurally isolated segment (role, XML tag, or labelled channel) that the system prompt instructs the model to treat as data only?

Model reasons over content; injection attempt cannot rewrite instructions

Structural violation, untrusted content may be interpreted as instruction

Apply wrapping at every untrusted-content entry path, user message, RAG retrieval, tool result, and inter-agent message each need their own labelled segment.

What it is

A large language model processes its context window as a flat sequence of tokens. There is no hardware boundary between the segment containing system instructions and the segment containing a retrieved document or a user message; the model must infer which is which from positional and syntactic cues. That inference can be defeated. An attacker who can place content where the model treats it as instruction can redirect the agent's behaviour without having any access to the system prompt itself.

Context isolation addresses this by enforcing the distinction structurally, at prompt construction time, before the model sees the input. Untrusted content is placed in a named, delimited segment and the system prompt instructs the model to treat anything inside that segment as data to reason over, not commands to follow. The model still reads the content; it is not filtered out. What changes is the trust level the model is told to assign it.

This is a structural control, not a classifier. It does not attempt to recognise and block every injection variant. It reduces the attack surface by making it architecturally harder for injected instructions to reach instruction-level processing. Three implementation patterns appear in production agentic systems, and they compose well:

Role-based segmentation. Chat Completions APIs assign each message a role: system, user, assistant, tool. Attacker-controlled content belongs in the user or tool role, never system. The system prompt explicitly instructs the model to refuse instruction-shaped patterns that appear in those roles. OpenAI's prompt-engineering guidance calls this the "chain of command" hierarchy; Anthropic and Azure OpenAI document the same pattern.
Tagged delimiters. Wrap untrusted content in explicit markers (XML tags such as <document>…</document>, <tool_result>…</tool_result>, or separator lines) and instruct the system prompt that text inside those markers is data only. Anthropic recommends XML-tag wrapping as the canonical pattern for Claude. Azure OpenAI's guidance recommends --- separators and uppercase section headings to structurally separate grounding data from instructions.
Channel separation. Different origins of untrusted content (user message, retrieved document, tool output, peer-agent reply) occupy distinct labelled segments so that content provenance is preserved into the model's reasoning trace. The OWASP LLM Top 10 v2025 LLM01 entry names this the "segregate external content from user prompts" recommendation.

Context isolation is one of three structural controls for preventing reasoning manipulation. The complementary controls are input sanitisation and intent attestation. Sanitisation runs first (cleans inbound content), then isolation (separates instruction from data), then intent attestation (binds output to declared user intent). No single control in the three is sufficient on its own.

Detection signals

Untrusted-content segment size. A segment exceeding its authored size limit indicates the channel is being used as an injection surface.
Verbatim strings from untrusted segments appearing in LLM output. Indicates the model treated data as instruction and echoed or acted on it.

Threats it covers

T1 Memory Poisoning −1 severity step

WHY IT HELPS Indirect prompt injection plants instruction-shaped content in a source the agent retrieves later (memory, a RAG document, or a tool result) so it executes when the agent reads it rather than when a user sends it. Structural isolation places every retrieved document in a labelled segment the system prompt marks as data only, so injection content in that segment cannot be interpreted as instruction.
T6 Intent Breaking and Goal Manipulation −1 severity step

WHY IT HELPS Direct prompt injection submits instruction-shaped content through the normal user input channel with the intent to override the system prompt or extract restricted behaviour. Constraining user input to a non-instruction context segment removes the structural path by which that content could reach instruction-level processing.

Principle coverage

Defence-in-Depth stage: Prevent — and it advances:

Assume Breach Assume Breach plans for an agent operating in an environment that is already partially compromised. Context isolation limits the blast radius by ensuring that hostile content planted in any channel the agent reads cannot reach instruction-level processing, so a compromised upstream source does not automatically translate into a compromised agent.
Sandboxing & Isolation Sandboxing confines an agent's execution environment; context isolation confines the trust level of content inside that environment. The two work at different layers: sandboxing restricts what the agent can reach externally, while context isolation restricts what the agent treats as authoritative internally.
Provenance & Trust-tagging Context isolation assigns a trust level to content at the point it enters the prompt, preserving that assignment into the model's reasoning. The labelled segment structure is itself a provenance record: it records the origin of each piece of content and the trust level the system declared for it at ingest.
The Lethal Trifecta The lethal trifecta requires an agent with broad permissions, network egress, and access to sensitive data to receive and act on attacker-controlled instructions. Context isolation removes the instruction pathway: untrusted content that could carry those instructions is confined to a data segment the model is told not to act on.

Design & governance principles (open design, economy of mechanism, accountability, …) are architectural, not advanced by a single placed control.

Implementation options

Four implementation options covering the structural, managed-detection, and architectural layers. The role/delimiter patterns are self-build with no additional infrastructure; Prompt Shields adds a managed classifier layer on top; the dual-LLM pattern provides the strongest architectural separation for high-risk pipelines.

Anthropic XML-tag wrapping Place untrusted content inside named XML tags in the user role and instruct the system prompt to treat anything inside those tags as data only. Anthropic documents this as the canonical untrusted-content pattern for Claude.

Why choose it: Best when your stack targets the Anthropic Messages API or any OpenAI-compatible Chat Completions API. Zero additional infrastructure, the isolation is in how you construct the messages array. The system role holds instructions; user/tool roles hold all external content, further wrapped in named XML tags (e.g. <document>, <tool_result>, <user_message>) so the model has explicit data-vs-instruction markers even within the user role.

More details:

Microsoft Prompt Shields REST endpoint in Azure AI Content Safety that independently classifies user prompts and grounding documents for injection attacks before the LLM call. Returns per-segment attackDetected booleans for userPromptAnalysis and documentsAnalysis.

Why choose it: Best as the detective complement to structural isolation. Prompt Shields catches injection attempts that slip past role/tag boundaries, the managed classifier adds roughly 50 to 200 ms per call. The document-attack detection mode is specifically designed for RAG pipelines where grounding documents are the injection surface. Pair with structural role/tag isolation; do not use as the sole control.

More details:

Dual-LLM pattern Run two separate model instances: a Privileged LLM that holds instructions and has tool access, and a Quarantined LLM that processes untrusted external content but has no tool access. A controller mediates between them and never forwards quarantined output directly to the Privileged LLM as instruction.

Why choose it: Best when the threat model requires the strongest available architectural separation, for example, an email-reading agent that may receive adversarially crafted messages. Structural role/tag isolation can be bypassed if the model follows instructions inside a tag; the dual-LLM pattern removes that residual risk by ensuring the quarantined model cannot call tools or affect the privileged context. The cost is increased complexity, latency (two model calls per turn), and reduced capability. No off-the-shelf product implements this for agentic pipelines; it is a self-build architectural pattern.

More details:

The dual LLM pattern, Simon Willison (2023) ↗

Per-origin channel separation A custom wrapping function that strips framework control tokens, normalises Unicode, trims untrusted content to a size limit, and wraps each input with a source-attributed XML tag before insertion into the message array. Different input origins (user upload, RAG corpus, tool result, peer agent) get distinct labelled segments, not a single generic untrusted tag.

Why choose it: The only option that fully implements channel separation with per-origin provenance. Required when you need audit trails that preserve content origin into the LLM reasoning trace, or when you need to enforce per-source size limits. Should be the default implementation pattern regardless of which classifier layer sits on top.

More details:

Trade-offs

Structural wrapping (role separation, XML tags) is sub-millisecond with no perceptible latency; it is the cheapest layer to deploy.
Adding Prompt Shields adds roughly 50 to 200 ms per call and a small per-document cost; budget for this in latency-sensitive RAG pipelines.
The dual-LLM pattern adds a full second model call per turn; reserve it for pipelines where the injection surface justifies the latency and capability cost.
The main development effort is not the wrapping code itself, it is enumerating every entry path where untrusted content can reach the LLM context across a multi-agent system; missed entry paths are missed controls.

When NOT to use

Do not apply context isolation to agents that consume only first-party, fully trusted structured data, there is no untrusted-content surface to partition.
Do not apply where the system prompt itself is user-supplied (fully open-ended chat with user-controlled instructions), there is no trusted system segment to protect.
Do not treat structural isolation as a sufficient sole control for high-stakes deployments; role-boundary-crossing injection is documented and no structural fix is complete.

Limitations

Context isolation does not defend against fine-tuned models that follow instructions regardless of segment role, adversarial prompts that survive role boundaries are documented in Greshake et al. 2023 and remain an active research problem.
Prompt Shields document-attack detection applies a classifier; it may miss novel injection variants and may flag legitimate documents. False-positive rates are deployment-specific and require calibration.
The dual-LLM pattern is vulnerable to social engineering: a user could be induced to copy quarantined output back into the privileged context, defeating the architectural separation.
Context isolation does not protect against attacks where the untrusted content itself is the target of exfiltration, an agent that summarises private memory back to the user is an output-edge problem, not an input-edge one.

Maturity tier reasoning

Tier 2 fits because the structural patterns (role separation, XML-tag wrapping) are implemented natively by every major LLM vendor and documented in their production guidance; the wrapping code is straightforward application development.
What keeps this out of Tier 1 is the absence of an industry-standard schema for tagging untrusted content. Every vendor uses different markers, no published benchmark scores the pattern consistently across providers, and the residual gap of instruction-following across role boundaries is an active research area.
Prompt Shields is production-available in Azure AI Content Safety; the dual-LLM pattern is an architectural pattern with no off-the-shelf implementation for agentic pipelines.

Last verified against upstream docs: 2026-05-30.

PLACEMENT

On the canvas, this control can be placed on:

node

Valid node kinds: agent

Place it on the canvas →

MAESTRO LAYERS

ATLAS TECHNIQUES

AML.T0051 LLM Prompt Injection
Adversary crafts prompt content (direct or indirect via documents, web pages, tool outputs) so the model interprets attacker text as instructions and acts on it.
AML.T0065 LLM Prompt Crafting
Adversary engineers prompt content to maximise the model's likelihood of taking a specific attacker-favourable action. This is the precursor to most prompt-based attacks.
AML.T0080 AI Agent Context Poisoning
Adversary contaminates an agent's context store (short-term scratchpad, vector memory, conversation history) so future reasoning is biased toward attacker goals.

ATLAS MITIGATIONS

AML.M0030 Restrict AI Agent Tool Invocation on Untrusted Data
Prevent agent tool calls from being driven by attacker-controllable content: block or sanitise prompt injection on every untrusted input.
AML.M0031 Memory Hardening
Trust boundaries and secure write paths around agent memory so attacker-controlled content cannot persist or be replayed as instruction.
AML.M0032 Segmentation of AI Agent Components
Define hard security boundaries around tools, data sources, and inter-agent channels so a compromise in one component does not propagate.

TRADE-OFFS

latency low
cost low
ux friction low
dev effort medium

PLAYBOOKS

OWASP v1.1 playbook that recommends this control:

P1 Preventing AI Agent Reasoning Manipulation