← Mitigation · m-context-isolation

EVIDENCE TRAIL

Context isolation — separate untrusted content from system instructions

Verbatim excerpts from the upstream sources cited on the mitigation page, with what each source does and does not prove. The title "context isolation" is Helmwart's normalised label — no upstream document uses the phrase verbatim. The strongest upstream anchor is OWASP Top 10 Agentic 2026 §ASI01, which states that "agents and the underlying model cannot reliably distinguish instructions from related content" — the root cause this control addresses structurally.

Last cross-checked against upstream sources: · 8 sources

References

Each entry shows what the source supports and what it does not prove.

Reference 1
Version 2026 · published December 2025

OWASP Top 10 for Agentic Applications 2026

§ASI01 Agent Goal Hijack — Description

"Due to inherent weaknesses in how natural-language instructions and related content are processed, agents and the underlying model cannot reliably distinguish instructions from related content."

Supports: Provides the root-cause rationale for context isolation: the LLM cannot natively separate instruction from data, so the partition must be enforced structurally by the developer. The strongest upstream statement of why this control exists.

Does not prove: Does not prescribe the specific structural mechanism (role-based segmentation, XML tags, or channel labelling). Mitigation guideline 3 ("Define and lock agent system prompts") is adjacent but stops short of naming the isolation pattern explicitly.

Reference 2
Version 2026 · published December 2025

OWASP Top 10 for Agentic Applications 2026

§ASI01 Agent Goal Hijack — Prevention and Mitigation Guidelines, item 1

"Treat all natural-language inputs (e.g., user-provided text, uploaded documents, retrieved content) as untrusted. Route them through the same input-validation and prompt-injection safeguards defined in LLM01:2025 before they can influence goal selection, planning, or tool calls."

Supports: Mandates treating all external natural-language inputs as untrusted — the precondition for any context-isolation scheme. Establishes the threat model this control defends against.

Does not prove: Focuses on validation routing rather than structural segmentation of the prompt itself. Does not specify role-based or XML-tag partitioning as the implementation mechanism.

Reference 3
Version 2026 · published December 2025

OWASP Top 10 for Agentic Applications 2026

§ASI06 Memory & Context Poisoning — Description

"Ingestion sources such as uploads, API feeds, user input, or peer-agent exchanges may be untrusted or only partially validated."

Supports: Names the entry-point categories (uploads, API feeds, user input, peer-agent exchanges) that context isolation must cover. Validates the MDX's claim that T1 Memory Poisoning is blunted by structural separation of untrusted memory-ingest paths.

Does not prove: ASI06 mitigations focus on memory validation, snapshots, and trust scoring — not on prompt-construction partitioning. Context isolation addresses the ingestion boundary but does not solve persistent memory corruption once data is stored.

Reference 4
v1.1 · published December 2025

OWASP Agentic AI — Threats & Mitigations v1.1

§T1 Memory Poisoning — Threat Description (threat table, page 16)

"Memory Poisoning involves exploiting an AI's memory systems, both short and long-term, to introduce malicious or false data and exploit the agent's context. This can lead to altered decision-making and unauthorized operations."

Supports: Confirms the MDX's T1 coverage claim and characterises the attack class that structural context isolation limits: untrusted content reaching the context window and being interpreted as instruction.

Does not prove: T1's listed mitigations ("memory content validation, session isolation, anomaly detection") are complementary, not identical to prompt-construction partitioning. The document does not name XML-tag or role-based segmentation as a T1 countermeasure.

Reference 5
Published July 2024

NIST AI 600-1 — Generative AI Profile (NIST AI RMF)

§2.9 Information Security

"prompt injection involves modifying what input is provided to a GAI system so that it behaves in unintended ways. In direct prompt injections, attackers might craft malicious prompts and input them directly to a GAI system, with a variety of downstream negative consequences to interconnected systems. Indirect prompt injection attacks occur when adversaries remotely (i.e., without a direct interface) exploit LLM-integrated applications by injecting prompts into data likely to be retrieved."

Supports: Authoritatively names both direct and indirect prompt injection as an Information Security risk for GAI systems. Establishes the threat class that context isolation defends against within the NIST AI RMF framework.

Does not prove: NIST AI 600-1 describes the threat but provides no action item for structural prompt separation. The MDX cited MS-1.1 and MG-3.2 as covering "structural prompt separation" — this is a misattribution. MS-1.1 actions cover content-provenance measurement; MG-3.2 actions cover explainable AI and content filtering for continuous improvement. Neither section addresses context isolation or prompt partitioning. Corrected to §2.9 which is where the document actually discusses prompt injection.

Reference 6
arxiv:2302.12173 · February 2023

Greshake et al. 2023 — "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection"

Abstract / core finding

"LLM-Integrated Applications blur the line between data and instructions."

Supports: Independent peer-reviewed evidence that the data/instruction boundary blur is a real, exploitable vulnerability in production systems — the precise weakness this control closes. Demonstrates that attacks succeed because retrieved external data is processed identically to user instructions.

Does not prove: Published before role-based segmentation was widely available in chat-completion APIs. Does not evaluate whether role-based or XML-tag isolation reduces attack success rate. Notes attacks against Bing Chat and code-completion engines; generalisability to current agentic frameworks is inferred, not measured.

Reference 7
Continuously updated; accessed 2026-05-29

Anthropic Prompt Engineering Documentation — Use XML tags

No verbatim excerpt pulled yet — open the original to verify the cited section.

Supports: Production-canonical guidance from Anthropic for Claude API customers, recommending XML-tag wrapping (e.g., <document>, <example>) as the standard pattern for marking untrusted or external content in the prompt. Directly implements the "tagged delimiters" variant of context isolation.

Does not prove: Vendor-specific guidance for Claude; does not benchmark attack-success reduction rates. Does not cover OpenAI, Gemini, or open-source model contexts. Page content was not directly extractable at time of cross-check (redirect to consolidated docs).

Reference 8
ATLAS catalogue (continuously updated)

MITRE ATLAS — AML.M0030, AML.M0031, AML.M0032

No verbatim excerpt pulled yet — open the original to verify the cited section.

Supports: ATLAS mitigations cited in the MDX frontmatter as applicable to AML.T0051 (Prompt Injection), AML.T0065, and AML.T0080. Provides the ATLAS-framework framing for the threat surface this control operates against.

Does not prove: ATLAS mitigation pages for AML.M0030, AML.M0031, and AML.M0032 returned 404 at time of cross-check — individual IDs could not be verified. The ATLAS catalogue URL structure may have changed. Verbatim descriptions cannot be confirmed and are omitted.