Primer

RAG

Retrieval-Augmented Generation is the pattern where a language model answers a question (or plans an action) using context fetched from an external store at inference time, rather than only what is encoded in the model's weights. The store is typically a vector database holding embeddings of documents the system can search over.

Why agentic systems lean on RAG heavily

Two reasons. First, agents need access to information that did not exist at training time (current customer data, current policies, current documents), so retrieval is the natural way to bring it in. Second, keeping organisational knowledge out of the model and inside a retrieval store is operationally simpler than fine-tuning, and gives finer-grained access control.

The OWASP document treats RAG as a core supporting service in its reference architecture rather than as a tool. The distinction matters: a tool is something the agent decides to call; the retrieval surface is consulted automatically as part of forming a response, often invisibly to the user.

Why this is a primary attack target

The retrieval surface is data that becomes part of the prompt. Anything that gets indexed into the store can subsequently steer the agent. Indirect prompt injection (instructions hidden inside an apparently benign document the user uploaded, or inside a web page the agent crawled) is the most common way this surface is abused. From the agent's perspective the malicious instruction looks like context, not user input.

Three classes of failure recur:

Knowledge poisoning. Adversarial content placed in the store is later retrieved and acted on. Persistent, because retrieval repeats across sessions.
Embedding-level attacks. Adversarial inputs designed to land near sensitive content in the embedding space, allowing retrieval that bypasses access control.
Cross-tenant leakage. Failure to apply per-tenant access control at the retrieval boundary causes one user's query to return another user's data.

The two paths

A RAG system has a write path (top: ingestion into the store) and a read path (bottom: retrieval into the prompt). Threats land where the two paths meet the store and the prompt assembly. Boxes are clickable to the threat pages.

How OWASP's catalog reflects RAG-specific risks

OWASP T1 Memory Poisoning covers the long-term memory and vector-store poisoning case explicitly. T5 Cascading Hallucinations applies because retrieved content gets quoted, cited, and reinforced through reflection loops. T12 Agent Communication Poisoning applies in multi-agent systems where one agent's retrieval feeds another agent's reasoning.

OWASP also notes that RAG-specific concerns overlap heavily with LLM08:2025 Vector and Embedding Weaknesses from the LLM Top 10. Helmwart cross-references that document rather than re-deriving its content.

Where this fits in Helmwart

Read this primer if your deployment uses RAG (almost all production agents do) and you want to understand why the retrieval boundary is treated as a primary threat surface in the catalog rather than an implementation detail.

Where to go next

T1 Memory Poisoning: the primary RAG-targeted threat, covering vector-store poisoning, adversarial embeddings, and cross-tenant leakage in full.
T5 Cascading Hallucination Attacks: how retrieved content amplifies model errors through reflection loops.
T8 Repudiation and Untraceability: why retrieval provenance must be logged, not just the model output.
T12 Agent Communication Poisoning: the multi-agent extension, where one agent's retrieval poisons another's reasoning.
Memory content validation and context isolation: the two mitigations that defend the write and read paths respectively.
Security principles: how Defence-in-Depth, Zero Trust, and Least Privilege compose against RAG-specific threats.
Agents primer: where the RAG retrieval path fits inside the full OWASP reference architecture.

Source: OWASP Agentic AI: Threats and Mitigations v1.1 (Dec 2025), §Reference Threat Model and §T1 Memory Poisoning.