T18: RAG Input Manipulation Leading to Policy Bypass

Definition

An attacker crafts an expense claim description that does not directly violate any policy rules but is semantically similar to approved claims in the vector store that should have been rejected. The Retrieval-Augmented Generation (RAG) system’s similarity search retrieves those flawed precedents, and the agent approves the new claim on that basis. The threat is distinct from T2 Tool Misuse, which involves direct commands; T18 manipulates the data used for retrieval rather than the tool invocations themselves.

What it looks like in practice

In a Robotic Process Automation (RPA) expense reimbursement pipeline, the agent queries a vector store of previously approved claims to inform its approval decision. An attacker submits a claim for a “business development lunch” at an unusually high cost. The description is semantically close to a batch of incorrectly approved extravagant-meal claims already in the store. The similarity search surfaces those flawed precedents; the agent treats them as evidence that comparable claims are acceptable and approves the new one.

A second variant exploits date-format ambiguity: claims with unusual date strings cause the agent to misread the policy applicability period and approve expenses that fall outside the authorised window.

Why it’s dangerous in multi-agent context

RAG-augmented agents do not re-evaluate policy rules from first principles on each request. They retrieve precedent. If past approvals were incorrect, or if the vector store has been seeded with examples that normalise borderline spending, the similarity mechanism actively works against the agent’s stated policy enforcement function. Autonomy compounds the risk: the agent acts on retrieved precedent without surfacing the retrieval decision to a human reviewer unless a separate threshold rule flags the claim amount. In a hierarchical multi-agent setup, a subordinate extraction agent may also normalise the description before the policy agent queries the vector store, amplifying the semantic drift.

Detection signals

Watch for RAG retrievals where the returned precedents semantically diverge from the current claim’s legitimate characteristics yet still drive an approval.

A retrieved document whose cosine similarity score to the query sits between 0.6 and 0.75 (close enough to surface, far enough to be suspicious) that is then cited as the sole basis for approval: log the similarity score alongside each approval decision and alert when it falls in this mid-range band for high-value claims.
An approval decision citing precedents all from the same submission window (i.e. a tight cluster of dates) rather than the normal spread across months. This temporal-clustering anomaly in the retrieved corpus suggests a seeded batch.
A sudden rise in the proportion of “business development” or open-category descriptors among approved claims relative to the rolling 30-day baseline. This is a content-drift metric on approved claim categories.
Claims with date-field values outside the standard ISO 8601 format passing the applicability check: log the raw date string from the claim payload and alert on any non-standard format reaching the approval decision.
Two or more retrievals within a short window that share near-identical embedding vectors (cosine similarity > 0.98) but reference different submitters. This pattern is consistent with a seeded-precedent batch from a single attacker.

Mitigations

Curate and audit the “approved examples” corpus used to seed the vector store; remove incorrectly approved precedents on discovery.
Apply a deterministic policy-rule re-validation step after retrieval to prevent retrieved precedent from overriding hard rules.
Enforce similarity-score thresholds: reject retrievals below a confidence floor rather than treating any match as authoritative.
Log retrieval results alongside the approval decision so auditors can reconstruct which precedents justified each claim.

Relation to base threat (T1–T17)

T18 extends T2 Tool Misuse. Where T2 targets tool invocations directly, T18 manipulates the data layer that informs retrieval, turning the RAG vector store into the attack surface. T49 (Semantic Drift in Expense Policy Embeddings) addresses the passive variant of the same surface: stale embeddings caused by operational neglect rather than adversarial shaping.

OWASP Top 10 for Agentic Applications 2026

The Agentic Top 10 (ASI01 through ASI10) is a separate practitioner-facing publication that maps onto the master Threats & Mitigations threat numbering. T18 is covered by the following Top 10 entries:

ASI06 Memory & Context Poisoning primary

An adversary writes malicious or misleading data into an agent's persistent memory or shared vector store, so that every future session, and every peer agent reading from the same store, operates on corrupted context. The defining difference from single-turn injection (ASI01) is that the poisoned data survives session reset; the agent's reasoning drifts without any new attacker input.

OWASP LLM Top 10: LLM01:2025 LLM04:2025 LLM08:2025
ASI01 Agent Goal Hijack contributing

An attacker manipulates an agent's objective, task selection, or decision pathway (via injected prompts, deceptive tool outputs, forged peer messages, or poisoned retrieval data) so that the agent pursues the attacker's goal rather than the operator's. Unlike a single-turn injection, the harm compounds across many authorised steps before any drift is visible.

OWASP LLM Top 10: LLM01:2025 LLM06:2025

Source: OWASP Top 10 for Agentic Applications 2026 (Dec 2025) · the Top 10 is a compass into the master Threats & Mitigations taxonomy, not a replacement for it.

Design principles at stake

When T18 is present, these security design principles are the ones being violated or tested. Each links to the full principle; the mitigations below are how you restore them.

Defence-in-Depth The RAG system's own similarity mechanism becomes the attack vector: flawed precedents in the vector store are retrieved and treated as authoritative, so the agent approves claims it should reject without any individual component behaving incorrectly. Depth means the retrieval result cannot be the final word: a deterministic policy-rule re-validation step is applied after retrieval so that retrieved precedent cannot override hard rules, similarity-score thresholds reject retrievals below a confidence floor rather than treating any match as authoritative, and retrieval results are logged alongside each approval decision so that a chain of flawed precedents is visible to auditors. Auditing and curating the approved-examples corpus removes the poisoned foundation before new claims can exploit it.
Memory & RAG Integrity The vector store is the agent's long-term reference memory for policy enforcement, so incorrectly approved precedents seeded into it have the same effect as memory poisoning: future decisions inherit the false standard without any further attacker intervention. Integrity controls must treat every write to the approved-examples corpus as security-relevant: provenance tags on every ingested chunk, a staging-review step before new examples become eligible for retrieval, and content-hash verification on read so that tampered records are detected. The deterministic post-retrieval re-validation step is the fail-safe that holds even when the store's integrity is imperfect.

Recommended mitigations

Auto-generated from the mitigation catalog: every mitigation whose coverage map includes T18, sorted by maturity tier (Tier 1 production-canonical first, then Tier 2, then Tier 3 research-stage).

Tier 2 Shared-memory ACL (Shared-memory ACL — per-agent, per-namespace read/write access control on shared vector stores)

When multiple agents share a single vector store, the access boundaries between them are not enforced by the store itself unless you configure them explicitly. Without per-namespace write and retrieval controls, an agent that can write to the shared corpus can insert crafted vectors into any namespace it can reach, and any agent that can query the store can retrieve another agent's confidential documents through embedding-space proximity. Shared-memory ACL addresses this by tagging every vector with a principal identifier at write time and filtering every retrieval query to the requesting agent's namespace, enforced at the gateway layer where the agent cannot bypass it.

why it helps RAG Input Manipulation requires write access to the shared retrieval corpus. Per-namespace write ACL removes that access from agents whose identity does not map to the target namespace, so a manipulated agent cannot insert adversarial vectors into a downstream agent's retrieval path.
Tier 2 Vector ACL (Permission-aware vector retrieval — ACLs at the retrieval boundary)

A vector store returns results by embedding-space proximity, not by who is asking. Without a per-principal filter applied before similarity ranking, a query from tenant A can surface tenant B's vectors if the embeddings are close enough. Vector ACL closes that gap: every retrieval call is scoped to the requesting principal's namespace or payload partition before the store ranks any results, so cross-principal hits are structurally impossible rather than merely unlikely.

why it helps T18 RAG Input Manipulation describes an attacker injecting adversarial vectors into a shared retrieval corpus to influence a target agent's context. Scoping every query to the requesting principal's namespace means the attacker's injected vectors must reside in that same namespace to reach the agent, which requires write access to the target's namespace rather than to any shared partition.

Multi-agent variants: OWASP MAS Guide

The OWASP OWASP MAS Threat Modelling Guide v1.0 catalogues 1 named multi-agent variant of T18, anchored to specific MAESTRO layers. Each is a concrete attack pattern that emerges when this threat compounds across agents.

CL RAG Manipulation / Semantic Drift / Repudiation Cascade extends T18, T49, T8

Adversary poisons a shared RAG store (T18); the injected context gradually shifts agent reasoning over time (T49); because logs are sparse or selectively pruned, the drift cannot be attributed after the fact (T8). Cross-layer: L2 data store, L3 agent reasoning, L5 observability.

Source: OWASP MAS Threat Modelling Guide v1.0, §2 Overview of MAESTRO Framework — Extended Threat Scenarios + Cross-Layer table.

Red-team pivot: MITRE ATLAS techniques

MITRE ATLAS catalogues adversary techniques against AI systems. Where this OWASP threat has an attacker-perspective counterpart, the ATLAS technique is shown below. That is what a red team would actually be doing on the wire. Use this for detection-signal anchoring, threat-hunting hypotheses, and IR runbooks. Source: mitre-atlas/atlas-data v5.6.0.

AML.T0051.001 LLM Prompt Injection: Indirect view on ATLAS ↗

Adversary injects prompts via a separate data channel ingested by the LLM (databases, websites, documents) rather than directly in user input.

Agentic angle: Primary injection vector for RAG-backed agents: malicious text in retrieved context becomes instructions the model follows silently.

AML.T0070 RAG Poisoning view on ATLAS ↗

Adversary injects malicious content into documents indexed by a retrieval-augmented generation system so future queries surface attacker-controlled context.

AML.T0080 AI Agent Context Poisoning view on ATLAS ↗

Adversary contaminates an agent's context store (short-term scratchpad, vector memory, conversation history) so future reasoning is biased toward attacker goals.

Agentic angle: Persistent across sessions: a single successful poisoning influences every later decision until the memory is purged.

References

OWASP MAS Threat Modelling Guide v1.0 (April 2025) §3 RPA Expense Reimbursement Agent — Layer 2 Data Operations.

Sources

OWASP-MAS-Guide ↗ · 1.0 (Apr 2025) · §3 RPA Expense Reimbursement Agent — Layer 2 Data Operations