T28: RAG Data Exfiltration

Definition

An attacker gains access to the vector database used by the Retrieval-Augmented Generation (RAG) pipeline. In the ElizaOS context this is a direct database breach; in the Anthropic MCP context it is access via an MCP server Resource endpoint. The vector store may contain proprietary knowledge, sensitive operational data, training examples, or domain-specific embeddings, all representing significant organisational intellectual property or privacy-relevant content.

What it looks like in practice

In the ElizaOS context: an attacker exploits misconfigured access controls on the shared vector database to connect directly, bypassing any agent-layer authentication. They run a bulk similarity query against the entire collection and download all stored embeddings alongside the source documents they index: trade research, wallet interaction history, and internal strategy notes.

In the Anthropic MCP context: the MCP server exposes a Resources primitive endpoint that allows connected clients to retrieve embedded knowledge base content. An attacker who has obtained MCP client credentials (via T40 MCP Client Impersonation) iterates over the resource namespace, retrieving the full contents of the vector store in batches. Because the Resource endpoint is designed for data retrieval, the access pattern is indistinguishable from normal agent operation without query-rate or volume anomaly detection.

Why it’s dangerous in multi-agent context

RAG vector stores are typically treated as infrastructure rather than as data assets with their own sensitivity classification and access controls. As agents encode organisational knowledge into embeddings (policies, internal procedures, customer interaction patterns), the vector store accumulates a comprehensive intelligence asset. A single breach exfiltrates that asset in its entirety. T43 (Network Exposure of MCP Server) compounds the risk: an MCP server exposed on an unauthorised network makes the vector store reachable to any attacker who can reach the server port.

Detection signals

Exfiltration of a vector store involves bulk retrieval (high query counts, large result sets, or systematic iteration over the namespace), which stands out sharply against the sparse, targeted queries an agent normally issues.

A single client identity issuing more than N similarity queries in a rolling 60-second window (where N is derived from the 99th-percentile of the legitimate agent’s normal query rate). Raise a query-rate threshold alert on the vector database’s access log.
A query that requests top_k results well above the agent’s configured retrieval limit (e.g. top_k = 1000 when the agent normally requests 5–10). Log the top_k parameter on every query and alert on values above the expected ceiling.
A client identity not registered in the vector database’s authorised agent list successfully authenticating and issuing any query. Wire a first-seen-client alert to the authentication log.
In the MCP context, a Resource endpoint request iterating sequentially over consecutive namespace URIs (e.g. resource://kb/doc-0001, resource://kb/doc-0002 …). Deploy a sequential-URI-access pattern detector on the MCP server’s request log.
Outbound data volume from the vector database host exceeding the 95th-percentile baseline for any 5-minute window. A byte-volume anomaly alert can catch bulk document download even when individual query counts appear normal.

Mitigations

Apply authentication and authorisation at the vector database level, scoped per collection or namespace; do not rely solely on agent-layer access controls.
Log all query operations with client identity, query vector, and result count; alert on bulk retrieval patterns (high result counts, unusual query frequency from a single client).
In the MCP context, enforce Resource endpoint access controls that restrict which namespaces each client can retrieve.
Classify the vector store as a sensitive data asset and apply the same data-at-rest encryption and access-review cycle as the source documents it indexes.

Relation to base threat (T1–T17)

T28 extends T1 Memory Poisoning. Where T1 is a write-path attack (injecting malicious content into the store), T28 is the read-path complement: exfiltrating the store’s contents without modification. T27 (Vector Database Poisoning with Malicious Smart Contract Data) targets the same vector store surface via the write path.

OWASP Top 10 for Agentic Applications 2026

The Agentic Top 10 (ASI01 through ASI10) is a separate practitioner-facing publication that maps onto the master Threats & Mitigations threat numbering. T28 is covered by the following Top 10 entries:

ASI06 Memory & Context Poisoning contributing

An adversary writes malicious or misleading data into an agent's persistent memory or shared vector store, so that every future session, and every peer agent reading from the same store, operates on corrupted context. The defining difference from single-turn injection (ASI01) is that the poisoned data survives session reset; the agent's reasoning drifts without any new attacker input.

OWASP LLM Top 10: LLM01:2025 LLM04:2025 LLM08:2025

Source: OWASP Top 10 for Agentic Applications 2026 (Dec 2025) · the Top 10 is a compass into the master Threats & Mitigations taxonomy, not a replacement for it.

Design principles at stake

When T28 is present, these security design principles are the ones being violated or tested. Each links to the full principle; the mitigations below are how you restore them.

Defence-in-Depth The vector store is treated as infrastructure rather than as a sensitive data asset, so the agent authentication layer is often the only gate; bypassing it (as an attacker does via a misconfigured collection endpoint or a stolen MCP client credential) leaves nothing else standing. Depth here means authentication and authorisation enforced at the database level independently of the agent layer, query-rate and bulk-retrieval alerting that fires even when the access pattern looks normal, and data-at-rest encryption so that raw storage access yields ciphertext rather than embeddings. Each control covers a different attack path (direct database breach, MCP Resource endpoint abuse, and storage-layer compromise), so defeating one still leaves the others in place.
Data Minimization & Privacy The RAG pipeline accumulates organisational knowledge (policies, procedures, customer interaction patterns) into a single corpus that is rarely classified or access-scoped, meaning every connected agent can retrieve far more than any individual task requires. This over-collection turns the vector store into a single high-value target: one bulk similarity query exfiltrates the entire intelligence asset in one pass. Scoping collection access per namespace, enforcing per-client Resource endpoint restrictions in the MCP context, and applying the same sensitivity classification to embeddings as to source documents are the minimisation controls that reduce the extractable surface even when authentication is bypassed.

Recommended mitigations

Auto-generated from the mitigation catalog: every mitigation whose coverage map includes T28, sorted by maturity tier (Tier 1 production-canonical first, then Tier 2, then Tier 3 research-stage).

Tier 2 Egress DLP (Output egress DLP — inspection gate for PII, secrets, and IP at the agent boundary)

An agent produces output continuously across multiple channels: user-facing responses, tool-call parameter envelopes, log records, and outbound HTTP requests. Any of those channels can carry sensitive content the agent has retrieved, been fed, or been tricked into including. Output egress DLP places an inspection gate at the boundary so that PII, credentials, and proprietary content are classified and either redacted or quarantined before they leave the trust boundary, regardless of how they got into the output.

why it helps RAG data exfiltration is the leakage of retrieved confidential documents through the agent's response to an external caller. DLP classification at the egress seam identifies and quarantines content that matches confidential-data patterns before the response is delivered.
Tier 2 Mem validate (Memory content validation — a write-boundary gate on what enters the agent's memory store)

An agent's memory store is a persistent surface: anything written to it can be retrieved by any agent, in any session, for the lifetime of the corpus. Memory poisoning exploits that persistence by writing adversarial content that steers the agent's reasoning long after the attacker has gone. Write-boundary validation prevents this by running every candidate memory write through schema, policy, and provenance checks before it is committed. Content that fails any gate is rejected and never reaches the store.

why it helps RAG data exfiltration is partially addressed at the write boundary by requiring every document written to memory to carry a verifiable provenance tag. Those tags are auditable at retrieval time and support forensic investigation of what was accessed and by whom. Write-boundary validation does not prevent retrieval-side exfiltration; pair with m-vector-acl for the retrieval boundary.
Tier 2 Shared-memory ACL (Shared-memory ACL — per-agent, per-namespace read/write access control on shared vector stores)

When multiple agents share a single vector store, the access boundaries between them are not enforced by the store itself unless you configure them explicitly. Without per-namespace write and retrieval controls, an agent that can write to the shared corpus can insert crafted vectors into any namespace it can reach, and any agent that can query the store can retrieve another agent's confidential documents through embedding-space proximity. Shared-memory ACL addresses this by tagging every vector with a principal identifier at write time and filtering every retrieval query to the requesting agent's namespace, enforced at the gateway layer where the agent cannot bypass it.

why it helps RAG Data Exfiltration depends on embedding-space proximity returning documents the requesting agent was not authorised to read. Retrieval-side namespace filtering is the structural control: a query filtered to the requesting principal's namespace cannot return records written to a different namespace.
Tier 2 Vector ACL (Permission-aware vector retrieval — ACLs at the retrieval boundary)

A vector store returns results by embedding-space proximity, not by who is asking. Without a per-principal filter applied before similarity ranking, a query from tenant A can surface tenant B's vectors if the embeddings are close enough. Vector ACL closes that gap: every retrieval call is scoped to the requesting principal's namespace or payload partition before the store ranks any results, so cross-principal hits are structurally impossible rather than merely unlikely.

why it helps T28 RAG Data Exfiltration relies on a principal retrieving vectors from a partition they should not access. Per-principal ACL at the retrieval boundary prevents this structurally: a query is scoped to the requesting principal's namespace before similarity ranking, so vectors from other namespaces are never ranked or returned, even when the query embedding is highly similar to their content.

Multi-agent variants: OWASP MAS Guide

The OWASP OWASP MAS Threat Modelling Guide v1.0 catalogues 1 named multi-agent variant of T28, anchored to specific MAESTRO layers. Each is a concrete attack pattern that emerges when this threat compounds across agents.

CL Cross-Client Inference Interference extends T42, T45, T28

Shared inference infrastructure (T42: side-channel) allows a tenant to observe timing or cache patterns from a co-tenant session (T45: session isolation failure); incomplete sandboxing (T28) lets the attacker infer private prompt content across clients.

Source: OWASP MAS Threat Modelling Guide v1.0, §2 Overview of MAESTRO Framework — Extended Threat Scenarios + Cross-Layer table.

Red-team pivot: MITRE ATLAS techniques

MITRE ATLAS catalogues adversary techniques against AI systems. Where this OWASP threat has an attacker-perspective counterpart, the ATLAS technique is shown below. That is what a red team would actually be doing on the wire. Use this for detection-signal anchoring, threat-hunting hypotheses, and IR runbooks. Source: mitre-atlas/atlas-data v5.6.0.

AML.T0085 Data from AI Services view on ATLAS ↗

Adversary collects data from AI service interfaces. Sub-technique .000 (RAG Databases) names retrieval-augmented generation stores; .001 (AI Agent Tools) names tool-call data.

AML.T0086 Exfiltration via AI Agent Tool Invocation view on ATLAS ↗

Adversary exfiltrates data by chaining the agent's legitimate tools (e.g. read-only DB query plus an outbound email tool), neither of which is alarming on its own.

Agentic angle: Each step looks routine in audit logs; the *combination* is the attack.

AML.T0012 Valid Accounts view on ATLAS ↗

Adversary obtains and abuses legitimate user or service credentials for initial access, persistence, privilege escalation, or defence evasion.

Agentic angle: Agents often run under long-lived service accounts whose blast radius exceeds the original task scope.

References

OWASP MAS Threat Modelling Guide v1.0 (April 2025) §4 ElizaOS — Layer 2 Data Operations; §5 Anthropic MCP — Layer 2 Data Operations.

Sources

OWASP-MAS-Guide ↗ · 1.0 (Apr 2025) · §4 Eliza OS — Layer 2 Data Operations; §5 Anthropic MCP — Layer 2 Data Operations