Memory & RAG Integrity · Principles

Why it matters for agentic AI

An LLM’s weights are read-only at inference time. Its context window is ephemeral, cleared at session end. But an agent’s persistent memory store is neither: it is writable across sessions, shared across interactions, and consulted as authoritative context in future reasoning. The attack surface this creates is directly related to Provenance Trust Tagging, which determines whether the origin of a memory entry can be traced and challenged. This creates a write surface that did not exist in static language models and that has fundamentally different security properties from the other attack surfaces in agentic systems. A classic prompt injection lasts one session; a successful memory poisoning attack persists indefinitely, influencing every future session that retrieves the tainted entry, without the attacker being present for any of them.

The attack mechanism does not require write access in the usual sense. MINJA demonstrated that an attacker who can cause an agent to process malicious content can indirectly trigger a memory write through the agent’s own consolidation behaviour. The agent “learns” from the interaction and stores a false fact. On subsequent sessions it retrieves that fact with the same confidence as verified knowledge, because the storage layer has no way to distinguish between entries written from legitimate observations and entries written from adversarially-induced ones. The attack becomes self-reinforcing when retrieved content triggers re-consolidation: the poisoned entry gains authority each time it is accessed.

In multi-agent systems the threat extends to lateral propagation. A shared vector store or shared long-term memory that backs multiple agents is a single-point-of-fleet-wide poisoning. One agent that ingests a malicious document writes a poisoned memory entry; every other agent that retrieves from the same store is subsequently influenced by it. This is a social-contagion model for false beliefs spreading across an agent fleet. The consequence is that memory stores must be isolated by trust boundary, not shared across agents of different privilege levels or exposure profiles, and every write must be provenance-tagged so the source of each belief is always recoverable.

Scenario: MINJA: belief injection through ordinary queries

An agent maintains a memory store that is consulted before answering questions. An attacker, who has no direct write access to the store, sends the agent a series of messages designed to induce it to form and store a specific false belief, for example that a particular URL is a legitimate internal endpoint for credential refresh. The agent, following its normal consolidation routine, writes this belief to memory. In subsequent sessions the false URL is retrieved as authoritative context; the agent navigates to it on behalf of later users. Write-ahead logging with provenance tags on every entry would allow a rollback to the pre-attack state once the anomaly is detected, limiting the blast radius to the window between poisoning and discovery.

Scenario: fleet-wide contamination through a shared RAG index

Ten customer-facing agents share one vector store. A malicious document is submitted through a public input channel and, after retrieval, causes one agent to write a poisoned summary to the shared store. The poisoned summary is subsequently returned as a top-ranked result for queries across the fleet. Because the store is shared and the entry carries no indication of its origin in the submission pipeline, all ten agents begin referencing the false content. Per-tenant namespace isolation would have confined the initial write to the affected agent’s private index, and an ingestion staging review with provenance checks would have caught the document before any write occurred.

How it fails

Retrieved chunks carry no provenance metadata, so the agent treats a RAG hit from a poisoned external document with the same confidence as one from a verified internal knowledge base.
Memory is shared across tenants or across agents with different trust profiles, making a single poisoned write fleet-wide in effect.
Reads trigger reconsolidation, so accessing a poisoned entry once promotes it to higher authority in the store. The attack amplifies itself over time.
There is no versioned history of the memory state, so when poisoning is discovered there is no safe snapshot to roll back to.
Memory has no expiry policy; unverified entries from years-old interactions remain authoritative indefinitely.

Why the mapped controls work

Content-hash integrity verified on read catches any in-transit or at-rest tampering before the entry influences reasoning. If the hash does not match, the entry is quarantined rather than retrieved. Provenance tags on every write make the origin of each belief queryable, so a rollback can be scoped to entries that derive from a compromised source rather than requiring a full store wipe. Source allow-listing and staging review for RAG create a bottleneck where new content must pass integrity checks before being promoted to retrievable status, removing the shortest path to fleet-wide poisoning. Per-tier memory partitions enforce the boundary between high-trust (operator-authored) and low-trust (user or environment-derived) content, so an attacker who can influence the low-trust tier cannot thereby poison the high-trust tier. Write-ahead logging and rollback together provide the recovery path that is otherwise missing when memory poisoning is detected days after the fact.

First steps

Enable write-ahead logging with provenance metadata (source ID, timestamp, content hash) on every write to your agent’s long-term memory store. If you are using a vector database such as Weaviate, Qdrant, or Pinecone, add a metadata schema field for source_id and ingest_timestamp before any write path goes live.
Enforce per-tier namespace partitioning in your vector store so that ENVIRONMENT-derived content (user submissions, RAG from external sources) cannot be written to the same namespace as SYSTEM or OPERATOR-DATA content. Configure separate collections or indices with separate write credentials for each trust tier.
Set an expiry policy on unverified memory entries. For example, any entry tagged with trust level environment should carry a ttl field and be automatically evicted after a defined period (30 days is a reasonable starting point) unless a human operator explicitly promotes it to a higher tier.

Threats it governs

When this principle is absent, these threats become reachable.

T1
Memory Poisoning Adversarial content written into short- or long-term memory contaminates future decisions.
T18
RAG Input Manipulation Leading to Policy Bypass Crafted inputs exploit RAG similarity search to surface lenient precedent that bypasses policy checks.
T27
Vector Database Poisoning with Malicious Smart Contract Data Attacker injects false data about malicious smart contracts into a shared vector store the agent queries.
T49
Semantic Drift in Embeddings Policy or content changes not reflected in vector store embeddings; outdated material retrieved as authoritative.

Controls that advance it

Catalogue mitigations that strengthen this principle, grouped by the defence-in-depth stage they sit in.

Prevent

Mem validate An agent's memory store is a persistent surface: anything written to it can be retrieved by any agent, in any session, for the lifetime of the corpus. Memory poisoning exploits that persistence by writing adversarial content that steers the agent's reasoning long after the attacker has gone. Write-boundary validation prevents this by running every candidate memory write through schema, policy, and provenance checks before it is committed. Content that fails any gate is rejected and never reaches the store.
Shared-memory ACL When multiple agents share a single vector store, the access boundaries between them are not enforced by the store itself unless you configure them explicitly. Without per-namespace write and retrieval controls, an agent that can write to the shared corpus can insert crafted vectors into any namespace it can reach, and any agent that can query the store can retrieve another agent's confidential documents through embedding-space proximity. Shared-memory ACL addresses this by tagging every vector with a principal identifier at write time and filtering every retrieval query to the requesting agent's namespace, enforced at the gateway layer where the agent cannot bypass it.
Vector ACL A vector store returns results by embedding-space proximity, not by who is asking. Without a per-principal filter applied before similarity ranking, a query from tenant A can surface tenant B's vectors if the embeddings are close enough. Vector ACL closes that gap: every retrieval call is scoped to the requesting principal's namespace or payload partition before the store ranks any results, so cross-principal hits are structurally impossible rather than merely unlikely.
Session isolation An agent that serves multiple users stores conversation history, retrieved facts, and intermediate state in a memory layer. If that layer is not scoped to the originating session, one user's writes can reach another user's retrieval path. Session-scoped memory isolation prevents that by enforcing a hard boundary at the storage layer, so each session can only read and write its own state.

Detect

Mem anomaly An agent's memory store can receive adversarial content that passes schema and policy validation because the content is structurally valid but statistically unusual. Memory anomaly detection addresses this by monitoring write rates, embedding distances, provenance tags, and retrieval patterns at runtime, and quarantining writes whose statistical signatures diverge from the established baseline.
Memory-poison defence An agent that reads from a vector store assumes the stored content reflects what was legitimately written. An adversary who can write to that store can inject passages that divert the agent's retrieval toward attacker-controlled content. This control applies two defensive layers: anomaly detection on writes, which quarantines incoming embeddings that are statistical outliers relative to existing cluster centroids; and re-ranking on reads, which uses a cross-encoder or probe-gradient scorer to demote adversarial candidates after dense retrieval. Both layers are research-stage. No turnkey production implementation exists as of catalogue version; deploy additively on top of Tier 2 baseline controls.

Respond

No catalogued control.

In Helmwart

Not scored directly; memory nodes and their read/write edges are modelled on the canvas.