EVIDENCE TRAIL
Memory anomaly detection
Verbatim excerpts from the upstream sources cited on the mitigation page, with what each source does and does not prove. The phrase "anomaly detection systems" appears verbatim in OWASP Agentic AI v1.1 §T1 and Playbook P2 — those are the closest upstream matches for this control's title and reactive mechanism.
Last cross-checked against upstream sources: · 7 sources
References
Each entry shows what the source supports and what it does not prove.
OWASP Agentic AI — Threats & Mitigations v1.1
§T1 Memory Poisoning — Mitigations column
"Implement memory content validation, session isolation, robust authentication mechanisms for memory access, anomaly detection systems, and regular memory sanitization routines. Require AI-generated memory snapshots for forensic analysis and rollback if anomalies are detected."
Supports: Verbatim naming of "anomaly detection systems" as a required mitigation for Memory Poisoning — the direct upstream citation for this control's rationale and scope.
Does not prove: Does not specify which anomaly signals to monitor (write-rate, embedding-distance, provenance-skew, recall-pattern). Helmwart operationalises those four classes from the broader literature.
OWASP Agentic AI Mitigation Playbook P2 — Preventing Memory Poisoning & AI Knowledge Corruption
Playbook 2 — §Step 2: Detect & Respond to Memory Poisoning (Reactive) · first bullet
"Deploy anomaly detection systems to monitor unexpected updates in AI memory logs, including unauthorized memory access."
Supports: Verbatim statement of the reactive anomaly-detection pattern this control implements. Also names write-frequency monitoring: "Detect and flag abnormal memory modification frequency. Identify cases where AI memory is being rewritten at an unusually high rate, which may indicate manipulation attempts."
Does not prove: Playbook language is guidance, not a mandatory standard. Detection thresholds and signal classes are left to the implementer.
OWASP Agentic AI Mitigation Playbook P2 — Preventing Memory Poisoning & AI Knowledge Corruption
Playbook 2 — §Step 3: Prevent the Spread of False Knowledge (Detective) · fifth bullet
"Continuously analyze memory access patterns to detect long-term anomalies or policy drift. Verify integrity of access logs and correlate unusual access trends with potential knowledge corruption."
Supports: Names recall-pattern drift and access-log correlation as detective signals — matching Helmwart's fourth detection class (recall-pattern anomalies) and provenance-tag drift signal.
Does not prove: Framed as knowledge-spread prevention rather than write-time detection. Partially overlaps with the proactive playbook step rather than the reactive anomaly-detection layer.
OWASP LLM Top 10 v2025 — LLM04:2025 Data and Model Poisoning
§Prevention and Mitigation Strategies · third item
"Implement strict sandboxing to limit model exposure to unverified data sources. Use anomaly detection techniques to filter out adversarial data."
Supports: Names anomaly detection as a defence against data poisoning in the OWASP LLM canon. "Monitor training loss and analyze model behavior for signs of poisoning. Use thresholds to detect anomalous outputs." — establishes threshold-based detection as expected practice.
Does not prove: LLM04:2025 addresses training-time and fine-tuning poisoning; agent runtime memory is an extension of this threat that the Agentic AI document covers explicitly. The anomaly-detection advice here is not specific to vector stores or write operations.
MITRE ATLAS AML.M0031 — Memory Hardening
AML.M0031 Memory Hardening — description
"Memory Hardening involves developing trust boundaries and secure processes for how an AI agent stores and accesses memory and context. This may be implemented using a combination of strategies including restricting an agent's ability to store memories by requiring external authentication and validation for memory updates, performing semantic integrity checks on retrieved memories before agents execute actions, and implementing controls for monitoring of memory and remediation processes for poisoned memory."
Supports: Names "implementing controls for monitoring of memory" and "remediation processes for poisoned memory" as components of memory hardening — the same monitoring-plus-quarantine pattern this control implements. Mapped to AML.T0080 (LLM Memory Poisoning).
Does not prove: Does not enumerate specific monitoring signals (write-rate, embedding-distance, etc.). Remediation details are left to the implementer.
MITRE ATLAS AML.M0024 — AI Telemetry Logging
AML.M0024 AI Telemetry Logging — description
"Implement logging of inputs and outputs of deployed AI models. When deploying AI agents, implement logging of the intermediate steps of agentic actions and decisions, data access and tool use, installation commands, and identity of the agent. Monitoring logs can help to detect security threats and mitigate impacts."
Supports: Defines the telemetry infrastructure (logging of data access and agentic actions) that an anomaly detector reads from. Without AML.M0024's logging layer, write-rate and provenance-drift signals cannot be computed.
Does not prove: AML.M0024 is the logging prerequisite, not the anomaly-detection control itself. It does not specify threshold logic or what to do when a signal fires.
Steinhardt, Koh & Liang — "Certified Defenses for Data Poisoning Attacks" (NeurIPS 2017)
Abstract
"Machine learning systems trained on user-provided data are susceptible to data poisoning attacks, whereby malicious users inject false training data with the aim of corrupting the learned model. … We address this by constructing approximate upper bounds on the loss across a broad family of attacks, for defenders that first perform outlier removal followed by empirical risk minimization."
Supports: Foundational academic reference establishing outlier removal (embedding-distance-based detection) as a principled statistical defence against data poisoning. Helmwart's embedding-distance outlier check (signal 2) is the operationalisation of this principle in a vector-store context.
Does not prove: The paper addresses training-time batch poisoning, not real-time write-stream monitoring of agent memory. The statistical methods transfer conceptually; the deployment context is different.