Resilience & Recovery · Principles

Why it matters for agentic AI

NIST 800-160 v2 frames resilience as four capabilities: anticipate, withstand, recover, and adapt. The “recover” phase depends on detection signals from Observability, on reversible action patterns from Reversibility, and on the ability to stop the agent cleanly via Safe Interruptibility. For traditional systems the hardest of these is usually “withstand,” meaning designing so a failure doesn’t cascade. For agentic systems the harder problem is often “recover,” because a precondition of recovery is detection, and agentic failures are routinely undetectable from their surface behaviour.

A crashed server returns an error code. A hallucinating agent returns a confident, well-formatted answer. A memory-poisoned agent continues processing tickets, filing responses, and returning HTTP 200 while acting on a false “fact” planted days earlier. An injected agent completes its visible workflow (the task appears done) while executing attacker-directed side effects in the background. Classical resilience design can assume that failures are either immediately visible or reliably detectable within a bounded latency. Agentic resilience cannot make that assumption. The implication is not that detection is impossible but that it must be proactive and behavioural, baselining normal action patterns and flagging deviations rather than relying on reactive, event-driven monitoring.

A second structural difference is irreversibility. Many agentic actions (sending an email, posting a transaction, modifying a shared document, calling an external API) are difficult or impossible to fully undo. Classical resilience uses backup/restore cycles for data and rollback for deployments; these translate directly to agent memory and plan state. But they do not automatically translate to real-world effects the agent has already caused. This makes the design principle reversible-by-default critically important: an agent that prefers reversible actions (draft rather than send, propose rather than commit) keeps the recovery window open until a promotion step explicitly closes it. Every irreversible action is a loss of future recovery options, and the architecture should treat it accordingly.

Scenario: MINJA, the long-dwell memory poison

Memory poisoning via indirect injection (exemplified by the MINJA attack class) works by placing a crafted payload inside ordinary content the agent will eventually retrieve: a shared document, a public knowledge base, or a RAG corpus. When the agent reads it and writes a summary to long-term memory, the summary carries the attacker’s false “fact.” The agent’s reasoning subtly drifts over the following days. No error is raised. The recovery path requires versioned, append-only memory with timestamps and source provenance: the operator can identify when the drift began, locate the poisoned write, and roll memory back to a snapshot that predates it. Without versioning, there is nothing to roll back to and the false fact must be hunted down manually across every downstream artefact the agent produced.

Scenario: the authorised-but-wrong irreversible action

A customer-service agent autonomously closes and archives a high-value account following a misclassification. It was an authorised action, within its permission scope, executed on genuinely wrong reasoning. No security control fails; the action clears every policy gate. Recovery now requires not just undoing a database write but contacting the customer, reversing downstream system states, and reconstructing context from logs that may not have been designed to support replay. A reversible-by-default architecture would have produced a “proposed closure” flagged for human review before irreversibly archiving; the promotion step to execute would have required explicit confirmation. The entire recovery cost is avoided at the cost of one confirmation click.

How it fails

There is no rollback path for an authorised-but-wrong action; the first the system knows is a human complaint, at which point downstream effects have propagated.
The agent can reach and modify its own kill-switch policy; an injected or misbehaving agent disables its own stop condition.
A cascade has already propagated across multiple agents before any single event looked anomalous in isolation. The detection signal only emerges from aggregation, and no aggregation layer exists.
Persistent memory accumulates indefinitely with no versioning; a poisoned write becomes the permanent ground truth, and there is no snapshot to restore from.
Recovery planning assumes errors surface immediately; silent “success” on a hallucinated or attacker-directed action is never designed for.

Why the mapped controls work

Reversible-by-default actions with a promotion step implement the most direct form of resilience: they keep the recovery window structurally open until an explicit human or policy decision closes it. Immutable memory versioning with rollback is the counterpart to backup/restore. It makes the “recover” phase of the resilience cycle possible for the specific failure mode (memory poisoning) where no event signals the need to recover. Digital-twin replay of multi-step plans moves recovery capability earlier: by simulating a plan against a mirrored environment before live execution, it creates both a checkpoint and a rollback reference for every plan step. Behavioural-baseline drift alerting supplies the detection signal that event-driven monitoring cannot, firing on statistical departure from normal action patterns rather than waiting for an error. Autonomy-tier demotion on incident closes the loop: when drift is detected, the agent’s scope is reduced before recovery is attempted, preventing further damage during the recovery window itself.

First steps

Implement versioned, append-only memory writes for your agent’s long-term store today. If you are using a key-value or vector database, add a version counter and written_at timestamp to every write, and configure the store to reject overwrites (require explicit delete + new write) so that every state transition is recoverable to any prior snapshot.
Build a behavioural baseline for your agent by logging the distribution of tool-call types, frequencies, and target resources across one week of normal operation, then configure an alert (Datadog anomaly detection, or a simple rolling z-score on your existing log pipeline) that fires when any dimension deviates more than two standard deviations from the baseline.
Draft and test a runbook for the “authorised-but-wrong irreversible action” failure mode. Identify the three most consequential irreversible actions your agent can take, document the compensating transaction or rollback steps for each, and verify them in a staging environment before an incident forces the discovery that recovery is harder than assumed.

Threats it governs

When this principle is absent, these threats become reachable.

T1
Memory Poisoning Adversarial content written into short- or long-term memory contaminates future decisions.
T5
Cascading Hallucination Attacks Fabricated outputs propagate via reflection, memory, or multi-agent comms.
T6
Intent Breaking and Goal Manipulation Adversaries manipulate planning, reasoning, or self-evaluation to override goals.

Controls that advance it

Catalogue mitigations that strengthen this principle, grouped by the defence-in-depth stage they sit in.

Prevent

Model registry An agent loads whichever model weights are available at startup unless the runtime is told exactly which artifact to load. If a poisoned or regressed weight is published to the model store, the agent picks it up silently on the next restart. A model registry prevents that: every artifact is registered with a cryptographic checksum and an approval stage, the agent runtime loads by explicit version pin, and new versions must pass a canary evaluation before promotion to production.

Detect

Workflow state consistency When multiple agents read and write shared workflow state concurrently, a network partition, a delayed message, or an adversarially timed race condition can produce divergent views. An agent acting on stale or conflicting state may authorise an action it would reject given correct current state. Hash-chained state snapshots, merge-point conflict detection, and optimistic concurrency control close that window.

Respond

Graceful degradation An agent that encounters a quota trip, a dependency failure, or a timeout faces a choice: continue at reduced quality, or refuse. Getting that choice wrong is the core operational failure. Graceful degradation requires the answer to be declared before the incident, not improvised during it: write-authority paths fail closed and return a refusal; read-only paths fail open and disclose the degraded state explicitly.
Kill switch Agentic systems can act faster than a human can intervene through normal channels. A kill switch is the operational guarantee that a named human role can stop agent activity at any scope (single instance, class, or global) through a documented runbook, without requiring a code change or redeployment, and with every invocation written to an audit trail.

In Helmwart

Not scored directly; related to the reactive phase in the Defence-in-Depth audit.