← Atlas · Mitigations Tier 2 · Real-composable

MITIGATION · m-output-provenance

Output provenance tracking — record the source of every claim an agent makes

When an agent produces a claim derived from retrieved data, that claim needs a record of where it came from: the source document, version, and retrieval time. Without that record, a downstream verifier cannot distinguish a well-grounded output from a fabricated one, a tampered one, or a poisoned one. Provenance tracking attaches source attribution to every claim, carries it through each transformation in the pipeline, and surfaces it in audit logs and user-facing interfaces.

Last reviewed 2026-05-12 · Status: published · Evidence →

At a glance

MATURITY

Tier 2

Available off-the-shelf or as a documented pattern, but newer or less broadly proven. Expect integration work and some operational nuance.

PLACES ON

node

Restricted to node kinds: agent

COVERAGE

3 threats

T1 · T5 · T8

TRADE-OFFS

LAT

low

COST

low

DEV

medium

Latency · cost · UX friction · dev effort.

TL;DR

Tag every claim the agent emits with the source document it derives from, corpus ID, document ID, version, and retrieval timestamp, before the output leaves the pipeline.
Provenance must travel through every transformation: a single RAG step or agent hop that drops Document.metadata breaks attribution for all downstream consumers.
Tamper-evident provenance (C2PA manifest signing for media; ed25519-signed audit records for arbitrary claims) turns attribution into forensic evidence that survives repudiation attempts.
Provenance records attribution, not correctness, pair with m-multi-source-verify when factual accuracy is the goal, not just traceability.

How it behaves

Agent produces an output claim derived from retrieved documents

Does every claim in the response carry a verifiable source ID (document index, character range, or page number) from the retrieval step?

Output released with provenance manifest; audit log records retrieval IDs and response hash

Hold output, a provenance gap is a structural violation; do not release an unattributed claim

Provenance is enforced at the pipeline boundary, not retrospectively. An unattributed claim that reaches a user or downstream agent is a repudiation risk.

What it is

An agent that retrieves documents and generates claims from them is performing an inference step: it reads sources and produces an output that represents those sources. Provenance tracking is the practice of recording that inference step explicitly. Each claim in the output carries the identifier of the retrieved document it derives from, the version and timestamp of that document, and a reference to the generation event itself. That record travels with the claim through every subsequent transformation in the pipeline.

Three layers of provenance are relevant in agentic systems.

Retrieval provenance. Every document entering the pipeline carries a source identifier: corpus ID, document ID, version, and retrieval timestamp. W3C PROV-DM defines the standardised vocabulary for this layer, with relations such as wasDerivedFrom and wasAttributedTo linking output entities back to their source entities.

Generation provenance. The model's output is structured so that each claim references the retrieval ID it was generated from. The Anthropic Citations API returns citations arrays with character-range or page-number bindings. LangChain Document.metadata carries a source key and arbitrary custom fields through the retrieval chain. LlamaIndex source nodes carry the same attribution. The key invariant is that this attribution must be preserved through every transformation: a pipeline step that reconstructs output objects without copying source metadata silently breaks the chain for all downstream consumers.

Tamper-evident provenance. A signed provenance record turns attribution into forensic evidence. C2PA embeds a signed manifest with cryptographic hard bindings into media files. For text and structured outputs, an ed25519-signed audit record binds the output hash to the retrieval IDs at generation time. Once signed, neither the output nor its attributed sources can be altered without breaking the signature.

The three layers address different failure modes. Retrieval provenance is the prerequisite for everything downstream. Generation provenance makes attribution legible to verifiers at output time. Tamper-evident signing makes it durable against repudiation.

Provenance records attribution, not correctness. A well-attributed claim can still be wrong if the source document was itself incorrect or poisoned. Pair with m-multi-source-verify when factual accuracy is the requirement, and with m-mem-validation when provenance of what entered the corpus is the concern.

Detection signals

Claims emitted without provenance tags. A rising volume indicates a pipeline stage that drops source metadata, either through a missing implementation step or a framework update that changed how Document objects are reconstructed.
Provenance-tag mismatch: claimed source ID does not match any document in the retrieval manifest for that request. Indicates tampering or a model hallucinating a source identifier.

Threats it covers

T1 Memory Poisoning −1 severity step

WHY IT HELPS Memory Poisoning introduces adversarial content into the agent's memory store; quarantining it after detection requires knowing which stored entries contributed to a given output. Per-claim provenance records the retrieval IDs that grounded each claim, giving incident response a starting point for identifying and removing poisoned entries.
T5 Cascading Hallucination Attacks −1 severity step

WHY IT HELPS Cascading Hallucination Attacks compound when a fabricated or weakly-grounded claim propagates through multi-agent pipelines and is treated as authoritative by downstream steps. Per-claim source attribution exposes which claims lack a real retrieval ID, allowing the pipeline to hold or flag those claims before they reach the next agent in the chain.
T8 Repudiation and Untraceability −1 severity step

WHY IT HELPS Repudiation and Untraceability succeed when no durable record links an output to the agent and source that produced it. Tamper-evident per-claim provenance, signed at generation time, removes the ability to deny or alter the record of what was produced and why.

Principle coverage

Defence-in-Depth stage: Detect — and it advances:

Provenance & Trust-tagging Provenance tracking is the direct implementation of the provenance-trust-tagging principle for generated outputs: it binds each claim to the source document that grounded it at retrieval time, so the trust level of the output is not asserted by the agent but verifiable from the record of where the content came from.
Observability / Non-repudiation Per-claim provenance records give the observability layer the structured data it needs to answer post-incident questions: which sources contributed to an output, which pipeline step produced it, and whether the attribution record was altered after signing.
Accountability Accountability requires that every consequential agent output be traceable to the actor and the evidence behind it. Tamper-evident per-claim provenance creates a durable, non-repudiable record that binds each output to its generating agent identity and its source documents, satisfying the attribution requirement that accountability depends on.
Transparency / Explainability Transparency toward users and auditors requires that the basis for an agent's claims be inspectable. Provenance tracking makes that basis explicit and verifiable, surfacing source attribution in both audit logs and user-facing interfaces so the reasoning behind any output can be examined without relying on the agent's own description of what it used.

Design & governance principles (open design, economy of mechanism, accountability, …) are architectural, not advanced by a single placed control.

Implementation options

Five implementation options covering managed APIs, open vocabulary, media-grade signing, framework metadata, and a self-build structured-RAG pattern. All five are production-verified.

Anthropic Citations API Pass documents with citations.enabled=true; the API returns each response text block with a citations array that pinpoints the exact character range or page number in the source document. Cited text does not count toward output tokens.

Why choose it: Best when your pipeline uses Claude and you want claim-source binding guaranteed by the API layer rather than prompt engineering. The API chunks documents into sentences, produces structured citation objects (char_location, page_location, content_block_location), and is compatible with prompt caching and batch processing. Structured Outputs cannot be used simultaneously.

More details:

Anthropic Citations API docs ↗

LangChain Document.metadata Every LangChain Document carries a metadata dict with a source key (file path or URL) plus any custom provenance fields. Retrievers propagate metadata through the RAG chain; the application layer surfaces metadata alongside generated text.

Why choose it: Best for LangChain-native pipelines where you need source attribution without a managed API. The Document class has a source property that reads from metadata["source"]. Metadata is arbitrary, so you can add doc_id, version, retrieved_at, and corpus fields. You are responsible for preserving metadata through every transformation in the chain, a step that reconstructs Documents without copying metadata silently drops attribution.

More details:

LangChain Document source code ↗

W3C PROV-DM PROV-DM (W3C Recommendation, April 2013) defines the conceptual data model for provenance: entities, activities, agents, and the wasGeneratedBy / wasDerivedFrom / wasAttributedTo relations. PROV-O is the OWL ontology serialisation; PROV-N and PROV-XML are alternative formats.

Why choose it: Best when you need an interoperable, standards-based provenance graph that multiple systems can consume. Use PROV-DM as the schema for your audit log: each agent output is a prov:Entity wasGeneratedBy a prov:Activity (the generation call) wasAttributedTo a prov:Agent (the model identity) and wasDerivedFrom the retrieved source entities. Not a library, it is the vocabulary you implement against.

More details:

C2PA content credentials C2PA embeds a signed manifest directly into the media file. The manifest contains assertions, a claim-generator signature (X.509 certificate), and cryptographic hard bindings that uniquely identify the asset bytes and detect any subsequent tampering. Key adopters include Adobe, Microsoft, Sony, BBC, and Google.

Why choose it: Best for agent outputs that are media files (images, video, audio, PDFs) where downstream consumers need tamper-evident proof of origin. C2PA is disproportionate overhead for plain-text audit logs, use ed25519-signed JSON records there. For mixed pipelines where agents produce media alongside text, C2PA is the only industry-standard option for the media layer.

More details:

Self-build: structured RAG with per-claim source IDs Retrieve documents with provenance metadata, build a prompt that instructs the model to output JSON with a claims array (each claim: text, sourceDocId, confidenceScore), verify each sourceDocId against the retrieval manifest, compute a SHA-256 hash of the response, and write a signed audit record.

Why choose it: Best when you are not on Claude and cannot use the Citations API, or when you need custom provenance fields (for example tenant ID or classification label) beyond what managed APIs expose. The model performs attribution as part of structured output; the pipeline verifies the cited IDs are real documents from the retrieval step and rejects any claim whose sourceDocId is not in the retrieval manifest. Requires a schema-validation step and an integrity-hash step, both are standard application code.

More details:

Trade-offs

Metadata propagation through the pipeline adds negligible latency, it is bookkeeping carried alongside existing data structures. C2PA manifest signing adds under 50ms per artifact. ed25519 record signing is 1 to 10ms per record.
Audit-log storage is the dominant ongoing cost. For a high-volume agent producing thousands of outputs per day, tamper-evident per-claim records can reach tens of gigabytes per month at commodity cloud storage pricing.
Development effort is medium. Wiring provenance metadata through every transformation in a multi-step agent pipeline is not technically complex, but easy to miss at one stage; a single step that drops the source metadata breaks the chain for all downstream consumers.
User-facing friction is low when provenance renders as inline citations. It rises when the provenance manifest dominates the output surface or requires the user to navigate to a separate view to inspect sources.

When NOT to use

Do not apply per-claim tamper-evident signing to fully generative outputs with no retrieval component, if the agent produces free-form text from model weights alone with no retrieved document, there is no retrieval provenance to record. The control reduces to logging the model version and generation parameters, which is covered by m-model-registry.
Do not invest in signed per-claim records for low-stakes, ephemeral outputs such as conversational filler, draft brainstorms, or internal scratch work where audit overhead outweighs forensic value.
Provenance is the wrong primary control for preventing hallucination, it records which source was cited, not whether the source is correct. Use m-multi-source-verify when factual accuracy, not attribution, is the goal.

Limitations

Provenance is only as trustworthy as the upstream source. A provenance chain that points to a poisoned corpus produces a confidently attributed but incorrect answer. Pair with m-mem-validation on the ingest side.
A single transformation in the pipeline that reconstructs output objects without copying source metadata silently breaks the provenance chain for all downstream consumers. There is no runtime warning, the gap surfaces only at audit or incident time.
The self-build structured-RAG option depends on the model correctly outputting sourceDocId values that match the retrieval manifest. A model that produces a sourceDocId that happens to exist in the corpus but was not retrieved for the current request yields a plausible-looking but incorrect attribution. Schema validation of each sourceDocId against the actual retrieval manifest for that request is the required verification step.

Maturity tier reasoning

Tier 2 (real-composable) fits because W3C PROV-DM and C2PA are mature, ratified standards; the Anthropic Citations API is production-available across all current Claude models; LangChain Document.metadata propagation is production-stable.
What keeps this out of Tier 1 is that agentic propagation of provenance through multi-step pipelines is a composed application pattern, not a single off-the-shelf product. Every deployment assembles retrieval provenance, generation-side citation binding, and tamper-evident signing differently.

Last verified against upstream docs: 2026-05-30.

PLACEMENT

On the canvas, this control can be placed on:

node

Valid node kinds: agent

Place it on the canvas →

MAESTRO LAYERS

L2 L7

ATLAS TECHNIQUES

AML.T0019 Publish Poisoned Datasets
Adversary publishes a manipulated dataset to a public hub (HuggingFace, Kaggle, GitHub) so that downstream training pipelines incorporate the poisoned data.
AML.T0080 AI Agent Context Poisoning
Adversary contaminates an agent's context store (short-term scratchpad, vector memory, conversation history) so future reasoning is biased toward attacker goals.

ATLAS MITIGATIONS

AML.M0024 AI Telemetry Logging
Log inputs, outputs, and reasoning steps of deployed AI models so anomalous behaviour can be detected and incidents reconstructed.
AML.M0025 Maintain AI Dataset Provenance
Keep a detailed history of every dataset used by AI applications so corruption or unauthorised changes can be traced and rolled back.

TRADE-OFFS

latency low
cost low
ux friction low
dev effort medium

PLAYBOOKS

OWASP v1.1 playbook that recommends this control:

P2 Preventing Memory Poisoning & AI Knowledge Corruption