MITIGATION · m-output-provenance
Output provenance tracking — record the source of every claim an agent makes
When an agent produces a claim derived from retrieved data, that claim needs a record of where it came from: the source document, version, and retrieval time. Without that record, a downstream verifier cannot distinguish a well-grounded output from a fabricated one, a tampered one, or a poisoned one. Provenance tracking attaches source attribution to every claim, carries it through each transformation in the pipeline, and surfaces it in audit logs and user-facing interfaces.
At a glance
TL;DR
- Tag every claim the agent emits with the source document it derives from, corpus ID, document ID, version, and retrieval timestamp, before the output leaves the pipeline.
- Provenance must travel through every transformation: a single RAG step or agent hop that drops Document.metadata breaks attribution for all downstream consumers.
- Tamper-evident provenance (C2PA manifest signing for media; ed25519-signed audit records for arbitrary claims) turns attribution into forensic evidence that survives repudiation attempts.
- Provenance records attribution, not correctness, pair with m-multi-source-verify when factual accuracy is the goal, not just traceability.
How it behaves
What it is
An agent that retrieves documents and generates claims from them is performing an inference step: it reads sources and produces an output that represents those sources. Provenance tracking is the practice of recording that inference step explicitly. Each claim in the output carries the identifier of the retrieved document it derives from, the version and timestamp of that document, and a reference to the generation event itself. That record travels with the claim through every subsequent transformation in the pipeline.
Three layers of provenance are relevant in agentic systems.
Retrieval provenance. Every document entering the pipeline carries a source identifier: corpus ID, document ID, version, and retrieval timestamp. W3C PROV-DM defines the standardised vocabulary for this layer, with relations such as wasDerivedFrom and wasAttributedTo linking output entities back to their source entities.
Generation provenance. The model's output is structured so that each claim references the retrieval ID it was generated from. The Anthropic Citations API returns citations arrays with character-range or page-number bindings. LangChain Document.metadata carries a source key and arbitrary custom fields through the retrieval chain. LlamaIndex source nodes carry the same attribution. The key invariant is that this attribution must be preserved through every transformation: a pipeline step that reconstructs output objects without copying source metadata silently breaks the chain for all downstream consumers.
Tamper-evident provenance. A signed provenance record turns attribution into forensic evidence. C2PA embeds a signed manifest with cryptographic hard bindings into media files. For text and structured outputs, an ed25519-signed audit record binds the output hash to the retrieval IDs at generation time. Once signed, neither the output nor its attributed sources can be altered without breaking the signature.
The three layers address different failure modes. Retrieval provenance is the prerequisite for everything downstream. Generation provenance makes attribution legible to verifiers at output time. Tamper-evident signing makes it durable against repudiation.
Provenance records attribution, not correctness. A well-attributed claim can still be wrong if the source document was itself incorrect or poisoned. Pair with m-multi-source-verify when factual accuracy is the requirement, and with m-mem-validation when provenance of what entered the corpus is the concern.
Detection signals
- Claims emitted without provenance tags. A rising volume indicates a pipeline stage that drops source metadata, either through a missing implementation step or a framework update that changed how Document objects are reconstructed.
- Provenance-tag mismatch: claimed source ID does not match any document in the retrieval manifest for that request. Indicates tampering or a model hallucinating a source identifier.
Threats it covers
-
WHY IT HELPS Memory Poisoning introduces adversarial content into the agent's memory store; quarantining it after detection requires knowing which stored entries contributed to a given output. Per-claim provenance records the retrieval IDs that grounded each claim, giving incident response a starting point for identifying and removing poisoned entries.
-
WHY IT HELPS Cascading Hallucination Attacks compound when a fabricated or weakly-grounded claim propagates through multi-agent pipelines and is treated as authoritative by downstream steps. Per-claim source attribution exposes which claims lack a real retrieval ID, allowing the pipeline to hold or flag those claims before they reach the next agent in the chain.
-
WHY IT HELPS Repudiation and Untraceability succeed when no durable record links an output to the agent and source that produced it. Tamper-evident per-claim provenance, signed at generation time, removes the ability to deny or alter the record of what was produced and why.
Principle coverage
Defence-in-Depth stage: Detect — and it advances:
- Provenance & Trust-tagging Provenance tracking is the direct implementation of the provenance-trust-tagging principle for generated outputs: it binds each claim to the source document that grounded it at retrieval time, so the trust level of the output is not asserted by the agent but verifiable from the record of where the content came from.
- Observability / Non-repudiation Per-claim provenance records give the observability layer the structured data it needs to answer post-incident questions: which sources contributed to an output, which pipeline step produced it, and whether the attribution record was altered after signing.
- Accountability Accountability requires that every consequential agent output be traceable to the actor and the evidence behind it. Tamper-evident per-claim provenance creates a durable, non-repudiable record that binds each output to its generating agent identity and its source documents, satisfying the attribution requirement that accountability depends on.
- Transparency / Explainability Transparency toward users and auditors requires that the basis for an agent's claims be inspectable. Provenance tracking makes that basis explicit and verifiable, surfacing source attribution in both audit logs and user-facing interfaces so the reasoning behind any output can be examined without relying on the agent's own description of what it used.
Design & governance principles (open design, economy of mechanism, accountability, …) are architectural, not advanced by a single placed control.
Implementation options
Five implementation options covering managed APIs, open vocabulary, media-grade signing, framework metadata, and a self-build structured-RAG pattern. All five are production-verified.
Anthropic Citations API Pass documents with citations.enabled=true; the API returns each response text block with a citations array that pinpoints the exact character range or page number in the source document. Cited text does not count toward output tokens.
Why choose it: Best when your pipeline uses Claude and you want claim-source binding guaranteed by the API layer rather than prompt engineering. The API chunks documents into sentences, produces structured citation objects (char_location, page_location, content_block_location), and is compatible with prompt caching and batch processing. Structured Outputs cannot be used simultaneously.
More details:
LangChain Document.metadata Every LangChain Document carries a metadata dict with a source key (file path or URL) plus any custom provenance fields. Retrievers propagate metadata through the RAG chain; the application layer surfaces metadata alongside generated text.
Why choose it: Best for LangChain-native pipelines where you need source attribution without a managed API. The Document class has a source property that reads from metadata["source"]. Metadata is arbitrary, so you can add doc_id, version, retrieved_at, and corpus fields. You are responsible for preserving metadata through every transformation in the chain, a step that reconstructs Documents without copying metadata silently drops attribution.
More details:
W3C PROV-DM PROV-DM (W3C Recommendation, April 2013) defines the conceptual data model for provenance: entities, activities, agents, and the wasGeneratedBy / wasDerivedFrom / wasAttributedTo relations. PROV-O is the OWL ontology serialisation; PROV-N and PROV-XML are alternative formats.
Why choose it: Best when you need an interoperable, standards-based provenance graph that multiple systems can consume. Use PROV-DM as the schema for your audit log: each agent output is a prov:Entity wasGeneratedBy a prov:Activity (the generation call) wasAttributedTo a prov:Agent (the model identity) and wasDerivedFrom the retrieved source entities. Not a library, it is the vocabulary you implement against.
More details:
C2PA content credentials C2PA embeds a signed manifest directly into the media file. The manifest contains assertions, a claim-generator signature (X.509 certificate), and cryptographic hard bindings that uniquely identify the asset bytes and detect any subsequent tampering. Key adopters include Adobe, Microsoft, Sony, BBC, and Google.
Why choose it: Best for agent outputs that are media files (images, video, audio, PDFs) where downstream consumers need tamper-evident proof of origin. C2PA is disproportionate overhead for plain-text audit logs, use ed25519-signed JSON records there. For mixed pipelines where agents produce media alongside text, C2PA is the only industry-standard option for the media layer.
More details:
Self-build: structured RAG with per-claim source IDs Retrieve documents with provenance metadata, build a prompt that instructs the model to output JSON with a claims array (each claim: text, sourceDocId, confidenceScore), verify each sourceDocId against the retrieval manifest, compute a SHA-256 hash of the response, and write a signed audit record.
Why choose it: Best when you are not on Claude and cannot use the Citations API, or when you need custom provenance fields (for example tenant ID or classification label) beyond what managed APIs expose. The model performs attribution as part of structured output; the pipeline verifies the cited IDs are real documents from the retrieval step and rejects any claim whose sourceDocId is not in the retrieval manifest. Requires a schema-validation step and an integrity-hash step, both are standard application code.
More details:
Trade-offs
- Metadata propagation through the pipeline adds negligible latency, it is bookkeeping carried alongside existing data structures. C2PA manifest signing adds under 50ms per artifact. ed25519 record signing is 1 to 10ms per record.
- Audit-log storage is the dominant ongoing cost. For a high-volume agent producing thousands of outputs per day, tamper-evident per-claim records can reach tens of gigabytes per month at commodity cloud storage pricing.
- Development effort is medium. Wiring provenance metadata through every transformation in a multi-step agent pipeline is not technically complex, but easy to miss at one stage; a single step that drops the source metadata breaks the chain for all downstream consumers.
- User-facing friction is low when provenance renders as inline citations. It rises when the provenance manifest dominates the output surface or requires the user to navigate to a separate view to inspect sources.
When NOT to use
- Do not apply per-claim tamper-evident signing to fully generative outputs with no retrieval component, if the agent produces free-form text from model weights alone with no retrieved document, there is no retrieval provenance to record. The control reduces to logging the model version and generation parameters, which is covered by m-model-registry.
- Do not invest in signed per-claim records for low-stakes, ephemeral outputs such as conversational filler, draft brainstorms, or internal scratch work where audit overhead outweighs forensic value.
- Provenance is the wrong primary control for preventing hallucination, it records which source was cited, not whether the source is correct. Use m-multi-source-verify when factual accuracy, not attribution, is the goal.
Limitations
- Provenance is only as trustworthy as the upstream source. A provenance chain that points to a poisoned corpus produces a confidently attributed but incorrect answer. Pair with m-mem-validation on the ingest side.
- A single transformation in the pipeline that reconstructs output objects without copying source metadata silently breaks the provenance chain for all downstream consumers. There is no runtime warning, the gap surfaces only at audit or incident time.
- The self-build structured-RAG option depends on the model correctly outputting
sourceDocIdvalues that match the retrieval manifest. A model that produces asourceDocIdthat happens to exist in the corpus but was not retrieved for the current request yields a plausible-looking but incorrect attribution. Schema validation of eachsourceDocIdagainst the actual retrieval manifest for that request is the required verification step.
Maturity tier reasoning
- Tier 2 (real-composable) fits because W3C PROV-DM and C2PA are mature, ratified standards; the Anthropic Citations API is production-available across all current Claude models; LangChain Document.metadata propagation is production-stable.
- What keeps this out of Tier 1 is that agentic propagation of provenance through multi-step pipelines is a composed application pattern, not a single off-the-shelf product. Every deployment assembles retrieval provenance, generation-side citation binding, and tamper-evident signing differently.
Last verified against upstream docs: 2026-05-30.