← Atlas · Mitigations Tier 2 · Real-composable

MITIGATION · m-ai-disclosure-ui

AI-source disclosure UI — visible AI labelling at the point of action

When an AI agent generates content or proposes an action, users need to know that the source is an AI before they decide to act. Without that signal, users routinely over-trust agent output. AI-source disclosure addresses this by attaching a visible label to every AI-generated item and by requiring explicit confirmation for consequential actions, restoring the critical gap between receipt and acceptance.

Last reviewed 2026-05-12 · Status: published · Evidence →

At a glance

MATURITY

Tier 2

Available off-the-shelf or as a documented pattern, but newer or less broadly proven. Expect integration work and some operational nuance.

PLACES ON

node

Restricted to node kinds: agent

COVERAGE

2 threats

T10 · T15

TRADE-OFFS

LAT

low

COST

low

medium

DEV

low

Latency · cost · UX friction · dev effort.

TL;DR

Mark every piece of AI-generated content with a visible label at the point it is shown or acted on, not in a footer or help page.
When the agent proposes a consequential action, the UI names the agent as the proposer and requires explicit confirmation scaled to the action's irreversibility.
At the start of any AI-driven interaction, and persistently in the UI chrome, the user must be informed they are interacting with an AI system, not a human. EU AI Act Article 50 paragraph 1 mandates this.
A label users tune out produces false regulatory compliance without behavioural effect. Plan for placement and contrast testing from launch; banner blindness sets in within weeks for static, low-contrast labels.

How it behaves

Agent generates content or proposes an action for the user

Is the AI-source label visible at the point the user makes a decision, and is confirmation friction scaled to the action's irreversibility?

User sees AI provenance before deciding; consequential actions require scaled confirmation before execution proceeds.

Disclosure gap: user acts without awareness of AI provenance, or accepts a consequential action without deliberate consent.

The label must appear at the decision point, not in a separate audit log or help text. Confirmation friction is the mechanism that re-establishes deliberate consent where mistakes are costly.

What it is

AI-source disclosure is the principle that a user must be able to identify AI-generated content and AI-proposed actions as such, at the moment they decide how to respond to them. Without it, users apply the same level of trust to agent output that they would to a trusted human contact, which is precisely the condition that manipulation-via-agent attacks depend on, and that regulatory frameworks like the EU AI Act Article 50 were written to prevent.

The control operates at three distinct layers, each addressing a different point at which the AI provenance can become invisible.

Content-level labelling attaches a persistent visual marker to every piece of AI-generated content: a badge, sparkle icon, or "AI-generated" tag rendered alongside the output, not in a footer or settings panel. When the content carries a C2PA manifest, the label can be cryptographically verified rather than self-asserted.

Action-level confirmation intercepts consequential agent proposals before they execute. The UI attributes the proposed action to the agent explicitly ("Claude proposes: archive this contract") and requires confirmation scaled to irreversibility. A single acknowledgment with a visible badge is sufficient for low-stakes actions; an explicit checkbox and a mandatory review period for medium-stakes ones; re-entry of the action target for high-stakes or irreversible ones. The friction is not an obstacle; it is the mechanism that re-establishes deliberate consent at the point where mistakes are costly.

Interaction-level disclosure makes the user's conversational partner legible. At session start, and persistently in the UI chrome, the agent declares itself as an AI system. EU AI Act Article 50 paragraph 1 mandates this wherever a natural person interacts with an AI system and it is not otherwise obvious.

A label that users tune out is worse than a weak control: it produces regulatory compliance on paper while delivering no behavioural effect. Published banner-blindness research shows that static, low-contrast labels are ignored within weeks of deployment. Plan for contrast and placement testing from launch, not as a later refinement.

Detection signals

Confirmation latency on labelled versus unlabelled actions. A significant difference confirms users are reading and responding to the label; similar latencies suggest the label is being ignored.
Override rate on AI-suggested actions. A declining rate over time is a banner-blindness signal: users are increasingly accepting suggestions without review.

Threats it covers

T10 Overwhelming Human-in-the-Loop (HITL) −1 severity step

WHY IT HELPS OWASP T10 Excessive Agency arises in part from users accepting agent output without the scrutiny they would apply to human-authored content. A persistent, visible AI-source label at the decision point reduces uncritical acceptance: users who can see the AI provenance are more likely to pause before approving a proposed action.
T15 Human Manipulation −1 severity step

WHY IT HELPS T15 Human Manipulation describes attacks in which an agent is used as an instrument to social-engineer the user, gaining compliance precisely because agent-generated content carries implicit trust. Visible AI-source labelling removes that trust premium by restoring the user's awareness that the content originated from an AI system, not from a trusted human contact.

Principle coverage

Defence-in-Depth stage: Prevent — and it advances:

Human Oversight (HITL / HOTL) Visible AI-source labelling makes oversight structural rather than implicit: users who can identify the AI provenance of a proposed action are positioned to apply genuine scrutiny before confirming, rather than accepting output they have not recognised as machine-generated.
Transparency / Explainability AI-source disclosure is the user-facing implementation of the transparency principle: it makes the AI origin of content and proposed actions legible at the point of decision, so the basis for what the user is being asked to accept is not opaque.

Design & governance principles (open design, economy of mechanism, accountability, …) are architectural, not advanced by a single placed control.

Implementation options

Four implementation paths covering machine-readable provenance, watermarking, platform-managed metadata, and self-build UI components. For image and document content, C2PA is the default choice; SynthID covers Google-native generative content; self-build badge and confirmation components are the only option for non-image agentic output such as chat, structured data, and tool results.

C2PA Content Credentials Attach a cryptographically signed C2PA manifest to AI-generated content (images, video, audio, PDF, Office documents) and render the credential as a human-visible badge using the @contentauth/c2pa-web browser SDK.

Why choose it: Best for image and document content where tamper-evident provenance is required: the manifest travels with the file and survives download, re-upload, and republication. Azure OpenAI DALL-E and GPT-image-1 automatically attach C2PA Content Credentials to every generated image with no additional setup. The @contentauth/c2pa-web SDK reads and surfaces the manifest in the browser. C2PA v2.0 is the technical foundation for EU AI Act Article 50 paragraph 2 machine-readable marks.

More details:

Google SynthID SynthID embeds imperceptible watermarks into AI-generated images, audio, video, and text. The watermark survives typical transformations (cropping, compression, re-encoding) and is detectable by the SynthID Detector portal.

Why choose it: Best when your content pipeline runs on Google's generative stack (Gemini, Imagen, Lyria) and you need a watermark that persists after download rather than a badge that can be stripped. SynthID is embedded by Google's models automatically; there is no developer API to call at generation time. The SynthID Detector verification portal is open for journalism and media use cases. For text, the watermark adjusts token probabilities during generation, leaving no visible UI artefact. SynthID is a machine-readable signal, not a human-visible label; pair with an explicit UI badge for the interaction-level and action-level disclosure layers.

More details:

SynthID overview, Google DeepMind ↗

IPTC Photo Metadata 2025.1 IPTC Photo Metadata Standard 2025.1 added four AI-specific fields: AI Prompt Information, AI Prompt Writer Name, AI System Used, and AI System Version Used, embedded in image file metadata (XMP/Exif) and readable by standard photo-management tools.

Why choose it: Best when your pipeline produces images distributed through professional media workflows (news agencies, stock libraries, press photography) where IPTC metadata is already read and displayed by downstream tools. IPTC fields are embedded metadata, not a rendered UI badge: they surface in metadata panels (Adobe Bridge, Lightroom, photo CMS tools), not in a consumer-facing label at the point of viewing. Use as the provenance record for editorial workflows; pair with C2PA for tamper-evidence and a UI badge for reader-facing disclosure.

More details:

IPTC Photo Metadata Standard ↗

EU AI Act Article 50 disclosure pattern A disclosure pattern derived directly from Article 50: paragraph 1 requires informing users they are interacting with an AI; paragraph 4 requires disclosing AI-generated or manipulated content. The disclosure must be clear and distinguishable at first interaction and meet accessibility standards.

Why choose it: Best as the compliance anchor for the interaction-level and action-level layers, where no off-the-shelf library prescribes the UX. Implement as a persistent header badge in chat UIs, an attributed byline on AI-generated text, and a confirmation modal that names the agent for consequential actions. The "clear and distinguishable" requirement rules out small-print, low-contrast, or collapsed disclosures. Obligations apply from 2 August 2026 across EU jurisdiction.

More details:

EU AI Act Article 50, Transparency obligations ↗

Self-build AI badge and confirmation component A purpose-built React (or framework-agnostic) component that renders a persistent AI-source badge alongside agent output and an action-confirmation modal that scales friction with action irreversibility.

Why choose it: The only option for non-image agentic output: structured data, chat messages, tool-result cards, and code suggestions, where C2PA and SynthID have no applicable surface. Dev effort is low for the initial badge and medium for the confirmation-friction calibration across action classes. Plan for A/B testing label placement and contrast against user-confirmation latency data; banner blindness sets in within weeks for static, low-contrast labels. Pair the UI layer with an append-only action log so the disclosure record is not solely in the rendered UI.

More details:

Trade-offs

C2PA manifest attachment adds no perceptible latency: signing happens at generation time, not render time. The @contentauth/c2pa-web SDK reads the manifest client-side.
SynthID watermarking is automatic in Google's generative stack; there is no developer API to add it to non-Google pipelines.
IPTC metadata fields require a metadata-writing step in the image pipeline (exiftool, Adobe Bridge, or a CMS hook): low effort, but not zero.
Self-build badge and confirmation components carry an ongoing calibration cost: friction thresholds per action class drift as usage patterns change. Budget one engineer-sprint per quarter for tuning against confirmation-latency telemetry.

When NOT to use

Do not apply UI disclosure controls to fully internal agentic pipelines where no human end-user sees the output: machine-to-machine API responses have no UI surface.
Do not substitute a machine-readable C2PA manifest for a human-visible badge when the user's workflow never surfaces metadata panels; both layers are required.
Do not require interaction-level disclosure for clearly AI-native products where the AI context is obvious at the point of use; EU AI Act Article 50 paragraph 1 has an exemption for this, but "obvious" is fact-specific and should not be self-certified without legal review.

Limitations

A label users tune out produces no behavioural effect. Published banner-blindness research shows static, low-contrast labels are ignored within weeks; contrast and placement testing is not optional.
C2PA manifests can be stripped by tools that do not preserve metadata on save or re-export; a tamper-evident manifest does not guarantee the label reaches the end viewer.
AI-source disclosure restores calibration, not authority. It does not prevent a user who chooses to act on AI advice from doing so. Pair with fail-closed refusal and HITL gates for high-stakes actions.
SynthID watermarks are detectable only by Google's detector as of mid-2026; third-party detection tooling does not exist, and the watermark is not a human-visible signal without the detector.

Maturity tier reasoning

Tier 2 fits because the individual building blocks (C2PA v2.0, @contentauth/c2pa-web, Azure OpenAI automatic Content Credentials, IPTC 2025.1 metadata fields, EU AI Act Article 50 compliance requirements) are all production-available and documented.
Not Tier 1, because no standard UI component or interaction pattern exists for the action-level confirmation layer in agentic systems; each deployment composes the friction calibration differently with no industry benchmark.
SynthID is embedded in Google's consumer products but has no developer API for third-party integration as of mid-2026; its use in third-party agentic pipelines is not currently possible.

Last verified against upstream docs: 2026-05-30.

PLACEMENT

On the canvas, this control can be placed on:

node

Valid node kinds: agent

Place it on the canvas →

MAESTRO LAYERS

L6 L7

ATLAS TECHNIQUES

AML.T0067 LLM Trusted Output Components Manipulation
Adversary manipulates the structured parts of an LLM response (citations, tool-call arguments, approved-action markup) that downstream systems treat as trusted.
AML.T0080 AI Agent Context Poisoning
Adversary contaminates an agent's context store (short-term scratchpad, vector memory, conversation history) so future reasoning is biased toward attacker goals.

ATLAS MITIGATIONS

AML.M0021 Generative AI Guidelines
Policy-level safety controls baked into prompts and system instructions that constrain what the model is permitted to do.
AML.M0034 Deepfake Detection
Apply deepfake detection to untrusted or user-provided media to identify synthetic content used for impersonation or fraud.

TRADE-OFFS

latency low
cost low
ux friction medium
dev effort low

PLAYBOOKS

OWASP v1.1 playbook that recommends this control:

P1 Preventing AI Agent Reasoning Manipulation