← Atlas · Mitigations Tier 2 · Real-composable

MITIGATION · m-ai-disclosure-ui

AI-source disclosure UI — visible AI labelling at the point of action

When an AI agent generates content or proposes an action, users need to know that the source is an AI before they decide to act. Without that signal, users routinely over-trust agent output. AI-source disclosure addresses this by attaching a visible label to every AI-generated item and by requiring explicit confirmation for consequential actions, restoring the critical gap between receipt and acceptance.

Last reviewed 2026-05-12 · Status: published · Evidence →

At a glance

MATURITY
Tier 2
Available off-the-shelf or as a documented pattern, but newer or less broadly proven. Expect integration work and some operational nuance.
PLACES ON
node
Restricted to node kinds: agent
COVERAGE
2 threats
T10 · T15
TRADE-OFFS
LAT
low
COST
low
UX
medium
DEV
low
Latency · cost · UX friction · dev effort.
TL;DR
  • Mark every piece of AI-generated content with a visible label at the point it is shown or acted on, not in a footer or help page.
  • When the agent proposes a consequential action, the UI names the agent as the proposer and requires explicit confirmation scaled to the action's irreversibility.
  • At the start of any AI-driven interaction, and persistently in the UI chrome, the user must be informed they are interacting with an AI system, not a human. EU AI Act Article 50 paragraph 1 mandates this.
  • A label users tune out produces false regulatory compliance without behavioural effect. Plan for placement and contrast testing from launch; banner blindness sets in within weeks for static, low-contrast labels.

How it behaves

Agent generates content or proposes an action for the user
Is the AI-source label visible at the point the user makes a decision, and is confirmation friction scaled to the action's irreversibility?
User sees AI provenance before deciding; consequential actions require scaled confirmation before execution proceeds.
Disclosure gap: user acts without awareness of AI provenance, or accepts a consequential action without deliberate consent.
The label must appear at the decision point, not in a separate audit log or help text. Confirmation friction is the mechanism that re-establishes deliberate consent where mistakes are costly.

What it is

AI-source disclosure is the principle that a user must be able to identify AI-generated content and AI-proposed actions as such, at the moment they decide how to respond to them. Without it, users apply the same level of trust to agent output that they would to a trusted human contact, which is precisely the condition that manipulation-via-agent attacks depend on, and that regulatory frameworks like the EU AI Act Article 50 were written to prevent.

The control operates at three distinct layers, each addressing a different point at which the AI provenance can become invisible.

Content-level labelling attaches a persistent visual marker to every piece of AI-generated content: a badge, sparkle icon, or "AI-generated" tag rendered alongside the output, not in a footer or settings panel. When the content carries a C2PA manifest, the label can be cryptographically verified rather than self-asserted.

Action-level confirmation intercepts consequential agent proposals before they execute. The UI attributes the proposed action to the agent explicitly ("Claude proposes: archive this contract") and requires confirmation scaled to irreversibility. A single acknowledgment with a visible badge is sufficient for low-stakes actions; an explicit checkbox and a mandatory review period for medium-stakes ones; re-entry of the action target for high-stakes or irreversible ones. The friction is not an obstacle; it is the mechanism that re-establishes deliberate consent at the point where mistakes are costly.

Interaction-level disclosure makes the user's conversational partner legible. At session start, and persistently in the UI chrome, the agent declares itself as an AI system. EU AI Act Article 50 paragraph 1 mandates this wherever a natural person interacts with an AI system and it is not otherwise obvious.

A label that users tune out is worse than a weak control: it produces regulatory compliance on paper while delivering no behavioural effect. Published banner-blindness research shows that static, low-contrast labels are ignored within weeks of deployment. Plan for contrast and placement testing from launch, not as a later refinement.

Detection signals

  • Confirmation latency on labelled versus unlabelled actions. A significant difference confirms users are reading and responding to the label; similar latencies suggest the label is being ignored.
  • Override rate on AI-suggested actions. A declining rate over time is a banner-blindness signal: users are increasingly accepting suggestions without review.

Threats it covers

  • WHY IT HELPS OWASP T10 Excessive Agency arises in part from users accepting agent output without the scrutiny they would apply to human-authored content. A persistent, visible AI-source label at the decision point reduces uncritical acceptance: users who can see the AI provenance are more likely to pause before approving a proposed action.

  • T15 Human Manipulation −1 severity step

    WHY IT HELPS T15 Human Manipulation describes attacks in which an agent is used as an instrument to social-engineer the user, gaining compliance precisely because agent-generated content carries implicit trust. Visible AI-source labelling removes that trust premium by restoring the user's awareness that the content originated from an AI system, not from a trusted human contact.

Principle coverage

Defence-in-Depth stage: Prevent — and it advances:

  • Human Oversight (HITL / HOTL) Visible AI-source labelling makes oversight structural rather than implicit: users who can identify the AI provenance of a proposed action are positioned to apply genuine scrutiny before confirming, rather than accepting output they have not recognised as machine-generated.
  • Transparency / Explainability AI-source disclosure is the user-facing implementation of the transparency principle: it makes the AI origin of content and proposed actions legible at the point of decision, so the basis for what the user is being asked to accept is not opaque.

Design & governance principles (open design, economy of mechanism, accountability, …) are architectural, not advanced by a single placed control.

Implementation options

Four implementation paths covering machine-readable provenance, watermarking, platform-managed metadata, and self-build UI components. For image and document content, C2PA is the default choice; SynthID covers Google-native generative content; self-build badge and confirmation components are the only option for non-image agentic output such as chat, structured data, and tool results.

C2PA Content Credentials Attach a cryptographically signed C2PA manifest to AI-generated content (images, video, audio, PDF, Office documents) and render the credential as a human-visible badge using the @contentauth/c2pa-web browser SDK.

Why choose it: Best for image and document content where tamper-evident provenance is required: the manifest travels with the file and survives download, re-upload, and republication. Azure OpenAI DALL-E and GPT-image-1 automatically attach C2PA Content Credentials to every generated image with no additional setup. The @contentauth/c2pa-web SDK reads and surfaces the manifest in the browser. C2PA v2.0 is the technical foundation for EU AI Act Article 50 paragraph 2 machine-readable marks.

More details:

Google SynthID SynthID embeds imperceptible watermarks into AI-generated images, audio, video, and text. The watermark survives typical transformations (cropping, compression, re-encoding) and is detectable by the SynthID Detector portal.

Why choose it: Best when your content pipeline runs on Google's generative stack (Gemini, Imagen, Lyria) and you need a watermark that persists after download rather than a badge that can be stripped. SynthID is embedded by Google's models automatically; there is no developer API to call at generation time. The SynthID Detector verification portal is open for journalism and media use cases. For text, the watermark adjusts token probabilities during generation, leaving no visible UI artefact. SynthID is a machine-readable signal, not a human-visible label; pair with an explicit UI badge for the interaction-level and action-level disclosure layers.

More details:

IPTC Photo Metadata 2025.1 IPTC Photo Metadata Standard 2025.1 added four AI-specific fields: AI Prompt Information, AI Prompt Writer Name, AI System Used, and AI System Version Used, embedded in image file metadata (XMP/Exif) and readable by standard photo-management tools.

Why choose it: Best when your pipeline produces images distributed through professional media workflows (news agencies, stock libraries, press photography) where IPTC metadata is already read and displayed by downstream tools. IPTC fields are embedded metadata, not a rendered UI badge: they surface in metadata panels (Adobe Bridge, Lightroom, photo CMS tools), not in a consumer-facing label at the point of viewing. Use as the provenance record for editorial workflows; pair with C2PA for tamper-evidence and a UI badge for reader-facing disclosure.

More details:

EU AI Act Article 50 disclosure pattern A disclosure pattern derived directly from Article 50: paragraph 1 requires informing users they are interacting with an AI; paragraph 4 requires disclosing AI-generated or manipulated content. The disclosure must be clear and distinguishable at first interaction and meet accessibility standards.

Why choose it: Best as the compliance anchor for the interaction-level and action-level layers, where no off-the-shelf library prescribes the UX. Implement as a persistent header badge in chat UIs, an attributed byline on AI-generated text, and a confirmation modal that names the agent for consequential actions. The "clear and distinguishable" requirement rules out small-print, low-contrast, or collapsed disclosures. Obligations apply from 2 August 2026 across EU jurisdiction.

More details:

Self-build AI badge and confirmation component A purpose-built React (or framework-agnostic) component that renders a persistent AI-source badge alongside agent output and an action-confirmation modal that scales friction with action irreversibility.

Why choose it: The only option for non-image agentic output: structured data, chat messages, tool-result cards, and code suggestions, where C2PA and SynthID have no applicable surface. Dev effort is low for the initial badge and medium for the confirmation-friction calibration across action classes. Plan for A/B testing label placement and contrast against user-confirmation latency data; banner blindness sets in within weeks for static, low-contrast labels. Pair the UI layer with an append-only action log so the disclosure record is not solely in the rendered UI.

More details:

Trade-offs

  • C2PA manifest attachment adds no perceptible latency: signing happens at generation time, not render time. The @contentauth/c2pa-web SDK reads the manifest client-side.
  • SynthID watermarking is automatic in Google's generative stack; there is no developer API to add it to non-Google pipelines.
  • IPTC metadata fields require a metadata-writing step in the image pipeline (exiftool, Adobe Bridge, or a CMS hook): low effort, but not zero.
  • Self-build badge and confirmation components carry an ongoing calibration cost: friction thresholds per action class drift as usage patterns change. Budget one engineer-sprint per quarter for tuning against confirmation-latency telemetry.

When NOT to use

  • Do not apply UI disclosure controls to fully internal agentic pipelines where no human end-user sees the output: machine-to-machine API responses have no UI surface.
  • Do not substitute a machine-readable C2PA manifest for a human-visible badge when the user's workflow never surfaces metadata panels; both layers are required.
  • Do not require interaction-level disclosure for clearly AI-native products where the AI context is obvious at the point of use; EU AI Act Article 50 paragraph 1 has an exemption for this, but "obvious" is fact-specific and should not be self-certified without legal review.

Limitations

  • A label users tune out produces no behavioural effect. Published banner-blindness research shows static, low-contrast labels are ignored within weeks; contrast and placement testing is not optional.
  • C2PA manifests can be stripped by tools that do not preserve metadata on save or re-export; a tamper-evident manifest does not guarantee the label reaches the end viewer.
  • AI-source disclosure restores calibration, not authority. It does not prevent a user who chooses to act on AI advice from doing so. Pair with fail-closed refusal and HITL gates for high-stakes actions.
  • SynthID watermarks are detectable only by Google's detector as of mid-2026; third-party detection tooling does not exist, and the watermark is not a human-visible signal without the detector.

Maturity tier reasoning

  • Tier 2 fits because the individual building blocks (C2PA v2.0, @contentauth/c2pa-web, Azure OpenAI automatic Content Credentials, IPTC 2025.1 metadata fields, EU AI Act Article 50 compliance requirements) are all production-available and documented.
  • Not Tier 1, because no standard UI component or interaction pattern exists for the action-level confirmation layer in agentic systems; each deployment composes the friction calibration differently with no industry benchmark.
  • SynthID is embedded in Google's consumer products but has no developer API for third-party integration as of mid-2026; its use in third-party agentic pipelines is not currently possible.

Last verified against upstream docs: 2026-05-30.