MITIGATION · m-link-render-restrict
Link and HTML rendering restriction — an allow-list control on what agent output may render
An agent can include links and rich HTML in its output. When that output is attacker-influenced, a clickable link, embedded image, or rich preview card becomes the delivery mechanism for phishing or data exfiltration via markdown image injection. Rendering restriction removes that delivery vector by allowing clickable content only from an explicit allow-list of trusted domains and reducing everything else to plain text before the output reaches the user.
At a glance
TL;DR
- Before output reaches the user, inspect every link and rich-content element against an allow-list of trusted domains; reduce everything else to plain text.
- Raw-URL rendering removes the clickable-anchor affordance entirely; allow-list rendering preserves it only for explicitly trusted domains.
- Markdown image tags from untrusted sources are stripped, closing the exfiltration path that encodes sensitive context as URL query parameters sent to an attacker-controlled server when the image renders.
- The control operates at the rendering layer, not the model layer; it does not alter the model's output, only how much of it the renderer is permitted to activate.
How it behaves
What it is
An agent's output is a string. Whether that string becomes a clickable anchor, an embedded image, or a rich preview card is decided by the rendering layer, not the model. That distinction matters: an attacker who can influence the agent's output does not need the model to behave maliciously; it is sufficient to get a plausible-looking URL into the string and let the renderer do the rest.
Rendering restriction is a control at that rendering layer. Before the agent's output is displayed, a sanitiser or allow-list filter inspects every link and rich-content element. Links from domains on the allow-list remain clickable. Everything else is reduced to a plain-text URL. Embedded images from untrusted sources are stripped. Rich preview cards are suppressed. The model's output is not changed; what changes is how much of it the renderer is permitted to activate.
The two standard modes are raw-URL rendering, where all links are downgraded to plain text and a click-confirmation interstitial may be offered, and allow-list rendering, where only explicitly trusted domains stay clickable. Raw-URL rendering provides stronger assurance; allow-list rendering preserves more usability for products that regularly cite known-good sources.
Markdown image exfiltration is a related attack this control also addresses. A prompt injection payload can instruct the agent to emit a markdown image tag pointing to an attacker-controlled server, encoding sensitive context as URL query parameters. The server receives the request when the image renders. Stripping or sandboxing image rendering from untrusted sources removes that exfiltration path.
Detection signals
- Output filter hit rate per agent. A rising rate indicates the agent is being fed attacker-controlled content from an upstream source.
- User-reported missing-link feedback. Sustained volume points to over-blocking from an overly narrow allow-list.
Threats it covers
-
WHY IT HELPS Indirect prompt injection plants attacker-controlled instructions in content the agent retrieves, such as a document, web page, or tool result. One class of payload instructs the agent to include a crafted link in its output to complete the delivery loop. Preventing the agent from rendering that link as a clickable anchor removes the mechanism the attacker depends on for user interaction.
-
WHY IT HELPS AI-assisted phishing requires the agent to present a manipulated link or preview in a form the user is likely to trust and click. Restricting rendering to an explicit allow-list of trusted domains means an attacker-supplied link appears only as raw text, removing the affordance a phishing attack requires to succeed.
Principle coverage
Defence-in-Depth stage: Prevent — and it advances:
- Attack Surface Minimization Rendering restriction removes a combinatorial class of attack surface from the output layer: every link or image element that could be exploited as a delivery vector is either eliminated or bounded to an explicit allow-list, shrinking the reachable surface without altering the agent's functional output.
Design & governance principles (open design, economy of mechanism, accountability, …) are architectural, not advanced by a single placed control.
Implementation options
Implementation options vary by stack. Choose the one that sits closest to the final rendering step.
DOMPurify Client-side DOM sanitiser that removes dangerous HTML, including unsafe href protocols and event handlers, before the browser renders it. Configurable via ALLOWED_TAGS and ALLOWED_ATTR allowlists.
Why choose it: Mature (v3.4.7 stable), zero dependencies, runs in-browser or in Node via jsdom. Explicitly designed for untrusted HTML from LLM or user input. Removing the <a> tag entirely or stripping href attributes collapses links to plain text in one call.
More details:
rehype-sanitize Unified/rehype plugin that scrubs HTML ASTs in remark/rehype markdown pipelines. Strips javascript: hrefs, event handlers, and any element not on the schema allowlist.
Why choose it: Drop-in for Astro, Next.js, and any remark/rehype pipeline. Applies sanitisation at the AST level before HTML serialisation, so there is no post-hoc string munging. Current release: 6.0.0 (MIT).
More details:
js-xss with link filtering Node/browser XSS sanitiser that accepts a whitelist of allowed tags and attributes. Configuring it to exclude <a> tags or scrub href attributes turns links into plain text with a single options object.
Why choose it: Useful when DOMPurify is not available (for example, server-side rendering without jsdom). The onTagAttr hook lets you inspect and rewrite individual href values, stripping any href that does not match a trusted-domain regular expression rather than removing anchors entirely.
More details:
Trusted-domain allow-list A thin function in the rendering layer that accepts a URL string and returns either the original URL (domain is on the allow-list) or a plain-text representation (domain is not). No third-party dependency required.
Why choose it: Zero new dependencies. The logic is under full product control, easy to audit and update when the allow-list changes. Suitable for teams with a small, stable set of trusted domains and an aversion to transitive dependencies.
More details:
Slack unfurl API flags Slack's chat.postMessage API exposes unfurl_links and unfurl_media boolean flags. Setting both to false prevents Slack from generating rich link previews for any URL in an agent-posted message.
Why choose it: Platform-native control: no client-side sanitiser is needed when delivering agent output through Slack. Slack's own documentation explicitly recommends disabling unfurls when posting LLM-generated messages to reduce prompt injection surface.
More details:
Trade-offs
- Users expect clickable links; raw-URL rendering creates copy-paste friction, especially on mobile.
- Allow-list maintenance is an ongoing operational task. A stale list over-blocks legitimate domains or silently under-blocks newly acquired attacker domains.
When NOT to use
- Agents whose core function is link delivery (URL shorteners, bookmark managers, developer tools producing code with URLs): stripping links breaks the product value proposition.
- Adversarial content that does not rely on links at all (plain-text social engineering, phone-number extraction): use output moderation instead.
Limitations
- Does not address social-engineering instructions that require no link (for example, instructions to call a number or forward text to a colleague).
- A sufficiently motivated attacker can encode URLs as QR codes or instruct the user to copy text manually. Rendering restriction addresses the clickable-link and markdown-image vectors only.
Maturity tier reasoning
- DOMPurify and rehype-sanitize are production-grade, widely deployed primitives. Each is Tier 1 maturity in isolation.
- Applying them as an agentic output gate is an operational composition: the pattern is established but the agentic wiring is team-built, which places this control at Tier 2. Adjacent controls such as output moderation and output egress DLP address vectors this control does not reach.
Last verified against upstream docs: 2026-05-30.