EVIDENCE TRAIL

Restricted link / HTML rendering

Verbatim excerpts from the upstream sources cited on the mitigation page, with what each source does and does not prove. The title "restricted link / HTML rendering" is Helmwart's normalised label — the closest upstream phrase appears in OWASP Agentic AI v1.1 §T15, which names "limit the agent's ability to print links" as a mitigation action.

Last cross-checked against upstream sources: 2026-05-29 · 7 sources

References

Each entry shows what the source supports and what it does not prove.

Reference 1

v1.1 · published December 2025

OWASP Agentic AI — Threats & Mitigations v1.1

§T15 Human Manipulation — Mitigation (summary table)

"Monitor agent behavior to ensure it aligns with its defined role and expected actions. Restrict tool access to minimize the attack surface, limit the agent's ability to print links, implement validation mechanisms to detect and filter manipulated responses using guardrails, moderation APIs, or another model"

Supports: Verbatim source for the core control: "limit the agent's ability to print links." This is the closest upstream phrase to Helmwart's link-rendering restriction. Also names guardrails and validation as paired controls.

Does not prove: Does not prescribe a specific renderer implementation, allow-list model, or HTML-sanitisation library. T15 threat description names clicking a phishing link as the harm path, but the mitigation stays at the level of restricting link output, not rendering policy.

open original ↗

Reference 2

v1.1 · published December 2025

OWASP Agentic AI — Threats & Mitigations v1.1

§T15 Human Manipulation — Example 2 (scenario table)

"Through a compromised agent, an attacker instructs the agent to tell the user to click on a malicious link. The user unknowingly click on the link is redirected to a phishing which is used to take over the user's account"

Supports: Confirms that agent-delivered clickable links are the delivery mechanism for account-takeover phishing in the canonical threat scenario. This is the attack model that link-rendering restriction directly defeats.

Does not prove: Attack scenario only; not a mitigation statement. Does not discuss T6 (Broken Goal / Indirect Prompt Injection) separately — Helmwart's MDX notes T6 coverage because IPI is the common upstream injection vector that produces the phishing output.

open original ↗

Reference 3

Version 2026 · published 2025

OWASP Top 10 for Agentic Applications 2026

§ASI09 Human-Agent Trust Exploitation — Mitigation 8 "Human-factors and UI safeguards"

"Human-factors and UI safeguards: Visually differentiate high-risk recommendations using cues such as red borders, banners, or confirmation prompts, and periodically remind users of manipulation patterns and agent limitations."

Supports: Names UI-level presentation controls (banners, confirmation prompts) as a defence against agent-mediated manipulation, the same output-presentation layer that link-rendering restriction operates on.

Does not prove: Mitigation is a UI-warning approach, not a link-stripping mandate. Does not name raw-URL rendering or allow-list link policies.

open original ↗

Reference 4

LLM Top 10 v2 · 2025

OWASP LLM Top 10 2025: LLM05 Improper Output Handling

§LLM05:2025 Improper Output Handling — Prevention and Mitigation Guidelines

"Encode model output back to users to mitigate undesired code execution by JavaScript or Markdown. Implement context-aware output encoding based on where the LLM output will be used (e.g., HTML encoding for web content, SQL escaping for database queries)."

Supports: Establishes the principle of context-aware output encoding for LLM responses before they reach a consumer. HTML encoding for web content is the foundational control that link-rendering restriction specialises for the link-delivery case.

Does not prove: Does not name clickable-link stripping or allow-list rendering as distinct controls. Treats all LLM output generically, not the specific phishing-via-agent-link threat model.

open original ↗

Reference 5

ATLAS catalogue (continuously updated)

MITRE ATLAS AML.M0020 — Generative AI Guardrails

AML.M0020 Generative AI Guardrails — description field (ATLAS.yaml)

"Guardrails are safety controls that are placed between a generative AI model and the output shared with the user to prevent undesired inputs and outputs. Guardrails can take the form of validators such as filters, rule-based logic, or regular expressions, as well as AI-based approaches, such as classifiers and utilizing LLMs, or named entity recognition (NER) to evaluate the safety of the prompt or response."

Supports: Defines the guardrail pattern — output safety controls placed between model and user — that encompasses link-rendering restrictions. "Filters, rule-based logic, or regular expressions" maps directly to the allow-list and strip-by-regex implementation patterns.

Does not prove: Does not define the renderer-specific raw-link approach or name phishing as the target harm. Generic guardrail definition.

open original ↗

Reference 6

ATLAS catalogue (continuously updated)

MITRE ATLAS AML.M0033 — Input and Output Validation for AI Agent Components

AML.M0033 Input and Output Validation for AI Agent Components — description field (ATLAS.yaml)

"Implement validation on inputs and outputs for the tools and data sources used by AI agents. Validation includes enforcing a common data format, schema validation, checks for sensitive or prohibited information leakage, and data sanitization to remove potential injections or unsafe code. Validation should be performed external to the AI agent."

Supports: "Data sanitization to remove potential injections or unsafe code" and "checks for … prohibited information" both apply to link-bearing output. "Validation should be performed external to the AI agent" matches the rendering-layer-as-final-gate architecture Helmwart recommends.

Does not prove: Does not prove that rendering restrictions alone are sufficient — ATLAS M0033 frames this as one component in a broader validation chain, not a standalone control.

open original ↗

Reference 7

Q1 2025 · published 2025

APWG Phishing Activity Trends Report — Q1 2025

APWG Phishing Activity Trends Report Q1 2025 — headline statistics

"1,003,924 phishing attacks … the largest number since late 2023"

Supports: Quantifies phishing volume at over 1 million attacks in a single quarter, justifying the scale argument for this control. Also notes BEC wire-transfer fraud growing 33% quarter-on-quarter — the exact harm scenario named in OWASP T15 Example 1.

Does not prove: Does not discuss agent systems, AI-delivered phishing, or link-rendering implementation. Provides only the volume/persistence justification for why phishing defences matter; the agentic threat model comes from OWASP T15.

open original ↗