Definition
Overwhelming Human-in-the-Loop is the failure mode where attackers exploit human oversight dependencies in agentic systems by saturating reviewers with intervention requests, decision fatigue, or cognitive overload. The HITL gate technically exists, but human capacity cannot keep up with multi-agent operation rates. The result is rushed approvals, reduced scrutiny, and systemic decision failures.
What it looks like in practice
OWASP v1.1 names three scenarios:
HII Manipulation. A procurement agent surfaces fifty tool-call approvals per shift to a single security reviewer. The attacker, who has gained write access to the intervention interface, modifies the summary text shown alongside each request, suppressing the destination URL of an outbound API call and replacing it with a benign-looking label. The reviewer sees “export analytics snapshot” rather than “POST customer PII to external endpoint”. Because the interface reformats the raw action into prose before presenting it, the manipulation is invisible to the reviewer until the exfiltration is logged downstream, by which point the data has already left the perimeter.
Cognitive Overload and Decision Bypass. A multi-agent customer-onboarding pipeline generates compliance review items at roughly 120 per hour during peak load. The human reviewer’s sustainable rate is around 30 per hour of careful evaluation. To avoid queue growth, the reviewer shifts to pattern-matching: large dollar amounts get scrutiny; everything else is approved in under five seconds. An attacker who knows the heuristic fragments a large suspicious transfer into nine sub-threshold transactions, each of which sails through under the cognitive shortcut. The audit log truthfully records “human approved” for every step.
Trust Mechanism Subversion. Over several weeks, an attacker with access to one sub-agent periodically introduces small inconsistencies in that agent’s outputs: a slightly off confidence score here, a marginally incorrect entity label there. The HITL reviewer notices the pattern and begins to discount that agent’s flags, treating them as noisy. Once distrust is established, the attacker escalates: the rogue sub-agent now confidently flags legitimate alerts as false positives. The reviewer, already calibrated to distrust the source, dismisses them. The real threats go unreviewed.
Why it’s dangerous
Agentic systems drive HITL throughput far beyond the rate at which a human can sustain attention. Multi-agent fan-out makes the problem worse: one prompt can produce dozens of items requiring review. Once approval-by-default takes hold, the gate is worse than nothing: the audit trail then records human consent for decisions the human never actually evaluated.
The ACM Europe TPC’s policy brief on agentic-AI governance (see Governance primer) sharpens this: Article 14 of the EU AI Act mandates human oversight, but the brief argues this generic requirement is insufficient for agents and proposes alignment oversight: verifying that the agent’s actual behaviour tracks its declared objectives. Generic oversight fails here. When reviewers rubber-stamp because they are saturated, alignment drift is undetectable, and a click-through gate without legibility of the agent’s behaviour does not constitute mitigation.
Where it manifests
Check the sustainable per-reviewer decision rate and whether the queue is risk-prioritised so reviewers see the most consequential items first. Check how decisions are summarised for reviewers and whether patterns of approval-without-rejection are monitored. Also check whether the HITL interface itself can be manipulated to obscure critical context from the reviewer.
Detection signals
Approval-queue saturation and rubber-stamping are measurable before a breach occurs.
- Review duration below floor threshold: alert when the average time-per-decision drops under a configurable minimum (e.g., under 8 seconds for any action classified high-risk), as recorded in the HITL session event log.
- Zero-rejection streaks: alert when a reviewer approves 50 or more consecutive items without a single rejection or deferral; a zero-rejection streak of this length is statistically implausible under genuine scrutiny.
- Queue depth vs. throughput divergence: track the ratio of items entering the review queue to items cleared per hour; a sustained ratio above 3:1 indicates saturation pressure that predicts decision-quality degradation.
- Interface content diff against raw action: compare the human-readable summary shown in the HITL interface to the raw structured action payload; log and alert on any field present in the payload that is absent or altered in the rendered summary.
- Repeated rejection reversals on the same item type: if an item type that was previously rejected begins receiving consistent approvals from the same reviewer without a policy change, flag for secondary review, as a sign of drift or trust-mechanism subversion.
OWASP Top 10 for Agentic Applications 2026
The Agentic Top 10 (ASI01 through ASI10) is a separate practitioner-facing publication that maps onto the master Threats & Mitigations threat numbering. T10 is covered by the following Top 10 entries:
-
ASI09 Human-Agent Trust Exploitation primary Adversaries exploit the tendency of humans to trust fluent, authoritative-sounding agents: an agent presents plausible justification for a harmful action, the human approves it, and the resulting audit trail reads as deliberate human authorisation. The attack surface is the review step itself: human-in-the-loop oversight becomes the vector when reviewers lack the context, time, or authority to challenge what the agent recommends.
Source: OWASP Top 10 for Agentic Applications 2026 (Dec 2025) · the Top 10 is a compass into the master Threats & Mitigations taxonomy, not a replacement for it.
Design principles at stake
When T10 is present, these security design principles are the ones being violated or tested. Each links to the full principle; the mitigations below are how you restore them.
- Defence-in-Depth The HITL gate technically exists but human capacity cannot keep up with multi-agent throughput, so the gate degrades to a rubber stamp and depth collapses to one probabilistic layer. The controls that restore depth are independent and structural: a risk-prioritised queue so reviewers see the most consequential decisions first (not a proportional reduction in volume), decision summaries that compress per-review cognitive load to a constant, and rate-limiting on proposal volume enforced by a watchdog the agent cannot disable. Each is independent, so the gate survives even when one layer is under saturation pressure.
- Human Oversight (HITL / HOTL) T10 is the direct attack on the human-oversight mechanism itself: by generating excessive tasks, artificial time pressure, and complex decision scenarios, an attacker degrades the gate from meaningful review to approval-by-default. Once that happens, the audit trail records consent for decisions the human never evaluated. Signed action-bound approval tokens with short expiry prevent a historic approval being replayed, while a watchdog that pre-filters proposals and enforces a sustainable per-reviewer decision rate ensures the oversight gate never becomes the denial-of-service target.
- Psychological Acceptability T10 succeeds because the secure path (reading each proposal carefully) is cognitively unsustainable at multi-agent throughput; reviewers do not bypass the gate through laziness but through genuine saturation, and the result is functionally identical. Decision summaries that present a constant-format rationale alongside each proposal, and batched plan-mode review that surfaces consequential steps rather than individual micro-prompts, make the careful path as cheap as the reflexive one and remove the saturation attack surface.
- Safety / Harm-limitation Once approval-by-default takes hold, irreversible actions accumulate human-attested consent records for decisions no human actually evaluated. Because harm arrives faster than any saturated reviewer can intervene, the convergence point of safety and T10 is the same: mandatory human approval before irreversible actions must be technically enforced, not merely prompted, and the watchdog must pre-filter volume so the gated actions that reach a reviewer are the ones that matter.
Recommended mitigations
Auto-generated from the mitigation catalog: every mitigation whose coverage map includes T10, sorted by maturity tier (Tier 1 production-canonical first, then Tier 2, then Tier 3 research-stage).
- Tier 2 Adaptive load (Adaptive workload balancing — distribute reviews by measured reviewer fatigue)
Human reviewers make more errors as cognitive load accumulates over a shift. An adversary who floods a HITL gate, or a system that simply generates high output volume, exploits that degradation without bypassing the gate at all. Adaptive workload balancing addresses this by treating reviewer fatigue as a live routing input: each incoming review is assigned to the reviewer with the lowest current fatigue score, mandatory breaks are enforced before a reviewer's error rate climbs further, and items are held rather than assigned to any reviewer above the break threshold.
why it helps Overwhelming HITL is the deliberate or incidental saturation of the human review layer beyond reliable decision capacity, achieved by flooding the gate with volume, injecting cognitively complex items, or spreading the same reviewer pool across multiple concurrent agent pipelines. Fatigue-aware routing reduces that saturation by distributing load according to measured fatigue state rather than queue position, and by enforcing mandatory breaks before a reviewer's decision quality degrades further.
-
When an AI agent generates content or proposes an action, users need to know that the source is an AI before they decide to act. Without that signal, users routinely over-trust agent output. AI-source disclosure addresses this by attaching a visible label to every AI-generated item and by requiring explicit confirmation for consequential actions, restoring the critical gap between receipt and acceptance.
why it helps OWASP T10 Excessive Agency arises in part from users accepting agent output without the scrutiny they would apply to human-authored content. A persistent, visible AI-source label at the decision point reduces uncritical acceptance: users who can see the AI provenance are more likely to pause before approving a proposed action.
-
When an agent decision reaches a human reviewer, the reviewer must reconstruct the agent's reasoning from raw traces before they can form a judgment. OWASP T10 names this reconstruction burden as the mechanism behind reviewer fatigue and oversight failures. A decision summary addresses the problem by inserting an independent model call between the agent's output and the reviewer: that call compresses the decision, evidence chain, and risk factors into a fixed-format card, reducing the per-review cognitive load without removing the human from the decision.
why it helps Overwhelming HITL occurs when the volume or complexity of agent decisions makes human review practically impossible, causing reviewers to approve without reading or to miss high-risk decisions under cognitive load. An independent decision summary cuts the per-review reconstruction work to a fixed cost: the reviewer reads a structured card rather than traversing a raw reasoning trace, keeping review thoroughness viable at scale.
- Tier 2 HITL calibration loop (HITL feedback-loop calibration — reviewer overrides fed back into agent tuning)
An agent at a human-in-the-loop gate will be overridden when its decisions do not match the reviewer's judgment. Without a return path, those corrections are discarded: the same miscalibration surfaces again in the next review cycle and the one after that. A feedback loop closes that gap by capturing each override event as a structured record, accumulating those records into a calibration dataset, and using patterns in that dataset to drive targeted changes to the agent's system prompt, tool-scope policy, or divergence-monitor thresholds. A well-calibrated agent produces fewer out-of-distribution decisions, so the review queue contracts over time.
why it helps Overwhelming the HITL gate is the threat that a sustained volume of agent decisions requiring human intervention exhausts reviewer capacity and degrades oversight quality. A well-calibrated agent produces fewer out-of-distribution decisions, so the review queue contracts over time. The feedback loop is the mechanism by which calibration data generated by queue activity is converted into prompt and policy changes that reduce that activity.
-
A human-in-the-loop review system saturates not from absolute decision volume but from undifferentiated volume: every item lands at the same priority, so reviewers cannot distinguish an irreversible high-consequence action from a routine low-stakes one. A risk-prioritised queue fixes this by scoring each decision before it enters the queue and routing it to the tier that matches its risk level, concentrating human attention where the cost of an error is highest.
why it helps Overwhelming HITL is the saturation of human review capacity by undifferentiated decision volume, identified in OWASP Agentic AI v1.1 as a primary driver of reviewer fatigue and the resulting degradation of oversight quality. A risk-prioritised queue addresses this directly: it replaces uniform-priority routing with scored tiers, so the senior-reviewer queue contains only genuinely high-consequence items and low-risk volume clears through an auto-approve path without consuming reviewer time.
Multi-agent variants: OWASP MAS Guide
The OWASP OWASP MAS Threat Modelling Guide v1.0 catalogues 1 named multi-agent variant of T10, anchored to specific MAESTRO layers. Each is a concrete attack pattern that emerges when this threat compounds across agents.
- CL Misconfigured Inter-Agent Monitoring extends T8, T10
Gaps in cross-agent monitoring let anomalous behaviour go undetected.
Source: OWASP MAS Threat Modelling Guide v1.0, §2 Overview of MAESTRO Framework — Extended Threat Scenarios + Cross-Layer table.
Red-team pivot: MITRE ATLAS techniques
MITRE ATLAS catalogues adversary techniques against AI systems. Where this OWASP threat has an attacker-perspective counterpart, the ATLAS technique is shown below. That is what a red team would actually be doing on the wire. Use this for detection-signal anchoring, threat-hunting hypotheses, and IR runbooks. Source: mitre-atlas/atlas-data v5.6.0.
AML.T0046 Spamming AI System with Chaff Data view on ATLAS ↗ Adversary floods the AI system with low-value inputs to crowd out legitimate signals, mask attacker activity, or drive up cost.
AML.T0080 AI Agent Context Poisoning view on ATLAS ↗ Adversary contaminates an agent's context store (short-term scratchpad, vector memory, conversation history) so future reasoning is biased toward attacker goals.
Agentic angle: Persistent across sessions: a single successful poisoning influences every later decision until the memory is purged.
Sources
- OWASP-Agentic-AI ↗ · 1.1 (Dec 2025) · Agentic Threats Taxonomy Navigator §Step 5 — Human Related Threats
- MAESTRO ↗ · 1.0 (Apr 2025) · Layer 5 Evaluation & Observability