T4: Resource Overload

Definition

Resource Overload is the deliberate exhaustion of an agent’s compute, memory, external service quotas, or downstream API budget. It looks like classical denial-of-service in its outcome, but the attack surface is different: agents autonomously schedule, queue, and execute tasks across sessions, can self-trigger work, and can coordinate with other agents, so a single trigger can fan out into exponential resource consumption.

What it looks like in practice

Inference Time Exploitation. A legal-document analysis agent accepts arbitrary user-submitted text for clause extraction. An attacker submits a document that is not obviously malicious but is structured to maximise reasoning load: deeply nested conditional clauses, self-referential definitions, and ambiguous pronoun chains that force the model to backtrack repeatedly during extraction. Each such document takes 40× the compute of a normal submission. By queuing 20 concurrent submissions, the attacker saturates the GPU allocation for the entire tenant, making the agent unavailable to legitimate users and pushing the operator’s inference bill well beyond its monthly budget cap.

Multi-Agent Resource Exhaustion. An orchestrator agent distributes research sub-tasks to a pool of worker agents. Each worker can itself spawn further sub-workers to parallelise its task. An attacker who controls one user’s task submits a request that causes the orchestrator to spawn 50 sub-tasks, each of which recursively spawns 50 more. Without a depth-limit or fan-out cap, the orchestration layer fans out to 2,500 simultaneous worker instances within seconds, exhausting the container quota for the entire platform and denying service to all other tenants on the shared cluster.

API Quota Depletion. A content-generation agent calls a third-party geocoding API to enrich location references in generated content. The geocoding plan carries 10,000 calls per day. An attacker submits a batch job requesting summaries of 15,000 place-names across multiple sessions. The agent faithfully makes one geocoding call per name, exhausts the daily quota within the first hour, and all subsequent agent invocations (legitimate or malicious) that depend on geocoding fail with a quota error for the remaining 23 hours.

Memory Cascade Failure. A long-running agent is tasked with aggregating a large data feed and building an in-memory index for rapid look-up. An attacker submits a feed file that is 200 MB of valid but highly unique records, each triggering an embedding and index insert. The agent allocates memory proportionally to record count without a cap, exhausts the container’s memory limit, and crashes. Because the agent holds a write lock on the shared index, the crash leaves the index in an inconsistent state, causing downstream agents that depend on it to fail until a full rebuild completes.

Why it’s dangerous

Conventional services expose request paths an attacker has to drive directly. Agents expose a far more permissive interface: any input that the agent might decide to act on is potentially a request multiplier. Reflection loops, multi-agent fan-out, and scheduled retries make the cost of a single malicious instruction unbounded if not explicitly capped.

Where it manifests

The exposure surfaces are inference cost per task (which can grow unboundedly under crafted inputs), per-tenant API quotas to downstream services, the depth counter for self-reflection or planning loops, and the rate of inter-agent task delegation.

Detection signals

Monitor at the task-queue, compute-billing, and external-API layers:

Per-session inference token count exceeding a per-session cap (e.g. 3× the p99 of historical session token usage for that agent type). Alert before the billing spike, not after.
Agent-spawned sub-task count crossing a fan-out threshold within a single orchestration trace (e.g. more than 50 worker instances attributed to a single originating task ID), checked at the task-queue level before agents are started.
Third-party API call rate per minute from any single agent identity exceeding a declared safe-call-rate ceiling for that API, with an automatic circuit-breaker that queues excess calls rather than submitting them.
Container or process resident-set-size (RSS) growth rate that exceeds a linear trend relative to input record count. A sharp super-linear growth curve is an early signal of unbounded allocation before the OOM kill occurs.
Time-in-loop counter: a planning or reflection loop that has executed more than N iterations without emitting a final action, triggering a forced termination and an alert for human review.

OWASP Top 10 for Agentic Applications 2026

The Agentic Top 10 (ASI01 through ASI10) is a separate practitioner-facing publication that maps onto the master Threats & Mitigations threat numbering. T4 is covered by the following Top 10 entries:

ASI02 Tool Misuse and Exploitation contributing

An agent applies authorised tools in ways their operator did not intend, driven by prompt injection, misaligned reasoning, or manipulated tool outputs. Every individual call looks clean; the harm is in the sequence: data exfiltrated via successive reads, workflows hijacked by parameter tampering, or a legitimate API weaponised across turns.

OWASP LLM Top 10: LLM06:2025
ASI06 Memory & Context Poisoning related

An adversary writes malicious or misleading data into an agent's persistent memory or shared vector store, so that every future session, and every peer agent reading from the same store, operates on corrupted context. The defining difference from single-turn injection (ASI01) is that the poisoned data survives session reset; the agent's reasoning drifts without any new attacker input.

OWASP LLM Top 10: LLM01:2025 LLM04:2025 LLM08:2025

Source: OWASP Top 10 for Agentic Applications 2026 (Dec 2025) · the Top 10 is a compass into the master Threats & Mitigations taxonomy, not a replacement for it.

Design principles at stake

When T4 is present, these security design principles are the ones being violated or tested. Each links to the full principle; the mitigations below are how you restore them.

Defence-in-Depth No single cap can stop resource overload because reflection loops, multi-agent fan-out, and scheduled retries each open separate amplification paths. The controls must therefore be independent: per-tenant API quota enforcement at the gateway, a hard hop and recursion depth limit at the orchestrator, and a cost-velocity circuit breaker that trips autonomously. None of these can be bypassed by defeating another layer. A crafted input that defeats the quota alone still hits the orchestrator's hop limit; defeating that still hits the cost breaker.
Attack Surface Minimization Every additional tool the agent can call is a potential inference trigger that a crafted input can activate; over-tooled agents are what produce the multi-agent fan-out and memory cascade failures this threat describes. Reducing the registered toolset to what each task strictly requires removes the multiplier paths before any quota or rate control needs to engage; the attack simply has fewer levers to pull.
Least Agency / Minimal Autonomy The dangerous property of resource overload is that a single trigger can fan out into autonomous, recursive work the user never intended: exactly what excess authority to decide and act enables. Capping the agent's authority to suggest-only for scheduling and requiring explicit confirmation before delegating sub-tasks breaks the self-triggering loop that turns one malicious instruction into unbounded consumption.
Reversibility / Dry-run / Hold periods Resource overload is compounded when the agent re-attempts failed or expensive actions without limit. A dry-run environment that projects the cost delta before any scheduled or recursive action executes lets operators block a runaway before it starts, and a configurable hold period on high-cost irreversible actions provides the window to catch a crafted input before the budget is exhausted.
Rate-limiting / Budgets / Loop prevention The threat describes inference time exploitation, API quota depletion, and multi-agent fan-out as distinct paths to unbounded consumption, each requiring its own ceiling. Token buckets per user, agent, and model tier; hard hop and recursion limits at the orchestrator; and a cost-velocity circuit breaker that escalates to human review rather than continuing are the three independent layers that stop a single malicious trigger becoming an overnight incident.

Recommended mitigations

Auto-generated from the mitigation catalog: every mitigation whose coverage map includes T4, sorted by maturity tier (Tier 1 production-canonical first, then Tier 2, then Tier 3 research-stage).

Tier 2 Graceful degradation (Graceful degradation — fail closed where it matters, fail open where it's safe)

An agent that encounters a quota trip, a dependency failure, or a timeout faces a choice: continue at reduced quality, or refuse. Getting that choice wrong is the core operational failure. Graceful degradation requires the answer to be declared before the incident, not improvised during it: write-authority paths fail closed and return a refusal; read-only paths fail open and disclose the degraded state explicitly.

why it helps Resource Overload is the deliberate exhaustion of an agent's compute, memory, or budget, forcing it into a degraded operating state. Whether that overload becomes a security incident depends entirely on how the agent responds when its resources are exhausted: fail-open silence allows it to continue acting on write-authority paths without the capacity to do so safely, while a pre-declared fail-closed response stops the action and logs the denial.
Tier 2 Kill switch (Kill switch: human authority to halt one agent, a class, or the entire deployment)

Agentic systems can act faster than a human can intervene through normal channels. A kill switch is the operational guarantee that a named human role can stop agent activity at any scope (single instance, class, or global) through a documented runbook, without requiring a code change or redeployment, and with every invocation written to an audit trail.

why it helps Resource Overload is the deliberate exhaustion of compute, budget, or API quota through runaway agent behaviour. A kill switch is the backstop when softer controls (rate limits, budget quotas) have not stopped the drain. It halts the consuming agents before the damage extends to the named scenarios of service disruption, cost manipulation, and cascading failure.
Tier 2 Loop limit (Reflection-loop depth limit — a ceiling on how often an agent reworks its own answer)

An AI agent can review and rewrite its own answer to improve it. If that review runs too long it ties up resources and stops the agent responding in time, and an attacker can deliberately trigger those endless cycles to stall the system. A reflection-loop depth limit prevents that: it sets how many review rounds an agent may run before it has to stop.

why it helps Resource Overload is the deliberate exhaustion of an agent's compute, memory, or budget. An unbounded reflection loop is one route to it, with the agent consuming resources by repeatedly reprocessing its own output, and a depth limit closes that route.
Tier 2 Rate limits and quotas (Per-agent rate limits and quotas — bound compute, tokens, and external-API spend)

An agent operates without direct human oversight, autonomously scheduling tool calls, external API requests, and reflection loops. Without a budget, a single triggering event can fan out into hundreds of downstream calls. Per-agent rate limits and quotas assign each agent identity its own ceiling on call rate, token consumption, and cost spend, so a misbehaving or compromised agent cannot exhaust shared resources and its overconsumption becomes a visible, actionable signal.

why it helps Resource Overload is the deliberate or accidental exhaustion of compute, memory, token, or cost budgets shared by the system. An agent whose call rate, token spend, and cost are each bounded by a per-identity quota cannot exhaust those resources beyond its ceiling, regardless of how many tool calls its planning or reflection loop schedules.

Multi-agent variants: OWASP MAS Guide

The OWASP OWASP MAS Threat Modelling Guide v1.0 catalogues 3 named multi-agent variants of T4, anchored to specific MAESTRO layers. Each is a concrete attack pattern that emerges when this threat compounds across agents.

L4 Distributed Denial of Service extends T4, T14

Coordinated DDoS targeting groups of agents, triggering cascading failure.
CL Systemic Resource Starvation extends T4

Inter-agent loops trigger system-wide resource exhaustion; whole MAS collapses.
CL Resource Overload (multi-agent) extends T4

Coordinated requests overwhelm many agents simultaneously across layers.

Source: OWASP MAS Threat Modelling Guide v1.0, §2 Overview of MAESTRO Framework — Extended Threat Scenarios + Cross-Layer table.

Catalogue extensions: Helmwart T18 to T49

This normalized catalogue includes 3 multi-agent entries based on the OWASP MAS Threat Modelling Guide v1.0 that extend T4. The source guide reuses some numbers between worked systems; these Helmwart entries provide stable detail pages, MAESTRO layers, and mitigation coverage.

T25 Workflow Disruption via Dependency Exploitation
Attacker disrupts the workflow by attacking a dependent system (approval agent, payment processor) rather than the primary agent itself.
T32 Runaway Agent on Solana
An agent enters a runaway loop and submits transactions at high frequency, incurring cost and disrupting the broader agent ecosystem.
T39 Unintended Resource Consumption via MCP
An autonomous agent loops over MCP tool invocations far beyond task requirements, overloading the MCP server or connected systems.

Red-team pivot: MITRE ATLAS techniques

MITRE ATLAS catalogues adversary techniques against AI systems. Where this OWASP threat has an attacker-perspective counterpart, the ATLAS technique is shown below. That is what a red team would actually be doing on the wire. Use this for detection-signal anchoring, threat-hunting hypotheses, and IR runbooks. Source: mitre-atlas/atlas-data v5.6.0.

AML.T0029 Denial of AI Service view on ATLAS ↗

Adversary exhausts compute, memory, or rate-limit budgets so the AI system stops responding or stops processing legitimate requests.

AML.T0034 Cost Harvesting view on ATLAS ↗

Adversary intentionally inflates the victim's inference bill (long prompts, expensive tools, repeated calls) to cause financial harm rather than service disruption.

Agentic angle: Loop-amplification by an agent can exceed a quarterly budget in minutes.

AML.T0034.002 Agentic Resource Consumption view on ATLAS ↗

Adversary coerces an agent into performing expensive tool calls (excessive API queries, fan-outs, or recursive self-delegation loops) to waste compute and API budgets.

Agentic angle: Prompt injection directives like "summarize 1000 times" or recursive sub-agent spawning can burn budgets in a single task.

AML.T0046 Spamming AI System with Chaff Data view on ATLAS ↗

Adversary floods the AI system with low-value inputs to crowd out legitimate signals, mask attacker activity, or drive up cost.

Sources

OWASP-Agentic-AI ↗ · 1.1 (Dec 2025) · Agentic Threats Taxonomy Navigator §Step 3; Threat Model T4
MAESTRO ↗ · 1.0 (Apr 2025) · Layer 4 Deployment Infrastructure; Cross-Layer Systemic Resource Starvation