ASI05: Unexpected Code Execution (RCE)

Definition

In an agentic system, code generation and code execution happen in the same turn: the model emits an instruction and a tool runs it, with no human review step between. Attackers exploit this by injecting execution payloads into the agent's inputs; the realistic defence is at the runtime boundary (sandboxing, capability restriction, egress control), not at the generation step.

What it means in practice

A classic web app generates code that a human reviews before it runs. An agent generates code and runs it in the same turn. The model emits an instruction (a shell command, a SQL statement, a Python snippet) and a tool executes it. The only thing between generation and impact is whatever sandbox the executing tool runs in.

Filtering generation is hard. LLMs are flexible enough that input sanitisation cannot reliably catch every malicious shape. The realistic containment is at execution: capability-restricted sandboxes, ephemeral filesystems, blocked outbound network, per-tool execution budgets. Assume the agent will eventually generate something bad and design the runtime so the blast radius is bounded.

Threat catalogue links

Base-catalog T-numbers follow OWASP source material; normalized MAS scenario entries are Helmwart editorial cross-references. Role colour-codes Helmwart's display weight: chips in the hero use the same scheme.

Primary: strongest pivot. Removing this T-number would gut the entry. Contributing: co-equal mechanism that combines with others to produce the ASI risk. Related: touches the entry but isn't its core; useful cross-reference.

T11 Unexpected RCE and Code Attacks primary

Code-execution paths in agents accept attacker-influenced input and run as arbitrary code.
Open threat detail →
T20 Framework Vulnerability Leading to Code Injection primary

Bug in the agent framework enables code injection into the agent execution context.
Open threat detail →

MITRE ATLAS technique

MITRE ATLAS catalogues adversary techniques against AI systems. The technique(s) below represent the red-team pivot for this entry: what an attacker is actually doing on the wire. Source: mitre-atlas/atlas-data v5.6.0.

AML.T0102 Generate Malicious Commands view on ATLAS ↗

Adversary uses an LLM to dynamically generate malicious commands from natural language, producing attack signatures that vary across executions.

Agentic angle: Agents with code-execution tools can be prompted to generate and immediately run adversary-crafted commands, collapsing generation and execution into one step.

OWASP LLM Top 10 cross-references

From OWASP Appendix A (canonical inheritance)

LLM01:2025 Prompt Injection LLM05:2025 Improper Output Handling

Recommended mitigations

No single control answers an ASI; it is met by a layered stack. The cards below are ranked by how directly each control counters ASI05: the chips on each card name the threat of this ASI it actually covers, colour-coded by that threat's role.

Counters the core

Cover one or more of this ASI's primary threats — the strongest direct response.

Code-generation review gate — human approval before AI-generated code executes or merges Tier 2

T11T20

An AI coding agent produces code that can be executed or merged to a production branch without a human ever reading it. If the agent has been manipulated, its generated code can contain hidden payloads, backdoors, or privilege-escalating logic. A code-generation review gate prevents that: every change attributable to an AI agent must pass automated static analysis and receive explicit human approval before it can merge or execute, and the agent identity that authored the change is structurally barred from also approving it.

gVisor sandbox — a user-space kernel that intercepts every syscall a container makes Tier 1

T11T20

When an agent executes generated or retrieved code, that code runs as a process with access to the host kernel. A vulnerability in the generated code, or a deliberate exploit injected through the agent's prompt, can reach the kernel and affect other workloads or the host itself. gVisor prevents this by inserting a user-space kernel implementation between the container and the host: the container's syscalls go to the Sentry process, not to the host kernel, so the reachable attack surface from inside the container is structurally smaller.

Human dual-control — four-eyes rule for irreversible high-impact approvals Tier 2

T11

An AI agent operating with broad authority can propose actions that are irreversible: deleting records, modifying IAM policies, moving funds. A single human reviewer at the approval gate is a single point of failure, one compromised account, one fatigued reviewer, or one successful social-engineering attempt is enough to commit the action. Human dual-control addresses that by requiring two distinct, independent humans to approve before the action commits.

Kill switch: human authority to halt one agent, a class, or the entire deployment Tier 2

T11

Agentic systems can act faster than a human can intervene through normal channels. A kill switch is the operational guarantee that a named human role can stop agent activity at any scope (single instance, class, or global) through a documented runbook, without requiring a code change or redeployment, and with every invocation written to an audit trail.

Secret scanning on agent-generated artefacts — detecting credentials before they escape the trust boundary Tier 2

T11

An agent produces code, configuration files, tool-call payloads, and log records continuously and at a rate no human reviewer can match. Any of those artefacts may contain a live API key, service token, or private certificate, placed there accidentally through model context, or deliberately through prompt injection or context poisoning. Secret scanning places an inspection gate at every agent output seam: regex patterns match known token formats, entropy analysis detects arbitrary high-entropy strings, and validator calls confirm which candidates are live credentials. The CI-secret-scanning pattern is mature; the agentic specialisation is seam placement, moving the scanner from the repository gate to the agent egress point, where artefacts can be intercepted before they reach any downstream system.

Static analysis on generated code — a pre-execution gate on LLM-emitted artifacts Tier 2

T11

An agent that can generate and execute code treats code generation as a tool call and code execution as the outcome. If the generated code contains a known-dangerous pattern, no amount of prompt engineering stops it from running once the execute call goes through. Static analysis closes that gap: it scans every code artifact the agent emits against a rule set before execution is permitted, catching the vulnerability patterns the same tooling already catches in human-written code.

OWASP Top 10 for Agentic Applications 2026 (canonical source) ↗ · OWASP Gen AI Security Project · Dec 2025 · CC BY-SA 4.0
Agentic Top 10 side-by-side explainer ↗ · trydeepteam.com · secondary reference