EVIDENCE TRAIL

Static analysis on generated code

Verbatim excerpts from the upstream sources cited on the mitigation page, with what each source does and does not prove. The title "static analysis on generated code — pre-execution scan" is Helmwart's normalised label. The closest verbatim upstream match is OWASP Top 10 Agentic 2026 §ASI05: "Do static scans before execution."

MDX citation correction: the MDX cites arxiv:2304.09655 for the Asare et al. Copilot study, but that identifier resolves to a paper on ChatGPT code security, not the Copilot comparative study. The correct identifier is arXiv:2204.04741 (Asare et al., "Is GitHub's Copilot as Bad as Humans at Introducing Vulnerabilities in Code?", 2023). This trail uses the correct reference.

Last cross-checked against upstream sources: 2026-05-29 · 7 sources

References

Each entry shows what the source supports and what it does not prove.

Reference 1

Version 2026 · published December 2025

OWASP Top 10 for Agentic Applications 2026

§ASI05 Unexpected Code Execution (RCE) — Prevention and Mitigation Guidelines, item 7

"Code analysis and monitoring: Do static scans before execution; enable runtime monitoring; watch for prompt-injection patterns; log and audit all generation and runs."

Supports: Verbatim directive to "do static scans before execution" in the context of agentic code generation. Closest upstream wording match for this control. Confirms the pre-execution placement of the scan gate.

Does not prove: Names the control as one item among seven; does not specify which SAST tools to use, the severity threshold for blocking, or the ensemble approach. Helmwart operationalises the detail.

open original ↗

Reference 2

v1.1 · published December 2025

OWASP Agentic AI — Threats & Mitigations v1.1

§Playbook 3 "Securing AI Tool Execution & Preventing Unauthorized Actions Across Supply Chains" — Step 2: Monitor & Prevent Tool Misuse and Supply Chain Anomalies (Reactive)

"Require human verification before AI-generated code with elevated privileges can be executed. Enforce execution control policies to flag AI-generated code execution attempts that bypass predefined security constraints."

Supports: Establishes the two-part gate pattern: a pre-execution flag/block step plus a human-verification escalation path for privileged code. Directly maps to the control's placement at the tool-execution seam.

Does not prove: T11 table row and Playbook 3 do not mention static analysis or SAST tooling by name — they describe the policy constraint (flag, require verification) rather than the technical mechanism. Helmwart supplies the SAST implementation.

open original ↗

Reference 3

Version 1.1 · published February 2022

NIST SP 800-218 — Secure Software Development Framework (SSDF)

Practice PW.7 "Review and/or analyze human-readable code to identify vulnerabilities and verify compliance with security requirements" — Task PW.7.1

"PW.7.1: Determine whether code review (a person looks directly at the code to find issues) and/or code analysis (tools are used to find issues in code, either in a fully automated way or in conjunction with a person) should be used, as defined by the organization."

Supports: Defines code analysis — automated tools finding issues in code — as a named SSDF practice. PW.7 is the SSDF anchor for SAST on any produced code, including LLM-generated artifacts. NIST explicitly covers both automated-only and human-in-the-loop variants.

Does not prove: PW.7 is a framework practice, not an agentic-AI-specific requirement. It does not address when the producer of the code is an LLM, or that the scan should occur before execution rather than before deployment.

open original ↗

Reference 4

Version 1.1 · published February 2022

NIST SP 800-218 — Secure Software Development Framework (SSDF)

Practice PW.7 — Task PW.7.2, Notional Implementation Example 4

"Use a static analysis tool to automatically check code for vulnerabilities and compliance with the organization's secure coding standards with a human reviewing the issues reported by the tool and remediating them as necessary."

Supports: Explicit statement that a static analysis tool should automatically check code for vulnerabilities — the core mechanism of this control. Names the human-review follow-up, supporting the paired use with m-codegen-review-gate for high-stakes cases.

Does not prove: Example 4 is illustrative, not normative. The SSDF does not require static analysis — it offers it as one of several acceptable approaches under PW.7.

open original ↗

Reference 5

OWASP Community Page (continuously updated)

OWASP Source Code Analysis Tools

Introductory paragraph

"Source code analysis tools, also known as Static Application Security Testing (SAST) Tools, can help analyze source code or compiled versions of code to help find security flaws."

Supports: Canonical OWASP definition of SAST. Establishes the tool category this control draws from and is the reference the MDX cites directly. Confirms "source code analysis" and "SAST" are used interchangeably.

Does not prove: Tool-reference page, not a threat or mitigation document. Does not discuss agentic AI, LLM-generated code, or pre-execution placement of the scan.

open original ↗

Reference 6

ATLAS catalogue (continuously updated)

MITRE ATLAS AML.M0016 — Vulnerability Scanning

AML.M0016 description (source: ATLAS.yaml dist)

"Vulnerability scanning is used to find potentially exploitable software vulnerabilities to remediate them. … Model artifacts, downstream products produced by models, and external software dependencies should be scanned for known vulnerabilities."

Supports: Names "downstream products produced by models" as objects that should be scanned — the broadest ATLAS endorsement that model outputs (including generated code) are within scope for vulnerability scanning.

Does not prove: AML.M0016 focuses on model files (pickle, serialisation) and supply-chain artifacts rather than agent-generated source code. The code-output application requires extension of the stated scope.

open original ↗

Reference 7

IEEE Transactions on Software Engineering, 2023 · arXiv:2204.04741

Asare et al. — "Is GitHub's Copilot as Bad as Humans at Introducing Vulnerabilities in Code?" (2023)

Abstract

"We find that Copilot replicates the original vulnerable code about 33% of the time while replicating the fixed code at a 25% rate. However this behaviour is not consistent: Copilot is more likely to introduce some types of vulnerabilities than others and is also more likely to generate vulnerable code in response to prompts that correspond to older vulnerabilities."

Supports: Empirical evidence that LLM-generated code carries real vulnerability rates (33% replication of vulnerable patterns in the tested scenarios), motivating the application of SAST to AI-generated code. Academic foundation for treating LLM outputs with at least equal suspicion to human code.

Does not prove: Study uses C/C++ prompts derived from historical CVE scenarios — a constrained experimental design. The 33% rate is not a base rate for general code generation and should not be cited as such. Does not measure Semgrep or SAST detection rates on the generated samples.

open original ↗