Default / Implicit Deny · Principles

Why it matters for agentic AI

Fail-safe defaults (base access decisions on explicit permission, not on the absence of an explicit denial) is one of the oldest principles in security design. It is a prerequisite for Defence-in-Depth: a layered defence collapses to a single layer when the innermost layer is open by default. For agents it acquires a new urgency, because the surface that must be deny-by-default is no longer just a filesystem or a network port. It spans five simultaneous planes: which tools the agent may call, which data stores it may read, which memory namespaces it may write, which external domains it may reach over the network, and which other agents may send it messages. An allow-list violation on any one plane is a potential exfiltration or lateral-movement path.

What makes this hard for agents is that the default tendency of LLM-based systems runs the other way. A model completion tries to be maximally helpful: if it can call a tool, it will consider doing so; if it can discover a new MCP server at runtime, it will try to use it. The engineering default in early agent frameworks was to wire up everything and let the model decide what to invoke. That default is the opposite of deny-by-default, and it produces systems where the blast radius of a successful prompt injection is determined only by how many tools happened to be registered, not by any deliberate access decision.

The most consequential plane is network egress. Prompt injection is largely a data-theft technique: the injected instruction instructs the agent to fetch something sensitive and POST it somewhere. If outbound HTTP is open by default, that step succeeds. If egress is deny-by-default with only the specific API endpoints the task requires on the allow-list, the injection’s delivery leg simply fails. The data cannot leave regardless of whether the model was successfully manipulated. Egress control is not a substitute for injection prevention; it is the control that holds after prevention fails.

Scenario: the blocked exfiltration

An agent is prompt-injected via a malicious web page it fetches as part of a research task. The injected instruction tells it to POST the user’s contact list to an external URL. The agent generates the outbound HTTP call correctly. The call is dropped by the container’s eBPF egress policy, which permits only the three API domains declared in the task manifest. No data leaves. The injection succeeded at the model layer and failed at the infrastructure layer, which is exactly the defence-in-depth outcome deny-by-default is meant to deliver.

Scenario: the rogue tool server

An agent framework is configured to auto-discover and trust any MCP tool server announced on the local network. A misconfigured test server running on the same VLAN advertises a tool called send_email; its tool description contains hidden instructions the model follows. Under a deny-by-default tool policy, the agent may only invoke servers on a signed manifest reviewed at deployment time. The undeclared server is never called, regardless of what its description says. If the legitimate server’s tool definitions change after approval (a “rug pull”), the hash check on the manifest fails and the server is treated as untrusted until re-reviewed.

How it fails

Agents auto-discover and immediately trust new tools or servers, so the allow-list is never actually enforced: any server can join the trusted set at runtime.
Egress is left open because “it’s an internal network,” ignoring that prompt injection turns the agent itself into an insider threat.
Tool definitions are not pinned by hash; a server silently changes what a tool does after its description was approved, and the agent keeps calling it.
Memory writes are unrestricted, so a prompt injection can plant durable instructions that survive session boundaries.
Inter-agent message routing is open: any agent can send instructions to any other, with no topology enforcement.

Why the mapped controls work

Signed tool manifests verified by hash on every call close the rug-pull path: the tool server cannot change its behaviour without invalidating the manifest, and an invalidated manifest means no call. Egress allow-lists enforced at the infrastructure layer (eBPF, iptables, VPC endpoint policies) act independently of the model, so a successful injection cannot complete its delivery step. Memory namespace tokens ensure that a write to one agent’s memory cannot bleed into another’s, containing the dwell-time threat. Intent-scoped capability tokens give the policy enforcement point a machine-readable statement of what the current task is allowed to do (not a role, but a claim) so the allow-list is narrow by construction and expires with the task.

First steps

Apply an eBPF egress policy (e.g. Cilium NetworkPolicy or AWS VPC endpoint policy) to every agent container that explicitly names the permitted outbound domains. Start with the exact API endpoints the task requires and default-deny everything else, including internal RFC1918 ranges the agent has no legitimate reason to reach.
Convert your MCP tool server list from a runtime-discovered set to a signed static manifest: generate a SHA-256 hash of each tool server’s definition, store it in your deployment config, and fail any call whose tool description hash does not match the stored value.
Audit memory write paths for all agents and introduce namespace tokens so that each agent’s memory writes are scoped to a unique, session-bound namespace; use your vector store’s collection-level access control (e.g. Qdrant collection API keys, Pinecone namespace restrictions) to enforce this at the storage layer, not just application logic.

Threats it governs

When this principle is absent, these threats become reachable.

T1
Memory Poisoning Adversarial content written into short- or long-term memory contaminates future decisions.
T2
Tool Misuse Agent uses authorized tools in unintended ways via deceptive prompts or chained calls.
T12
Agent Communication Poisoning Inter-agent messages tampered with. The output of one becomes injection input of another.
T16
Insecure Inter-Agent Protocol Abuse MCP/A2A protocols abused via consent-flow manipulation, MCP response injection, or weaponised tool descriptions.
T47
Rogue MCP Server in Ecosystem Malicious MCP server registers in the agent ecosystem and is invoked under presumed-trustworthy framing.

Controls that advance it

Catalogue mitigations that strengthen this principle, grouped by the defence-in-depth stage they sit in.

Prevent

Fail-closed An agent that is uncertain about what to do next faces a choice: refuse and ask for clarification, or proceed on its best guess. In low-stakes situations that tradeoff is tolerable. In agentic systems that write, delete, or send, a confident-sounding but wrong output can commit an irreversible action. A fail-closed gate resolves that choice structurally: below a configured confidence threshold, the agent stops and escalates rather than guessing.
OPA authorisation An agent can invoke any tool it has access to, constrained only by its own reasoning. If that reasoning is manipulated or the agent's permissions are misconfigured, it will call tools it should not. OPA addresses this by placing a policy decision point between the agent and every tool invocation: a Rego policy evaluates the agent identity, the tool, and the parameter envelope before execution proceeds, and the agent cannot reason or argue past the result.
Policy bound An agent's authority is normally bounded only by its own reasoning. If that reasoning is manipulated, or the agent's identity is compromised, it will attempt actions the operator never intended to permit. Policy-bound autonomy addresses this by placing a declarative enforcement point between the agent and every consequential action: a policy engine evaluates the agent identity, the target tool, and the parameter envelope before execution, and the agent cannot reason or argue past the result.
Tool scope Each tool in an agent's catalog should expose only the methods, resources, and parameter ranges its designated role requires. Over-broad tool surfaces let individually authorised primitives compose into actions no human intended to grant; narrowing the scope at design time reduces both the attack surface and the blast radius of any compromise.
Data classification Every dataset, document, and external system an agent can reach carries a classification label. The agent's permitted-class set and the tool's permitted-class set are intersected at the moment of every read or write. When the requested data's class falls outside that intersection, access is denied at the seam. This is the data-side complement to least-privilege: it adds a data-sensitivity constraint that role scoping alone does not provide.
Pre-exec check An LLM produces tool-call arguments through generation, not through a type system, and generation is not reliable. The arguments may be wrong in type, out of range, or assembled in a combination that violates business rules. A pre-execution validation gate intercepts the call before it reaches the tool: a schema pass confirms each argument conforms to the declared JSON Schema, and a policy pass confirms the argument combination is permitted for this agent and this action. The tool executes only when both passes clear.

Detect

Egress DLP An agent produces output continuously across multiple channels: user-facing responses, tool-call parameter envelopes, log records, and outbound HTTP requests. Any of those channels can carry sensitive content the agent has retrieved, been fed, or been tricked into including. Output egress DLP places an inspection gate at the boundary so that PII, credentials, and proprietary content are classified and either redacted or quarantined before they leave the trust boundary, regardless of how they got into the output.

Respond

No catalogued control.

In Helmwart

Not a dedicated lens today. The closest signals are the trifecta external-comms leg and the canvas egress/edge properties.