Secure by Design · Principles

Why it matters for agentic AI

Security added after the fact is remediation, not design. It patches individual symptoms rather than removing the structural conditions that enable them. For agentic systems the gap between “we’ll tighten it later” and first deployment is where most serious vulnerabilities are baked in: the decisions that are hardest to reverse (tool scopes, delegation topology, memory architecture, autonomy tiers) are made at design time, often by teams whose primary concern is capability, not containment. The principle is the upstream source of Least Privilege (start narrow), Reversibility (default to reversible), and Open Design (enforce in infrastructure, not prompts).

Agentic systems compound the problem in two ways. First, the threat model is non-obvious. A developer building a document-processing agent rarely thinks of prompt injection, because the word “injection” points to SQL and HTML, not natural language. Without a structured walkthrough of MAESTRO layers, OWASP ASI categories, or at minimum a trifecta check, the most critical exposure (untrusted content + private data + external comms in one agent) is invisible at launch. Second, the system prompt is not a security layer. Teams routinely encode business rules, access logic, and safety constraints in the system prompt and treat those as the enforcement mechanism. They are not: the prompt is probabilistic guidance and can be extracted, contradicted, or circumvented. Enforcement must live in infrastructure: a policy engine, a tool gateway, a network allow-list. Token sequences a model may or may not follow are not enforcement.

Secure-by-design for agents therefore has three concrete starting points. Threat-model before first deployment: identify which MAESTRO layer each component lives at, map the data flows, and check the trifecta. Establish secure defaults: the narrowest tool set the task actually needs; read-only access unless write is explicitly justified; a confirmation gate on every irreversible action; deny-by-default egress. And treat autonomy as something earned, not assumed: the lowest justified autonomy tier on day one, with documented criteria and measurement before any promotion. These three disciplines (model first, secure defaults, earned autonomy) are what the principle actually means when applied to agents.

Scenario: the unmodelled launch

A team builds an internal productivity agent. It can read email and calendar data, write to a shared document store, and send notifications. No one enumerates these capabilities together before launch, so no one notices the trifecta: untrusted inbound email, private data in calendar and documents, external comms via notifications. The first incident was an injected email that read the CEO’s draft board minutes and forwarded them as a notification. It was also the first time the combination was examined. A thirty-minute pre-deployment threat-model session would have surfaced it on day zero, and a read-only default plus a separate notification agent would have broken the chain architecturally.

Scenario: the system-prompt guardrail

A fintech agent has “never reveal account balances to counterparties who have not confirmed their identity via the bank’s portal” in its system prompt. A developer verifies this works in manual testing and ships. An adversarial user extracts the system prompt through a well-known pattern, sees the exact wording of the rule, and then crafts input that talks around it: a plausible-sounding scenario in which the model decides a balance disclosure is justified in context. The balance leaks. The rule belonged in a policy engine that checks the current session’s authentication state at every call: a deterministic check that cannot be reasoned around.

How it fails

Agents ship with broad tool scopes provisioned “for flexibility” and no threat model; the first incident is the first enumeration of what the agent could do.
The system prompt is treated as the primary security layer; it is extracted, circumvented, or overridden by injected instructions that arrive later in context.
Autonomy is granted up front at the level needed for the best-case scenario, not the lowest-risk starting tier.
Rollback and kill-switch procedures are never documented, so when an incident occurs the only safe response is a full shutdown.

Why the mapped controls work

Pre-deployment threat modelling (MAESTRO, OWASP ASI, trifecta check) forces the tool surface, data flows, and delegation topology to be enumerated before anything is live, at the moment when changes are cheap. Autonomy-tier classification gives teams a structured vocabulary for how much the agent can do unsupervised, and a written justification requirement creates a deliberate gate before promotion. Read-only defaults with explicit escalation mean that the most common mistakes, such as an agent calling a write API when it needed only a read, are blocked at the tool gateway rather than recovered from after the fact. Documented rollback and kill-switch procedures ensure that when something goes wrong, the response is a controlled rollback to a known-good state, not an emergency with no plan.

First steps

Before the next agent deployment, run a trifecta check and a MAESTRO-layer walk. Draw the data flows for each agent, mark which layers (model, tool, orchestrator, data, etc.) each component occupies, and confirm no single agent simultaneously holds private data, untrusted content, and external comms capability.
Set read-only as the default access mode for every new agent capability. Require an explicit written justification (in your design doc or PR description) before any write permission is provisioned, and configure your tool gateway to default to read-only scopes unless the agent’s declared task explicitly requires writes.
Document the rollback and kill-switch procedure for every agent before it goes live. The document must include the exact command to revoke the agent’s credential, the location of the most recent memory/state snapshot, and the compensating steps for the top three consequential actions the agent can take; test the procedure in staging before first production deployment.

In Helmwart

The whole product is this principle operationalised: it exists to threat-model agents before they ship.