Why it matters for agentic AI
Least privilege has existed since Saltzer and Schroeder’s 1975 paper. For traditional software it is a static question: which role does this service account need? For agents the question never settles. Human Oversight provides the gate that makes suggest-over-act defaults operationally safe: without a functioning approval mechanism, a suggest-only agent is just a slower agent. An agent’s effective privilege is the union of every tool in its namespace, every memory it can write, every downstream agent it can invoke, and the autonomous tier at which it can act. That union shifts at runtime as tasks change and context accumulates. Granting privilege generously “just in case” is a recognised failure pattern precisely because the cost is invisible until an attack or a hallucination reveals it.
Least Agency reformulates the principle for that dynamic setting. It operates on three axes simultaneously: functionality (which tools exist at all in the agent’s namespace), permission (what each tool’s credential allows at the data layer), and autonomy tier (how much the agent can execute without seeking human confirmation). The autonomy dimension has no classical analogue It encodes how far the agent is trusted to act on its own reasoning, which is probabilistic and injectable. The conservative default is not “grant everything and restrict later”; it is “start at the lowest justified tier and promote only on measured evidence.” Every increment of autonomy is a liability that must be explicitly justified against the task’s actual requirements.
A further reason to hold the line: agents can act where they could instead have proposed. An agent that has write access to a production database will use it when it hallucinates a schema change; an agent that only has read access and is wired to surface a migration script for human review cannot destroy tables no matter how badly it reasons. The principle therefore has a behavioural corollary: prefer suggesting an action over taking it, especially wherever the action touches production state, money, external communications, or infrastructure.
Scenario: the over-tooled coding agent
A coding agent provisioned for “general engineering tasks” is given access to source control, the CI runner, the production database, and the secrets vault. A hallucinated schema migration reaches the production database and destroys tables. No injection occurred; no privilege escalation occurred; the agent simply reasoned incorrectly with full authority. A read-only default in production and a dry-run gate (constraints that would have required a conscious decision to provision at full write access) would have contained a reasoning error that no access-control rule alone could catch.
Scenario: the autonomy-tier creep
A support agent is launched at “suggest replies, human approves.” Performance looks good, so the team promotes it to “send replies automatically.” Then to “open and close tickets.” Then to “escalate and reassign.” Each promotion feels incremental; collectively they have crossed from HITL to near-full autonomy with no written justification of the new risk. When a poisoned ticket arrives and the agent takes a cascade of wrong actions across customer records, there is no human checkpoint anywhere in the path. Documenting the autonomy-tier promotion explicitly, and requiring evidence that the previous tier performed correctly under adversarial inputs, would have made the accumulated exposure visible before the incident.
How it fails
- Agents are provisioned with dozens of tools “for flexibility” that the actual task never needs, broadening the exploitation chain available to any injection.
- Long-lived credentials are stored in context or memory, so a single leak grants persistent access well beyond the task’s duration.
- Autonomy is granted up front based on the expected happy-path; there is no process to justify the new exposure before promotion.
- The agent acts where it could have proposed, taking a write action that a suggest-and-confirm flow would have safely intercepted.
- MCP and A2A grants are coarse by default; the agent receives the upstream scope rather than a narrowed derivative scoped to the current task.
Why the mapped controls work
Per-role tool allow-lists remove the tools the task doesn’t need before any reasoning begins. The explosion surface simply isn’t there. Task-scoped short-lived credentials collapse the exploitation window to the task’s duration; a credential that expires in minutes cannot be weaponised across sessions. Suggest-over-act defaults insert a deterministic human gate on the actions most likely to cause irreversible harm (production writes, payments, external communications), so a reasoning error produces a proposal to review rather than an action to undo. Documented autonomy tiers with written justification make each promotion a governance event rather than an ambient configuration drift, creating the accountability trail that retrospective analysis depends on.
First steps
- Define your autonomy tiers explicitly (e.g. Tier 1: suggest only; Tier 2: act on reversible operations; Tier 3: act on irreversible operations with HITL gate) and document in writing which tier each deployed agent is authorised to operate at. If you cannot find this document, every agent is implicitly untiered and that is your first gap to close.
- For every agent operating at Tier 2 or above, confirm that its task-scoped credential has a TTL shorter than the expected task duration and is not stored in any context or memory that persists between tasks. A credential that survives task boundaries widens the autonomy footprint invisibly.
- Add a production-write guard to your most consequential agent: configure it so that any tool call that modifies production state, sends an external message, or moves money requires an explicit suggest-then-confirm step, implemented in orchestration code rather than system-prompt instruction, so that prompting cannot bypass it.
Threats it governs
When this principle is absent, these threats become reachable.
- T2 Tool Misuse Agent uses authorized tools in unintended ways via deceptive prompts or chained calls.
- T3 Privilege Compromise Mismanaged roles, dynamic inheritance, or overly broad scopes let agents escalate.
- T4 Resource Overload Agents autonomously schedule, queue, and execute work. Exhaustion fans out.
Controls that advance it
Catalogue mitigations that strengthen this principle, grouped by the defence-in-depth stage they sit in.
- Tool scope Each tool in an agent's catalog should expose only the methods, resources, and parameter ranges its designated role requires. Over-broad tool surfaces let individually authorised primitives compose into actions no human intended to grant; narrowing the scope at design time reduces both the attack surface and the blast radius of any compromise.
- JIT tool grants An agent that holds a persistent catalog of invokable tools can reach any of them at any point in its session. If its reasoning is manipulated or its identity is compromised, that persistent surface is fully available to an attacker. Just-in-time tool grants remove the standing surface: a policy broker issues a time-bound, task-scoped grant immediately before the tool is needed and revokes it automatically when the task completes or the window expires.
- Policy bound An agent's authority is normally bounded only by its own reasoning. If that reasoning is manipulated, or the agent's identity is compromised, it will attempt actions the operator never intended to permit. Policy-bound autonomy addresses this by placing a declarative enforcement point between the agent and every consequential action: a policy engine evaluates the agent identity, the target tool, and the parameter envelope before execution, and the agent cannot reason or argue past the result.
- Rate limits and quotas An agent operates without direct human oversight, autonomously scheduling tool calls, external API requests, and reflection loops. Without a budget, a single triggering event can fan out into hundreds of downstream calls. Per-agent rate limits and quotas assign each agent identity its own ceiling on call rate, token consumption, and cost spend, so a misbehaving or compromised agent cannot exhaust shared resources and its overconsumption becomes a visible, actionable signal.
- Loop limit An AI agent can review and rewrite its own answer to improve it. If that review runs too long it ties up resources and stops the agent responding in time, and an attacker can deliberately trigger those endless cycles to stall the system. A reflection-loop depth limit prevents that: it sets how many review rounds an agent may run before it has to stop.
No catalogued control.
No catalogued control.
In Helmwart
The Q1 per-agent action-authority signal captures part of this; autonomy tiers aren’t modelled explicitly.