You set up a human-in-the-loop approval flow for your AI agent. Commands get intercepted, reviewers see what's about to run, approvals are logged. The system works.

Then you add a second agent. The orchestrating agent now calls the sub-agent, the sub-agent runs the commands. The approval flow is… somewhere. Maybe it's still attached to the sub-agent. Maybe the orchestrator bypassed it when it delegated. Maybe the sub-agent inherited credentials from the orchestrator and ran with elevated permissions it shouldn't have.

Multi-agent systems are the fastest-growing area of AI agent deployment right now. They're also where the security model gets complicated in ways most teams haven't thought through.

The Trust Problem Isn't New

In traditional software systems, trust propagation has clear rules. OAuth scopes don't escalate when you call a downstream service. A service account token issued to Service A doesn't automatically get used by Service B. You set up these boundaries deliberately, and they hold.

AI agents complicate this in a few ways:

  • Dynamic delegation: Orchestrating agents decide at runtime which sub-agents to call and what to ask them to do. There's no static call graph to audit.
  • Implicit context passing: When an orchestrator passes context to a sub-agent, it may include credentials, session tokens, or capability grants that weren't explicitly authorized for the delegation.
  • Approval laundering: A human approves an action at the orchestrator level ("reorganize the file structure") without understanding that this translates to hundreds of individual shell commands at the sub-agent level.
  • Identity ambiguity: When a sub-agent's action gets logged, the audit record might show the sub-agent's identity, obscuring that the orchestrator authorized it.

How Approval Controls Break in Multi-Agent Systems

Pattern 1: Approval at the Wrong Layer

The most common failure: approval is implemented at the orchestrator level, not at the execution layer. The orchestrator gets a human "yes" for a high-level task, then delegates to sub-agents that execute individual commands without further review.

The human approved "deploy the new version." They did not approve the specific commands: kubectl set image deployment/api api=registry/api:v2.1.0, kubectl scale deployment/api --replicas=0, kubectl delete pvc data-volume-0. The last one was not supposed to be part of the deployment.

Approval at the intent level ("deploy") doesn't substitute for approval at the action level ("delete this persistent volume claim"). These are different controls with different risk profiles, and conflating them is where incidents happen.

Pattern 2: Trust Escalation Through Delegation

An orchestrating agent has broad permissions because it coordinates everything. A sub-agent specializes in a narrow task and should have narrow permissions. When the orchestrator delegates, it passes its own credentials to the sub-agent "for convenience."

Now the sub-agent has the orchestrator's permissions. If the sub-agent is compromised — via prompt injection, a buggy tool, or a malicious input — it has access to everything the orchestrator could touch. The narrowly-scoped sub-agent is a fiction.

Pattern 3: Audit Log Fragmentation

Each agent logs its own actions. The orchestrator's log shows "called sub-agent with task X." The sub-agent's log shows "executed commands A, B, C." There's no unified trace that shows the full causal chain: human request → orchestrator decision → sub-agent selection → individual command execution.

After an incident, reconstructing what happened requires manually correlating logs from multiple agents, possibly across different systems, with different timestamp formats and different levels of detail. This is exactly the kind of audit fragmentation that makes post-incident analysis take days instead of hours.

Pattern 4: Prompt Injection Across Agent Boundaries

A sub-agent reads from an external source — a file, a database record, a webpage — that contains injected instructions. The sub-agent follows them, possibly requesting elevated permissions from the orchestrator or modifying its own behavior in ways the orchestrator doesn't detect.

The approval flow that protected the orchestrator doesn't protect against this because the injection happens inside the sub-agent's context, after the delegation has already occurred. By the time a malicious instruction executes, it looks like a normal sub-agent action.

What a Sound Trust Model Looks Like

Enforce Controls at the Execution Layer, Not the Intent Layer

Whatever approval controls you've built, they need to exist at the layer where side effects actually happen — where commands run, files get written, APIs get called. Approval at higher layers is useful for scope control but doesn't replace execution-layer controls.

If you're using an SSH-proxied approval flow like expacti, every command that reaches the shell goes through approval regardless of which agent submitted it. The orchestrator and its sub-agents both route through the same interception layer. The approval control is on the execution surface, not on the agent graph above it.

Issue Credentials That Reflect the Actual Task Scope

When an orchestrator delegates to a sub-agent, issue credentials scoped to that specific delegation — not the orchestrator's full credentials. If the sub-agent is doing a database backup, it gets read-only access to the tables it needs to back up. Not read-write. Not to all tables. Not with the orchestrator's admin token.

This requires your orchestrator to actively manage credential issuance as part of delegation, which is more work than passing credentials through. It's also the only way to prevent trust escalation.

Propagate a Causal Identity Through the Chain

Every action in a multi-agent system should be attributable to the original human request that started the chain. This means passing a trace ID (or equivalent) from the initial request through every agent, and ensuring that trace ID appears in every audit log entry.

With a trace ID in place, you can reconstruct the full chain: "this kubectl command was run by sub-agent-deploy-7, which was called by orchestrator-3, which was responding to human request #4892 from [email protected] at 14:23 UTC." Without it, you have isolated facts with no connective tissue.

Apply Approval Policies to Delegations, Not Just Actions

Some delegations are as consequential as the actions they enable. An orchestrator deciding to spin up a new sub-agent with production database credentials is worth human review, even before any action runs. An orchestrator deciding to extend a sub-agent's session by 24 hours is worth review.

Treat the act of delegation as an auditable event with its own approval policy, separate from the approval policy on individual commands. A high-risk delegation — one that grants broad permissions or initiates an irreversible workflow — should require explicit human sign-off.

Define Trust Boundaries Statically, Not Just at Runtime

Dynamic agent graphs are flexible. They're also hard to reason about from a security perspective because you can't audit a trust relationship that doesn't exist until runtime.

For high-stakes agent systems, define the allowed delegation graph explicitly: orchestrator O can call sub-agents A, B, C but not D. Sub-agent A can call tool X but not tool Y. These constraints should be enforced at the infrastructure level, not just as prompting conventions or "expected behavior."

A whitelist of allowed delegations, maintained outside the agents themselves, is harder to circumvent than behavioral guidelines embedded in prompts.

Practical Checklist for Multi-Agent Approval

Control Risk Addressed Implementation
Execution-layer interception Approval bypass via delegation Shell proxy, SSH interception, or MCP tool wrapping at the action layer
Scoped delegation credentials Trust escalation Issue per-task credentials via secrets manager when orchestrator delegates
Unified trace ID Audit fragmentation Pass trace ID in all agent-to-agent calls; emit in all log records
Static delegation whitelist Unauthorized agent invocation Enforce allowed-delegations list outside agent prompts
Delegation approval policy High-consequence delegations Flag delegations that grant elevated permissions for human review
Input sanitization at agent boundaries Cross-agent prompt injection Treat sub-agent inputs from external sources as untrusted; filter before processing

The "Who Approved That?" Test

After your multi-agent system runs a task, you should be able to point to a specific human action (or explicit policy rule) that authorized every consequential action the system took. Not a general task-level approval, but an action-level authorization.

If you can't pass this test — if there are actions in your audit log that you can't trace to a specific authorization — you have a gap. Either the approval controls aren't at the right layer, or the audit log isn't capturing the causal chain, or both.

Most multi-agent systems fail this test when you first apply it. The goal isn't to require human approval for every low-risk action — that's approval fatigue at scale. The goal is to ensure that every high-risk action has a clear, traceable authorization, and that the authorization happened before the action, not as a rubber-stamp after the fact.

Where This Gets Hard

The controls described above add overhead. Issuing scoped credentials per delegation is more complex than passing tokens through. Enforcing a static delegation graph limits the flexibility that makes multi-agent systems powerful.

The engineering tradeoff is real, and the right balance depends on the risk profile of your specific system. An agent that reads documentation and drafts summaries doesn't need the same controls as an agent that modifies production databases.

Start with execution-layer interception — it's the highest-leverage control with the lowest architectural disruption. Add trace IDs next; they're cheap and the post-incident value is enormous. Then layer in scoped credentials and static delegation graphs as your system matures and your risk profile clarifies.

The hard part isn't implementing any of these controls individually. It's building a coherent security model for the full agent graph before the first incident teaches you why you needed one.

Consistent approval controls across your entire agent graph

Expacti intercepts shell commands at the execution layer — regardless of which agent submitted them. Orchestrators, sub-agents, and autonomous workers all route through the same approval flow, with full causal attribution in the audit log.

Get Started Read the Docs