Multi-Agent Systems: The Governance Nightmare Nobody's Talking About

When one AI agent can spawn, instruct, or delegate to other agents, your approval queue, audit trail, and kill switch just got a lot more complicated.

Most governance discussions assume a simple model: one AI agent, one human, one shell. The agent proposes a command, the human approves, it runs. Clean, auditable, controllable.

That model is already obsolete for a lot of teams.

Modern AI systems are multi-agent: an orchestrator dispatches subtasks to specialist agents, those agents spawn tool sessions, those sessions execute commands. A single high-level instruction — "migrate the database schema" — might result in a dozen agents running commands across three environments before a human sees a single approval request.

The governance problem compounds with every layer you add. This post is about what breaks, and what you need to fix it.

The Core Problem: Attribution Dissolves

In a single-agent system, attribution is easy. The agent ran the command. Done.

In a multi-agent system, attribution is a graph problem. Which agent ran the command? On whose instruction? With what original human intent? Through how many layers of delegation?

When something goes wrong — and it will — you need to trace backwards from the broken thing to the decision that caused it. In a multi-agent system without explicit attribution tracking, that trace breaks at every delegation boundary.

Your audit log shows rm -rf /opt/app/data executed by agent-cleanup-7f3a. But who told cleanup-7f3a to run that? An orchestrator. Who told the orchestrator? A planning agent. Who told the planning agent? The user said "clean up old deployments."

That chain is your audit trail. If you can't reconstruct it, you don't have an audit trail. You have a log of commands with unknown provenance.

The attribution rule: Every command executed in a multi-agent system must carry a full causal chain — the original human intent, every intermediate delegation, and the specific agent that issued the execution. If your audit log can't show this, it's incomplete.

Five Ways Multi-Agent Systems Break Governance

1. Approval queues get flooded

A single orchestrator turn can generate fifty commands across ten agents. If every command requires human approval, you've recreated the approval fatigue problem at 10x scale. Humans start clicking "approve" without reading. The human-in-the-loop becomes theater.

If you skip approval for sub-agent commands, you've created an approval bypass: the orchestrator decomposes a risky action into individually-innocuous steps, none of which trigger review, and the cumulative effect is the risky action — approved by nobody.

2. Kill switches don't propagate

You stop the orchestrator. The subtask agents don't know. They're mid-execution, committed to a series of commands, and they'll complete them because nobody told them to stop. Your kill switch killed the orchestrator process and nothing else.

Stopping a multi-agent system requires stopping all agents in the session — including ones that haven't started yet. That requires either a shared cancellation channel or a session-scoped authority that all agents respect.

3. Permissions leak through delegation

The orchestrator has broad permissions because it needs to plan across the whole system. It delegates to a subtask agent and passes along its full credential context, or the subtask agent inherits session permissions by default.

Now the subtask agent — which only needed read access to one directory — has the orchestrator's permissions. If that agent is compromised via prompt injection or a malicious tool response, the attacker has orchestrator-level access.

Permissions should be scoped to the minimum required for the delegated task, not inherited from the delegating agent.

4. Prompt injection crosses agent boundaries

A single-agent system has one attack surface: the context the agent reads. A multi-agent system has as many attack surfaces as it has agents — plus the communication channels between them.

A malicious string in a file read by a subtask agent can inject instructions into that agent's context. If the subtask agent has permission to communicate back to the orchestrator, or if the orchestrator blindly trusts subtask output, the injection can propagate up the delegation chain.

Treat inter-agent messages as untrusted input, the same way you'd treat external API responses.

5. The blast radius is invisible

Before a single-agent action, you can reason about the blast radius: "this command touches these files on this host." Before an orchestrated multi-agent run, the blast radius is the union of all subtask blast radii — and you typically can't see that until the run completes.

In a multi-agent system, blast radius is a pre-run planning problem. If you can't estimate it before committing, you can't make an informed approval decision.

What the Audit Trail Needs to Show

Field Single-Agent Multi-Agent
Command Required Required
Executing agent ID Implicit (one agent) Required explicitly
Parent agent ID N/A Required
Root session ID Session = root Required (links to human intent)
Delegation chain N/A Required (full lineage)
Original human instruction Often implicit Required (anchors attribution)
Approval decision Per-command Per-command + session-level

Without delegation chain and root session ID, your audit log is a list of commands with no causal context. That's not an audit trail — it's a history file.

Designing Governance for Multi-Agent Systems

Session-scoped identity, not agent-scoped

The approval authority in a multi-agent system should be the session, not any individual agent. A human approves a session with a stated scope. All agents operating within that session inherit that approval scope.

The session has a defined blast radius and a revocation mechanism. Killing the session kills all agents and cancels all pending approvals. This is the foundation of a coherent multi-agent kill switch.

Risk aggregation at the session level

Individual command risk scores are necessary but insufficient. You need session-level risk aggregation: the cumulative risk of all commands the session has executed or proposes to execute.

If command-level risk is capped at 60 but the session has 30 pending commands each scoring 60, the actual session risk is much higher. Reviewers need to see both.

Delegation boundaries as approval gates

Every time an agent delegates to a new agent, that's a trust boundary — and a potential approval gate. High-risk delegation (giving a subtask agent elevated permissions, or spawning an agent that will touch production) should require explicit approval before the subagent starts.

This is equivalent to asking: "The orchestrator wants to spin up an agent with these permissions to run these commands. Is that OK?"

It's more overhead than per-command approval, but it's far better than discovering post-hoc that ten subagents ran arbitrary commands while you were reviewing the orchestrator's plan.

Constrained inter-agent communication

Subagents should not be able to expand their own permissions or escalate to parent agents. The communication protocol between agents should:

  • Be authenticated (subagent can't impersonate orchestrator)
  • Be scoped (subagent can report results, not issue new instructions upward)
  • Be logged (every inter-agent message goes to the audit trail)
  • Be validated (orchestrator treats subagent output as untrusted data)

Least-privilege delegation

When an orchestrator delegates to a subagent, it should pass the minimum permissions required for the delegated task — not a copy of its own permission set. This requires explicit permission scoping at spawn time.

In practice: the orchestrator says "spawn an agent with read-only access to /opt/app/logs and no network access." If it can't express that constraint, it should default to the most restrictive permission set, not the least.

The Practical Baseline for Teams Doing This Now

You probably can't overhaul your agent architecture today. Here's the minimum viable governance layer for multi-agent systems you're running now:

  1. Assign a root session ID before any agent spawns. All agents in the run must log this ID. It's the thread that connects your entire audit trail.
  2. Log parent agent ID for every spawned agent. Even if you can't reconstruct the full delegation chain, you need one hop of attribution.
  3. Make the kill switch session-scoped. Stopping the orchestrator is not enough. You need a mechanism that propagates cancellation to all agents that share the root session ID.
  4. Cap session-level parallelism. If the orchestrator can spawn unlimited concurrent agents, your blast radius is unbounded. Set a hard limit — even something like "max 5 agents executing simultaneously" — and enforce it.
  5. Require high-risk delegation approval. Define "high-risk delegation" (e.g., agent touching production, agent with write permissions, agent running for >5 minutes). Require human approval before those specific delegations proceed.
The five-question test for multi-agent governance:
  1. Can you trace any executed command to a human instruction?
  2. Can you stop all agents in a session with a single action?
  3. Do subagents have only the permissions they need for their subtask?
  4. Are inter-agent messages treated as untrusted input?
  5. Do reviewers see aggregate session risk, not just per-command risk?
If you can't answer yes to all five, your multi-agent governance has gaps.

What Expacti Does (and What We're Building)

The current Expacti model is designed for single-agent sessions with a human reviewer. The session ID, attribution, and kill switch work well at that scale.

Multi-agent support — session hierarchies, delegation chains in audit logs, session-scoped kill switches that propagate to subagents, aggregate risk scoring — is on the roadmap. If you're running multi-agent systems now and governance is a problem, we want to talk.

The governance patterns exist. The tooling is catching up.

Running multi-agent systems?

Join the waitlist for multi-agent governance features, or tell us about your current setup.

Join the Waitlist