expacti
← All posts

Sandboxing AI Agents: Why Container Isolation Isn't Enough

April 7, 2026 · 8 min read · Security

The most common answer to "how do we safely run AI agents?" is "put them in containers." It's not wrong. Container isolation limits the process surface, restricts filesystem access, and makes it harder for a misbehaving agent to affect the host. These are real benefits.

But containers sandbox the process, not the decisions. And with AI agents, the decisions are where the risk lives.

An agent running in a locked-down container can still drop your production database if it has credentials and a network path to it. It can still exfiltrate your codebase if it can reach an outbound HTTP endpoint. It can still delete six months of customer records if you gave it write access to the right bucket. The container didn't stop any of that — and it wasn't designed to.

This is the gap that most sandboxing strategies miss. Here's what actually needs to be isolated when you're running AI agents in production.

What Containers Actually Isolate

To be clear about what you're getting from container isolation:

These are genuinely useful controls. If an agent tries to install a kernel module or fork-bomb the host, containers help. The problem is that AI agents don't usually attack the host directly. They use the access they've been legitimately given to do things you didn't intend.

The Decision Surface: What Containers Don't Touch

Consider an agent that has been given:

The container boundary doesn't protect any of those. The agent operates with those credentials from inside the container, and if it decides to run DELETE FROM users WHERE created_at < '2024-01-01' or aws s3 rm s3://production-backups --recursive, the container is silent. It let the agent execute. The damage happened in the external systems the agent reached through its granted credentials.

The decision surface — the set of actions the agent can choose to take — is entirely outside what container isolation controls.

The Three Layers That Actually Need Sandboxing

1. Command authorization

Every command the agent wants to execute should pass through an authorization layer before it runs. Not a capability check ("can this process execute bash?"), but a semantic check ("should this specific command run in this context, against this target, at this time?").

This is what a command approval gateway does. The agent submits a command. The gateway evaluates it against a whitelist, applies risk scoring, and either auto-approves, routes to a human reviewer, or denies. The command doesn't execute until that decision is made.

Container isolation can't do this. It operates at the process/syscall level, not at the semantic level of "what is this command actually trying to do?"

2. Credential scope and lifetime

The credentials you give an agent define its blast radius. A database credential with full read-write access to all tables is a much larger blast radius than one scoped to a specific schema with only the tables the agent legitimately needs.

Effective sandboxing at the credential layer means:

This is runtime credential sandboxing. It doesn't prevent the agent from attempting a destructive action, but it limits the damage if the attempt succeeds.

3. Network egress control

Containers can restrict network egress, but most deployments don't configure this carefully. A common mistake is putting the agent on the same network segment as production databases "for convenience" and then relying on the agent to make good decisions about which queries to run.

Network sandboxing for agents should be explicit and minimal:

This limits the exfiltration surface. An agent that's been manipulated through prompt injection can't easily send data to an attacker-controlled endpoint if the only outbound connections allowed go to your own services.

Sandboxing Levels by Risk

What the agent can reachContainer isolationCommand authorizationCredential scopeNetwork egress control
Read-only dataOptionalRead-only credsInternal only
Write to non-critical dataRecommendedScoped write credsInternal only
Write to customer dataRequiredRequiredRow/table-scoped credsAllowlist only
Production infrastructureRequiredRequired + human reviewSession-scoped, short-livedStrict allowlist
Financial systems / billingRequiredRequired + multi-partyMinimal, per-operationStrict allowlist

Notice that container isolation appears in every row — it's a baseline, not a solution. The controls that vary by risk are command authorization, credential scope, and egress. Those are the layers that actually track the threat model for AI agents.

The Depth Problem: Defense in Depth Requires Multiple Layers

The reason this matters practically is that every sandboxing layer has failure modes:

Defense in depth means any single layer failing doesn't compromise the whole system. An agent that escapes its container but hits command authorization before any command executes is still contained. An agent that bypasses command authorization but has only read credentials can't cause write damage. Each layer catches what the previous one misses.

Containers are one layer of a four-layer defense. Treating them as the primary sandbox is like relying solely on TLS and not implementing application-layer auth — it's not wrong, it's just not enough.

A Practical Baseline for Agent Sandboxing

If you're deploying AI agents in production today, here's the minimum viable sandbox:

  1. Container isolation — with seccomp profile, no privileged mode, read-only root filesystem where possible, explicit volume mounts only
  2. Command authorization gateway — every shell/script execution, every database write operation, every destructive API call routes through an approval layer before executing
  3. Scoped credentials — no agent gets credentials broader than what it demonstrably needs; review and tighten on a regular cadence
  4. Egress allowlist — explicit permit-list for outbound connections; default-deny everything else
  5. Session audit trail — full record of what the agent attempted, what was approved or denied, and what executed; queryable for incident review

That's four controls beyond container isolation. Each one addresses a failure mode that containers don't cover.

What This Looks Like in Practice

An agent that needs to manage customer data might run like this with proper sandboxing:

Agent submits: DELETE FROM inactive_users WHERE last_login < '2025-01-01'

Command authorization gateway:
  risk_score: 87 (bulk deletion, customer data, irreversible)
  whitelist match: none
  → routed to human reviewer

Reviewer sees:
  command: DELETE FROM inactive_users WHERE last_login < '2025-01-01'
  estimated rows: 12,400
  reversible: no
  risk: CRITICAL

Reviewer: Deny (wrong table — should be inactive_test_users)
Agent receives: denied — command rejected by reviewer
No rows deleted.

The container didn't stop this. The credential scope didn't stop it (the agent had legitimate write access). The egress allowlist didn't stop it (the database was on the internal network). The command authorization gateway stopped it — because a human saw what was about to happen and said no.

That's the layer containers can't replace.

The Summary

Container isolation is a necessary starting point for running AI agents safely. It's not sufficient. The threat model for AI agents isn't about process escapes or syscall exploits — it's about autonomous decisions made with legitimate credentials against external systems. The sandboxing layers that address that threat model are command authorization, credential scoping, and network egress control. Build all four layers. Treat them as defense in depth. No single layer is enough on its own.

Expacti adds command authorization as a sandbox layer — every shell command routed through approval before it runs.

Works alongside your existing container setup. Intercepts at the execution layer, not the process layer.

Join the waitlist