When AI Agents Break Things: Rollback and Recovery Strategies

Every team shipping AI agents into production eventually has the same moment: the agent does something it shouldn't have, something breaks, and now everyone is scrambling. A file deleted. A schema migrated in the wrong direction. A config overwritten. A service restarted at the wrong time.

Approval gates catch the obvious cases before they happen. But no gate is perfect, and not every bad action looks dangerous before it runs. The second line of defense is recovery velocity — how quickly can you understand what happened, undo it, and restore normal operations.

Most teams underinvest here. They spend cycles on prevention and almost nothing on recovery, then discover that when something does go wrong, they're flying blind.

The recovery problem is different with AI agents

When a human makes a mistake, you can ask them what happened. They have intent, context, a mental model of what they were trying to do. Recovery is a conversation.

When an AI agent makes a mistake, recovery is a forensic exercise. You're piecing together what the agent was doing, what it was responding to, what commands it ran, and in what order. The agent's "intent" is inferred, not explained.

This makes a few things harder:

Blast radius is often unclear. The agent may have run 12 commands across 4 systems before anything visibly broke. Which one caused the problem?
Partial success is common. The agent completed half a task before failing. Now you have a system in an intermediate state — migrated but not deployed, deleted but not re-created, rotated but not propagated.
Correlation is non-obvious. The symptom shows up in system B, but the cause was an action in system A three minutes earlier.

Recovery strategies need to account for all three.

Layer 1: The audit log as ground truth

The most important recovery tool isn't a rollback script — it's an accurate, tamper-evident audit log of every command the agent ran, when it ran, what it returned, and who (or what) approved it.

Without this, you're guessing. With it, you can reconstruct the exact sequence of events and identify the causal action.

Minimum viable audit log entry for an AI agent action:

{
  "timestamp": "2026-04-07T09:14:33Z",
  "session_id": "sess_8f3k2",
  "agent": "deploy-bot",
  "command": "kubectl rollout restart deployment/api-server -n production",
  "risk_score": 72,
  "approval": {
    "required": true,
    "reviewer": "[email protected]",
    "approved_at": "2026-04-07T09:14:28Z"
  },
  "exit_code": 0,
  "output": "deployment.apps/api-server restarted",
  "working_directory": "/home/deploy",
  "environment": "production"
}

The fields that teams most often forget: the session ID (to correlate all commands in one agent run), the working directory (context for relative paths), and the full output (not just exit code — the output often contains the actual error).

Store this in append-only storage if possible. An audit log that can be retroactively edited isn't a safety tool — it's a liability.

Layer 2: Reversibility classification

Not all actions are equally reversible. Before recovery can be fast, you need to know which class of action you're dealing with.

Class	Examples	Recovery path	Time to recover
Trivially reversible	Config change, feature flag, env var	Revert the change	Seconds–minutes
Reversible with state	Service restart, deployment rollback	Roll back to previous version	Minutes
Reversible with backup	Data deletion, schema migration	Restore from snapshot/backup	Minutes–hours
Partially reversible	Emails sent, webhooks fired, API calls made	Compensating actions; downstream notifications	Variable
Irreversible	Permanent deletion, public disclosure	Damage containment only	N/A — prevent, don't recover

Your risk scoring system should map commands to these classes. Irreversible actions should require the highest scrutiny (and often a second approver). Trivially reversible actions can sometimes run unreviewed with auto-rollback on failure.

Layer 3: Snapshots before destructive actions

For any command that modifies state — database rows, files, configurations — the agent should capture state before executing, not after.

This sounds obvious, but most teams skip it because it requires instrumenting the agent workflow, not just the commands themselves.

Practical implementations:

Database operations: Require a transaction wrapper or point-in-time snapshot before any DDL or bulk DML. If the agent is running ALTER TABLE or DELETE FROM, the reviewer approval should trigger an automatic pre-snapshot.
File operations: Before rm, mv, or file overwrites — copy to a staging location. Keep a manifest so you can restore specific files without touching the rest of the tree.
Configuration changes: Store the before-state in version control or a config history table. One git commit per agent-initiated config change makes rollback trivial.
Kubernetes workloads: Capture the current manifest before any kubectl apply or rollout. Store it with the audit entry.

The goal: every destructive action should generate an artifact that enables point-in-time restoration, not just "latest backup".

Layer 4: Session-scoped blast radius

One of the most effective recovery strategies is limiting how much damage any single agent session can cause — not just via approval gates, but via session-level resource scoping.

What this looks like in practice:

Session-scoped credentials. The agent authenticates with credentials that expire when the session ends. Even if the agent goes rogue, the credentials stop working at session end. More importantly, you can revoke them mid-session if something looks wrong.
Write quotas per session. Limit the number of destructive operations any single session can execute — e.g., max 5 DELETE statements, max 3 service restarts. Hits to the quota require human intervention to continue.
Namespace isolation. When possible, give the agent access to a staging namespace or workspace, not production directly. Changes are promoted by a human (or a separate review step), not executed in-place.

The session boundary becomes a natural rollback unit. If a session goes wrong, you know exactly the window to audit, and revoking session credentials stops any in-flight actions.

Layer 5: Compensating action library

Build a library of known rollback procedures alongside your whitelist. For each approved command pattern, document:

What it does
Its reversibility class
The compensating action (if any)
Conditions under which rollback is safe

Example entry:

command_pattern: "kubectl scale deployment/* --replicas=*"
reversibility: "trivially_reversible"
compensating_action: "kubectl scale deployment/{name} --replicas={previous_count} -n {namespace}"
rollback_safe: "always"
pre_capture: "kubectl get deployment/{name} -n {namespace} -o json"
notes: "Capture previous replica count from pre_capture output before approving"

This library serves two purposes: it documents institutional knowledge about rollbacks before you need it (not during an incident), and it enables semi-automated recovery — the on-call engineer doesn't have to figure out the right kubectl syntax at 2 AM under pressure.

When the agent makes a partial change

The hardest recovery scenario isn't "the agent deleted something." It's "the agent ran 8 commands, 6 succeeded, 2 failed, and now the system is in an inconsistent state."

This happens more often than teams expect because AI agents are optimistic — they often proceed through a sequence of steps even after partial failures, especially when errors are soft (non-zero exit code but no exception thrown).

Mitigations:

Fail-fast sessions. Configure the agent to halt the session and await human review on any non-zero exit code, not just exceptions. The cost is more interruptions; the benefit is catching partial failures before they compound.
Idempotent command design. Where possible, prefer commands that are safe to re-run. kubectl apply over kubectl create. INSERT OR IGNORE over plain INSERT. Terraform apply over imperative API calls.
Session replay from last-known-good. Use the audit log to identify the last fully successful action, then define a recovery target state. Recovery becomes: undo everything after that point and try again (with fixes).

The recovery runbook

Most incidents are stressful not because recovery is technically hard, but because people are confused about what to do first. Having a documented runbook — even a short one — eliminates that confusion.

Minimal AI agent incident runbook:

Stop the agent session immediately. Revoke credentials, kill the process, or put the queue in deny-all mode. Stop the bleeding before you assess damage.
Pull the session audit log. Export all commands from this session. You need the complete picture, not just the last command.
Identify the causal action. Work backwards from the symptom. Which command correlates with when the problem started?
Assess reversibility. What class of action was it? What state was captured before it ran?
Execute compensating action or restore from snapshot. Use the compensating action library. Don't improvise.
Verify system health. Confirm the service/system is back to expected state before bringing the agent back online.
Root cause and improve. Why did the approval gate miss this? Adjust risk scoring, add whitelist rules, or add pre-conditions to the relevant command patterns.

Keep this short. Long runbooks don't get read during incidents.

Prevention is still cheaper than recovery

Everything above is about recovering faster. It's important. But recovery is always more expensive than prevention — in engineer time, in customer impact, in trust.

The right posture is: invest enough in prevention that recovery is rare, and invest enough in recovery that when it happens, it's boring instead of traumatic.

A well-designed approval gate catches the 95% of cases that would cause problems. Snapshots, session scoping, and a recovery runbook handle the 5% that slip through. Together, they mean that running AI agents in production is a manageable risk — not a gamble.

What Expacti helps with

Expacti's approval system generates a tamper-evident audit log of every command your AI agents run, including who approved it, when, and what the agent returned. When something goes wrong, you have the forensic record to reconstruct exactly what happened — and a structured export for your incident post-mortem.

Risk scoring classifies commands by reversibility, so high-risk destructive actions get the scrutiny they deserve before execution. Session-scoped credentials mean every agent run is bounded and revocable.

Try the interactive demo to see how command-level visibility works in practice, or get started to connect your first agent.