Observability for AI Agents: What to Log, What to Alert On
Your AI agent ran a command at 3 AM. It completed successfully. No alerts fired. Was that good?
Maybe. Or maybe the agent was quietly exfiltrating data, cleaning up evidence of a misconfiguration, or drifting scope in a way that will only matter in three weeks. Traditional observability — CPU, error rates, latency — won't tell you which.
AI agents need a different observability model. Not instead of infrastructure metrics, but layered on top of them. This post breaks down what that model looks like.
Why Standard Observability Misses the Point
Conventional o11y answers: did it run? did it fail? was it slow? For AI agents, the more important questions are:
- Did it do what it was supposed to do?
- Did it do anything it wasn't supposed to do?
- Could a human reconstruct what happened and why?
An agent that calls rm -rf /var/log/old/ and returns exit code 0 looks
perfect to a metrics dashboard. But if nobody approved that deletion, if the logs
contained an active audit trail, or if the path was slightly wrong — you want to know.
The Four Layers of Agent Observability
Layer 1: Command Telemetry
Every command an agent issues should be logged with enough context to reconstruct the decision. Minimum fields:
- command — full command string (scrubbed for secrets)
- session_id — links to the task/conversation that spawned it
- agent_id — which agent, which model version
- risk_score — computed score at time of submission
- whitelist_matched — was it on the approved list?
- decision — auto-approved / queued / denied
- reviewer_id — if human-reviewed, who approved or denied it
- latency_ms — time from submission to execution start
- exit_code — outcome
- output_bytes — rough sense of data volume (not the data itself)
Note output_bytes — not the full output. Full output storage is expensive and creates its own data security problem. But the volume is a useful signal: an agent reading 50MB from a database table when it normally reads a few KB is worth flagging.
Layer 2: Approval Flow Metrics
The approval pipeline is itself a system worth instrumenting:
| Metric | Why It Matters | Alert Threshold (example) |
|---|---|---|
| Queue depth | Approval backlog building up | > 10 pending |
| Review latency p50/p95 | Slowdown causing agent stalls | p95 > 5 min |
| Auto-approval rate | Sudden spike = whitelist too permissive | spike > +20% |
| Deny rate | Sudden spike = agent drift or prompt injection | spike > +15% |
| Timeout rate | Reviewer unavailable; commands auto-denied | > 5% of queue |
| Same reviewer approving own agent | Conflict of interest / policy violation | any |
Auto-approval rate is particularly subtle. A drop in human review might look like efficiency — the whitelist is doing its job. But it can also mean the whitelist has drifted to being too permissive. Tracking this over time, not just in absolute terms, surfaces the drift.
Layer 3: Anomaly Signals
Raw command logs don't tell you when behavior is unusual. You need baselines and deviation detection:
- Command entropy: How diverse are the commands this agent runs in a session? A coding agent that suddenly issues 10 network commands has deviated from its typical pattern.
- Time-of-day distribution: Most agents run during business hours or scheduled windows. A burst at 3 AM could be a cron job, or it could be something else.
-
Target directory pattern: An agent that normally writes to
/app/build/and suddenly starts reading from/etc/or~/.ssh/is doing something it hasn't done before. - New command vocabulary: Commands the agent has never issued before in a given role are worth extra scrutiny, regardless of risk score.
- Deny-then-retry patterns: An agent that gets denied and immediately issues a reformulated version of the same command is either confused or probing.
Layer 4: Session-Level Context
Individual commands exist in the context of sessions. Session-level metrics give you a different view:
- Session duration — unusually long sessions can mean runaway tasks
- Commands per session — high count can mean task scope creep
- Session risk score — aggregate of all command scores; a session trending high-risk deserves a look even if no single command crossed a threshold
- Session outcome — completed / abandoned / killed; abandoned sessions may leave partial state
Session playback (terminal recording) ties all of this together. When you're investigating an incident, you want to replay exactly what happened, in order, with approval decisions timestamped alongside the commands.
What to Alert On (vs. What to Just Log)
Not everything worth logging is worth alerting on. Alert fatigue kills the signal. Here's a practical split:
| Event | Action | Rationale |
|---|---|---|
| Command denied (single) | Log only | Normal; reviewer judgment |
| 3+ denials in one session | Alert | Pattern, not noise |
| CRITICAL risk score command | Alert + require 2-reviewer | High blast radius |
| New agent identity seen | Alert | Unregistered agents are unknown risk |
| Agent accesses secrets path | Alert | High sensitivity target |
| Approval latency spike | Alert (ops) | Reviewers may be unavailable |
| Auto-approval rate delta > 20% | Alert (security) | Whitelist drift |
| Failed auth on agent token | Alert immediately | Credential leak / probe |
| Deny-then-retry (same command) | Alert | Probing behavior |
| Session > N hours | Alert + notify owner | Runaway task risk |
The Audit Trail Your Compliance Team Actually Wants
Security and observability converge in the audit trail. For compliance purposes (SOC 2, ISO 27001, HIPAA), you need to be able to answer:
- Who authorized this action? (reviewer name + timestamp)
- What was the stated purpose? (task context)
- What was the risk assessment at the time? (risk score, anomaly flags)
- What was the outcome? (exit code, output summary)
- Was this within policy? (whitelist match, org policy version)
This is structurally different from a system log. It's a decision record, not just an event record. The whitelist isn't just a filter — it's documented policy. The approval isn't just a gate — it's an authorization record.
Design your log schema with this in mind from the start. Adding it later means retrofitting or correlating across disparate systems.
Practical Implementation Notes
Where to Store Agent Logs
Agent audit logs should be separate from application logs and write-protected from the agent itself. An agent that can delete its own logs is an agent that can cover its tracks — intentionally or through a prompt injection attack.
At minimum: append-only storage with a separate access key. Better: ship to an external SIEM in real-time so the logs survive even if the machine is compromised.
Secrets in Logs
Before logging any command string, run it through a scrubber that redacts:
- Patterns matching
--password=,-p,AWS_SECRET, etc. - Base64 blobs above a certain length (likely encoded secrets)
- Known secret formats (AWS keys, GitHub PATs, etc.)
Log the scrubbed version, flag that scrubbing occurred, and preserve the original (encrypted) only if your threat model requires it and you have the key management infrastructure for it.
Correlation IDs
Every command should carry IDs that let you join across systems:
session_id → task_id → command_id → approval_id.
When you're debugging an incident at 2 AM, you'll be grateful you can pull the
full chain with a single query.
The Honest Limitation
Even perfect observability is retrospective. You can see what happened; you can alert on it; but by the time you're reading the alert, the command has already run.
That's why observability is necessary but not sufficient. It works alongside approval gates, not instead of them. The approval gate is the prospective control. The audit trail is the retrospective control. You need both.
Think of it this way: a smoke detector is not a fire suppression system. You still need sprinklers.
Summary
Effective AI agent observability has four layers:
- Command telemetry — full context on every command issued
- Approval flow metrics — health of the human-in-the-loop pipeline
- Anomaly signals — behavioral baselines and deviation detection
- Session context — aggregate view across a full task lifecycle
Layer your alerts carefully — log everything, alert on patterns, not individual events. Design your audit schema as a decision record, not just an event log. And keep the logs out of the agent's reach.
The goal isn't to watch every move. It's to be able to answer "what happened and why?" in 10 minutes — whether for a security incident, a compliance audit, or just a confused teammate asking why the deploy agent deleted that file.
expacti gives you this out of the box
Structured audit logs, approval flow metrics, 8-rule anomaly detection, session playback, and JSON/CSV export for compliance. Every command is logged with its risk score, reviewer decision, and session context.
Get started free →