agent-debugger
Record, replay, and debug LLM agent runs. Use when you need to: capture agent step sequences (tool calls, LLM calls, events), replay a run step-by-step, diff two runs to find what changed between a passing and failing run, search across all recorded runs, or inspect full context windows. Triggers include "debug agent run", "record agent steps", "replay agent", "why did my agent fail", "compare agent runs", "inspect context window", or any task requiring visibility into LLM agent execution.
Best use case
agent-debugger is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Record, replay, and debug LLM agent runs. Use when you need to: capture agent step sequences (tool calls, LLM calls, events), replay a run step-by-step, diff two runs to find what changed between a passing and failing run, search across all recorded runs, or inspect full context windows. Triggers include "debug agent run", "record agent steps", "replay agent", "why did my agent fail", "compare agent runs", "inspect context window", or any task requiring visibility into LLM agent execution.
Teams using agent-debugger should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/agent-debugger/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How agent-debugger Compares
| Feature / Agent | agent-debugger | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Record, replay, and debug LLM agent runs. Use when you need to: capture agent step sequences (tool calls, LLM calls, events), replay a run step-by-step, diff two runs to find what changed between a passing and failing run, search across all recorded runs, or inspect full context windows. Triggers include "debug agent run", "record agent steps", "replay agent", "why did my agent fail", "compare agent runs", "inspect context window", or any task requiring visibility into LLM agent execution.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# agent-debugger
Self-hosted tool for recording and debugging LLM agent runs. Capture every step via a simple HTTP API, then explore runs in a dashboard: replay step-by-step, diff two runs, search across all steps, and inspect full context windows.
## When to use
- Debugging why an agent run failed
- Comparing behavior between two agent versions
- Inspecting the full context window at each LLM call
- Searching for specific tool calls or error messages across many runs
- Tracking token usage per step over time
## Prerequisites
- Node.js 20+
- pnpm
- `AGENT_ENCRYPTION_KEY` set (32-byte hex)
## Quick Start
```bash
pnpm install
cp .env.example .env
# Edit .env: set AGENT_ENCRYPTION_KEY
pnpm start
# Dashboard: http://localhost:4201
```
## Instrumenting Your Agent
### Python example
```python
import requests
import json
DEBUGGER_URL = "http://localhost:4200"
INGEST_TOKEN = "your-ingest-token" # from Settings
headers = {
"Authorization": f"Bearer {INGEST_TOKEN}",
"Content-Type": "application/json"
}
def start_run(agent_name, name=None):
resp = requests.post(f"{DEBUGGER_URL}/api/ingest/run",
headers=headers,
json={"agent": agent_name, "name": name})
return resp.json()["run_id"]
def record_step(run_id, step_type, name, input_data, output_data, duration_ms):
requests.post(f"{DEBUGGER_URL}/api/ingest/step",
headers=headers,
json={
"run_id": run_id,
"type": step_type, # "llm_call" | "tool_call" | "tool_result" | "event"
"name": name,
"input": input_data,
"output": output_data,
"duration_ms": duration_ms
})
def complete_run(run_id, total_tokens, duration_ms):
requests.post(f"{DEBUGGER_URL}/api/ingest/run/{run_id}/complete",
headers=headers,
json={"total_tokens": total_tokens, "duration_ms": duration_ms})
def fail_run(run_id, error, duration_ms):
requests.post(f"{DEBUGGER_URL}/api/ingest/run/{run_id}/fail",
headers=headers,
json={"error": error, "duration_ms": duration_ms})
# Usage in agent loop
run_id = start_run("my-agent-v2", name="Research: climate policy 2024")
try:
response = llm.call(messages)
record_step(run_id, "llm_call", "gpt-4o", {"messages": messages}, response, 1240)
# ... more steps
complete_run(run_id, total_tokens=18420, duration_ms=12400)
except Exception as e:
fail_run(run_id, str(e), duration_ms=5000)
```
### Node.js / TypeScript example
```typescript
const BASE = "http://localhost:4200";
const TOKEN = process.env.AGENT_INGEST_TOKEN;
async function ingest(path: string, body: object) {
await fetch(`${BASE}${path}`, {
method: "POST",
headers: { "Authorization": `Bearer ${TOKEN}`, "Content-Type": "application/json" },
body: JSON.stringify(body)
});
}
const run = await fetch(`${BASE}/api/ingest/run`, {
method: "POST",
headers: { "Authorization": `Bearer ${TOKEN}`, "Content-Type": "application/json" },
body: JSON.stringify({ agent: "my-agent", name: "test run" })
}).then(r => r.json());
const runId = run.run_id;
await ingest("/api/ingest/step", {
run_id: runId, type: "llm_call", name: "gpt-4o",
input: { messages }, output: completion, duration_ms: 1100
});
await ingest(`/api/ingest/run/${runId}/complete`, { total_tokens: 4200, duration_ms: 3800 });
```
## API Reference
| Endpoint | Description |
|---|---|
| `POST /api/ingest/run` | Start a new run |
| `POST /api/ingest/step` | Record a step |
| `POST /api/ingest/run/:id/complete` | Mark run complete |
| `POST /api/ingest/run/:id/fail` | Mark run failed |
| `GET /api/runs` | List runs with filter/search |
| `GET /api/runs/:id` | Run details |
| `GET /api/runs/:id/steps` | All steps for a run |
| `GET /api/steps/:id` | Single step |
| `POST /api/runs/:id/replay` | Create replay session |
| `GET /api/replay/:session/events` | SSE replay stream |
| `POST /api/replay/:session/control` | play, pause, next, prev, jump |
| `GET /api/diff?a=id&b=id` | Diff two runs |
| `GET /api/search?q=...` | Full-text search |
| `GET /api/settings` | Get settings |
| `PATCH /api/settings` | Update settings |
| `GET /health` | Health check |
## Step Types
| Type | When to use |
|---|---|
| `llm_call` | Any call to an LLM. Input = messages array. Output = completion. |
| `tool_call` | A tool being invoked. Input = tool arguments. |
| `tool_result` | Tool call result. Input = original args. Output = result. |
| `sub_agent` | Spawning a sub-agent. Include sub run_id in metadata. |
| `event` | Any other notable event. Input = description. |
## Environment Variables
| Variable | Description | Default |
|---|---|---|
| `AGENT_PORT` | Server port | 4200 |
| `AGENT_DASHBOARD_PORT` | Dashboard port | 4201 |
| `AGENT_DATA_DIR` | SQLite directory | ~/.agent-debugger |
| `AGENT_ENCRYPTION_KEY` | 32-byte hex for settings encryption | required |
| `AGENT_MAX_STEP_SIZE` | Max step payload bytes | 1048576 |
| `AGENT_MAX_STEPS_PER_RUN` | Max steps per run | 5000 |
| `AGENT_INGEST_TOKEN` | Bearer token for ingest endpoints | (empty = no auth) |
| `AGENT_LOG_LEVEL` | debug, info, warn, error | info |
| `AGENT_DEV` | Dev mode | 0 |
## Troubleshooting
### Ingest returns 401
Set the correct token in your agent code. The token is configured in Settings.
### Step payload rejected with 413
Your step input/output exceeds `AGENT_MAX_STEP_SIZE` (default 1MB). Truncate large tool results before recording.
### Replay stream disconnects
SSE connections time out after inactivity. Reconnect and use `jump(n)` to resume from where you left off.
### Context viewer shows blank
The step input was not recorded as a messages array. Ensure `input` contains `{"messages": [...]}` for llm_call steps.Related Skills
Skill: Uptime Monitoring
## Overview
Skill: Status Page
## Overview
Skill: unit-conversion
## Overview
Skill: recipe-scaler
## Overview
reading-list
Operate the reading-list API to save, manage, tag, search, and export articles.
email-digest
Configure, test, and troubleshoot the reading-list daily email digest delivered via nodemailer.
websocket-realtime
Use the WebSocket connection in poll-builder to receive live vote updates. Use when you need to stream real-time poll results, monitor a poll for new votes, or build a live dashboard. Triggers include "live results", "real-time updates", "stream votes", "watch poll", or "WebSocket".
poll-builder
Self-hosted poll creation tool with real-time results. Use when you need to create a poll, check vote counts, close a poll, export results, or get the shareable link for a poll. Triggers include "create poll", "vote", "poll results", "survey", "collect votes", "share poll", or any task involving polling or voting.
Skill: personal-finance
## Overview
Skill: csv-import
## Overview
Skill: Syntax Highlighting
## Purpose
Skill: Pastebin Core
## Purpose