agent-debugger

Record, replay, and debug LLM agent runs. Use when you need to: capture agent step sequences (tool calls, LLM calls, events), replay a run step-by-step, diff two runs to find what changed between a passing and failing run, search across all recorded runs, or inspect full context windows. Triggers include "debug agent run", "record agent steps", "replay agent", "why did my agent fail", "compare agent runs", "inspect context window", or any task requiring visibility into LLM agent execution.

7 stars

byheldernoid

View on GitHub Installation ↓

Best use case

agent-debugger is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using agent-debugger should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/agent-debugger/SKILL.md --create-dirs "https://raw.githubusercontent.com/heldernoid/agentic-build-templates/main/projects/ai-llm-tools/agent-debugger/skills/agent-debugger/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/agent-debugger/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How agent-debugger Compares

Feature / Agent	agent-debugger	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# agent-debugger

Self-hosted tool for recording and debugging LLM agent runs. Capture every step via a simple HTTP API, then explore runs in a dashboard: replay step-by-step, diff two runs, search across all steps, and inspect full context windows.

## When to use

- Debugging why an agent run failed
- Comparing behavior between two agent versions
- Inspecting the full context window at each LLM call
- Searching for specific tool calls or error messages across many runs
- Tracking token usage per step over time

## Prerequisites

- Node.js 20+
- pnpm
- `AGENT_ENCRYPTION_KEY` set (32-byte hex)

## Quick Start

```bash
pnpm install
cp .env.example .env
# Edit .env: set AGENT_ENCRYPTION_KEY

pnpm start
# Dashboard: http://localhost:4201
```

## Instrumenting Your Agent

### Python example

```python
import requests
import json

DEBUGGER_URL = "http://localhost:4200"
INGEST_TOKEN = "your-ingest-token"  # from Settings

headers = {
    "Authorization": f"Bearer {INGEST_TOKEN}",
    "Content-Type": "application/json"
}

def start_run(agent_name, name=None):
    resp = requests.post(f"{DEBUGGER_URL}/api/ingest/run",
        headers=headers,
        json={"agent": agent_name, "name": name})
    return resp.json()["run_id"]

def record_step(run_id, step_type, name, input_data, output_data, duration_ms):
    requests.post(f"{DEBUGGER_URL}/api/ingest/step",
        headers=headers,
        json={
            "run_id": run_id,
            "type": step_type,    # "llm_call" | "tool_call" | "tool_result" | "event"
            "name": name,
            "input": input_data,
            "output": output_data,
            "duration_ms": duration_ms
        })

def complete_run(run_id, total_tokens, duration_ms):
    requests.post(f"{DEBUGGER_URL}/api/ingest/run/{run_id}/complete",
        headers=headers,
        json={"total_tokens": total_tokens, "duration_ms": duration_ms})

def fail_run(run_id, error, duration_ms):
    requests.post(f"{DEBUGGER_URL}/api/ingest/run/{run_id}/fail",
        headers=headers,
        json={"error": error, "duration_ms": duration_ms})

# Usage in agent loop
run_id = start_run("my-agent-v2", name="Research: climate policy 2024")
try:
    response = llm.call(messages)
    record_step(run_id, "llm_call", "gpt-4o", {"messages": messages}, response, 1240)
    # ... more steps
    complete_run(run_id, total_tokens=18420, duration_ms=12400)
except Exception as e:
    fail_run(run_id, str(e), duration_ms=5000)
```

### Node.js / TypeScript example

```typescript
const BASE = "http://localhost:4200";
const TOKEN = process.env.AGENT_INGEST_TOKEN;

async function ingest(path: string, body: object) {
  await fetch(`${BASE}${path}`, {
    method: "POST",
    headers: { "Authorization": `Bearer ${TOKEN}`, "Content-Type": "application/json" },
    body: JSON.stringify(body)
  });
}

const run = await fetch(`${BASE}/api/ingest/run`, {
  method: "POST",
  headers: { "Authorization": `Bearer ${TOKEN}`, "Content-Type": "application/json" },
  body: JSON.stringify({ agent: "my-agent", name: "test run" })
}).then(r => r.json());

const runId = run.run_id;

await ingest("/api/ingest/step", {
  run_id: runId, type: "llm_call", name: "gpt-4o",
  input: { messages }, output: completion, duration_ms: 1100
});

await ingest(`/api/ingest/run/${runId}/complete`, { total_tokens: 4200, duration_ms: 3800 });
```

## API Reference

| Endpoint | Description |
|---|---|
| `POST /api/ingest/run` | Start a new run |
| `POST /api/ingest/step` | Record a step |
| `POST /api/ingest/run/:id/complete` | Mark run complete |
| `POST /api/ingest/run/:id/fail` | Mark run failed |
| `GET /api/runs` | List runs with filter/search |
| `GET /api/runs/:id` | Run details |
| `GET /api/runs/:id/steps` | All steps for a run |
| `GET /api/steps/:id` | Single step |
| `POST /api/runs/:id/replay` | Create replay session |
| `GET /api/replay/:session/events` | SSE replay stream |
| `POST /api/replay/:session/control` | play, pause, next, prev, jump |
| `GET /api/diff?a=id&b=id` | Diff two runs |
| `GET /api/search?q=...` | Full-text search |
| `GET /api/settings` | Get settings |
| `PATCH /api/settings` | Update settings |
| `GET /health` | Health check |

## Step Types

| Type | When to use |
|---|---|
| `llm_call` | Any call to an LLM. Input = messages array. Output = completion. |
| `tool_call` | A tool being invoked. Input = tool arguments. |
| `tool_result` | Tool call result. Input = original args. Output = result. |
| `sub_agent` | Spawning a sub-agent. Include sub run_id in metadata. |
| `event` | Any other notable event. Input = description. |

## Environment Variables

| Variable | Description | Default |
|---|---|---|
| `AGENT_PORT` | Server port | 4200 |
| `AGENT_DASHBOARD_PORT` | Dashboard port | 4201 |
| `AGENT_DATA_DIR` | SQLite directory | ~/.agent-debugger |
| `AGENT_ENCRYPTION_KEY` | 32-byte hex for settings encryption | required |
| `AGENT_MAX_STEP_SIZE` | Max step payload bytes | 1048576 |
| `AGENT_MAX_STEPS_PER_RUN` | Max steps per run | 5000 |
| `AGENT_INGEST_TOKEN` | Bearer token for ingest endpoints | (empty = no auth) |
| `AGENT_LOG_LEVEL` | debug, info, warn, error | info |
| `AGENT_DEV` | Dev mode | 0 |

## Troubleshooting

### Ingest returns 401

Set the correct token in your agent code. The token is configured in Settings.

### Step payload rejected with 413

Your step input/output exceeds `AGENT_MAX_STEP_SIZE` (default 1MB). Truncate large tool results before recording.

### Replay stream disconnects

SSE connections time out after inactivity. Reconnect and use `jump(n)` to resume from where you left off.

### Context viewer shows blank

The step input was not recorded as a messages array. Ensure `input` contains `{"messages": [...]}` for llm_call steps.

Related Skills

Skill: Uptime Monitoring

from heldernoid/agentic-build-templates

## Overview

Skill: Status Page

from heldernoid/agentic-build-templates

## Overview

Skill: unit-conversion

from heldernoid/agentic-build-templates

## Overview

Skill: recipe-scaler

from heldernoid/agentic-build-templates

## Overview

reading-list

from heldernoid/agentic-build-templates

Operate the reading-list API to save, manage, tag, search, and export articles.

email-digest

from heldernoid/agentic-build-templates

Configure, test, and troubleshoot the reading-list daily email digest delivered via nodemailer.

websocket-realtime

from heldernoid/agentic-build-templates

Use the WebSocket connection in poll-builder to receive live vote updates. Use when you need to stream real-time poll results, monitor a poll for new votes, or build a live dashboard. Triggers include "live results", "real-time updates", "stream votes", "watch poll", or "WebSocket".

poll-builder

from heldernoid/agentic-build-templates

Self-hosted poll creation tool with real-time results. Use when you need to create a poll, check vote counts, close a poll, export results, or get the shareable link for a poll. Triggers include "create poll", "vote", "poll results", "survey", "collect votes", "share poll", or any task involving polling or voting.

Skill: personal-finance

from heldernoid/agentic-build-templates

## Overview

Skill: csv-import

from heldernoid/agentic-build-templates

## Overview

Skill: Syntax Highlighting

from heldernoid/agentic-build-templates

## Purpose

Skill: Pastebin Core

from heldernoid/agentic-build-templates

## Purpose