swarm-attach-watchdog

Retrofit a watchdog daemon onto an existing v1 swarm (no recreation). Upgrades team.json to v2 schema and spawns the watchdog tmux session.

8 stars

bystevengonsalvez

View on GitHub Installation ↓

Best use case

swarm-attach-watchdog is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Retrofit a watchdog daemon onto an existing v1 swarm (no recreation). Upgrades team.json to v2 schema and spawns the watchdog tmux session.

Teams using swarm-attach-watchdog should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/swarm-attach-watchdog/SKILL.md --create-dirs "https://raw.githubusercontent.com/stevengonsalvez/agents-in-a-box/main/toolkit/packages/skills/swarm-attach-watchdog/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/swarm-attach-watchdog/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How swarm-attach-watchdog Compares

Feature / Agent	swarm-attach-watchdog	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Retrofit a watchdog daemon onto an existing v1 swarm (no recreation). Upgrades team.json to v2 schema and spawns the watchdog tmux session.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# /swarm-attach-watchdog

Attach the v2 watchdog daemon to an existing swarm team without recreating it.
Use this to recover an in-flight swarm whose agents keep stalling because there
was no automatic stuck-pane detection in v1.

## Usage

```bash
/swarm-attach-watchdog <team-id> [--provider <claude|codex|copilot>] [--tick-min <N>] [--verify-cmd <cmd>]
```

| Argument | Required | Default | Description |
|----------|----------|---------|-------------|
| `<team-id>` | Yes | - | Existing swarm team id (e.g. `swarm-1778723020`) |
| `--provider` | No | ask | Which agent runtime the panes are running. Sets spinner regex. |
| `--tick-min` | No | 5 | Watchdog tick interval in minutes (min 1, max 60) |
| `--verify-cmd` | No | auto-detected | Override the auto-detected verify command. Auto-detect maps Cargo.toml→`cargo test --workspace --no-fail-fast`, package.json→`npm test`, pyproject.toml→`pytest`, go.mod→`go test ./...`, Makefile→`make test`, etc. |

## Process

When the user runs this command:

1. **Parse args**
   ```bash
   TEAM_ID="$1"
   PROVIDER=""        # from --provider
   TICK_MIN=5         # from --tick-min
   VERIFY_CMD=""      # from --verify-cmd
   ```

2. **Validate team exists**
   ```bash
   TEAM_DIR="${HOME}/.claude/swarm/${TEAM_ID}"
   if [[ ! -d "$TEAM_DIR" || ! -f "${TEAM_DIR}/team.json" ]]; then
     echo "Error: team not found at $TEAM_DIR"
     exit 1
   fi
   ```

3. **Ask for provider if not specified**

   Use `AskUserQuestion`:

   ```
   question: "Which agent runtime are the swarm tmux panes running?"
   header: "Provider"
   options:
     - label: "claude"
       description: "Claude Code TUI (spinner: ✻/✳/⏺/✿ + verb)"
     - label: "codex"
       description: "OpenAI Codex CLI (spinner: braille dots ⠋⠙⠹⠸⠼⠴⠦⠧)"
     - label: "copilot"
       description: "GitHub Copilot CLI (spinner: braille dots — placeholder)"
     - label: "generic"
       description: "Unknown / mixed — falls back to pane-hash heuristic only"
   ```

4. **Auto-detect verify command if not specified**
   ```bash
   if [[ -z "$VERIFY_CMD" ]]; then
     VERIFY_CMD=$(bash {{HOME_TOOL_DIR}}/utils/swarm-lib.sh detect-verify-cmd "$PWD")
     echo "Auto-detected verify: $VERIFY_CMD"
   fi
   ```

5. **Check no existing watchdog**
   ```bash
   if tmux has-session -t "${TEAM_ID}-watchdog" 2>/dev/null; then
     echo "Watchdog already running for $TEAM_ID. Use 'tmux attach -t ${TEAM_ID}-watchdog' to inspect."
     exit 0
   fi
   ```

6. **Attach watchdog**
   ```bash
   bash {{HOME_TOOL_DIR}}/utils/swarm-lib.sh attach-watchdog \
     "$TEAM_ID" "$PROVIDER" "$VERIFY_CMD" "$TICK_MIN"
   ```

   This:
   - Upgrades `team.json` in place with v2 schema fields (provider, commands.verify, watchdog config, finalize config)
   - Spawns `<team-id>-watchdog` tmux session running `watchdog.sh <team-id>`

7. **Report**
   ```
   ==========================================
   Watchdog attached: swarm-XXXXXXXXXX-watchdog
   ==========================================
   Provider:   claude
   Tick:       5min
   Verify cmd: cargo test --workspace --no-fail-fast

   Commands:
     Attach to watchdog:  tmux attach -t swarm-XXXXXXXXXX-watchdog
     Watchdog log:        tail -f {{HOME_TOOL_DIR}}/swarm/swarm-XXXXXXXXXX/watchdog.log
     Kill watchdog only:  bash {{HOME_TOOL_DIR}}/utils/swarm-lib.sh kill-watchdog swarm-XXXXXXXXXX
     Status:              /swarm-status swarm-XXXXXXXXXX

   The watchdog will:
   - Capture leader + agent panes every 5min
   - Send Enter (then "continue" + Enter) to stuck panes
   - Escalate to leader.jsonl after 2 stuck cycles
   - On epic-done: run finalize.sh (notify-only, NO tmux kill, NO auto-merge)

   It will NEVER kill tmux sessions or auto-merge worktrees.
   Human owns merge + PR ready-marking + /swarm-shutdown.
   ==========================================
   ```

## When to Use

- An existing v1 swarm has stalled agents and you don't want to recreate it
- You're upgrading from v1 to v2 without restarting in-flight work
- You want to add watchdog capability to a swarm that was started with `--no-watchdog`

## What This Does NOT Do

- Does not restart leader or worker tmux sessions (they keep their context)
- Does not modify worker prompts (existing workers don't get the v2 awareness — that requires a new spawn)
- Does not kill any tmux session
- Does not merge any worktrees
- Does not open a PR (those happen at finalize time, opt-in)

## Troubleshooting

**Watchdog session dies immediately**
- Check `{{HOME_TOOL_DIR}}/swarm/<team-id>/watchdog.log` for the error
- Common: `team.json` not found, `jq` missing, `tmux` not on PATH

**False-stuck detections**
- Provider regex may not cover all spinner states for the runtime
- Set `--provider generic` to fall back to pure pane-hash heuristic

**Two watchdog sessions accidentally spawned**
- Run `bash {{HOME_TOOL_DIR}}/utils/swarm-lib.sh kill-watchdog <team-id>` once, then `/swarm-attach-watchdog <team-id>` again
- The kill-watchdog command only targets `<team-id>-watchdog` — never touches workers

Related Skills

swarm-status

from stevengonsalvez/agents-in-a-box

Display comprehensive status dashboard for a swarm team

swarm-shutdown

from stevengonsalvez/agents-in-a-box

Gracefully shutdown a swarm team

swarm-orchestration

from stevengonsalvez/agents-in-a-box

A tmux-based persistent multi-agent swarm system with file-based inter-agent messaging

swarm-join

from stevengonsalvez/agents-in-a-box

Join an existing swarm team as a worker agent

swarm-inbox

from stevengonsalvez/agents-in-a-box

Read and send inter-agent messages within a swarm team

swarm-create

from stevengonsalvez/agents-in-a-box

Create a new self-sufficient swarm team from a Beads epic with N worker agents + a watchdog daemon that auto-recovers stuck panes and notify-only finalizes when the epic is done. Cross-provider (Claude/Codex/Copilot).

swarm-agent-troubleshooting

from stevengonsalvez/agents-in-a-box

Diagnose and fix swarm agent spawn failures when agents don't start processing tasks

attach-agent-worktree

from stevengonsalvez/agents-in-a-box

Attach to Agent Session

workflow

from stevengonsalvez/agents-in-a-box

Guide through structured delivery workflow with plan, implement, validate phases

webapp-testing

from stevengonsalvez/agents-in-a-box

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.

validate

from stevengonsalvez/agents-in-a-box

Verify implementation against specifications

ui-ux-pro-max

from stevengonsalvez/agents-in-a-box

UI/UX design intelligence. 67 styles, 96 palettes, 57 font pairings, 25 charts, 13 stacks (React, Next.js, Vue, Svelte, Astro, Nuxt, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui, Jetpack Compose). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient.