trendr-watchdog

运行时监督器。监控 run_status/progress/log 活跃度，发现卡住后自动向 owner session 注入断点续跑指令。

9 stars

bygy-hou

View on GitHub Installation ↓

Best use case

trendr-watchdog is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

运行时监督器。监控 run_status/progress/log 活跃度，发现卡住后自动向 owner session 注入断点续跑指令。

Teams using trendr-watchdog should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/trendr-watchdog/SKILL.md --create-dirs "https://raw.githubusercontent.com/gy-hou/trendr/main/skills/trendr-watchdog/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/trendr-watchdog/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How trendr-watchdog Compares

Feature / Agent	trendr-watchdog	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

运行时监督器。监控 run_status/progress/log 活跃度，发现卡住后自动向 owner session 注入断点续跑指令。

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# TrendR Watchdog Skill

用于修复多 agent 运行时的“会话分裂/提前停转”问题：
- 子会话完成但主会话未续接
- 状态文件长时间不刷新
- Phase 文件已产出但 orchestration 没推进

> 使用前完整阅读本文件。
> 核心守夜逻辑在 `supervisor.py`；`watchdog.py` 保留为兼容旧启动命令的入口。

## Runtime Router（必读）

识别当前 runtime，只读取对应 sibling，另一方休眠：

- `openclaw`    → 本文件内原有指令块仍然有效（`supervisor.py` 注入模式）
- `claude-code` → **跳过本文件的指令块**，读 `./claude-code.md` 获取 Claude Code hooks 驱动方式
- `codex` / `cli` → **跳过本文件的指令块**，读 `./codex.md` 获取 Codex file-based heartbeat 协议

本节之后的章节描述 **共享知识**（检测逻辑、故障条件、heartbeat 协议）。指令块保持现状（OpenClaw 语法），Claude Code 读者请切换到 `./claude-code.md`，Codex/CLI 读者请切换到 `./codex.md`。

## Runtime Strategy

- `openclaw`: 使用本文件的 `supervisor.py` 注入恢复消息（`openclaw agent --session-id ...`）。
- `codex` / `claude-code` / `cli`: 不做会话注入；使用 `engine/watchdog.py` 监控并写 `resume_request.json`，由状态机循环消费恢复。

## 前置条件

- `python3` 可用
- `openclaw` CLI（仅 `openclaw` runtime 需要）
- 当前任务目录使用 `~/research/[PROJECT]/`

## 启动流程（建议）

1. 启动前先拿当前主会话 ID，写入 `run_status.json.owner_session_id`
2. 启动 watchdog 常驻进程（后台）
3. 任务结束（completed/failed）后，停止 watchdog

## 启动命令模板

`openclaw` runtime（会话注入模式）：

```bash
exec: PROJECT="[PROJECT]" && RUN_ID="[RUN_ID]" && SESSION_ID="[OWNER_SESSION_ID]" && \
  mkdir -p "$HOME/research/$PROJECT/logs" && \
  nohup python3 "$HOME/.openclaw/workspace/skills/trendr-watchdog/supervisor.py" \
    --project "$PROJECT" \
    --run-id "$RUN_ID" \
    --session-id "$SESSION_ID" \
    --poll-sec 60 \
    --idle-timeout-sec 600 \
    --phase-mismatch-grace-sec 180 \
    --artifact-complete-grace-sec 1800 \
    --resume-cooldown-sec 300 \
    --heartbeat-sec 300 \
    --max-resume 12 \
    >> "$HOME/research/$PROJECT/logs/watchdog.out" 2>&1 & \
  echo $! > "$HOME/research/$PROJECT/logs/watchdog.pid"
```

`codex` / `claude-code` / `cli` runtime（文件恢复模式）：

```bash
exec: PROJECT="[PROJECT]" && \
  mkdir -p "$HOME/research/$PROJECT/logs" && \
  nohup python3 engine/watchdog.py "$HOME/research/$PROJECT" \
    >> "$HOME/research/$PROJECT/logs/watchdog.out" 2>&1 & \
  echo $! > "$HOME/research/$PROJECT/logs/watchdog.pid"
```

默认行为：
- 每 60 秒轮询一次
- 10 分钟无活动触发自动续接
- 若检测到“文件已到下一阶段、phase 未推进”，3 分钟后即触发续接
- 若 `review.md + references.bib` 稳定 30 分钟，视为研究已完成并停止守夜
- 每 5 分钟写一条 watchdog 心跳到 run log

## 停止命令模板

```bash
exec: PROJECT="[PROJECT]" && PID_FILE="$HOME/research/$PROJECT/logs/watchdog.pid" && \
  if [ -f "$PID_FILE" ]; then kill "$(cat "$PID_FILE")" 2>/dev/null || true; fi
```

## 产物

- `~/research/[PROJECT]/logs/supervisor_[RUN_ID].json`：监督状态（注入次数、最近注入原因、停止原因）
- `~/research/[PROJECT]/logs/overnight_report_[RUN_ID].md`：夜间守护报告（停机原因、当前 phase、建议下一步）
- `~/research/[PROJECT]/logs/overnight_report.md`：最新一份夜间守护报告镜像
- `~/research/[PROJECT]/logs/watchdog.out`：watchdog 运行输出
- `~/research/[PROJECT]/logs/[RUN_ID].log`：追加 watchdog 心跳/注入事件
- `~/research/[PROJECT]/logs/latest.log`：自动同步

## 断点续接判定（幂等）

- `candidates.csv + search_log.md` 存在：判定 Phase 1 可推进
- `matrix.csv + notes/*.md` 存在：判定 Phase 2 可推进
- `review.md + references.bib` 存在：判定可直接收尾

若文件已存在，续接指令会要求“跳过重复工作”。

Related Skills

verifier

from gy-hou/trendr

Independent verification of literature review quality — citation checks, claim tracing, coverage analysis

review-writer

from gy-hou/trendr

将论文笔记和对比矩阵综合为结构化学术文献综述，含 BibTeX 引用

research-vault

from gy-hou/trendr

将研究成果持久化到 Obsidian vault，维护论文池索引。支持每日研究日志、论文卡片、综述归档，以及跨项目论文去重和快速检索。

platform-hotspots

from gy-hou/trendr

Collect and summarize Zhihu, Xiaohongshu, X, Reddit, YouTube, GitHub Trending, Hacker News, and Product Hunt hotspots with strict Chrome CDP routing and reproducible extraction commands.

paper-scout

from gy-hou/trendr

9 源学术论文搜索与筛选（arXiv, Semantic Scholar, OpenAlex, PubMed, CrossRef, DBLP, Europe PMC, bioRxiv, Papers with Code），利用本机已安装工具，零额外依赖

paper-analyzer

from gy-hou/trendr

从学术论文中提取结构化信息，生成标准化笔记和文献对比矩阵

chrome-cdp-setup

from gy-hou/trendr

Chrome 146+ CDP remote debugging architecture — dual-instance setup, cookie sync, profile isolation, and troubleshooting "Allow remote debugging" popups.

swarm-attach-watchdog

from stevengonsalvez/agents-in-a-box

Retrofit a watchdog daemon onto an existing v1 swarm (no recreation). Upgrades team.json to v2 schema and spawns the watchdog tmux session.

claude-watchdog

from Demerzels-lab/elsamultiskillagent

Monitor the Claude API for outages and latency spikes with rich Telegram alerts.

system-watchdog

from Demerzels-lab/elsamultiskillagent

System resource monitoring that detects wasteful or suspicious processes.

swe-cli-skills

from SylphAI-Inc/skills

Senior engineer CLI expertise for AI agents — workflows, safety guardrails, gotchas, and anti-patterns across cloud, IaC, containers, databases, dev tools, and platforms

DevOps & Infrastructure

PicoClaw Fleet

from EricGrill/agents-skills-plugins

Orchestrate a fleet of remote PicoClaw workers over SSH for fast, ephemeral one-shot tasks.

DevOps & Infrastructure