trendr-watchdog
运行时监督器。监控 run_status/progress/log 活跃度,发现卡住后自动向 owner session 注入断点续跑指令。
Best use case
trendr-watchdog is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
运行时监督器。监控 run_status/progress/log 活跃度,发现卡住后自动向 owner session 注入断点续跑指令。
Teams using trendr-watchdog should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/trendr-watchdog/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How trendr-watchdog Compares
| Feature / Agent | trendr-watchdog | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
运行时监督器。监控 run_status/progress/log 活跃度,发现卡住后自动向 owner session 注入断点续跑指令。
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# TrendR Watchdog Skill
用于修复多 agent 运行时的“会话分裂/提前停转”问题:
- 子会话完成但主会话未续接
- 状态文件长时间不刷新
- Phase 文件已产出但 orchestration 没推进
> 使用前完整阅读本文件。
> 核心守夜逻辑在 `supervisor.py`;`watchdog.py` 保留为兼容旧启动命令的入口。
## Runtime Router(必读)
识别当前 runtime,只读取对应 sibling,另一方休眠:
- `openclaw` → 本文件内原有指令块仍然有效(`supervisor.py` 注入模式)
- `claude-code` → **跳过本文件的指令块**,读 `./claude-code.md` 获取 Claude Code hooks 驱动方式
- `codex` / `cli` → **跳过本文件的指令块**,读 `./codex.md` 获取 Codex file-based heartbeat 协议
本节之后的章节描述 **共享知识**(检测逻辑、故障条件、heartbeat 协议)。指令块保持现状(OpenClaw 语法),Claude Code 读者请切换到 `./claude-code.md`,Codex/CLI 读者请切换到 `./codex.md`。
## Runtime Strategy
- `openclaw`: 使用本文件的 `supervisor.py` 注入恢复消息(`openclaw agent --session-id ...`)。
- `codex` / `claude-code` / `cli`: 不做会话注入;使用 `engine/watchdog.py` 监控并写 `resume_request.json`,由状态机循环消费恢复。
## 前置条件
- `python3` 可用
- `openclaw` CLI(仅 `openclaw` runtime 需要)
- 当前任务目录使用 `~/research/[PROJECT]/`
## 启动流程(建议)
1. 启动前先拿当前主会话 ID,写入 `run_status.json.owner_session_id`
2. 启动 watchdog 常驻进程(后台)
3. 任务结束(completed/failed)后,停止 watchdog
## 启动命令模板
`openclaw` runtime(会话注入模式):
```bash
exec: PROJECT="[PROJECT]" && RUN_ID="[RUN_ID]" && SESSION_ID="[OWNER_SESSION_ID]" && \
mkdir -p "$HOME/research/$PROJECT/logs" && \
nohup python3 "$HOME/.openclaw/workspace/skills/trendr-watchdog/supervisor.py" \
--project "$PROJECT" \
--run-id "$RUN_ID" \
--session-id "$SESSION_ID" \
--poll-sec 60 \
--idle-timeout-sec 600 \
--phase-mismatch-grace-sec 180 \
--artifact-complete-grace-sec 1800 \
--resume-cooldown-sec 300 \
--heartbeat-sec 300 \
--max-resume 12 \
>> "$HOME/research/$PROJECT/logs/watchdog.out" 2>&1 & \
echo $! > "$HOME/research/$PROJECT/logs/watchdog.pid"
```
`codex` / `claude-code` / `cli` runtime(文件恢复模式):
```bash
exec: PROJECT="[PROJECT]" && \
mkdir -p "$HOME/research/$PROJECT/logs" && \
nohup python3 engine/watchdog.py "$HOME/research/$PROJECT" \
>> "$HOME/research/$PROJECT/logs/watchdog.out" 2>&1 & \
echo $! > "$HOME/research/$PROJECT/logs/watchdog.pid"
```
默认行为:
- 每 60 秒轮询一次
- 10 分钟无活动触发自动续接
- 若检测到“文件已到下一阶段、phase 未推进”,3 分钟后即触发续接
- 若 `review.md + references.bib` 稳定 30 分钟,视为研究已完成并停止守夜
- 每 5 分钟写一条 watchdog 心跳到 run log
## 停止命令模板
```bash
exec: PROJECT="[PROJECT]" && PID_FILE="$HOME/research/$PROJECT/logs/watchdog.pid" && \
if [ -f "$PID_FILE" ]; then kill "$(cat "$PID_FILE")" 2>/dev/null || true; fi
```
## 产物
- `~/research/[PROJECT]/logs/supervisor_[RUN_ID].json`:监督状态(注入次数、最近注入原因、停止原因)
- `~/research/[PROJECT]/logs/overnight_report_[RUN_ID].md`:夜间守护报告(停机原因、当前 phase、建议下一步)
- `~/research/[PROJECT]/logs/overnight_report.md`:最新一份夜间守护报告镜像
- `~/research/[PROJECT]/logs/watchdog.out`:watchdog 运行输出
- `~/research/[PROJECT]/logs/[RUN_ID].log`:追加 watchdog 心跳/注入事件
- `~/research/[PROJECT]/logs/latest.log`:自动同步
## 断点续接判定(幂等)
- `candidates.csv + search_log.md` 存在:判定 Phase 1 可推进
- `matrix.csv + notes/*.md` 存在:判定 Phase 2 可推进
- `review.md + references.bib` 存在:判定可直接收尾
若文件已存在,续接指令会要求“跳过重复工作”。Related Skills
verifier
Independent verification of literature review quality — citation checks, claim tracing, coverage analysis
review-writer
将论文笔记和对比矩阵综合为结构化学术文献综述,含 BibTeX 引用
research-vault
将研究成果持久化到 Obsidian vault,维护论文池索引。支持每日研究日志、论文卡片、综述归档,以及跨项目论文去重和快速检索。
platform-hotspots
Collect and summarize Zhihu, Xiaohongshu, X, Reddit, YouTube, GitHub Trending, Hacker News, and Product Hunt hotspots with strict Chrome CDP routing and reproducible extraction commands.
paper-scout
9 源学术论文搜索与筛选(arXiv, Semantic Scholar, OpenAlex, PubMed, CrossRef, DBLP, Europe PMC, bioRxiv, Papers with Code),利用本机已安装工具,零额外依赖
paper-analyzer
从学术论文中提取结构化信息,生成标准化笔记和文献对比矩阵
chrome-cdp-setup
Chrome 146+ CDP remote debugging architecture — dual-instance setup, cookie sync, profile isolation, and troubleshooting "Allow remote debugging" popups.
swarm-attach-watchdog
Retrofit a watchdog daemon onto an existing v1 swarm (no recreation). Upgrades team.json to v2 schema and spawns the watchdog tmux session.
claude-watchdog
Monitor the Claude API for outages and latency spikes with rich Telegram alerts.
system-watchdog
System resource monitoring that detects wasteful or suspicious processes.
swe-cli-skills
Senior engineer CLI expertise for AI agents — workflows, safety guardrails, gotchas, and anti-patterns across cloud, IaC, containers, databases, dev tools, and platforms
PicoClaw Fleet
Orchestrate a fleet of remote PicoClaw workers over SSH for fast, ephemeral one-shot tasks.