agent-health-diagnostics
Diagnose and fix the 4 most common OpenClaw agent failures — heartbeat spam, API rate limit cascades, channel death loops, and memory/embedding errors. Battle-tested across a 6-agent multi-host deployment.
Best use case
agent-health-diagnostics is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Diagnose and fix the 4 most common OpenClaw agent failures — heartbeat spam, API rate limit cascades, channel death loops, and memory/embedding errors. Battle-tested across a 6-agent multi-host deployment.
Teams using agent-health-diagnostics should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/agent-health-diagnostics/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How agent-health-diagnostics Compares
| Feature / Agent | agent-health-diagnostics | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Diagnose and fix the 4 most common OpenClaw agent failures — heartbeat spam, API rate limit cascades, channel death loops, and memory/embedding errors. Battle-tested across a 6-agent multi-host deployment.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
AI Agent for YouTube Script Writing
Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.
SKILL.md Source
# Agent Health Diagnostics **Scripts available in the [Collective Skills repo](https://github.com/Bobalouie44/collective-skills/tree/main/references)** ## Overview When an OpenClaw agent misbehaves — spamming messages, going dark, burning API credits, or looping on dead channels — this skill provides the diagnostic playbook. Covers the 4 most common failure modes with exact commands to diagnose and fix each one. Battle-tested across a 6-agent deployment spanning 3 hosts (Windows + Linux + Proxmox). ## When to Use This Skill Use when you observe any of these symptoms: - Agent sending repeated heartbeat/status messages to Telegram/Discord/etc. - Agent goes silent despite gateway showing "active" - Logs show `429 Too many tokens` or `rate_limit` errors - Channel connection loops: `auto-restart attempt 1/10`, `2/10`, etc. - Memory search errors: `input length exceeds context length` - Gateway says "active" but agent doesn't respond to messages ## The 4 Failure Modes ### 1. Heartbeat Spam **Symptom:** Agent sends repeated messages every N minutes. **Root cause:** Heartbeat interval too low (10m = 144 messages/day) + verbose prompt that always generates output instead of HEARTBEAT_OK. **Quick fix:** ```bash # Check interval grep -A5 heartbeat ~/.openclaw/openclaw.json # Fix: set to 30m minimum, simplify prompt to checklist + HEARTBEAT_OK default # Then restart gateway openclaw gateway restart ``` **Prevention:** Never set heartbeat below 20 minutes. Heartbeat prompts should CHECK things, not CREATE things. ### 2. API Rate Limit Cascade **Symptom:** All models fail, agent goes dark. **Root cause:** Heartbeat + N crons = (N+1) API calls per interval. Exceeds provider TPM limit → all fallbacks exhausted simultaneously. **Quick fix:** ```bash # Check for rate limits journalctl -u <service> --since '1h ago' | grep '429\|rate_limit' # Count your crons (each burns tokens) openclaw cron list # Fix: reduce heartbeat to 30-60m, disable non-essential crons, stagger schedules ``` **Prevention:** Calculate token budget before adding crons. Each run ≈ 2K-10K tokens. Route heartbeats to cheap/local models. ### 3. Channel Death Loop **Symptom:** Logs show repeated `auto-restart attempt N/10` for IRC/Discord/etc. **Root cause:** Target server unreachable → health monitor restarts → fails again → loop. Each restart may trigger model calls, burning API tokens. **Quick fix:** ```bash # Check for loops journalctl -u <service> --since '1h ago' | grep 'auto-restart\|timed out' # Test connectivity nc -zv <target-ip> <target-port> -w 5 # Fix: disable the broken channel in openclaw.json # channels.<name>.enabled = false openclaw gateway restart ``` **Prevention:** Test connectivity BEFORE enabling channels. Disable channels you can't reach. ### 4. Memory/Embedding Overflow **Symptom:** `memory sync failed` or `input length exceeds context length` errors. **Root cause:** File too large for embedding model's context window (mxbai-embed-large = 8K tokens). **Quick fix:** Archive old sections of large files (MEMORY.md → memory/archive/). Keep active files under 8K tokens. **Prevention:** Don't let MEMORY.md grow unbounded. Archive quarterly. ## Remote Diagnostic Quick Reference | What | Command | |------|---------| | Service status | `systemctl is-active <service>` | | Recent logs | `journalctl -u <service> --since '1h ago' --no-pager \| tail -40` | | Live tail | `journalctl -u <service> -f` | | Rate limits | `journalctl -u <service> --since '1h ago' \| grep '429'` | | Cron list | `openclaw cron list` | | Port test | `nc -zv <ip> <port> -w 5` | | Config backup | `cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.bak` | ## Golden Rules 1. **Always back up config before editing.** `cp openclaw.json openclaw.json.bak` 2. **Always restart gateway after config changes.** Hot reload doesn't catch everything. 3. **Check logs before guessing.** `journalctl` tells you what's wrong 90% of the time. 4. **Calculate your API budget.** Heartbeat freq × (crons + 1) × avg tokens = burn rate. 5. **Disable what you can't reach.** Dead channels create loops that waste resources. 6. **"Configured" ≠ "working."** Verify with actual output after every change.
Related Skills
botlearn-healthcheck
botlearn-healthcheck — BotLearn autonomous health inspector for OpenClaw instances across 5 domains (hardware, config, security, skills, autonomy); triggers on system check, health report, diagnostics, or scheduled heartbeat inspection.
doctorbot-healthcheck-free
🩺 Free Security & Health Audit. Your OpenClaw deserves a check-up. This skill performs a non-invasive scan to detect security risks, outdated software, and misconfigurations.
healthy-meal-reminder
健康饮食提醒技能。每日三餐+下午茶定时提醒,每次3个方案ABC供选择,饭后30分钟自动跟进记录饮食+计算热量。按季节推荐应季低卡食谱,支持减肥/维持/增肌三模式,含运动搭配、周末放纵餐、互动问答和周报打卡。当用户提到饮食提醒、三餐提醒、吃什么、减肥食谱、健康饮食、meal reminder、吃了什么、体重打卡时激活。
health-check
每日安全检查。检查 OpenClaw Gateway、磁盘空间、内存使用等系统健康状态。触发时机:cron 定时任务或手动调用。
session-health-monitor
Context window health monitoring for OpenClaw agents — threshold warnings via Telegram, pre-compaction snapshots, and memory rotation.
Healthcheck Readiness Starter Skill
Description: Performs a quick risk posture check on the host and reports basic security/posture status.
healthkit-code-review
Reviews HealthKit code for authorization patterns, query usage, background delivery, and data type handling. Use when reviewing code with import HealthKit, HKHealthStore, HKSampleQuery, HKObserverQuery, or HKQuantityType.
bluebubbles-healthcheck
Diagnoses and auto-heals BlueBubbles ↔ OpenClaw iMessage connectivity. Use when: iMessages stop arriving after a gateway restart, webhook connection is broken, or user reports messages not coming through. Runs a 4-step diagnostic and auto-fixes webhook backoff, stale registrations, and gateway issues.
Huangdi Health Timer
12 two-hour energy cycles, 3 unique tips daily.
org-health-diagnostic
Cross-functional organizational health check combining signals from all C-suite roles. Scores 8 dimensions on a traffic-light scale with drill-down recommendations. Use when assessing overall company health, preparing for board reviews, identifying at-risk functions, or when user mentions org health, health check, or health dashboard.
healthie
Healthie — manage patients, appointments, goals, and documents via GraphQL API
---
name: article-factory-wechat