ops-detection-incident-routing
Detect agent runtime anomalies and route incidents through approval-safe guardrails. Use when you need deterministic checks for cron failures, context pressure, dangling sessions, token spikes, and a controlled incident workflow (detect -> route -> investigate -> remediate).
Best use case
ops-detection-incident-routing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Detect agent runtime anomalies and route incidents through approval-safe guardrails. Use when you need deterministic checks for cron failures, context pressure, dangling sessions, token spikes, and a controlled incident workflow (detect -> route -> investigate -> remediate).
Teams using ops-detection-incident-routing should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/ops-detection-incident-routing/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How ops-detection-incident-routing Compares
| Feature / Agent | ops-detection-incident-routing | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Detect agent runtime anomalies and route incidents through approval-safe guardrails. Use when you need deterministic checks for cron failures, context pressure, dangling sessions, token spikes, and a controlled incident workflow (detect -> route -> investigate -> remediate).
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# ops-detection-incident-routing
Run deterministic operations checks and route incidents with guardrails.
This skill ships a small toolkit for:
1. detecting runtime anomalies from local state/log files
2. applying in-flight + cooldown guards
3. emitting structured incident actions for investigator/remediator flows
## Use This Skill
Use this skill when you need a production-safe ops loop for agent systems and do not want ad-hoc prompt-only monitoring.
## Files
- `scripts/ops-threshold-detector.sh`
reads session/cron/snapshot state and appends detector JSONL events
- `scripts/incident-guard-check.sh`
checks in-flight/cooldown guard status for a check id
- `scripts/incident-state-update.sh`
updates guard state for start/complete/fail transitions
- `scripts/ops-incident-router.sh`
converts detector alerts into structured actions
- `scripts/ops-detector-cycle.sh`
detector + router cycle runner
- `scripts/setup.sh`
dependency checks + local example scaffold
- `scripts/clean-generated.sh`
removes generated `.jsonl` and lock artifacts before republishing from a used folder
## Setup
```bash
bash scripts/setup.sh
```
## Quick Start
Run one full dry-run cycle:
```bash
bash scripts/ops-detector-cycle.sh \
--workspace "$(pwd)/examples/workspace" \
--state-file "$(pwd)/examples/incident-state.json" \
--detector-out "$(pwd)/examples/ops-detector.jsonl" \
--router-out "$(pwd)/examples/router-actions.jsonl"
```
Run live mode (router also acquires in-flight locks):
```bash
bash scripts/ops-detector-cycle.sh \
--workspace "$(pwd)/examples/workspace" \
--state-file "$(pwd)/examples/incident-state.json" \
--detector-out "$(pwd)/examples/ops-detector.jsonl" \
--router-out "$(pwd)/examples/router-actions.jsonl" \
--live
```
## Output Contract
Detector writes one JSON line per run:
```json
{
"ts": "2026-02-24T02:30:00Z",
"status": "ALERT",
"checks": 5,
"alerts": [{"sev":"Sev-2","trigger":"cron_failure","value":2,"threshold":0}],
"gaps": []
}
```
Router emits one JSON action per alert decision:
```json
{"action":"spawn","check_id":"cron_failure","severity":"Sev-2","mode":"dry-run","task":"Investigate incident: cron_failure"}
```
## Operational Pattern
1. schedule `ops-threshold-detector.sh` (every 5-15 min)
2. feed the latest detector line to `ops-incident-router.sh`
3. spawn investigator/remediator only from router output
4. keep remediation behind explicit owner approval
For details, read `references/architecture.md`.Related Skills
seo-forensic-incident-response
Investigate sudden drops in organic traffic or rankings and run a structured forensic SEO incident response with triage, root-cause analysis and recovery plan.
sage-workspace-detection
Sage 工作区检测开发指南,涵盖项目类型检测、语言识别、依赖分析、Git 信息
incident-runbook-templates
Create structured incident response runbooks with step-by-step procedures, escalation paths, and recovery actions. Use when building runbooks, responding to incidents, or establishing incident resp...
alphavantage-routing
Reference for all Alpha Vantage MCP tools. Use when exploring available data.
pentest-outbound-interaction-oob-detection
Security assessment skill for outbound interaction and out-of-band (OOB) validation. Use when prompts include SSRF callback confirmation, blind XSS beacons, webhook abuse, XXE/OOB behavior, DNS/HTTP callback correlation, or asynchronous server-side interaction proof. Do not use when vulnerabilities are fully in-band and require no external callback correlation.
anomaly-detection
Rule-based anomaly detection for production systems with configurable thresholds, cooldown periods to prevent alert storms, and error pattern tracking for repeated failures.
ambiguity-detection
Detects critical product, scope, data, risk, and success ambiguities in requirements or PRDs and expresses them as structured, decision-forcing clarification questions without proposing solutions or workflow actions.
u01789-human-approval-routing-for-remote-team-collaboration
Operate the "Human Approval Routing for remote team collaboration" capability in production for remote team collaboration workflows. Use when mission execution explicitly requires this capability and outcomes must be reproducible, policy-gated, and handoff-ready.
u01784-human-approval-routing-for-multilingual-translation-services
Operate the "Human Approval Routing for multilingual translation services" capability in production for multilingual translation services workflows. Use when mission execution explicitly requires this capability and outcomes must be reproducible, policy-gated, and handoff-ready.
u01689-human-approval-routing-for-education-support-services
Operate the "Human Approval Routing for education support services" capability in production for education support services workflows. Use when mission execution explicitly requires this capability and outcomes must be reproducible, policy-gated, and handoff-ready.
UMR-LMR-PMD-detection
This pipeline performs genome-wide segmentation of CpG methylation profiles to identify Unmethylated Regions (UMRs), Low-Methylated Regions (LMRs), and Partially Methylated Domains (PMDs) using whole-genome bisulfite sequencing (WGBS) methylation calls. The pipeline provides high-resolution enhancer-like LMRs, promoter-associated UMRs, and large-scale PMDs characteristic of reprogramming, aging, or cancer methylomes, enabling integration with chromatin accessibility, TF binding, and genome architecture analyses.
tech-detection
Detects project tech stack including languages, frameworks, package managers, and cloud platforms. Use when analyzing a project, detecting technologies, bootstrapping infrastructure, or setting up permissions. Generates project-context.json with detected stack.