ops-detection-incident-routing

Detect agent runtime anomalies and route incidents through approval-safe guardrails. Use when you need deterministic checks for cron failures, context pressure, dangling sessions, token spikes, and a controlled incident workflow (detect -> route -> investigate -> remediate).

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

ops-detection-incident-routing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using ops-detection-incident-routing should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ops-detection-incident-routing/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/tools/ops-detection-incident-routing/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ops-detection-incident-routing/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ops-detection-incident-routing Compares

Feature / Agent	ops-detection-incident-routing	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# ops-detection-incident-routing

Run deterministic operations checks and route incidents with guardrails.

This skill ships a small toolkit for:

1. detecting runtime anomalies from local state/log files
2. applying in-flight + cooldown guards
3. emitting structured incident actions for investigator/remediator flows

## Use This Skill

Use this skill when you need a production-safe ops loop for agent systems and do not want ad-hoc prompt-only monitoring.

## Files

- `scripts/ops-threshold-detector.sh`
  reads session/cron/snapshot state and appends detector JSONL events
- `scripts/incident-guard-check.sh`
  checks in-flight/cooldown guard status for a check id
- `scripts/incident-state-update.sh`
  updates guard state for start/complete/fail transitions
- `scripts/ops-incident-router.sh`
  converts detector alerts into structured actions
- `scripts/ops-detector-cycle.sh`
  detector + router cycle runner
- `scripts/setup.sh`
  dependency checks + local example scaffold
- `scripts/clean-generated.sh`
  removes generated `.jsonl` and lock artifacts before republishing from a used folder

## Setup

```bash
bash scripts/setup.sh
```

## Quick Start

Run one full dry-run cycle:

```bash
bash scripts/ops-detector-cycle.sh \
  --workspace "$(pwd)/examples/workspace" \
  --state-file "$(pwd)/examples/incident-state.json" \
  --detector-out "$(pwd)/examples/ops-detector.jsonl" \
  --router-out "$(pwd)/examples/router-actions.jsonl"
```

Run live mode (router also acquires in-flight locks):

```bash
bash scripts/ops-detector-cycle.sh \
  --workspace "$(pwd)/examples/workspace" \
  --state-file "$(pwd)/examples/incident-state.json" \
  --detector-out "$(pwd)/examples/ops-detector.jsonl" \
  --router-out "$(pwd)/examples/router-actions.jsonl" \
  --live
```

## Output Contract

Detector writes one JSON line per run:

```json
{
  "ts": "2026-02-24T02:30:00Z",
  "status": "ALERT",
  "checks": 5,
  "alerts": [{"sev":"Sev-2","trigger":"cron_failure","value":2,"threshold":0}],
  "gaps": []
}
```

Router emits one JSON action per alert decision:

```json
{"action":"spawn","check_id":"cron_failure","severity":"Sev-2","mode":"dry-run","task":"Investigate incident: cron_failure"}
```

## Operational Pattern

1. schedule `ops-threshold-detector.sh` (every 5-15 min)
2. feed the latest detector line to `ops-incident-router.sh`
3. spawn investigator/remediator only from router output
4. keep remediation behind explicit owner approval

For details, read `references/architecture.md`.

Related Skills

seo-forensic-incident-response

from diegosouzapw/awesome-omni-skill

Investigate sudden drops in organic traffic or rankings and run a structured forensic SEO incident response with triage, root-cause analysis and recovery plan.

sage-workspace-detection

from diegosouzapw/awesome-omni-skill

Sage 工作区检测开发指南，涵盖项目类型检测、语言识别、依赖分析、Git 信息

incident-runbook-templates

from diegosouzapw/awesome-omni-skill

Create structured incident response runbooks with step-by-step procedures, escalation paths, and recovery actions. Use when building runbooks, responding to incidents, or establishing incident resp...

alphavantage-routing

from diegosouzapw/awesome-omni-skill

Reference for all Alpha Vantage MCP tools. Use when exploring available data.

pentest-outbound-interaction-oob-detection

from diegosouzapw/awesome-omni-skill

Security assessment skill for outbound interaction and out-of-band (OOB) validation. Use when prompts include SSRF callback confirmation, blind XSS beacons, webhook abuse, XXE/OOB behavior, DNS/HTTP callback correlation, or asynchronous server-side interaction proof. Do not use when vulnerabilities are fully in-band and require no external callback correlation.

anomaly-detection

from diegosouzapw/awesome-omni-skill

Rule-based anomaly detection for production systems with configurable thresholds, cooldown periods to prevent alert storms, and error pattern tracking for repeated failures.

ambiguity-detection

from diegosouzapw/awesome-omni-skill

Detects critical product, scope, data, risk, and success ambiguities in requirements or PRDs and expresses them as structured, decision-forcing clarification questions without proposing solutions or workflow actions.

u01789-human-approval-routing-for-remote-team-collaboration

from diegosouzapw/awesome-omni-skill

Operate the "Human Approval Routing for remote team collaboration" capability in production for remote team collaboration workflows. Use when mission execution explicitly requires this capability and outcomes must be reproducible, policy-gated, and handoff-ready.

u01784-human-approval-routing-for-multilingual-translation-services

from diegosouzapw/awesome-omni-skill

Operate the "Human Approval Routing for multilingual translation services" capability in production for multilingual translation services workflows. Use when mission execution explicitly requires this capability and outcomes must be reproducible, policy-gated, and handoff-ready.

u01689-human-approval-routing-for-education-support-services

from diegosouzapw/awesome-omni-skill

Operate the "Human Approval Routing for education support services" capability in production for education support services workflows. Use when mission execution explicitly requires this capability and outcomes must be reproducible, policy-gated, and handoff-ready.

UMR-LMR-PMD-detection

from diegosouzapw/awesome-omni-skill

This pipeline performs genome-wide segmentation of CpG methylation profiles to identify Unmethylated Regions (UMRs), Low-Methylated Regions (LMRs), and Partially Methylated Domains (PMDs) using whole-genome bisulfite sequencing (WGBS) methylation calls. The pipeline provides high-resolution enhancer-like LMRs, promoter-associated UMRs, and large-scale PMDs characteristic of reprogramming, aging, or cancer methylomes, enabling integration with chromatin accessibility, TF binding, and genome architecture analyses.

tech-detection

from diegosouzapw/awesome-omni-skill

Detects project tech stack including languages, frameworks, package managers, and cloud platforms. Use when analyzing a project, detecting technologies, bootstrapping infrastructure, or setting up permissions. Generates project-context.json with detected stack.