runbook
Generate operational runbooks for services, procedures, or incident response with step-by-step procedures, troubleshooting guides, and escalation paths
Best use case
runbook is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Generate operational runbooks for services, procedures, or incident response with step-by-step procedures, troubleshooting guides, and escalation paths
Teams using runbook should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/runbook/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How runbook Compares
| Feature / Agent | runbook | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Generate operational runbooks for services, procedures, or incident response with step-by-step procedures, troubleshooting guides, and escalation paths
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Runbook Generate operational runbooks for services, procedures, or incident response. Investigates the codebase and infrastructure to produce accurate, actionable procedures. ## When to Use - Creating operational documentation for a service - Documenting deployment, scaling, or maintenance procedures - Building incident response playbooks - Standardizing operational procedures across teams ## Input - **Topic**: Service name, operation type, or incident scenario - **Scope**: deployment, scaling, failover, maintenance, troubleshooting - **Optional**: Specific scenarios to cover ## Investigation Strategy Launch parallel investigation tracks to gather comprehensive information: ### Track 1: Codebase Exploration - Identify service entry points and configuration - Find health check endpoints - Map dependencies (databases, caches, external services) - Locate logging and metrics instrumentation - Find existing scripts or automation ### Track 2: Infrastructure Analysis - Review deployment manifests (Kubernetes, Terraform, etc.) - Identify scaling configuration - Map service dependencies - Find monitoring and alerting setup - Review backup and recovery procedures ### Track 3: External Research - Find operational best practices for the service type - Research common failure modes - Identify industry-standard procedures ## Output Generate the runbook document using the template at `references/templates/runbook.md`. The runbook should include: - Service overview and architecture - Dependencies with failure impact - Step-by-step procedures with actual commands - Troubleshooting guides for common issues - Escalation paths and contacts ## Behavior 1. Parse topic to identify service and operation scope 2. Launch parallel investigation tracks 3. Extract configuration, endpoints, and dependencies from codebase 4. Identify common operations and failure modes 5. Generate step-by-step procedures with actual commands 6. Document troubleshooting steps and escalation paths ## Constraints - **Accuracy**: All commands must be verified against actual codebase/infrastructure - **Actionable**: Every procedure must have concrete, executable steps - **Complete**: Include prerequisites, verification, and rollback for each procedure - **Maintainable**: Note dependencies that may change and require updates ## Example ``` Input: "Generate runbook for the payment-service" Investigation: - Found deployment at k8s/payment-service/ - Found health endpoints: /health, /ready - Dependencies: PostgreSQL (critical), Redis (cache), Stripe API - Scaling: HPA configured, min 3, max 10 replicas - Alerts: Prometheus rules in monitoring/ Generated Runbook: payment-service-runbook.md ## Overview - Service: payment-service - Owner: payments-team - Criticality: P1 ## Dependencies | Dependency | Type | Criticality | Failure Impact | |------------|------|-------------|----------------| | PostgreSQL | Database | Critical | Full outage | | Redis | Cache | High | Degraded latency | | Stripe API | External | Critical | Payment failures | ## Procedures ### Deployment 1. Verify no active transactions ```bash kubectl exec -it payment-service-0 -- curl localhost:8080/metrics | grep active_transactions ``` 2. Apply new deployment ```bash kubectl apply -f k8s/payment-service/deployment.yaml ``` 3. Monitor rollout ```bash kubectl rollout status deployment/payment-service ``` ### Scaling ```bash kubectl scale deployment payment-service --replicas=5 ``` ## Troubleshooting ### High Latency **Symptoms**: p99 latency > 500ms **Diagnosis**: ```bash kubectl top pods -l app=payment-service kubectl logs -l app=payment-service --tail=100 | grep -i slow ``` **Resolution**: Check Redis connection, scale if CPU > 80% ``` Begin by identifying the service or operation to document and launching investigation tracks.
Related Skills
Runbooks
Runbooks provide step-by-step procedures for operating and troubleshooting systems. Effective runbooks enable teams to handle incidents, perform maintenance, and operate systems consistently with clea
bgo
Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.
terraform-engineer
Use when implementing infrastructure as code with Terraform across AWS, Azure, or GCP. Invoke for module development, state management, provider configuration, multi-environment workflows, infrastructure testing.
terraform-diagrams
Generates architecture diagrams from Terraform code. Use when user has .tf files or asks to visualize Terraform infrastructure.
terraform-azurerm-set-diff-analyzer
Wave 5 migration placeholder for `awesome-copilot/terraform-azurerm-set-diff-analyzer` imported from antigravity-awesome-skills manifest.
terraform-aws-modules
Terraform module creation for AWS — reusable modules, state management, and HCL best practices. Use when building or reviewing Terraform AWS infrastructure.
terraform-analyzer
Specialized skill for analyzing Terraform configurations. Supports parsing, security scanning (tfsec, checkov), cost estimation (infracost), drift detection, and plan visualization across AWS, Azure, and GCP.
terradev-gpu-cloud
Cross-cloud GPU provisioning with NUMA-aligned topology optimization, K8s cluster creation, and inference overflow. Get real-time pricing across 11+ cloud providers, provision the cheapest GPUs in seconds, spin up production K8s clusters with automatic GPU-NIC pairing, and burst to cloud when your local GPU maxes out. BYOAPI — your keys never leave your machine.
tencent-cloud-pptx
Create professional Tencent Cloud themed presentations from markdown content. Use when users request: (1) Creating presentations with Tencent Cloud branding, (2) Converting markdown documents to PowerPoint slides, (3) Generating slides with automatic content structuring, (4) Creating bilingual (Chinese/English) technical presentations, (5) Adding AI-generated images to presentation slides. Keywords to watch: 腾讯云, Tencent Cloud, markdown to PPT, presentation generation, slides with images.
telegram-reminders
Send reminders and messages to Telegram with cloud-based scheduling. Use when the user wants to send immediate messages or schedule future reminders to Telegram. Supports text messages, timestamp-based scheduling, recurring reminders, viewing and canceling scheduled messages, and message history.
tech-detection
Detects project tech stack including languages, frameworks, package managers, and cloud platforms. Use when analyzing a project, detecting technologies, bootstrapping infrastructure, or setting up permissions. Generates project-context.json with detected stack.
team-lifecycle
Unified team skill for full lifecycle - spec/impl/test. All roles invoke this skill with --role arg for role-specific execution.