alerting-and-monitoring
Define alerts, escalation, and incident response.
Best use case
alerting-and-monitoring is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Define alerts, escalation, and incident response.
Teams using alerting-and-monitoring should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/alerting-and-monitoring/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How alerting-and-monitoring Compares
| Feature / Agent | alerting-and-monitoring | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Define alerts, escalation, and incident response.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Alerting And Monitoring ## Purpose - Define alerts, escalation, and incident response. ## Preconditions - Access to system context (repos, infra, environments) - Confirmed requirements and constraints - Required approvals for security, compliance, or governance ## Inputs - Problem statement and scope - Current architecture or system constraints - Non-functional requirements (performance, security, compliance) - Target stack and environment ## Outputs - Design or implementation plan - Required artifacts (diagrams, configs, specs, checklists) - Validation steps and acceptance criteria ## Detailed Step-by-Step Procedures 1. Clarify scope, constraints, and success metrics. 2. Review current system state, dependencies, and integration points. 3. Select patterns, tools, and architecture options that match constraints. 4. Produce primary artifacts (docs/specs/configs/code stubs). 5. Validate against requirements and known risks. 6. Provide rollout and rollback guidance. ## Decision Trees and Conditional Logic - If compliance or regulatory scope applies -> add required controls and audit steps. - If latency budget is strict -> choose low-latency storage and caching. - Else -> prefer cost-optimized storage and tiering. - If data consistency is critical -> prefer transactional boundaries and strong consistency. - Else -> evaluate eventual consistency or async processing. ## Error Handling and Edge Cases - Partial failures across dependencies -> isolate blast radius and retry with backoff. - Data corruption or loss risk -> enable backups and verify restore path. - Limited access to systems -> document gaps and request access early. - Legacy dependencies with limited change tolerance -> use adapters and phased rollout. ## Tool Requirements and Dependencies - CLI and SDK tooling for the target stack - Credentials or access tokens for required environments - Diagramming or spec tooling when producing docs ## Stack Profiles - Use Profile A, B, or C from `skills/STACK_PROFILES.md`. - Note selected profile in outputs for traceability. ## Validation - Requirements coverage check - Security and compliance review - Performance and reliability review - Peer or stakeholder sign-off ## Rollback Procedures - Revert config or deployment to last known good state. - Roll back database migrations if applicable. - Verify service health, data integrity, and error rates after rollback. ## Success Metrics - Measurable outcomes (latency, error rate, uptime, cost) - Acceptance thresholds defined with stakeholders ## Example Workflows and Use Cases - Minimal: apply the skill to a small service or single module. - Production: apply the skill to a multi-service or multi-tenant system.
Related Skills
alerting
Real-time alerting and notification system for Univers infrastructure. Use this when you need to monitor system health, service status, and send proactive alerts when thresholds are exceeded or services fail.
alerting-rules-agent
Designs and configures alerting rules for monitoring systems
observability-monitoring-performance-engineer
Expert performance engineer specializing in modern observability, application optimization, and scalable system performance. Masters OpenTelemetry, distributed tracing, load testing, multi-tier caching, Core Web Vitals, and performance monitoring. Handles end-to-end optimization, real user monitoring, and scalability patterns. Use PROACTIVELY for performance optimization, observability, or scalability challenges. Use when: the task directly matches performance engineer responsibilities within plugin observability-monitoring. Do not use when: a more specific framework or task-focused skill is clearly a better match.
monitoring
Production health check, uptime monitoring, performance metrics. DevOps engineer agent için monitoring best practices.
blazemeter-api-monitoring
Comprehensive guide for BlazeMeter API Monitoring, including test creation, configuration, scripting, integrations, notifications, and management. Use when working with API Monitoring tests for (1) Creating and configuring API tests, (2) Writing custom scripts (Initial, Pre-request, Post-response), (3) Integrating with third-party services (Slack, PagerDuty, Datadog, etc.), (4) Managing teams, buckets, and RBAC, (5) Configuring notifications and sharing results, (6) Using test data (CSV, Data Entities), (7) Advanced features (GraphQL, SOAP, file uploads, environments), or any other API Monitoring tasks.
sentry-setup-ai-monitoring
Setup Sentry AI Agent Monitoring in any project. Use this when asked to add AI monitoring, track LLM calls, monitor AI agents, or instrument OpenAI/Anthropic/Vercel AI/LangChain/Google GenAI. Automatically detects installed AI SDKs and configures the appropriate Sentry integration.
Data Quality Monitoring
Data Quality (DQ) Monitoring is the continuous process of validating data against predefined rules and expectations. In a modern data stack, monitoring must happen at every stage: **Ingestion**, **Tra
apify-brand-reputation-monitoring
Track reviews, ratings, sentiment, and brand mentions across Google Maps, Booking.com, TripAdvisor, Facebook, Instagram, YouTube, and TikTok. Use when user asks to monitor brand reputation, analyze...
bgo
Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.
go-microservices
Production-ready Go microservices patterns including Gin, Echo, gRPC, clean architecture, dependency injection, error handling, middleware, testing, Docker containerization, Kubernetes deployment, distributed tracing, observability with Prometheus, high-performance APIs, concurrent processing, database integration with GORM, Redis caching, message queues, and cloud-native best practices.
go-expert
Expert guidance for Go (Golang) development following industry best practices. Use when writing Go code, reviewing PRs, bootstrapping new services, configuring linters, implementing observability, or troubleshooting Go applications. Covers SOLID principles, Gang of Four design patterns, domain-driven structure, error handling, context patterns, concurrency, testing, structured logging, health endpoints, and CI gates.
gke-deployment
Deploy, configure, and manage Kubernetes workloads on GKE with Deployments, Services, Ingress, HPA, health probes, ConfigMaps, and Secrets. Use when deploying containers to GKE, configuring load balancers, setting up autoscaling, writing health checks, managing environment configs, or troubleshooting pod issues.