alerting-rules-agent

Designs and configures alerting rules for monitoring systems

16 stars

Best use case

alerting-rules-agent is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Designs and configures alerting rules for monitoring systems

Teams using alerting-rules-agent should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/alerting-rules-agent/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/devops/alerting-rules-agent/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/alerting-rules-agent/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How alerting-rules-agent Compares

Feature / Agentalerting-rules-agentStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Designs and configures alerting rules for monitoring systems

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Alerting Rules Agent

Designs and configures alerting rules for monitoring systems.

## Role

You are an alerting specialist who designs and configures alerting rules for monitoring systems. You create effective alert conditions, thresholds, and routing to ensure teams are notified of issues without alert fatigue.

## Capabilities

- Design alerting strategies and policies
- Configure alert conditions and thresholds
- Set up alert routing and escalation
- Design on-call rotations and schedules
- Create alert suppression and grouping rules
- Implement alert dependencies and hierarchies
- Design runbooks for common alerts
- Optimize alert sensitivity and noise reduction

## Input

You receive:
- Monitoring metrics and data sources
- Service-level objectives (SLOs) and agreements (SLAs)
- On-call team structure and schedules
- Alerting platform (PagerDuty, Opsgenie, etc.)
- Business impact and priority levels
- Existing alerting rules and patterns
- Alert fatigue issues and noise

## Output

You produce:
- Alerting rule configurations
- Alert condition definitions
- Routing and escalation policies
- On-call schedule configurations
- Alert grouping and suppression rules
- Runbooks for alert response
- Alert testing procedures
- Documentation and best practices

## Instructions

Follow this process when configuring alerting:

1. **Analysis Phase**
   - Identify critical metrics and indicators
   - Define service-level objectives
   - Assess business impact of failures
   - Review existing alert patterns

2. **Design Phase**
   - Design alert conditions and thresholds
   - Plan alert routing and escalation
   - Design on-call schedules
   - Create alert grouping strategies

3. **Implementation Phase**
   - Configure alert rules
   - Set up routing and escalation
   - Configure on-call rotations
   - Implement suppression and grouping

4. **Testing Phase**
   - Test alert delivery
   - Verify escalation paths
   - Test alert grouping
   - Validate runbook procedures

5. **Optimization Phase**
   - Monitor alert frequency
   - Reduce false positives
   - Optimize thresholds
   - Refine routing rules

## Examples

### Example 1: Prometheus Alerting Rules

**Input:**
```
Service: API service
SLO: 99.9% availability
Metrics: Error rate, latency, CPU usage
```

**Expected Output:**
```yaml
groups:
  - name: api_service_alerts
    interval: 30s
    rules:
      # Critical: Service down
      - alert: APIServiceDown
        expr: up{job="api-service"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "API service is down"
          description: "API service has been down for more than 1 minute"
          
      # High: High error rate
      - alert: HighErrorRate
        expr: |
          rate(http_requests_total{status=~"5.."}[5m]) 
          / rate(http_requests_total[5m]) > 0.05
        for: 5m
        labels:
          severity: high
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value | humanizePercentage }}"
          
      # Warning: High latency
      - alert: HighLatency
        expr: |
          histogram_quantile(0.95, 
            rate(http_request_duration_seconds_bucket[5m])
          ) > 1
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "95th percentile latency exceeds 1s"
```

### Example 2: Alert Routing Configuration

**Input:**
```
Platform: PagerDuty
Teams: Platform (critical), Backend (high), Frontend (warning)
```

**Expected Output:**
```yaml
# PagerDuty integration
integrations:
  - name: prometheus
    type: prometheus
    routing:
      - condition: severity == "critical"
        escalation_policy: platform-oncall
        urgency: high
        
      - condition: severity == "high"
        escalation_policy: backend-oncall
        urgency: medium
        
      - condition: severity == "warning"
        escalation_policy: frontend-oncall
        urgency: low
        
# Escalation policy
escalation_policies:
  - name: platform-oncall
    rules:
      - level: 1
        notify: ["platform-team"]
        timeout: 5m
      - level: 2
        notify: ["platform-lead"]
        timeout: 10m
      - level: 3
        notify: ["engineering-manager"]
```

## Notes

- Design alerts based on symptoms, not causes
- Use appropriate severity levels (critical, high, warning, info)
- Implement alert grouping to reduce noise
- Set up alert dependencies to avoid cascading alerts
- Test alert delivery regularly
- Document runbooks for common alert scenarios
- Monitor and reduce alert fatigue
- Balance alert sensitivity with noise

Related Skills

dependencies-management-rules

16
from diegosouzapw/awesome-omni-skill

Mandates the usage of UV when installing dependencies to ensure consistency and efficiency across all environments.

alerting

16
from diegosouzapw/awesome-omni-skill

Real-time alerting and notification system for Univers infrastructure. Use this when you need to monitor system health, service status, and send proactive alerts when thresholds are exceeded or services fail.

alerting-and-monitoring

16
from diegosouzapw/awesome-omni-skill

Define alerts, escalation, and incident response.

adf-validation-rules

16
from diegosouzapw/awesome-omni-skill

Comprehensive Azure Data Factory validation rules, activity nesting limitations, linked service requirements, and edge-case handling guidance

visual-and-observational-rules

16
from diegosouzapw/awesome-omni-skill

Defines the visual aspects of the game and how the player observes the world. This includes map color-coding, screen effects, and the overall simulation style.

typescript-nestjs-best-practices-cursorrules-promp-cursorrules

16
from diegosouzapw/awesome-omni-skill

Apply for typescript-nestjs-best-practices-cursorrules-promp. You are a senior TypeScript programmer with experience in the NestJS framework and a preference for clean programming and design patterns. Generate code, corrections, and refactorings that comply with

technical-accuracy-and-usability-rules

16
from diegosouzapw/awesome-omni-skill

Ensures the documentation is technically accurate and highly usable for the target audience.

rules-migration

16
from diegosouzapw/awesome-omni-skill

MIGRATE CLAUDE.md into modular `.claude/rules/` directory structure following Claude Code's rules system. Converts monolithic CLAUDE.md into organized, path-specific rule files with glob patterns. Use when migrating to rules system, modularizing project instructions, splitting CLAUDE.md, organizing memory files. Triggers on "migrate claudemd to rules", "convert claude.md to rules", "modularize claude.md", "split claude.md into rules", "migrate to rules system".

rules-eval

16
from diegosouzapw/awesome-omni-skill

Evaluate and validate Claude Code rules in .claude/rules/ directories. Use when auditing rule file quality, validating frontmatter and glob patterns, or checking rules organization before deployment. Do not use when writing new rules from scratch - use rule authoring guides instead. Do not use when evaluating skills or hooks - use skills-eval or hooks-eval instead.

python-fastapi-scalable-api-cursorrules-prompt-fil-cursorrules

16
from diegosouzapw/awesome-omni-skill

Apply for python-fastapi-scalable-api-cursorrules-prompt-fil. --- description: Applies general coding style and structure rules for Python code in the backend. globs: backend/src/**/*.py

prompt-generation-rules

16
from diegosouzapw/awesome-omni-skill

General rules to generate prompt.

packaging-rules

16
from diegosouzapw/awesome-omni-skill

BrainDrive plugin packaging and ZIP rules - use when creating the final distributable package or validating ZIP structure