prometheus-patterns
PromQL queries, alerting rules, recording rules, Grafana dashboard JSON, SLO
Best use case
prometheus-patterns is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
PromQL queries, alerting rules, recording rules, Grafana dashboard JSON, SLO
Teams using prometheus-patterns should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/prometheus-patterns/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How prometheus-patterns Compares
| Feature / Agent | prometheus-patterns | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
PromQL queries, alerting rules, recording rules, Grafana dashboard JSON, SLO
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Prometheus Patterns
## PromQL Essentials
### Rate and Error Calculations
```promql
# Request rate (per second, 5m window)
rate(http_requests_total[5m])
# Error rate percentage
sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m])) * 100
# P99 latency from histogram
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
# P50 latency by endpoint
histogram_quantile(0.50,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le, handler)
)
# Saturation: CPU usage per pod
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)
/ sum(kube_pod_container_resource_limits{resource="cpu"}) by (pod) * 100
```
### SLO: Error Budget
```promql
# SLO: 99.9% availability over 30 days
# Error budget = 0.1% = 43.2 minutes/month
# Current burn rate (how fast consuming budget)
1 - (
sum(rate(http_requests_total{status!~"5.."}[1h]))
/ sum(rate(http_requests_total[1h]))
) / (1 - 0.999)
# Remaining error budget (percentage)
1 - (
sum(increase(http_requests_total{status=~"5.."}[30d]))
/ (sum(increase(http_requests_total[30d])) * 0.001)
)
```
## Recording Rules
```yaml
groups:
- name: sli_rules
interval: 30s
rules:
- record: job:http_request_rate:5m
expr: sum(rate(http_requests_total[5m])) by (job)
- record: job:http_error_rate:5m
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m])) by (job)
/ sum(rate(http_requests_total[5m])) by (job)
- record: job:http_latency_p99:5m
expr: |
histogram_quantile(0.99,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le, job)
)
```
## Alerting Rules
```yaml
groups:
- name: slo_alerts
rules:
- alert: HighErrorRate
expr: job:http_error_rate:5m > 0.01
for: 5m
labels:
severity: critical
annotations:
summary: "Error rate above 1% for {{ $labels.job }}"
runbook: "https://wiki.internal/runbooks/high-error-rate"
- alert: HighLatency
expr: job:http_latency_p99:5m > 0.5
for: 10m
labels:
severity: warning
annotations:
summary: "P99 latency above 500ms for {{ $labels.job }}"
- alert: ErrorBudgetBurn
expr: |
(
sum(rate(http_requests_total{status=~"5.."}[1h]))
/ sum(rate(http_requests_total[1h]))
) > 14.4 * 0.001
for: 2m
labels:
severity: critical
annotations:
summary: "Burning error budget 14.4x faster than allowed"
```
## Instrumentation (Go)
```go
import "github.com/prometheus/client_golang/prometheus"
var (
httpRequests = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total HTTP requests",
},
[]string{"method", "handler", "status"},
)
httpDuration = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "HTTP request duration",
Buckets: []float64{0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5},
},
[]string{"method", "handler"},
)
)
```
## Checklist
- [ ] RED metrics (Rate, Errors, Duration) for every service
- [ ] Recording rules for expensive queries
- [ ] Alerts have runbook links
- [ ] Error budget alerts with multi-window burn rate
- [ ] Histogram buckets match expected latency distribution
- [ ] Labels have low cardinality (no user IDs, request IDs)
- [ ] Grafana dashboards use recording rules, not raw queries
- [ ] Alert severity matches response SLA
## Anti-Patterns
- High cardinality labels (user_id, trace_id) causing metric explosion
- Using `avg()` for latency instead of histograms/quantiles
- Missing `for` clause in alerts causing alert storms
- Recording rules with too short intervals wasting resources
- Alerting on symptoms without linking to causes
- Not setting meaningful histogram bucket boundariesRelated Skills
websocket-patterns
Connection management, room patterns, reconnection strategies, message buffering, and binary protocol design.
vector-db-patterns
Embedding strategies, ANN algorithms, hybrid search, RAG chunking strategies, and reranking for semantic search and retrieval.
tracing-patterns
OpenTelemetry setup, span context propagation, sampling strategies, Jaeger queries
terraform-patterns
Module composition, state management, workspace strategy, provider versioning, and infrastructure-as-code best practices.
swift-patterns
SwiftUI view composition, @Observable patterns, async/await concurrency, TCA architecture, and Combine reactive streams.
springboot-patterns
Spring Boot architecture patterns, REST API design, layered services, data access, caching, async processing, and logging. Use for Java Spring Boot backend work.
seo-patterns
Meta tag patterns, structured data (JSON-LD), Core Web Vitals optimization, and SSR/SSG strategies for search visibility.
secret-patterns
30+ service-specific secret detection regex patterns, entropy-based detection, PEM/JWT/Base64 identification, and false positive filtering.
saas-payment-patterns
Payment provider abstraction, webhook security, subscription lifecycle, dunning flows, pricing models, invoicing, tax handling, and refund patterns for SaaS applications.
saas-auth-patterns
SaaS authentication and authorization patterns including JWT vs session strategies, multi-tenant isolation, RBAC, API key management, passwordless flows, MFA, and secure session handling.
saas-analytics-patterns
SaaS analytics event taxonomy, metric formulas (MRR, churn, LTV), provider-agnostic tracking, funnel analysis, cohort setup, and privacy-respecting instrumentation.
revenuecat-patterns
RevenueCat SDK entegrasyon pattern'leri. iOS (Swift), Android (Kotlin), React Native ve Flutter icin setup, offerings, entitlement checking, webhook integration, StoreKit 2 migration ve sandbox testing.