grafana-dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational ...
Best use case
grafana-dashboards is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational ...
Teams using grafana-dashboards should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/grafana-dashboards/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How grafana-dashboards Compares
| Feature / Agent | grafana-dashboards | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational ...
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Grafana Dashboards
Create and manage production-ready Grafana dashboards for comprehensive system observability.
## Do not use this skill when
- The task is unrelated to grafana dashboards
- You need a different domain or tool outside this scope
## Instructions
- Clarify goals, constraints, and required inputs.
- Apply relevant best practices and validate outcomes.
- Provide actionable steps and verification.
- If detailed examples are required, open `resources/implementation-playbook.md`.
## Purpose
Design effective Grafana dashboards for monitoring applications, infrastructure, and business metrics.
## Use this skill when
- Visualize Prometheus metrics
- Create custom dashboards
- Implement SLO dashboards
- Monitor infrastructure
- Track business KPIs
## Dashboard Design Principles
### 1. Hierarchy of Information
```
┌─────────────────────────────────────┐
│ Critical Metrics (Big Numbers) │
├─────────────────────────────────────┤
│ Key Trends (Time Series) │
├─────────────────────────────────────┤
│ Detailed Metrics (Tables/Heatmaps) │
└─────────────────────────────────────┘
```
### 2. RED Method (Services)
- **Rate** - Requests per second
- **Errors** - Error rate
- **Duration** - Latency/response time
### 3. USE Method (Resources)
- **Utilization** - % time resource is busy
- **Saturation** - Queue length/wait time
- **Errors** - Error count
## Dashboard Structure
### API Monitoring Dashboard
```json
{
"dashboard": {
"title": "API Monitoring",
"tags": ["api", "production"],
"timezone": "browser",
"refresh": "30s",
"panels": [
{
"title": "Request Rate",
"type": "graph",
"targets": [
{
"expr": "sum(rate(http_requests_total[5m])) by (service)",
"legendFormat": "{{service}}"
}
],
"gridPos": {"x": 0, "y": 0, "w": 12, "h": 8}
},
{
"title": "Error Rate %",
"type": "graph",
"targets": [
{
"expr": "(sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m]))) * 100",
"legendFormat": "Error Rate"
}
],
"alert": {
"conditions": [
{
"evaluator": {"params": [5], "type": "gt"},
"operator": {"type": "and"},
"query": {"params": ["A", "5m", "now"]},
"type": "query"
}
]
},
"gridPos": {"x": 12, "y": 0, "w": 12, "h": 8}
},
{
"title": "P95 Latency",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))",
"legendFormat": "{{service}}"
}
],
"gridPos": {"x": 0, "y": 8, "w": 24, "h": 8}
}
]
}
}
```
**Reference:** See `assets/api-dashboard.json`
## Panel Types
### 1. Stat Panel (Single Value)
```json
{
"type": "stat",
"title": "Total Requests",
"targets": [{
"expr": "sum(http_requests_total)"
}],
"options": {
"reduceOptions": {
"values": false,
"calcs": ["lastNotNull"]
},
"orientation": "auto",
"textMode": "auto",
"colorMode": "value"
},
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{"value": 0, "color": "green"},
{"value": 80, "color": "yellow"},
{"value": 90, "color": "red"}
]
}
}
}
}
```
### 2. Time Series Graph
```json
{
"type": "graph",
"title": "CPU Usage",
"targets": [{
"expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)"
}],
"yaxes": [
{"format": "percent", "max": 100, "min": 0},
{"format": "short"}
]
}
```
### 3. Table Panel
```json
{
"type": "table",
"title": "Service Status",
"targets": [{
"expr": "up",
"format": "table",
"instant": true
}],
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": {"Time": true},
"indexByName": {},
"renameByName": {
"instance": "Instance",
"job": "Service",
"Value": "Status"
}
}
}
]
}
```
### 4. Heatmap
```json
{
"type": "heatmap",
"title": "Latency Heatmap",
"targets": [{
"expr": "sum(rate(http_request_duration_seconds_bucket[5m])) by (le)",
"format": "heatmap"
}],
"dataFormat": "tsbuckets",
"yAxis": {
"format": "s"
}
}
```
## Variables
### Query Variables
```json
{
"templating": {
"list": [
{
"name": "namespace",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(kube_pod_info, namespace)",
"refresh": 1,
"multi": false
},
{
"name": "service",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(kube_service_info{namespace=\"$namespace\"}, service)",
"refresh": 1,
"multi": true
}
]
}
}
```
### Use Variables in Queries
```
sum(rate(http_requests_total{namespace="$namespace", service=~"$service"}[5m]))
```
## Alerts in Dashboards
```json
{
"alert": {
"name": "High Error Rate",
"conditions": [
{
"evaluator": {
"params": [5],
"type": "gt"
},
"operator": {"type": "and"},
"query": {
"params": ["A", "5m", "now"]
},
"reducer": {"type": "avg"},
"type": "query"
}
],
"executionErrorState": "alerting",
"for": "5m",
"frequency": "1m",
"message": "Error rate is above 5%",
"noDataState": "no_data",
"notifications": [
{"uid": "slack-channel"}
]
}
}
```
## Dashboard Provisioning
**dashboards.yml:**
```yaml
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: 'General'
type: file
disableDeletion: false
updateIntervalSeconds: 10
allowUiUpdates: true
options:
path: /etc/grafana/dashboards
```
## Common Dashboard Patterns
### Infrastructure Dashboard
**Key Panels:**
- CPU utilization per node
- Memory usage per node
- Disk I/O
- Network traffic
- Pod count by namespace
- Node status
**Reference:** See `assets/infrastructure-dashboard.json`
### Database Dashboard
**Key Panels:**
- Queries per second
- Connection pool usage
- Query latency (P50, P95, P99)
- Active connections
- Database size
- Replication lag
- Slow queries
**Reference:** See `assets/database-dashboard.json`
### Application Dashboard
**Key Panels:**
- Request rate
- Error rate
- Response time (percentiles)
- Active users/sessions
- Cache hit rate
- Queue length
## Best Practices
1. **Start with templates** (Grafana community dashboards)
2. **Use consistent naming** for panels and variables
3. **Group related metrics** in rows
4. **Set appropriate time ranges** (default: Last 6 hours)
5. **Use variables** for flexibility
6. **Add panel descriptions** for context
7. **Configure units** correctly
8. **Set meaningful thresholds** for colors
9. **Use consistent colors** across dashboards
10. **Test with different time ranges**
## Dashboard as Code
### Terraform Provisioning
```hcl
resource "grafana_dashboard" "api_monitoring" {
config_json = file("${path.module}/dashboards/api-monitoring.json")
folder = grafana_folder.monitoring.id
}
resource "grafana_folder" "monitoring" {
title = "Production Monitoring"
}
```
### Ansible Provisioning
```yaml
- name: Deploy Grafana dashboards
copy:
src: "{{ item }}"
dest: /etc/grafana/dashboards/
with_fileglob:
- "dashboards/*.json"
notify: restart grafana
```
## Reference Files
- `assets/api-dashboard.json` - API monitoring dashboard
- `assets/infrastructure-dashboard.json` - Infrastructure dashboard
- `assets/database-dashboard.json` - Database monitoring dashboard
- `references/dashboard-design.md` - Dashboard design guide
## Related Skills
- `prometheus-configuration` - For metric collection
- `slo-implementation` - For SLO dashboardsRelated Skills
semgrep-rule-variant-creator
Creates language variants of existing Semgrep rules. Use when porting a Semgrep rule to specified target languages. Takes an existing rule and target languages as input, produces independent rule+test directories for each language.
searchnews
当用户要求"搜索新闻"、"查询AI新闻"、"整理新闻"、"获取某天的新闻",或提到需要搜索、整理、汇总指定日期的AI行业新闻时,应使用此技能。
search-specialist
Expert web researcher using advanced search techniques and
scorecard-marketing
Build quiz and assessment funnels that generate qualified leads at 30-50% conversion. Use when the user mentions "lead magnet", "quiz funnel", "assessment tool", "lead generation", or "score-based segmentation". Covers question design, dynamic results by tier, and automated follow-up sequences. For landing page conversion, see cro-methodology. For full marketing plans, see one-page-marketing.
scikit-learn
Machine learning in Python with scikit-learn. Use when working with supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), model evaluation, hyperparameter tuning, preprocessing, or building ML pipelines. Provides comprehensive reference documentation for algorithms, preprocessing techniques, pipelines, and best practices.
scholar-evaluation
Systematically evaluate scholarly work using the ScholarEval framework, providing structured assessment across research quality dimensions including problem formulation, methodology, analysis, and writing with quantitative scoring and actionable feedback.
sarif-parsing
Parses and processes SARIF files from static analysis tools like CodeQL, Semgrep, or other scanners. Triggers on "parse sarif", "read scan results", "aggregate findings", "deduplicate alerts", or "process sarif output". Handles filtering, deduplication, format conversion, and CI/CD integration of SARIF data. Does NOT run scans — use the Semgrep or CodeQL skills for that.
kaizen:root-cause-tracing
Use when errors occur deep in execution and you need to trace back to find the original trigger - systematically traces bugs backward through call stack, adding instrumentation when needed, to identify source of invalid data or incorrect behavior
rice
RICE prioritization scoring initiatives by Reach, Impact, Confidence, and Effort. Use for feature prioritization, roadmap planning, or when comparing initiatives objectively.
retro
Start-Stop-Continue retrospective identifying what to Start doing, Stop doing, and Continue doing. Use for sprint retros, personal reflection, team process reviews, or habit audits.
fpf:reset
Reset the FPF reasoning cycle to start fresh
research
Conduct preliminary research on a topic and generate research outline. For academic research, benchmark research, technology selection, etc.