grafana-platform-dashboard
Design, refactor, and validate Grafana dashboards for OpenShift/Kubernetes platform operations. Use when users ask to improve platform health dashboards, prioritize critical tenant-impacting signals, filter noise (for example ArgoCD), add Crossplane/Keycloak health panels, validate PromQL programmatically, or apply GrafanaDashboard CR changes live then promote to GitOps.
Best use case
grafana-platform-dashboard is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Design, refactor, and validate Grafana dashboards for OpenShift/Kubernetes platform operations. Use when users ask to improve platform health dashboards, prioritize critical tenant-impacting signals, filter noise (for example ArgoCD), add Crossplane/Keycloak health panels, validate PromQL programmatically, or apply GrafanaDashboard CR changes live then promote to GitOps.
Teams using grafana-platform-dashboard should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/grafana-platform-dashboard/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How grafana-platform-dashboard Compares
| Feature / Agent | grafana-platform-dashboard | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Design, refactor, and validate Grafana dashboards for OpenShift/Kubernetes platform operations. Use when users ask to improve platform health dashboards, prioritize critical tenant-impacting signals, filter noise (for example ArgoCD), add Crossplane/Keycloak health panels, validate PromQL programmatically, or apply GrafanaDashboard CR changes live then promote to GitOps.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Grafana Platform Dashboard
Design platform operations dashboards so operators see tenant-impacting risk first, then drill into service-specific health without overload.
## Quick Start
Use this skill when the user asks for platform dashboard updates and reliability checks.
1. Confirm dashboard target:
```bash
oc --context <ctx> get grafanadashboard -A | rg -i '<dashboard-name-or-theme>'
```
2. Export dashboard and JSON:
```bash
skills/grafana-platform-dashboard/scripts/grafanadashboard_roundtrip.sh export \
--context <ctx> \
--namespace <ns> \
--name <grafanadashboard-name> \
--out-dir /tmp/<workspace>
```
3. Edit the JSON and validate all PromQL:
```bash
skills/grafana-platform-dashboard/scripts/promql_scan_thanos.sh \
--context <ctx> \
--dashboard-json /tmp/<workspace>/<name>.json
```
4. Apply live safely:
```bash
skills/grafana-platform-dashboard/scripts/grafanadashboard_roundtrip.sh apply \
--context <ctx> \
--namespace <ns> \
--name <grafanadashboard-name> \
--json /tmp/<workspace>/<name>.json
```
## Workflow
### 1) Lock Scope From Platform Contracts
Use the platform contract in [platform-contract.md](references/platform-contract.md) before editing panels.
1. Keep L1 command view constrained to critical pre-tenant-impact signals.
2. Use gate-aligned components first (critical CO gate, nodes, MCP, core API/etcd/ingress).
3. Keep service-specific sections (Crossplane, Keycloak) below L1.
### 2) Enforce Information Architecture
Use [layout-guidelines.md](references/layout-guidelines.md):
1. L1: critical-only, immediate action, minimal panel budget.
2. L2: platform services by dependency domain.
3. L3: deep dives (for example future GPU dashboard), not in L1.
### 3) Build Queries From Known Library
Use [promql-library.md](references/promql-library.md):
1. Start from known-good queries and adapt labels minimally.
2. Prefer counts and action tables over decorative charts.
3. Filter alert noise explicitly (for example ArgoCD/GitOps) when requested.
### 4) Validate Before Apply
Always run the scan script after edits:
```bash
skills/grafana-platform-dashboard/scripts/promql_scan_thanos.sh \
--context <ctx> \
--dashboard-json <file.json> \
--output <scan.tsv>
```
Pass criteria: all queries report `success`, zero bad/parse errors.
### 5) Apply and Verify Sync
Apply only after validation succeeds:
```bash
skills/grafana-platform-dashboard/scripts/grafanadashboard_roundtrip.sh apply ...
oc --context <ctx> -n <ns> get grafanadashboard <name> \
-o jsonpath='{.status.conditions[?(@.type=="DashboardSynchronized")].status}{"|"}{.status.conditions[?(@.type=="DashboardSynchronized")].reason}{"\n"}'
```
### 6) Close With Operator-Focused Summary
Report:
1. What changed (panel names and intent).
2. Validation result (query count and failures).
3. Sync status and any residual risk.
4. Next step: promote live changes into GitOps-managed source.
## Design Rules
1. Put critical tenant-impact predictors first.
2. Every red panel must imply an action path.
3. Avoid ambiguous panel names (for example replace “platform pods” with concrete namespace scope).
4. Keep L1 low-noise; move detail below or to dedicated dashboards.
5. Keep GPU deep diagnostics in a dedicated GPU dashboard, not mixed into L1.
## References
1. [Platform Contract](references/platform-contract.md)
2. [PromQL Panel Library](references/promql-library.md)
3. [Layout Guidelines](references/layout-guidelines.md)
## Local Resources
### references/
- [references/layout-guidelines.md](references/layout-guidelines.md)
- [references/platform-contract.md](references/platform-contract.md)
- [references/promql-library.md](references/promql-library.md)
### scripts/
- `scripts/grafanadashboard_roundtrip.sh`
- `scripts/promql_scan_thanos.sh`
- `scripts/validate.sh`Related Skills
vibe
Comprehensive code validation. Runs complexity analysis then multi-model council. Answer: Is this code ready to ship? Triggers: "vibe", "validate code", "check code", "review code", "code quality", "is this ready".
validation
Full validation phase orchestrator. Vibe + post-mortem + retro + forge. Reviews implementation quality, extracts learnings, feeds the knowledge flywheel. Triggers: "validation", "validate", "validate work", "review and learn", "validation phase", "post-implementation review".
update
Reinstall all AgentOps skills globally from the latest source. Triggers: "update skills", "reinstall skills", "sync skills".
trace
Trace design decisions and concepts through session history, handoffs, and git. Triggers: "trace decision", "how did we decide", "where did this come from", "design provenance", "decision history".
test
Test generation, coverage analysis, and TDD workflow. Triggers: "test", "generate tests", "test coverage", "write tests", "tdd", "add tests", "test strategy", "missing tests", "coverage gaps".
status
Single-screen dashboard showing current work, recent validations, flywheel health, and suggested next action. Triggers: "status", "dashboard", "what am I working on", "where was I".
standards
Language-specific coding standards and validation rules. Provides Python, Go, Rust, TypeScript, Shell, YAML, JSON, and Markdown standards. Auto-loaded by $vibe, $implement, $doc, $bug-hunt, $complexity based on file types.
shared
Shared reference documents for multi-agent skills (not directly invocable)
security
Continuous repository security scanning and release gating. Triggers: "security scan", "security audit", "pre-release security", "run scanners", "check vulnerabilities".
security-suite
Composable security suite for binary and prompt-surface assurance, static analysis, dynamic tracing, repo-native redteam scans, contract capture, baseline drift, and policy gating. Triggers: "binary security", "reverse engineer binary", "black-box binary test", "behavioral trace", "baseline diff", "prompt redteam", "security suite".
scenario
Author and manage holdout scenarios for behavioral validation. Scenarios are stored in .agents/holdout/ where implementing agents cannot see them. Triggers: "$scenario", "holdout", "behavioral scenario", "create scenario", "list scenarios".
scaffold
Project scaffolding, component generation, and boilerplate setup. Triggers: "scaffold", "new project", "init project", "create project", "generate component", "setup project", "starter", "boilerplate".