test-metrics-dashboard

Use when querying test history, analyzing flakiness rates, tracking MTTR, or building quality trend dashboards from test execution data.

Best use case

test-metrics-dashboard is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Use when querying test history, analyzing flakiness rates, tracking MTTR, or building quality trend dashboards from test execution data.

Teams using test-metrics-dashboard should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/test-metrics-dashboard/SKILL.md --create-dirs "https://raw.githubusercontent.com/proffesor-for-testing/agentic-qe/main/.claude/skills/test-metrics-dashboard/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/test-metrics-dashboard/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How test-metrics-dashboard Compares

Feature / Agenttest-metrics-dashboardStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Use when querying test history, analyzing flakiness rates, tracking MTTR, or building quality trend dashboards from test execution data.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Test Metrics Dashboard

Data & Analysis skill for querying test execution history, identifying trends, and surfacing actionable quality metrics.

## Activation

```
/test-metrics-dashboard
```

## Key Metrics

### Test Health Metrics

| Metric | Formula | Target | Alert |
|--------|---------|--------|-------|
| **Pass Rate** | Passed / Total | > 95% | < 90% |
| **Flakiness Rate** | Flaky / Total | < 5% | > 10% |
| **MTTR** | Avg time from failure to fix | < 4 hours | > 24 hours |
| **Execution Time** | Total suite duration | < 10 min | > 20 min |
| **Coverage Delta** | Current - Previous | >= 0% | < -2% |

### Data Collection

```bash
# Export Jest results to JSON
npx jest --json --outputFile=test-results/$(date +%Y-%m-%d).json

# Parse results for dashboard
jq '{
  date: .startTime,
  total: .numTotalTests,
  passed: .numPassedTests,
  failed: .numFailedTests,
  duration_ms: (.testResults | map(.endTime - .startTime) | add),
  pass_rate: ((.numPassedTests / .numTotalTests) * 100),
  flaky: [.testResults[] | select(.numPendingTests > 0)] | length
}' test-results/$(date +%Y-%m-%d).json
```

### Trend Analysis

```bash
# Compare last 5 runs
for f in $(ls -t test-results/*.json | head -5); do
  jq --arg file "$f" '{
    file: $file,
    pass_rate: ((.numPassedTests / .numTotalTests) * 100 | floor),
    duration_s: ((.testResults | map(.endTime - .startTime) | add) / 1000 | floor)
  }' "$f"
done
```

### Top Failing Tests

```bash
# Find most frequently failing tests across runs
for f in test-results/*.json; do
  jq -r '.testResults[] | select(.numFailingTests > 0) | .testFilePath' "$f"
done | sort | uniq -c | sort -rn | head -10
```

## Run History

Store dashboard data in `${CLAUDE_PLUGIN_DATA}/test-metrics.log`:

```
2026-03-18|95.2|4.1|312|82.5|3
```

Format: `date|pass_rate|flakiness_rate|duration_s|coverage_pct|failed_count`

Read history for trend detection:
```bash
# Coverage trending down?
tail -5 "${CLAUDE_PLUGIN_DATA}/test-metrics.log" | awk -F'|' '{print $5}' | sort -n | head -1
```

## Composition

Feeds into:
- **`/qe-quality-assessment`** — quality gate decisions based on metrics
- **`/test-failure-investigator`** — investigate top failing tests
- **`/coverage-drop-investigator`** — when coverage trends down

## Gotchas

- Metrics without baselines are meaningless — establish baselines before tracking trends
- Flakiness rate is underreported — a test that fails 1/100 times still breaks CI weekly
- Duration trends upward over time as test count grows — set alerts on rate of increase, not absolute value
- Agent may report metrics from a single run as "trends" — need 5+ data points for meaningful trends

Related Skills

qe-visual-testing-advanced

298
from proffesor-for-testing/agentic-qe

Advanced visual regression testing with pixel-perfect comparison, AI-powered diff analysis, responsive design validation, and cross-browser visual consistency. Use when detecting UI regressions, validating designs, or ensuring visual consistency.

qe-testability-scoring

298
from proffesor-for-testing/agentic-qe

AI-powered testability assessment using 10 principles of intrinsic testability with Playwright and optional Vibium integration. Evaluates web applications against Observability, Controllability, Algorithmic Simplicity, Transparency, Stability, Explainability, Unbugginess, Smallness, Decomposability, and Similarity. Use when assessing software testability, evaluating test readiness, identifying testability improvements, or generating testability reports.

qe-test-reporting-analytics

298
from proffesor-for-testing/agentic-qe

Advanced test reporting, quality dashboards, predictive analytics, trend analysis, and executive reporting for QE metrics. Use when communicating quality status, tracking trends, or making data-driven decisions.

qe-test-idea-rewriting

298
from proffesor-for-testing/agentic-qe

Transform passive 'Verify X' test descriptions into active, observable test actions. Use when test ideas lack specificity, use vague language, or fail quality validation. Converts to action-verb format for clearer, more testable descriptions.

qe-test-environment-management

298
from proffesor-for-testing/agentic-qe

Test environment provisioning, infrastructure as code for testing, Docker/Kubernetes for test environments, service virtualization, and cost optimization. Use when managing test infrastructure, ensuring environment parity, or optimizing testing costs.

qe-test-design-techniques

298
from proffesor-for-testing/agentic-qe

Systematic test design with boundary value analysis, equivalence partitioning, decision tables, state transition testing, and combinatorial testing. Use when designing comprehensive test cases, reducing redundant tests, or ensuring systematic coverage.

qe-test-data-management

298
from proffesor-for-testing/agentic-qe

Strategic test data generation, management, and privacy compliance. Use when creating test data, handling PII, ensuring GDPR/CCPA compliance, or scaling data generation for realistic testing scenarios.

qe-test-automation-strategy

298
from proffesor-for-testing/agentic-qe

Design and implement effective test automation with proper pyramid, patterns, and CI/CD integration. Use when building automation frameworks or improving test efficiency.

qe-shift-right-testing

298
from proffesor-for-testing/agentic-qe

Testing in production with feature flags, canary deployments, synthetic monitoring, and chaos engineering. Use when implementing production observability or progressive delivery.

qe-shift-left-testing

298
from proffesor-for-testing/agentic-qe

Move testing activities earlier in the development lifecycle to catch defects when they're cheapest to fix. Use when implementing TDD, CI/CD, or early quality practices.

qe-security-visual-testing

298
from proffesor-for-testing/agentic-qe

Security-first visual testing combining URL validation, PII detection, and visual regression with parallel viewport support. Use when testing web applications that handle sensitive data, need visual regression coverage, or require WCAG accessibility compliance.

qe-security-testing

298
from proffesor-for-testing/agentic-qe

Test for security vulnerabilities using OWASP principles. Use when conducting security audits, testing auth, or implementing security practices.