Codex

regression-metrics

Track and analyze regression statistics, trends, hotspots, and health indicators across test suites

104 stars

Best use case

regression-metrics is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

It is a strong fit for teams already working in Codex.

Track and analyze regression statistics, trends, hotspots, and health indicators across test suites

Teams using regression-metrics should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/regression-metrics/SKILL.md --create-dirs "https://raw.githubusercontent.com/jmagly/aiwg/main/.agents/skills/regression-metrics/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/regression-metrics/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How regression-metrics Compares

Feature / Agentregression-metricsStandard Approach
Platform SupportCodexLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Track and analyze regression statistics, trends, hotspots, and health indicators across test suites

Which AI agents support this skill?

This skill is designed for Codex.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# regression-metrics

Track and analyze regression statistics, trends, and health indicators.

## Triggers


Alternate expressions and non-obvious activations (primary phrases are matched automatically from the skill description):

- "regression KPIs" → regression metric dashboard
- "flakiness score" → test stability metrics

## Purpose

This skill provides regression analytics by:
- Tracking regression occurrence rates
- Measuring time-to-detection and time-to-fix
- Analyzing regression patterns and hotspots
- Identifying high-risk areas
- Trending regression metrics over time
- Generating regression health dashboards

## Behavior

When triggered, this skill:

1. **Collects regression data**:
   - Parse regression test results
   - Load historical regression records
   - Gather bisect findings
   - Import baseline comparisons
   - Aggregate issue tracker data

2. **Calculates key metrics**:
   - Regression rate (per sprint/release)
   - Mean time to detect (MTTD)
   - Mean time to fix (MTTF)
   - Regression recurrence rate
   - Escape rate (production regressions)

3. **Identifies patterns**:
   - Common root causes
   - High-regression components
   - Time-of-day/sprint patterns
   - Correlation with code changes

4. **Analyzes trends**:
   - Regression rate over time
   - Detection speed improvements
   - Fix time trends
   - Quality trajectory

5. **Generates visualizations**:
   - Regression heatmaps
   - Trend charts
   - Burn-down tracking
   - Risk matrices

6. **Produces actionable insights**:
   - Prioritize high-risk areas
   - Recommend test improvements
   - Suggest process changes
   - Set quality goals

## Key Metrics

### Regression Rate

```yaml
regression_rate:
  description: Number of regressions per time period
  formula: regressions_detected / time_period
  units: regressions per sprint/week/release

  targets:
    excellent: "< 2 per sprint"
    good: "2-5 per sprint"
    acceptable: "5-10 per sprint"
    poor: "> 10 per sprint"

  calculation:
    count: new regressions introduced
    period: sprint, release, or month
    exclude: known issues, flaky tests
```

### Mean Time to Detect (MTTD)

```yaml
mttd:
  description: Average time from regression introduction to detection
  formula: sum(detection_time) / regression_count
  units: hours or days

  targets:
    excellent: "< 4 hours"
    good: "< 24 hours"
    acceptable: "< 7 days"
    poor: "> 7 days"

  calculation:
    detection_time: commit_time_to_failure_report
    includes: automated and manual detection
```

### Mean Time to Fix (MTTF)

```yaml
mttf:
  description: Average time from detection to fix deployment
  formula: sum(fix_time) / regression_count
  units: hours or days

  targets:
    critical: "< 4 hours"
    high: "< 24 hours"
    medium: "< 7 days"
    low: "< 30 days"

  calculation:
    fix_time: detection_to_fix_deployed
    severity_weighted: true
```

### Escape Rate

```yaml
escape_rate:
  description: Percentage of regressions reaching production
  formula: (production_regressions / total_regressions) * 100
  units: percentage

  targets:
    excellent: "< 5%"
    good: "5-10%"
    acceptable: "10-20%"
    poor: "> 20%"

  calculation:
    production_regressions: found by users/monitoring
    total_regressions: all detected including pre-release
```

### Recurrence Rate

```yaml
recurrence_rate:
  description: Percentage of regressions that recur after fix
  formula: (recurring_regressions / total_fixed) * 100
  units: percentage

  targets:
    excellent: "< 5%"
    good: "5-10%"
    acceptable: "10-15%"
    poor: "> 15%"

  indicates:
    - insufficient test coverage
    - lack of regression tests
    - poor fix quality
```

## Metrics Dashboard

```markdown
# Regression Metrics Dashboard

**Period**: Last 30 Days (2025-12-29 to 2026-01-28)
**Project**: User Service

## Executive Summary

| Metric | Current | Target | Status | Trend |
|--------|---------|--------|--------|-------|
| Regression Rate | 4.2/sprint | < 5 | ✅ Good | ↓ Improving |
| MTTD | 8.5 hours | < 24h | ✅ Good | ↓ Improving |
| MTTF | 18.7 hours | < 24h | ⚠️ Close | → Stable |
| Escape Rate | 12% | < 10% | ⚠️ Above Target | ↑ Worsening |
| Recurrence Rate | 7% | < 10% | ✅ Good | → Stable |

**Overall Health**: ⚠️ Good with Concerns
**Priority Focus**: Reduce production escapes

## Regression Trend (Last 6 Sprints)

```
Sprint 8:  ██████████ 10 regressions
Sprint 9:  ████████   8 regressions
Sprint 10: ██████     6 regressions
Sprint 11: █████      5 regressions
Sprint 12: ████       4 regressions
Sprint 13: ████       4 regressions
           ↓ -60% improvement since Sprint 8
```

**Analysis**: Significant improvement trend. Stabilizing around 4-5 per sprint.

## Detection Speed Trend

```
Week 1: 24h ████████████████████████
Week 2: 18h ██████████████████
Week 3: 12h ████████████
Week 4:  9h █████████
Week 5:  8h ████████
        ↓ -67% improvement in 5 weeks
```

**Analysis**: Automation improvements paying off. Most regressions now caught within hours.

## Component Heatmap

Regressions by component (last 30 days):

| Component | Regressions | Change | Risk Level |
|-----------|-------------|--------|------------|
| src/auth/ | 🔴🔴🔴 3 | +1 | High |
| src/api/ | 🟡🟡 2 | 0 | Medium |
| src/db/ | 🟡🟡 2 | -1 | Medium |
| src/user/ | 🟡 1 | -2 | Low |
| src/utils/ | 🟢 0 | 0 | Low |

**Hotspot Alert**: `src/auth/` showing increased regression rate

## Root Cause Analysis

| Root Cause | Count | % | Trend |
|------------|-------|---|-------|
| Missing test coverage | 5 | 42% | → |
| Integration not tested | 3 | 25% | ↑ |
| Edge case not considered | 2 | 17% | ↓ |
| Flaky test masking issue | 1 | 8% | → |
| Breaking dependency change | 1 | 8% | → |

**Insight**: 67% of regressions preventable with better coverage/integration testing

## Severity Distribution

| Severity | Count | MTTF | Status |
|----------|-------|------|--------|
| Critical | 1 | 3.2h | ✅ Fast response |
| High | 4 | 12.5h | ✅ Within target |
| Medium | 6 | 28.4h | ⚠️ Above target |
| Low | 1 | 72h | ✅ Acceptable |

## Time-to-Detection Analysis

```
Detection Method:
  Automated Tests: 75% (avg 4.2h detection)
  Manual Testing:  17% (avg 32h detection)
  Production:       8% (avg 96h detection)
```

**Insight**: Automation catching most issues early. Need to reduce production escapes.

## Time-to-Fix Analysis

```
Fix Duration by Severity:
  Critical: ▓▓▓ 3.2h (target: 4h) ✅
  High:     ▓▓▓▓▓▓ 12.5h (target: 24h) ✅
  Medium:   ▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 28.4h (target: 24h) ⚠️
  Low:      ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 72h ✅
```

**Issue**: Medium-severity regressions taking slightly longer than target

## Regression Recurrence

| Original Issue | Recurred | Reason |
|----------------|----------|--------|
| AUTH-101 | ✅ Yes | Missing regression test |
| API-205 | ❌ No | Regression test added |
| DB-089 | ❌ No | Regression test added |
| USER-145 | ❌ No | Regression test added |

**Recurrence Rate**: 25% (1 of 4) - One regression lacked test

## Production Escapes

Regressions that reached production:

| Issue | Severity | Detection | Impact | MTTD |
|-------|----------|-----------|--------|------|
| AUTH-203 | High | User report | 500 users | 12h |

**Analysis**: 1 escape this period. Auth module regression bypassed staging tests.

## Recommendations

### High Priority

1. **Add integration tests for auth flows**
   - Reason: 3 regressions in auth, 1 production escape
   - Impact: Reduce auth regressions by ~60%
   - Effort: 2 days

2. **Improve staging test coverage**
   - Reason: Production escape indicates gap
   - Impact: Reduce escape rate to <5%
   - Effort: 1 week

3. **Reduce medium-severity MTTF**
   - Reason: 28.4h vs 24h target
   - Impact: Faster user impact resolution
   - Effort: Process improvement

### Medium Priority

4. **Add regression tests for all fixes**
   - Reason: 25% recurrence rate on fixes without tests
   - Impact: Zero recurrence for tested fixes
   - Effort: Ongoing discipline

5. **Monitor auth module closely**
   - Reason: Highest regression count
   - Impact: Early detection of issues
   - Effort: Weekly review

## Historical Comparison

| Period | Reg Rate | MTTD | MTTF | Escape % |
|--------|----------|------|------|----------|
| 3 months ago | 8.2 | 36h | 48h | 18% |
| 2 months ago | 6.5 | 24h | 36h | 15% |
| 1 month ago | 5.1 | 12h | 24h | 13% |
| Current | 4.2 | 8.5h | 18.7h | 12% |

**Trend**: All metrics improving. Regression rate down 49%, detection 76% faster.

## Goals for Next Period

| Metric | Current | Goal | Strategy |
|--------|---------|------|----------|
| Regression Rate | 4.2 | < 4 | Improve auth testing |
| MTTD | 8.5h | < 8h | Add more automation |
| MTTF | 18.7h | < 18h | Faster review process |
| Escape Rate | 12% | < 10% | Better staging tests |

## Data Sources

- Regression tests: `.aiwg/testing/regression-results/`
- Bisect reports: `.aiwg/testing/regression-bisect-*/`
- Baseline comparisons: `.aiwg/testing/baseline-comparisons/`
- Issue tracker: GitHub Issues (label: regression)
- CI/CD logs: GitHub Actions
```

## Usage Examples

### View Current Metrics

```
User: "Show regression metrics"

Skill executes:
1. Aggregate data from last 30 days
2. Calculate key metrics
3. Generate dashboard
4. Identify trends

Output:
"Regression Metrics (Last 30 Days)

Overall Health: ⚠️ Good with Concerns

Key Metrics:
- Regression Rate: 4.2/sprint ✅ (target < 5)
- MTTD: 8.5 hours ✅ (target < 24h)
- MTTF: 18.7 hours ⚠️ (target < 24h)
- Escape Rate: 12% ⚠️ (target < 10%)

Hotspots:
🔴 src/auth/ - 3 regressions this period
🟡 src/api/ - 2 regressions

Top Recommendation: Add integration tests for auth

Full dashboard: .aiwg/testing/regression-metrics-dashboard.md"
```

### Regression Trends

```
User: "Regression trends over time"

Skill analyzes:
- Last 6 sprints of data
- Calculate trend direction
- Identify patterns

Output:
"Regression Trends (Last 6 Sprints)

Sprint 8:  10 regressions
Sprint 9:   8 regressions (-20%)
Sprint 10:  6 regressions (-25%)
Sprint 11:  5 regressions (-17%)
Sprint 12:  4 regressions (-20%)
Sprint 13:  4 regressions (stable)

Overall: ↓ -60% improvement
Status: Stabilizing around 4-5/sprint

MTTD: 36h → 8.5h (-76%)
MTTF: 48h → 18.7h (-61%)

Conclusion: Strong improvement trend. Approaching best-in-class levels."
```

### Component Heatmap

```
User: "Which components have most regressions?"

Skill generates:
"Component Regression Heatmap (Last 30 Days)

High Risk:
🔴 src/auth/ - 3 regressions (+1 from last period)
   Most common: Missing integration tests

Medium Risk:
🟡 src/api/ - 2 regressions (no change)
🟡 src/db/ - 2 regressions (-1 from last period)

Low Risk:
🟢 src/user/ - 1 regression (-2 from last period)
🟢 src/utils/ - 0 regressions

Recommendation: Focus testing efforts on auth module"
```

## Integration

This skill uses:
- `regression-bisect`: Import bisect findings
- `regression-baseline`: Analyze baseline drift patterns
- `test-coverage`: Correlate coverage with regression rates
- `project-awareness`: Detect sprint/release boundaries

## Agent Orchestration

```yaml
agents:
  analysis:
    agent: metrics-analyst
    focus: Statistical analysis and trends

  visualization:
    agent: technical-writer
    focus: Dashboard and report generation

  recommendations:
    agent: test-architect
    focus: Process improvement suggestions
```

## Configuration

### Metric Collection

```yaml
collection_config:
  data_sources:
    - regression_test_results
    - bisect_reports
    - baseline_comparisons
    - issue_tracker
    - ci_cd_logs

  update_frequency: daily
  retention: 90 days
  aggregation: sprint, week, month
```

### Thresholds

```yaml
thresholds:
  regression_rate:
    excellent: 2
    good: 5
    acceptable: 10

  mttd_hours:
    excellent: 4
    good: 24
    acceptable: 168  # 7 days

  mttf_hours:
    critical: 4
    high: 24
    medium: 168  # 7 days

  escape_rate_percent:
    excellent: 5
    good: 10
    acceptable: 20
```

### Alert Rules

```yaml
alerts:
  regression_spike:
    condition: regression_rate > 10
    severity: high
    notification: team-channel

  escape_rate_high:
    condition: escape_rate > 20%
    severity: critical
    notification: leadership

  mttd_degrading:
    condition: mttd_trend_increase > 50%
    severity: medium
    notification: test-team
```

## Output Locations

- Dashboards: `.aiwg/testing/regression-metrics-dashboard.md`
- Trends: `.aiwg/testing/regression-trends.json`
- Heatmaps: `.aiwg/testing/regression-heatmap.json`
- Historical data: `.aiwg/testing/metrics-history/`

## References

- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/metrics/regression-metrics-schema.yaml
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/agents/metrics-analyst.md
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/commands/metrics-dashboard.md

Related Skills

regression-visual

104
from jmagly/aiwg

Detect visual and UI regressions through screenshot comparison and pixel-diff analysis across browsers and viewports

Codex

regression-report

104
from jmagly/aiwg

Generate comprehensive regression analysis reports combining bisect, baseline, and metrics data with actionable recommendations

Codex

regression-performance

104
from jmagly/aiwg

Detect performance regressions by comparing benchmarks across versions with latency, throughput, and statistical significance analysis

Codex

regression-learning

104
from jmagly/aiwg

Improve regression detection over time through cross-task pattern recognition, test prioritization, and historical analysis

Codex

regression-cicd-hooks

104
from jmagly/aiwg

Integrate regression testing into CI/CD pipelines with baseline comparison and merge blocking on failure

Codex

regression-check

104
from jmagly/aiwg

Compare current behavior against baseline to detect regressions

Codex

regression-bisect

104
from jmagly/aiwg

Identify the commit that introduced a regression using git bisect with automated test execution and blame context

Codex

regression-baseline

104
from jmagly/aiwg

Create and maintain regression test baselines for comparison and drift detection across versioned snapshots

Codex

metrics-tokens

104
from jmagly/aiwg

Analyze token usage efficiency against the MetaGPT baseline and surface per-step optimization opportunities

Codex

aiwg-orchestrate

104
from jmagly/aiwg

Route structured artifact work to AIWG workflows via MCP with zero parent context cost

venv-manager

104
from jmagly/aiwg

Create, manage, and validate Python virtual environments. Use for project isolation and dependency management.

pytest-runner

104
from jmagly/aiwg

Execute Python tests with pytest, supporting fixtures, markers, coverage, and parallel execution. Use for Python test automation.