Codex

regression-learning

Improve regression detection over time through cross-task pattern recognition, test prioritization, and historical analysis

104 stars

Best use case

regression-learning is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

It is a strong fit for teams already working in Codex.

Improve regression detection over time through cross-task pattern recognition, test prioritization, and historical analysis

Teams using regression-learning should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/regression-learning/SKILL.md --create-dirs "https://raw.githubusercontent.com/jmagly/aiwg/main/.agents/skills/regression-learning/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/regression-learning/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How regression-learning Compares

Feature / Agentregression-learningStandard Approach
Platform SupportCodexLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Improve regression detection over time through cross-task pattern recognition, test prioritization, and historical analysis

Which AI agents support this skill?

This skill is designed for Codex.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# regression-learning

Cross-task learning skill for improving regression detection over time through pattern recognition, test prioritization, and historical analysis.

## Triggers


Alternate expressions and non-obvious activations (primary phrases are matched automatically from the skill description):

- "learn from regressions" → adaptive pattern recognition
- "improve detection accuracy" → regression ML tuning

## Purpose

This skill integrates with Al's cross-task learning system to continuously improve regression detection by:
- Learning from past regression patterns (REG-XXXX records)
- Identifying high-risk code patterns and hot spots
- Prioritizing tests by regression likelihood
- Building a taxonomy of root causes
- Suggesting fixes based on past resolutions
- Adapting test strategies based on effectiveness

Based on REF-013 MetaGPT's executable feedback pattern, this skill creates a persistent memory that accumulates knowledge across iterations and projects.

## Behavior

When triggered, this skill:

1. **Pattern Recognition**:
   - Analyze historical regression records
   - Identify recurring error patterns
   - Map code patterns to failure types
   - Detect hot spots in codebase
   - Build regression taxonomy

2. **Test Prioritization**:
   - Rank tests by regression detection likelihood
   - Prioritize based on:
     - Historical failure correlation
     - Code change patterns
     - Component risk scores
     - Recent regression trends
   - Generate optimal test execution order

3. **Root Cause Memory**:
   - Store root cause analyses from REG-XXXX
   - Link similar regressions across time
   - Build fix template library
   - Track fix effectiveness
   - Enable cross-project learning

4. **Predictive Analysis**:
   - Predict regression probability for code changes
   - Flag high-risk modifications
   - Suggest additional test coverage
   - Recommend preventive actions

5. **Feedback Loop**:
   - Learn from false positives/negatives
   - Adjust confidence thresholds
   - Incorporate human corrections
   - Track prediction accuracy
   - Improve over time

## Learning Schema

### Learning Record

```yaml
# .aiwg/ralph/learning/regression-patterns.yaml

learning_records:
  - pattern_id: "RP-001"
    pattern_name: "null_access_without_check"
    category: null_undefined_access

    occurrences:
      - regression_id: "REG-0023"
        file: "src/auth/validate.ts"
        root_cause: "Missing null check before property access"
        fix: "Added early return for null/undefined"
        effectiveness: 1.0  # 100% success rate

      - regression_id: "REG-0047"
        file: "src/api/users.ts"
        root_cause: "Null user object accessed"
        fix: "Added null guard clause"
        effectiveness: 1.0

      - regression_id: "REG-0091"
        file: "src/payments/process.ts"
        root_cause: "Transaction object null"
        fix: "Added null check with error throw"
        effectiveness: 1.0

    statistics:
      total_occurrences: 3
      avg_fix_time_hours: 1.5
      recurrence_rate: 0.0  # 0% recurrence after fix
      detection_methods:
        automated_test: 3
        production: 0

    fix_template:
      pattern: |
        // Before: obj.property
        // After:
        if (!obj) {
          throw new Error('Object is null or undefined');
        }
        // Or: if (!obj) return defaultValue;

      applicability_rules:
        - language: typescript
        - pattern: "Cannot read property .* of null"
        - context: function_entry

    confidence: 0.95
    last_updated: "2026-01-28T15:30:00Z"
```

### Hot Spot Tracking

```yaml
# .aiwg/ralph/learning/code-hotspots.yaml

hotspots:
  - file: "src/auth/validate.ts"
    risk_score: 8.5  # 0-10 scale

    regression_history:
      - REG-0023: null_access_without_check
      - REG-0056: type_mismatch
      - REG-0112: missing_validation

    total_regressions: 3
    regressions_per_kloc: 2.1  # Per 1000 lines of code

    recent_changes:
      last_30_days: 8
      last_7_days: 2

    contributors: 3

    risk_factors:
      high_complexity: true
      frequent_changes: true
      multiple_regressions: true
      critical_path: true

    recommended_actions:
      - "Increase test coverage (current: 65%, target: 80%)"
      - "Add null checks at function entry"
      - "Refactor complex validation logic"
      - "Add integration tests"

    test_priority: critical  # Run tests for this file first
```

### Test Effectiveness Tracking

```yaml
# .aiwg/ralph/learning/test-effectiveness.yaml

test_effectiveness:
  - test_path: "test/auth/validate.test.ts"

    regression_detection_rate: 0.85  # 85% of regressions caught

    detected_regressions:
      - REG-0023
      - REG-0056
      - REG-0112

    missed_regressions:
      - REG-0089: "Escaped to production"

    false_positive_rate: 0.02  # 2% false alarms

    execution_stats:
      avg_duration_ms: 450
      success_rate: 0.98
      flakiness_score: 0.01  # Very stable

    priority_score: 9.2  # High value test

    recommendations:
      - "Add edge case: empty string input"
      - "Test concurrent validation requests"
```

## Regression Pattern Taxonomy

### Pattern Categories

| Category | Description | Detection | Fix Template | Effectiveness |
|----------|-------------|-----------|--------------|---------------|
| **null_undefined_access** | Accessing properties of null/undefined | `Cannot read property .* of null` | Add null check | 95% |
| **type_mismatch** | Type coercion or wrong type | `Expected .* but got .*` | Add type validation | 90% |
| **off_by_one** | Array bounds or loop conditions | `Index out of bounds` | Fix boundary conditions | 88% |
| **race_condition** | Timing-dependent failures | `Intermittent failures` | Add synchronization | 75% |
| **missing_validation** | Inputs not validated | `Validation error` | Add input checks | 92% |
| **logic_error** | Incorrect business logic | `Wrong behavior` | Fix logic | 85% |
| **resource_leak** | Resources not released | `Memory leak / handles` | Add cleanup | 80% |
| **api_misuse** | Incorrect API usage | `API error` | Fix API call | 90% |
| **configuration_error** | Wrong config values | `Config mismatch` | Update config | 95% |

## Test Prioritization Algorithm

```yaml
test_prioritization:
  algorithm: weighted_risk_score

  factors:
    code_change_correlation: 0.30      # How often this test catches changes
    historical_regression_detection: 0.25  # Past regression detection rate
    code_hotspot_coverage: 0.20        # Tests high-risk code
    recent_failure_trend: 0.15         # Recently failing pattern
    execution_efficiency: 0.10         # Fast tests preferred

  output:
    - test_priority: critical | high | medium | low
    - execution_order: [test1, test2, ...]
    - skip_recommendations: [low-value tests]
    - expand_coverage: [high-risk areas]
```

### Example Prioritization

```yaml
# After code change to src/auth/validate.ts

prioritized_tests:
  critical:  # Run first
    - test/auth/validate.test.ts
    - test/integration/auth-flow.test.ts
    reason: "High regression detection rate for this file"

  high:  # Run next
    - test/api/login.test.ts
    - test/api/register.test.ts
    reason: "Depends on auth module"

  medium:  # Run if time allows
    - test/unit/utils.test.ts
    reason: "Low correlation with auth changes"

  low:  # Skip in fast-feedback mode
    - test/ui/theme.test.ts
    reason: "No historical correlation"

  skip_in_fast_mode:
    - test/e2e/full-flow.test.ts
    reason: "Slow (5 min), low failure rate for auth changes"
```

## Predictive Regression Analysis

### Risk Prediction

```yaml
# On code change: src/payments/process.ts

regression_prediction:
  file: "src/payments/process.ts"
  change_type: modification
  lines_changed: 45

  risk_analysis:
    overall_risk: HIGH  # critical | high | medium | low
    confidence: 0.82

    risk_factors:
      - factor: "Known hotspot (REG-0034, REG-0067)"
        weight: 0.35
      - factor: "Large change (45 lines)"
        weight: 0.25
      - factor: "Critical payment logic"
        weight: 0.20
      - factor: "Recent regression (REG-0103, 5 days ago)"
        weight: 0.20

    similar_past_changes:
      - commit: "abc123"
        changes: 38 lines
        result: "Regression REG-0067"
        time_to_detect: 2 hours

      - commit: "def456"
        changes: 52 lines
        result: "Clean - no regressions"

    recommendations:
      - "Run full payment test suite (priority: critical)"
      - "Add integration tests for new code paths"
      - "Manual QA recommended before deploy"
      - "Consider pair review with payments expert"
      - "Check for null handling (pattern RP-001)"

    suggested_tests:
      - test/payments/process.test.ts
      - test/integration/payment-flow.test.ts
      - test/regression/REG-0067-refund-handling.test.ts
```

## Integration with Al

### Debug Memory Integration

```yaml
# Al uses this skill to improve across iterations

ralph_integration:
  on_test_failure:
    - query_similar_past_failures
    - retrieve_fix_templates
    - apply_learned_patterns
    - update_pattern_effectiveness

  on_test_success:
    - record_successful_fix
    - increment_fix_effectiveness
    - update_confidence_scores

  on_regression_detected:
    - create_regression_record: REG-XXXX
    - analyze_root_cause
    - store_in_learning_memory
    - update_hotspot_scores
    - adjust_test_priorities

  cross_iteration_learning:
    enabled: true
    memory_path: .aiwg/ralph/learning/
    retention_days: 365
    share_across_projects: true  # Learn from all projects
```

### Learning Storage Structure

```
.aiwg/ralph/learning/
├── regression-patterns.yaml       # Pattern taxonomy
├── code-hotspots.yaml            # Risk scoring
├── test-effectiveness.yaml       # Test value metrics
├── fix-templates/                # Reusable fix patterns
│   ├── null-check.template
│   ├── type-validation.template
│   └── error-handling.template
├── prediction-models/            # ML models (if applicable)
│   └── risk-predictor.json
└── analytics/
    ├── pattern-trends.json       # Pattern frequency over time
    └── effectiveness-metrics.json # Fix success rates
```

## Usage Examples

### Analyze Regression Patterns

```
User: "Analyze regression patterns"

Skill executes:
1. Load all REG-XXXX records from .aiwg/testing/regressions/
2. Group by root cause category
3. Identify recurring patterns
4. Build pattern taxonomy
5. Generate recommendations

Output:
"Regression Pattern Analysis

Total Regressions: 47 (last 90 days)

Top Patterns:
1. null_undefined_access (12 occurrences, 25.5%)
   - Files: validate.ts, users.ts, process.ts
   - Fix template success rate: 95%
   - Avg time to fix: 1.5 hours

2. missing_validation (8 occurrences, 17.0%)
   - Files: api/*.ts
   - Fix template success rate: 92%
   - Avg time to fix: 2.0 hours

3. type_mismatch (6 occurrences, 12.8%)
   - Files: converters/*.ts
   - Fix template success rate: 90%
   - Avg time to fix: 1.0 hours

Code Hotspots:
- src/auth/validate.ts: 8.5 risk score (3 regressions)
- src/api/users.ts: 7.2 risk score (2 regressions)
- src/payments/process.ts: 8.0 risk score (2 regressions)

Recommendations:
- Increase test coverage in auth module (current: 65%)
- Add null checks at function boundaries
- Implement stricter TypeScript settings
- Refactor validate.ts (high complexity)

See full report: .aiwg/ralph/learning/pattern-analysis.md"
```

### Prioritize Tests for Change

```
User: "What tests should I run after changing src/auth/validate.ts?"

Skill analyzes:
"Test Prioritization for src/auth/validate.ts

Based on learning history:

Critical Priority (run first):
✓ test/auth/validate.test.ts
  - Regression detection rate: 85%
  - Detected: REG-0023, REG-0056, REG-0112
  - Duration: 450ms

✓ test/integration/auth-flow.test.ts
  - Regression detection rate: 75%
  - Detected: REG-0089
  - Duration: 2.1s

High Priority:
- test/api/login.test.ts
- test/api/register.test.ts

Recommended Additions:
- Add test for null input handling (pattern RP-001)
- Add test for concurrent requests (gap identified)

Skip in Fast Mode:
- test/ui/theme.test.ts (no correlation)
- test/e2e/full-flow.test.ts (5 min, low value for this change)

Estimated total time:
- Critical tests: 2.6s
- + High priority: 5.2s
- Full suite: 45s"
```

### Predict Regression Risk

```
User: "Predict regression risk for my changes"

Skill analyzes current diff:
"Regression Risk Prediction

Files Changed:
- src/payments/process.ts (+45 lines)
- src/payments/refund.ts (+12 lines)

Risk Assessment: HIGH (confidence: 82%)

Risk Factors:
1. Known hotspot (8.0 risk score)
   - Previous: REG-0034, REG-0067, REG-0103
2. Large change (45 lines in critical module)
3. Recent regression (REG-0103, 5 days ago)
4. Pattern match: Similar to commit abc123 → REG-0067

Predicted Failure Points:
- Null handling in refund flow (pattern RP-001)
- Type validation for amounts (pattern RP-005)
- Error handling for API timeouts (pattern RP-012)

Recommendations:
✓ Run full payment test suite (20 tests, 8.5s)
✓ Add integration test for refund edge cases
✓ Manual QA for payment flows
✓ Consider pair review with @payments-expert

Apply fix templates:
- null-check.template for refund object
- error-handling.template for API calls

Proceed with caution. Consider creating feature branch
for additional testing."
```

### Find Similar Past Failures

```
User: "Similar past failures for this error?"

Current error: "TypeError: Cannot read property 'amount' of undefined"

Skill searches learning memory:
"Similar Past Failures Found

Pattern: null_undefined_access (RP-001)
Occurrences: 12 total

Most Similar:
1. REG-0023: "Cannot read property 'length' of null"
   - File: src/auth/validate.ts
   - Fix: Added null check at function entry
   - Time to fix: 1.2 hours
   - Code:
     ```typescript
     if (!input) {
       throw new Error('Input is null or undefined');
     }
     ```

2. REG-0067: "Cannot read property 'total' of undefined"
   - File: src/payments/process.ts
   - Fix: Added null guard with default value
   - Time to fix: 0.8 hours
   - Code:
     ```typescript
     const total = transaction?.total ?? 0;
     ```

3. REG-0091: "Cannot read property 'id' of null"
   - File: src/api/users.ts
   - Fix: Early return for null case
   - Time to fix: 0.5 hours

Fix Template (95% success rate):
```typescript
// Pattern: obj.property causing error
// Fix: Add null/undefined check

if (!obj) {
  throw new Error('Object is required');
}
// OR
if (!obj) return defaultValue;
// OR
const value = obj?.property ?? defaultValue;
```

Apply this fix? [y/N]"
```

## Cross-Project Learning

### Shared Learning Pool

```yaml
# Enable learning across multiple projects

cross_project_learning:
  enabled: true

  shared_memory:
    location: ~/.aiwg/global-learning/

    shared_patterns:
      - null_undefined_access
      - type_mismatch
      - missing_validation
      - off_by_one
      - race_condition

    project_contributions:
      - project: "ai-writing-guide"
        patterns_contributed: 12
        patterns_consumed: 8

      - project: "payment-gateway"
        patterns_contributed: 15
        patterns_consumed: 12

  privacy:
    anonymize_file_paths: true
    anonymize_business_logic: true
    share_only_patterns: true  # Not sensitive code
```

## Agent Orchestration

```yaml
agents:
  pattern_analysis:
    agent: test-architect
    focus: Identify and categorize regression patterns

  risk_prediction:
    agent: reliability-engineer
    focus: Predict regression likelihood

  test_prioritization:
    agent: test-engineer
    focus: Optimize test execution order

  fix_suggestion:
    agent: software-implementer
    focus: Apply learned fix templates
```

## Configuration

```yaml
# .aiwg/config/regression-learning.yaml

regression_learning:
  enabled: true

  learning_sources:
    - .aiwg/testing/regressions/  # REG-XXXX records
    - .aiwg/ralph/debug-memory/   # Execution history
    - .aiwg/ralph/learning/       # Accumulated patterns

  pattern_detection:
    min_occurrences: 2  # Minimum to consider a pattern
    confidence_threshold: 0.70

  test_prioritization:
    enabled: true
    algorithm: weighted_risk_score

  risk_prediction:
    enabled: true
    confidence_threshold: 0.75

  cross_project_learning:
    enabled: false  # Privacy consideration

  retention:
    regression_records: 365 days
    pattern_library: forever
    effectiveness_metrics: 180 days
```

## Metrics

Track learning effectiveness:

| Metric | Target | Purpose |
|--------|--------|---------|
| pattern_accuracy | >85% | Pattern matching correct |
| fix_template_success_rate | >90% | Fixes work first time |
| prediction_accuracy | >75% | Risk predictions correct |
| test_prioritization_efficiency | >50% | Time saved by prioritization |
| false_positive_rate | <10% | Spurious alerts |
| cross_project_reuse | >30% | Patterns used across projects |

## Output Locations

- Learning memory: `.aiwg/ralph/learning/`
- Pattern analysis reports: `.aiwg/ralph/learning/analytics/`
- Fix templates: `.aiwg/ralph/learning/fix-templates/`
- Global learning: `~/.aiwg/global-learning/` (if enabled)

## References

- @$AIWG_ROOT/agentic/code/addons/ralph/schemas/debug-memory.yaml - Debug memory structure
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/testing/regression.yaml - Regression schema
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/executable-feedback.md - Feedback loop patterns
- @.aiwg/research/findings/REF-013-metagpt.md - MetaGPT executable feedback
- @.aiwg/research/findings/REF-015-self-refine.md - Self-Refine learning patterns
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/skills/regression-baseline/SKILL.md - Baseline management
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/skills/regression-bisect/SKILL.md - Root cause analysis
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/skills/regression-metrics/SKILL.md - Metrics tracking

Related Skills

regression-visual

104
from jmagly/aiwg

Detect visual and UI regressions through screenshot comparison and pixel-diff analysis across browsers and viewports

Codex

regression-report

104
from jmagly/aiwg

Generate comprehensive regression analysis reports combining bisect, baseline, and metrics data with actionable recommendations

Codex

regression-performance

104
from jmagly/aiwg

Detect performance regressions by comparing benchmarks across versions with latency, throughput, and statistical significance analysis

Codex

regression-metrics

104
from jmagly/aiwg

Track and analyze regression statistics, trends, hotspots, and health indicators across test suites

Codex

regression-cicd-hooks

104
from jmagly/aiwg

Integrate regression testing into CI/CD pipelines with baseline comparison and merge blocking on failure

Codex

regression-check

104
from jmagly/aiwg

Compare current behavior against baseline to detect regressions

Codex

regression-bisect

104
from jmagly/aiwg

Identify the commit that introduced a regression using git bisect with automated test execution and blame context

Codex

regression-baseline

104
from jmagly/aiwg

Create and maintain regression test baselines for comparison and drift detection across versioned snapshots

Codex

aiwg-orchestrate

104
from jmagly/aiwg

Route structured artifact work to AIWG workflows via MCP with zero parent context cost

venv-manager

104
from jmagly/aiwg

Create, manage, and validate Python virtual environments. Use for project isolation and dependency management.

pytest-runner

104
from jmagly/aiwg

Execute Python tests with pytest, supporting fixtures, markers, coverage, and parallel execution. Use for Python test automation.

vitest-runner

104
from jmagly/aiwg

Execute JavaScript/TypeScript tests with Vitest, supporting coverage, watch mode, and parallel execution. Use for JS/TS test automation.