AI Agent Skill HUB

Codex

regression-check

Compare current behavior against baseline to detect regressions

104 stars

View on GitHub Installation ↓

Best use case

regression-check is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

It is a strong fit for teams already working in Codex.

Compare current behavior against baseline to detect regressions

Teams using regression-check should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/regression-check/SKILL.md --create-dirs "https://raw.githubusercontent.com/jmagly/aiwg/main/.agents/skills/regression-check/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/regression-check/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How regression-check Compares

Feature / Agent	regression-check	Standard Approach
Platform Support	Codex	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Compare current behavior against baseline to detect regressions

Which AI agents support this skill?

This skill is designed for Codex.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Cursor vs Codex for AI Workflows

Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

SKILL.md Source

# Regression Check Command

Compare current system behavior against a baseline to detect regressions using executable feedback patterns.

## Task

I'll perform comprehensive regression detection by:

1. Establishing baseline comparison point (branch, tag, commit, or timestamp)
2. Identifying scope of changes to test
3. Executing relevant test suites with baseline comparison
4. Analyzing execution feedback for behavioral changes
5. Reporting detected regressions with severity and impact
6. Optionally correlating with bug reports

## Arguments

- `--baseline <ref>` - Comparison baseline (branch name, git tag, commit hash, or ISO timestamp)
  - Examples: `--baseline main`, `--baseline v2026.1.4`, `--baseline abc123`, `--baseline 2026-01-20T10:00:00Z`
  - Default: `main` branch or most recent release tag

- `--scope <level>` - Testing scope
  - `full` - Run complete test suite (default)
  - `module <name>` - Test specific module/component
  - `changed-files` - Test only files changed since baseline
  - Example: `--scope module auth`

- `--format <type>` - Output format
  - `summary` - High-level regression report (default)
  - `detailed` - Full execution logs with diffs
  - `json` - Machine-readable format for CI/CD integration

- `--bug-report <path>` - Correlate with bug report
  - Verifies bug fix doesn't introduce regressions
  - Cross-references test failures with reported issues
  - Example: `--bug-report .aiwg/bugs/BUG-042.md`

- `--save-baseline` - Save current execution as new baseline
  - Useful after confirming changes are intentional
  - Updates regression detection baseline for future runs

## Process

### Step 1: Baseline Establishment

```bash
# Determine baseline reference
BASELINE_REF="${--baseline:-main}"

# Checkout baseline (in temporary worktree)
git worktree add /tmp/baseline-check "$BASELINE_REF"

# Capture baseline test results
cd /tmp/baseline-check
npm test -- --json > /tmp/baseline-results.json
```

### Step 2: Scope Identification

**Full Scope**:
- Run entire test suite in both baseline and current
- Compare all test outcomes

**Module Scope**:
- Identify module-specific tests
- Run subset of test suite
- Focus regression detection on specified component

**Changed Files Scope**:
```bash
# Identify changed files
git diff --name-only "$BASELINE_REF" > /tmp/changed-files.txt

# Map to test files
# Example: src/auth/login.ts → test/unit/auth/login.test.ts

# Run only affected tests
```

### Step 3: Execution Comparison

Execute tests in current state:
```bash
# Run tests with same configuration as baseline
npm test -- --json > /tmp/current-results.json

# Capture additional metrics
# - Performance timing
# - Code coverage deltas
# - Error types and frequencies
```

### Step 4: Regression Analysis

Compare execution results:

```yaml
regression_detection:
  new_failures:
    - test: "test/unit/auth/login.test.ts::validateCredentials"
      baseline_status: PASS
      current_status: FAIL
      error: "Expected 200, received 401"
      severity: CRITICAL

  behavior_changes:
    - test: "test/integration/api/users.test.ts::createUser"
      baseline_output: "User created in 150ms"
      current_output: "User created in 450ms"
      delta: +200%
      severity: MAJOR

  fixed_tests:
    - test: "test/unit/utils/parser.test.ts::edgeCase"
      baseline_status: FAIL
      current_status: PASS
      note: "Intentional fix"
```

### Step 5: Reporting

**Summary Format**:
```markdown
## Regression Check Report

**Baseline**: main (commit abc123)
**Scope**: full
**Date**: 2026-01-25T15:30:00Z

### Summary

- **New Failures**: 3 tests
- **Behavior Changes**: 5 tests (performance degradation)
- **Fixed Tests**: 2 tests
- **Overall**: ⚠️ REGRESSIONS DETECTED

### Critical Regressions

1. **auth/login.test.ts::validateCredentials**
   - Status: PASS → FAIL
   - Error: "Expected 200, received 401"
   - Impact: Breaks user authentication
   - Action: BLOCK MERGE

2. **payments/process.test.ts::chargeCard**
   - Status: PASS → FAIL
   - Error: "Transaction timeout"
   - Impact: Payment processing broken
   - Action: BLOCK MERGE

### Behavior Changes

1. **api/users.test.ts::createUser**
   - Performance: 150ms → 450ms (+200%)
   - Severity: MAJOR
   - Action: INVESTIGATE

[... detailed report continues ...]
```

**JSON Format**:
```json
{
  "baseline": {
    "ref": "main",
    "commit": "abc123",
    "timestamp": "2026-01-20T10:00:00Z"
  },
  "current": {
    "commit": "def456",
    "timestamp": "2026-01-25T15:30:00Z"
  },
  "summary": {
    "new_failures": 3,
    "behavior_changes": 5,
    "fixed_tests": 2,
    "total_tests": 247
  },
  "regressions": [
    {
      "test": "auth/login.test.ts::validateCredentials",
      "type": "failure",
      "severity": "critical",
      "baseline_status": "PASS",
      "current_status": "FAIL",
      "error": "Expected 200, received 401",
      "action": "block"
    }
  ],
  "verdict": "FAIL"
}
```

## Bug Report Integration

When `--bug-report` is provided:

```bash
# Load bug report
BUG_REPORT=$(cat .aiwg/bugs/BUG-042.md)

# Extract:
# - Bug description
# - Affected components
# - Expected fix behavior

# Cross-reference regression results:
# 1. Did fix resolve reported bug? (test now passes)
# 2. Did fix introduce new failures? (regressions)
# 3. Did fix change behavior elsewhere? (side effects)

# Generate correlation report:
```

**Example Output**:
```markdown
## Bug Fix Verification: BUG-042

**Bug**: Login fails for users with special characters in email

**Fix Verification**: ✅ PASS
- Test `auth/login.test.ts::specialCharactersInEmail` now passes
- Previously failing with "Email validation error"

**Regression Check**: ⚠️ WARNING
- **New failure detected**: `auth/login.test.ts::standardLogin`
  - Error: "Unexpected character escape"
  - Likely caused by overly aggressive input sanitization
  - **Action**: Refine fix to avoid breaking standard login flow

**Recommendation**: REFINE FIX before merge
```

## Baseline Modes

### Branch Baseline

```bash
# Compare against branch
/regression-check --baseline develop --scope changed-files

# Use case: Feature branch testing
# Ensures changes don't break existing functionality
```

### Tag Baseline

```bash
# Compare against release
/regression-check --baseline v2026.1.4 --format detailed

# Use case: Release regression testing
# Verifies new version doesn't break previous release behavior
```

### Commit Baseline

```bash
# Compare against specific commit
/regression-check --baseline abc123def --scope module auth

# Use case: Pinpoint regression introduction
# Identifies exact commit that introduced regression
```

### Timestamp Baseline

```bash
# Compare against point in time
/regression-check --baseline 2026-01-20T10:00:00Z

# Use case: Time-based regression detection
# Finds when behavior changed over time
```

## Executable Feedback Pattern

Implements REF-013 MetaGPT executable feedback loop:

```yaml
executable_feedback_loop:
  1_execute_baseline:
    - checkout_baseline_ref
    - run_test_suite
    - capture_execution_results

  2_execute_current:
    - run_test_suite_current
    - capture_execution_results

  3_compare_feedback:
    - detect_new_failures
    - detect_behavior_changes
    - detect_performance_regressions

  4_analyze_root_cause:
    - diff_source_changes
    - identify_likely_culprits
    - suggest_fixes

  5_iterate_or_report:
    - if_regressions_critical: block_merge
    - if_regressions_minor: warn_and_document
    - if_no_regressions: approve
```

## Usage Examples

### Example 1: Pre-Merge Regression Check

```bash
# Before merging feature branch
/regression-check --baseline main --scope changed-files --format summary

# Expected outcome:
# - Fast execution (only changed files tested)
# - Summary of any regressions introduced
# - Recommendation: merge/block/investigate
```

### Example 2: Bug Fix Verification

```bash
# After implementing bug fix
/regression-check --baseline v2026.1.4 --bug-report .aiwg/bugs/BUG-042.md

# Expected outcome:
# - Verifies bug is fixed (test now passes)
# - Checks for side effects (new failures)
# - Provides merge recommendation
```

### Example 3: Release Regression Testing

```bash
# Before cutting release
/regression-check --baseline main --scope full --format detailed

# Expected outcome:
# - Comprehensive test suite comparison
# - Detailed logs for all detected regressions
# - Complete report for release decision
```

### Example 4: Module-Specific Testing

```bash
# After refactoring auth module
/regression-check --baseline main --scope module auth --format json

# Expected outcome:
# - Focused regression detection
# - JSON output for CI/CD integration
# - Fast feedback on module changes
```

### Example 5: Performance Regression Detection

```bash
# Detect performance regressions
/regression-check --baseline v2026.1.3 --format detailed

# Expected outcome:
# - Timing comparisons for all tests
# - Flags tests with >20% performance degradation
# - Detailed analysis of slow tests
```

## Integration with CI/CD

```yaml
# .github/workflows/regression-check.yml
name: Regression Check

on:
  pull_request:
    branches: [main]

jobs:
  regression:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0  # Full history for baseline comparison

      - name: Run regression check
        run: |
          aiwg regression-check \
            --baseline main \
            --scope changed-files \
            --format json \
            > regression-results.json

      - name: Post results
        if: failure()
        uses: actions/github-script@v6
        with:
          script: |
            const results = require('./regression-results.json');
            // Post comment to PR with regression report
```

## Output Artifacts

Generated files:

```
.aiwg/testing/regression-reports/
├── regression-{timestamp}.md         # Human-readable report
├── regression-{timestamp}.json       # Machine-readable results
├── baseline-execution.log           # Baseline test output
├── current-execution.log            # Current test output
└── diff-analysis.md                 # Source code diff analysis
```

## Exit Codes

```bash
0  - No regressions detected
1  - Critical regressions (blocking)
2  - Major regressions (warning)
3  - Minor regressions (informational)
4  - Execution error (baseline/current test failure)
```

## References

- @$AIWG_ROOT/docs/references/REF-013-metagpt-multi-agent-framework.md - Executable feedback pattern
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/executable-feedback.md - Feedback loop implementation rules
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/templates/test/regression-test-set-card.md - Regression test template
- @.aiwg/testing/ - Test artifacts location
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/executable-feedback.md - Test-first principles

Related Skills

eslint-checker

from jmagly/aiwg

Run ESLint for JavaScript/TypeScript code quality and style enforcement. Use for static analysis and auto-fixing.

traceability-check

from jmagly/aiwg

Verify bidirectional traceability from requirements to code to tests and identify coverage gaps and orphan artifacts

regression-visual

from jmagly/aiwg

Detect visual and UI regressions through screenshot comparison and pixel-diff analysis across browsers and viewports

regression-report

from jmagly/aiwg

Generate comprehensive regression analysis reports combining bisect, baseline, and metrics data with actionable recommendations

regression-performance

from jmagly/aiwg

Detect performance regressions by comparing benchmarks across versions with latency, throughput, and statistical significance analysis

regression-metrics

from jmagly/aiwg

Track and analyze regression statistics, trends, hotspots, and health indicators across test suites

regression-learning

from jmagly/aiwg

Improve regression detection over time through cross-task pattern recognition, test prioritization, and historical analysis

regression-cicd-hooks

from jmagly/aiwg

Integrate regression testing into CI/CD pipelines with baseline comparison and merge blocking on failure

regression-bisect

from jmagly/aiwg

Identify the commit that introduced a regression using git bisect with automated test execution and blame context

regression-baseline

from jmagly/aiwg

Create and maintain regression test baselines for comparison and drift detection across versioned snapshots

quality-checker

from jmagly/aiwg

Validate skill quality, completeness, and adherence to standards. Use before packaging to ensure skill meets quality requirements.

project-health-check

from jmagly/aiwg

Analyze overall project health and metrics