regression-baseline
Create and maintain regression test baselines for comparison and drift detection across versioned snapshots
Best use case
regression-baseline is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
It is a strong fit for teams already working in Codex.
Create and maintain regression test baselines for comparison and drift detection across versioned snapshots
Teams using regression-baseline should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/regression-baseline/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How regression-baseline Compares
| Feature / Agent | regression-baseline | Standard Approach |
|---|---|---|
| Platform Support | Codex | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Create and maintain regression test baselines for comparison and drift detection across versioned snapshots
Which AI agents support this skill?
This skill is designed for Codex.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Cursor vs Codex for AI Workflows
Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.
AI Agents for Marketing
Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.
SKILL.md Source
# regression-baseline
Create and maintain regression test baselines for comparison and drift detection.
## Triggers
Alternate expressions and non-obvious activations (primary phrases are matched automatically from the skill description):
- "snapshot tests" → baseline creation shorthand
- "capture current behavior" → baseline establishment
## Purpose
This skill manages regression baselines by:
- Capturing known-good system states as baselines
- Storing snapshots of expected outputs
- Maintaining versioned baseline history
- Comparing current state to baseline
- Detecting drift from established baselines
- Managing baseline updates and approvals
## Behavior
When triggered, this skill:
1. **Identifies baseline scope**:
- Determine what to baseline (tests, outputs, performance)
- Select baseline type (functional, visual, performance, API)
- Identify affected components
- Check for existing baselines
2. **Captures baseline data**:
- Run tests and capture outputs
- Take snapshots (visual, data, API responses)
- Measure performance metrics
- Record system configuration
- Generate checksums/hashes
3. **Stores baseline**:
- Version baseline with semantic versioning
- Tag with git commit/release
- Store in `.aiwg/testing/baselines/`
- Update baseline manifest
- Link to requirements/features
4. **Validates baseline quality**:
- Ensure baseline is stable (run multiple times)
- Verify baseline is reproducible
- Check for known issues in baseline
- Require approval for critical baselines
5. **Documents baseline**:
- Record what is baselined
- Note when and why baseline was created
- Document known deviations
- Link to issues/requirements
6. **Manages baseline lifecycle**:
- Archive old baselines
- Update baselines on intentional changes
- Track baseline drift over time
- Alert on excessive drift
## Baseline Types
### Functional Baseline
```yaml
functional_baseline:
description: Expected behavior outputs
scope: Unit and integration tests
format: JSON snapshots
capture:
- test_outputs
- assertions
- mock_data
- expected_errors
example:
test: "user registration"
baseline: "baselines/functional/user-registration-v1.json"
includes:
- success_response_format
- validation_error_messages
- database_state_after_registration
```
### Visual Baseline
```yaml
visual_baseline:
description: UI screenshots for visual regression
scope: Frontend components
format: PNG images with metadata
capture:
- component_screenshots
- page_screenshots
- responsive_breakpoints
- interaction_states
example:
component: "LoginForm"
baseline: "baselines/visual/login-form-v2.png"
metadata:
viewport: 1920x1080
theme: dark
state: default
```
### Performance Baseline
```yaml
performance_baseline:
description: Performance benchmarks
scope: API response times, load tests
format: JSON metrics
capture:
- response_times_p50_p95_p99
- throughput_requests_per_second
- resource_utilization
- database_query_times
example:
endpoint: "/api/users"
baseline: "baselines/performance/api-users-v1.json"
metrics:
p50_response_time_ms: 45
p95_response_time_ms: 120
p99_response_time_ms: 250
throughput_rps: 1000
```
### API Contract Baseline
```yaml
api_baseline:
description: API request/response contracts
scope: REST/GraphQL APIs
format: OpenAPI/JSON schemas
capture:
- request_schemas
- response_schemas
- status_codes
- headers
example:
endpoint: "POST /api/auth/login"
baseline: "baselines/api/auth-login-v3.yaml"
contract:
request_body_schema: LoginRequest
success_response: 200 with token
error_responses: [400, 401, 429]
```
## Baseline Manifest
```yaml
# .aiwg/testing/baselines/manifest.yaml
baselines:
- id: functional-auth-v1
type: functional
created: 2026-01-28T10:00:00Z
created_by: test-engineer
git_commit: abc123def
release: v1.2.0
scope: Authentication flows
files:
- baselines/functional/login-v1.json
- baselines/functional/logout-v1.json
- baselines/functional/register-v1.json
status: active
approved_by: tech-lead
notes: Initial baseline for auth module
- id: visual-dashboard-v2
type: visual
created: 2026-01-20T14:30:00Z
created_by: frontend-engineer
git_commit: def456ghi
release: v1.1.0
scope: Dashboard UI components
files:
- baselines/visual/dashboard-desktop-v2.png
- baselines/visual/dashboard-mobile-v2.png
status: active
approved_by: design-lead
previous_baseline: visual-dashboard-v1
changes: Updated color scheme per design system v2
- id: performance-api-v1
type: performance
created: 2026-01-15T09:00:00Z
created_by: devops-engineer
git_commit: ghi789jkl
release: v1.0.0
scope: Core API endpoints
files:
- baselines/performance/api-benchmarks-v1.json
status: active
approved_by: architect
notes: Pre-optimization baseline
```
## Baseline Comparison Report
```markdown
# Baseline Comparison Report
**Date**: 2026-01-28
**Baseline**: functional-auth-v1
**Current State**: HEAD (commit xyz789)
**Comparison Type**: Functional
## Executive Summary
**Status**: ⚠️ Drift Detected
**Changes**: 3 outputs differ from baseline
**Severity**: Medium - Unexpected behavior changes
| Metric | Baseline | Current | Drift |
|--------|----------|---------|-------|
| Tests Passing | 45/45 | 43/45 | -2 |
| Output Matches | 100% | 93.3% | -6.7% |
| New Failures | 0 | 2 | +2 |
## Detailed Comparison
### Test: user-login
**Status**: ⚠️ Drift
**Baseline** (functional-auth-v1):
```json
{
"status": 200,
"body": {
"token": "jwt.header.payload.signature",
"user": {
"id": "uuid",
"email": "user@example.com"
}
}
}
```
**Current**:
```json
{
"status": 200,
"body": {
"token": "jwt.header.payload.signature",
"user": {
"id": "uuid",
"email": "user@example.com",
"name": "John Doe" // NEW FIELD
}
}
}
```
**Analysis**: New `name` field added to response
**Impact**: Breaking change for clients expecting exact schema
**Recommendation**: Update baseline if intentional, or fix if bug
### Test: user-registration-invalid-email
**Status**: ❌ Failure
**Baseline**:
```json
{
"status": 400,
"body": {
"error": "Invalid email format"
}
}
```
**Current**:
```json
{
"status": 500,
"body": {
"error": "Internal server error"
}
}
```
**Analysis**: Validation now returns 500 instead of 400
**Impact**: Critical - Server error on invalid input
**Recommendation**: Fix immediately - regression in error handling
### Test: user-logout
**Status**: ✅ Match
Output matches baseline exactly. No drift detected.
## Drift Summary
| Test | Status | Drift Type | Severity |
|------|--------|------------|----------|
| user-login | ⚠️ Drift | Schema change | Medium |
| user-registration | ✅ Match | None | - |
| user-registration-invalid | ❌ Fail | Error code | High |
| user-logout | ✅ Match | None | - |
| token-refresh | ✅ Match | None | - |
## Root Cause Analysis
### Likely Causes
1. **user-login drift**: Intentional feature addition (commit abc123)
- PR #789: "Add user name to login response"
- Needs baseline update
2. **invalid-email failure**: Unintentional regression
- Validation middleware broken
- Returns 500 instead of 400
- Introduced in commit def456
## Recommendations
### Immediate Actions
- [ ] Fix validation error handling (returns 500 instead of 400)
- [ ] Add test for proper error codes
- [ ] Verify error handling across all validation
### Baseline Updates
- [ ] Update baseline for user-login if name field is intentional
- [ ] Document breaking change in API changelog
- [ ] Notify API consumers of schema change
### Process Improvements
- [ ] Require baseline review for API changes
- [ ] Add baseline diff to PR template
- [ ] Run baseline comparison in CI
## Approval Required
To update baseline with new outputs:
```bash
# Review changes
aiwg baseline compare functional-auth-v1
# Update if intentional
aiwg baseline update functional-auth-v1 \
--approve-changes user-login \
--justification "Added name field per REQ-123"
# Or create new baseline version
aiwg baseline create functional-auth-v2 \
--based-on v1 \
--changes "Added user name to login response"
```
```
## Usage Examples
### Create New Baseline
```
User: "Create baseline for authentication tests"
Skill executes:
1. Identify auth test suite
2. Run tests and capture outputs
3. Validate stability (run 3 times)
4. Store baseline with version
5. Update manifest
Output:
"Baseline Created: functional-auth-v1
Scope: Authentication tests (45 tests)
Format: JSON snapshots
Location: .aiwg/testing/baselines/functional/auth/
Files:
- login-v1.json
- logout-v1.json
- register-v1.json
- token-refresh-v1.json
Status: Active
Git Commit: abc123
Release: v1.2.0
Next: Baseline will be used for regression detection"
```
### Update Existing Baseline
```
User: "Update baseline for dashboard UI"
Skill checks current vs baseline:
"Comparing to visual-dashboard-v1:
- 5 screenshots differ
- Changes: Color scheme updated
Approve changes?
- dashboard-desktop.png
- dashboard-mobile.png
- widget-panel.png
Reason for update?"
User: "Design system v2 rollout"
Skill creates:
"Baseline Updated: visual-dashboard-v2
Previous: visual-dashboard-v1 (archived)
Changes: Design system v2 colors
Approved by: [current user]
Files updated: 5
New baseline active for regression tests"
```
### Compare to Baseline
```
User: "Compare current state to baseline"
Skill analyzes:
"Baseline Comparison: functional-auth-v1
Status: ⚠️ Drift Detected
Matches: 43/45 tests (95.6%)
Drifts: 2 tests show differences
Drift 1: user-login
- New field 'name' in response
- Severity: Medium
- Likely intentional (commit abc123)
Drift 2: invalid-email handling
- Returns 500 instead of 400
- Severity: High
- Likely regression (needs fix)
See full report: .aiwg/testing/baseline-comparison.md"
```
## Integration
This skill uses:
- `regression-bisect`: Find commits that broke baseline
- `regression-metrics`: Track baseline drift over time
- `test-coverage`: Ensure baselines cover critical paths
- `project-awareness`: Detect test framework and conventions
## Agent Orchestration
```yaml
agents:
creation:
agent: test-engineer
focus: Baseline capture and validation
analysis:
agent: test-architect
focus: Drift analysis and recommendations
approval:
agent: tech-lead
focus: Baseline update approval
```
## Configuration
### Baseline Storage
```yaml
baseline_config:
storage_path: .aiwg/testing/baselines/
structure:
- functional/
- visual/
- performance/
- api/
versioning: semantic # v1, v2, v3
compression: gzip # for large baselines
retention: 10 # keep last 10 versions
```
### Drift Thresholds
```yaml
drift_thresholds:
functional:
exact_match_required: true
allow_new_fields: false # strict schema matching
visual:
pixel_diff_threshold: 0.1% # 0.1% difference allowed
ignore_areas: [timestamp, dynamic-content]
performance:
p50_tolerance: 10% # ±10% from baseline
p95_tolerance: 15% # ±15% from baseline
p99_tolerance: 20% # ±20% from baseline
```
### Approval Requirements
```yaml
approval_config:
require_approval_for:
- baseline_creation: true
- baseline_updates: true
- baseline_deletion: true
approvers:
- tech-lead
- architect
- test-lead
approval_via:
- pr_review
- issue_comment
- command_flag: --approved-by
```
## Output Locations
- Baselines: `.aiwg/testing/baselines/{type}/`
- Manifest: `.aiwg/testing/baselines/manifest.yaml`
- Comparisons: `.aiwg/testing/baseline-comparisons/`
- Archived: `.aiwg/testing/baselines/archive/`
## References
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/test/baseline-schema.yaml
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/commands/baseline-create.md
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/agents/test-engineer.mdRelated Skills
regression-visual
Detect visual and UI regressions through screenshot comparison and pixel-diff analysis across browsers and viewports
regression-report
Generate comprehensive regression analysis reports combining bisect, baseline, and metrics data with actionable recommendations
regression-performance
Detect performance regressions by comparing benchmarks across versions with latency, throughput, and statistical significance analysis
regression-metrics
Track and analyze regression statistics, trends, hotspots, and health indicators across test suites
regression-learning
Improve regression detection over time through cross-task pattern recognition, test prioritization, and historical analysis
regression-cicd-hooks
Integrate regression testing into CI/CD pipelines with baseline comparison and merge blocking on failure
regression-check
Compare current behavior against baseline to detect regressions
regression-bisect
Identify the commit that introduced a regression using git bisect with automated test execution and blame context
aiwg-orchestrate
Route structured artifact work to AIWG workflows via MCP with zero parent context cost
venv-manager
Create, manage, and validate Python virtual environments. Use for project isolation and dependency management.
pytest-runner
Execute Python tests with pytest, supporting fixtures, markers, coverage, and parallel execution. Use for Python test automation.
vitest-runner
Execute JavaScript/TypeScript tests with Vitest, supporting coverage, watch mode, and parallel execution. Use for JS/TS test automation.