Best use case
test is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Comprehensive testing workflow - unit tests ∥ integration tests → E2E tests
Teams using test should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/test/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How test Compares
| Feature / Agent | test | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Comprehensive testing workflow - unit tests ∥ integration tests → E2E tests
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# /test - Testing Workflow
Run comprehensive test suite with parallel execution.
## When to Use
- "Run all tests"
- "Test the feature"
- "Verify everything works"
- "Full test suite"
- Before releases or merges
- After major changes
## Workflow Overview
```
┌─────────────┐ ┌───────────┐
│ diagnostics │ ──▶ │ arbiter │ ─┐
│ (type check)│ │ (unit) │ │
└─────────────┘ └───────────┘ │
├──▶ ┌─────────┐
┌───────────┐ │ │ atlas │
│ arbiter │ ─┘ │ (e2e) │
│ (integ) │ └─────────┘
└───────────┘
Pre-flight Parallel Sequential
(~1 second) fast tests slow tests
```
## Agent Sequence
| # | Agent | Role | Execution |
|---|-------|------|-----------|
| 1 | **arbiter** | Unit tests, type checks, linting | Parallel |
| 1 | **arbiter** | Integration tests | Parallel |
| 2 | **atlas** | E2E/acceptance tests | After 1 passes |
## Why This Order?
1. **Fast feedback**: Unit tests fail fast
2. **Parallel efficiency**: No dependency between unit and integration
3. **E2E gating**: Only run slow E2E tests if faster tests pass
## Execution
### Phase 0: Pre-flight Diagnostics (NEW)
Before running tests, check for type errors - they often cause test failures:
```bash
tldr diagnostics . --project --format text 2>/dev/null | grep "^E " | head -10
```
**Why diagnostics first?**
- Type check is instant (~1s), tests take longer
- Diagnostics show ROOT CAUSE, tests show symptoms
- "Expected int, got str" is clearer than "AttributeError at line 50"
- Catches errors in untested code paths
**If errors found:** Fix them BEFORE running tests. Type errors usually mean tests will fail anyway.
**If clean:** Proceed to Phase 1.
### Phase 0.5: Change Impact (Optional)
For large test suites, find only affected tests:
```bash
tldr change-impact --session
# or for explicit files:
tldr change-impact src/changed_file.py
```
This returns which tests to run based on what changed. Skip this for small projects or when you want full coverage.
### Phase 1: Parallel Tests
```
# Run both in parallel
Task(
subagent_type="arbiter",
prompt="""
Run unit tests for: [SCOPE]
Include:
- Unit tests
- Type checking
- Linting
Report: Pass/fail count, failures detail
""",
run_in_background=true
)
Task(
subagent_type="arbiter",
prompt="""
Run integration tests for: [SCOPE]
Include:
- Integration tests
- API tests
- Database tests
Report: Pass/fail count, failures detail
""",
run_in_background=true
)
# Wait for both
[Check TaskOutput for both]
```
### Phase 2: E2E Tests (If Phase 1 Passes)
```
Task(
subagent_type="atlas",
prompt="""
Run E2E tests for: [SCOPE]
Include:
- End-to-end flows
- Acceptance tests
- UI tests if applicable
Report: Pass/fail count, screenshots on failure
"""
)
```
## Test Scopes
### Full Suite
```
User: /test
→ All unit + integration + E2E tests
```
### Feature Scope
```
User: /test authentication
→ Only auth-related tests
```
### Quick Check
```
User: /test --quick
→ Only unit tests (skip integration and E2E)
```
## Example
```
User: /test the new payment feature
Claude: Starting /test workflow for payment feature...
Phase 0: Pre-flight diagnostics...
$ tldr diagnostics . --project --format text | grep "^E "
(no type errors found)
Phase 1: Running parallel tests...
┌─────────────────────────────────────────┐
│ arbiter: Running unit tests... │
│ arbiter: Running integration tests... │
└─────────────────────────────────────────┘
arbiter: ✅ 45/45 unit tests passing
arbiter: ✅ 12/12 integration tests passing
Phase 2: Running E2E tests...
atlas: ✅ 8/8 E2E tests passing
Test Summary:
┌─────────────┬─────────┬────────┐
│ Type │ Passed │ Failed │
├─────────────┼─────────┼────────┤
│ Unit │ 45 │ 0 │
│ Integration │ 12 │ 0 │
│ E2E │ 8 │ 0 │
├─────────────┼─────────┼────────┤
│ TOTAL │ 65 │ 0 │
└─────────────┴─────────┴────────┘
All tests passing! ✅
```
### Example with Type Errors
```
User: /test
Claude: Starting /test workflow...
Phase 0: Pre-flight diagnostics...
$ tldr diagnostics . --project --format text | grep "^E "
E src/payment.py:45:12: Argument of type 'str' not assignable to 'int'
E src/refund.py:23:8: Return type 'None' not assignable to 'float'
Found 2 type errors. Fixing before running tests...
[Claude fixes the type errors]
Re-running diagnostics... clean.
Phase 1: Running parallel tests...
```
## Failure Handling
If Phase 1 fails:
```
arbiter: ❌ 43/45 tests passing
2 failures:
- test_payment_validation: expected 'invalid' got 'valid'
- test_refund_calculation: off by $0.01
Stopping workflow. Fix failures before running E2E tests.
```
## Flags
- `--quick`: Unit tests only
- `--no-e2e`: Skip E2E tests
- `--coverage`: Include coverage report
- `--watch`: Re-run on file changesRelated Skills
test-strategy
Test pyramid decision matrix, coverage targets, when to write which test type, mock vs real dependency decisions, and test ROI analysis.
python-testing
Python testing strategies using pytest, TDD methodology, fixtures, mocking, parametrization, and coverage requirements.
property-based-testing
Property-based testing (PBT) patterns with fast-check (JS/TS), Hypothesis (Python), and gopter (Go). Generate random inputs, define invariants, shrink failures to minimal cases. Adapted from Trail of Bits. Use when testing pure functions, parsers, serializers, state machines, or any code where example-based tests miss edge cases.
performance-testing
Load testing with k6/Artillery, response time thresholds, memory leak detection, N+1 query detection, and CI integration.
load-testing-patterns
k6 script templates, load profiles, response time thresholds, SLO validation, and performance testing strategies.
golang-testing
Go testing patterns including table-driven tests, subtests, benchmarks, fuzzing, and test coverage. Follows TDD methodology with idiomatic Go practices.
contract-testing-patterns
Pact consumer-driven contracts, provider verification, schema evolution
accessibility-testing
axe-core integration, WCAG 2.2 AA checklist, keyboard navigation testing, screen reader testing, and ARIA pattern validation.
workflow-router
Goal-based workflow orchestration - routes tasks to specialist agents based on user goals
wiring
Wiring Verification
websocket-patterns
Connection management, room patterns, reconnection strategies, message buffering, and binary protocol design.
visual-verdict
Screenshot comparison QA for frontend development. Takes a screenshot of the current implementation, scores it across multiple visual dimensions, and returns a structured PASS/REVISE/FAIL verdict with concrete fixes. Use when implementing UI from a design reference or verifying visual correctness.