orchestrating-test-execution

Test coordinate parallel test execution across multiple environments and frameworks. Use when performing specialized testing. Trigger with phrases like "orchestrate tests", "run parallel tests", or "coordinate test execution".

1,868 stars

byjeremylongshore

View on GitHub Installation ↓

Best use case

orchestrating-test-execution is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using orchestrating-test-execution should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/orchestrating-test-execution/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/testing/test-orchestrator/skills/orchestrating-test-execution/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/orchestrating-test-execution/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How orchestrating-test-execution Compares

Feature / Agent	orchestrating-test-execution	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

SKILL.md Source

# Test Orchestrator

## Overview

Coordinate parallel test execution across multiple test suites, frameworks, and environments. Manages test splitting, worker allocation, result aggregation, and intelligent retry strategies.

## Prerequisites

- Test runner with parallel execution support (Jest, Vitest, pytest-xdist, Playwright, or JUnit 5)
- CI/CD platform configured (GitHub Actions, GitLab CI, CircleCI, or Jenkins)
- Test suite with consistent pass rates (flaky tests identified and tagged)
- Sufficient CI runner resources for parallel worker count
- Test result reporting tool (JUnit XML, Allure, or equivalent)

## Instructions

1. Analyze the existing test suite using Grep and Glob to catalog all test files, their framework, approximate run time, and dependency requirements.
2. Classify tests into execution tiers:
- **Tier 1 (Fast)**: Unit tests with no I/O -- target under 30 seconds total.
- **Tier 2 (Medium)**: Integration tests requiring local services -- target under 3 minutes.
- **Tier 3 (Slow)**: E2E and browser tests -- target under 10 minutes.
3. Configure parallel execution for each tier:
- Split unit tests across N workers using `jest --shard=i/N` or `pytest -n auto`.
- Shard E2E tests by test file using Playwright `--shard=i/N` or Cypress parallelization.
- Assign heavier integration tests to dedicated workers with more resources.
4. Create a CI pipeline configuration that runs tiers in parallel:
- Tier 1 and Tier 2 run concurrently on separate jobs.
- Tier 3 runs after a fast pre-check gate passes.
- Each tier reports results to a unified collection step.
5. Implement intelligent retry logic for flaky tests:
- Tag known flaky tests with `@flaky` or equivalent marker.
- Retry failed tests up to 2 times before marking as failed.
- Track flaky test frequency in a log file for triage.
6. Aggregate results from all parallel workers into a single report:
- Merge JUnit XML files from each shard.
- Calculate total pass/fail/skip counts and execution time.
- Identify the slowest tests for optimization targets.
7. Write the orchestration configuration to the project's CI config file and validate it with a dry run.

## Output

- CI pipeline configuration file (`.github/workflows/test.yml`, `.gitlab-ci.yml`, or equivalent)
- Test sharding configuration with worker count and split strategy
- Merged test result report in JUnit XML or JSON format
- Execution timeline showing parallel job durations and bottlenecks
- Flaky test inventory with retry counts and failure patterns

## Error Handling

| Error | Cause | Solution |
|-------|-------|---------|
| Shard produces zero tests | Uneven test distribution or incorrect shard index | Verify shard count matches actual test file count; use file-based splitting |
| Worker out of memory | Too many parallel processes on one runner | Reduce `--maxWorkers` or `-n` count; increase runner memory; use `--workerIdleMemoryLimit` |
| Test ordering dependency | Tests pass in isolation but fail in specific shard order | Add `--randomize` flag; fix shared state leaks; enforce test independence |
| Result aggregation mismatch | Missing shard results due to job timeout | Set job-level timeouts higher than test timeouts; add result upload as a separate step |
| CI cache miss slowing startup | Dependencies not cached between parallel jobs | Configure dependency caching per lockfile hash; use a shared setup job |

## Examples

**GitHub Actions matrix strategy for Jest sharding:**
```yaml
jobs:
test:
strategy:
matrix:
shard: [1, 2, 3, 4]
steps:
- run: npx jest --shard=${{ matrix.shard }}/4 --ci --reporters=jest-junit
- uses: actions/upload-artifact@v4
with:
name: results-${{ matrix.shard }}
path: junit.xml
merge:
needs: test
steps:
- uses: actions/download-artifact@v4
- run: npx junit-merge -d results-* -o merged-results.xml
```

**pytest-xdist parallel execution:**
```bash
pytest -n auto --dist worksteal -q --junitxml=results.xml
```

**Playwright sharded execution:**
```bash
npx playwright test --shard=1/3 --reporter=junit
```

## Resources

- Jest sharding: https://jestjs.io/docs/cli#--shardshardindex-shardcount
- pytest-xdist: https://pytest-xdist.readthedocs.io/
- Playwright test sharding: https://playwright.dev/docs/test-sharding
- GitHub Actions matrix strategy: https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs
- JUnit XML merge tools: https://github.com/imsky/junit-merge

Related Skills

test-skill

1868

from jeremylongshore/claude-code-plugins-plus-skills

Test skill for E2E validation. Trigger with "run test skill" or "execute test". Use this skill when testing skill activation and tool permissions.

testing-visual-regression

1868

from jeremylongshore/claude-code-plugins-plus-skills

Detect visual changes in UI components using screenshot comparison. Use when detecting unintended UI changes or pixel differences. Trigger with phrases like "test visual changes", "compare screenshots", or "detect UI regressions".

generating-unit-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Test automatically generate comprehensive unit tests from source code covering happy paths, edge cases, and error conditions. Use when creating test coverage for functions, classes, or modules. Trigger with phrases like "generate unit tests", "create tests for", or "add test coverage".

generating-test-reports

1868

from jeremylongshore/claude-code-plugins-plus-skills

Generate comprehensive test reports with metrics, coverage, and visualizations. Use when performing specialized testing. Trigger with phrases like "generate test report", "create test documentation", or "show test metrics".

managing-test-environments

1868

from jeremylongshore/claude-code-plugins-plus-skills

Test provision and manage isolated test environments with configuration and data. Use when performing specialized testing. Trigger with phrases like "manage test environment", "provision test env", or "setup test infrastructure".

generating-test-doubles

1868

from jeremylongshore/claude-code-plugins-plus-skills

Generate mocks, stubs, spies, and fakes for dependency isolation. Use when creating mocks, stubs, or test isolation fixtures. Trigger with phrases like "generate mocks", "create test doubles", or "setup stubs".

generating-test-data

1868

from jeremylongshore/claude-code-plugins-plus-skills

Generate realistic test data including edge cases and boundary conditions. Use when creating realistic fixtures or edge case test data. Trigger with phrases like "generate test data", "create fixtures", or "setup test database".

analyzing-test-coverage

1868

from jeremylongshore/claude-code-plugins-plus-skills

Analyze code coverage metrics and identify untested code paths. Use when analyzing untested code or coverage gaps. Trigger with phrases like "analyze coverage", "check test coverage", or "find untested code".

managing-snapshot-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Create and validate component snapshots for UI regression testing. Use when performing specialized testing. Trigger with phrases like "update snapshots", "test UI snapshots", or "validate component snapshots".

running-smoke-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Execute fast smoke tests validating critical functionality after deployment. Use when performing specialized testing. Trigger with phrases like "run smoke tests", "quick validation", or "test critical paths".

performing-security-testing

1868

from jeremylongshore/claude-code-plugins-plus-skills

Test automate security vulnerability testing covering OWASP Top 10, SQL injection, XSS, CSRF, and authentication issues. Use when performing security assessments, penetration tests, or vulnerability scans. Trigger with phrases like "scan for vulnerabilities", "test security", or "run penetration test".

tracking-regression-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Track and manage regression test suites across releases. Use when performing specialized testing. Trigger with phrases like "track regressions", "manage regression suite", or "validate against baseline".