Best use case
e2e-testing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Playwright-based end-to-end testing workflow.
Teams using e2e-testing should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/e2e-testing/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How e2e-testing Compares
| Feature / Agent | e2e-testing | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Playwright-based end-to-end testing workflow.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
SKILL.md Source
# E2E Testing Skill (Playwright)
Playwright-based E2E testing across four phases: Scaffold, Build, Run, Validate. Each phase produces a saved artifact and must pass its gate before the next phase begins.
## Instructions
### PHASE 1: SCAFFOLD
**Goal:** Verify Playwright is installed, create the directory structure, and generate `playwright.config.ts`.
**Actions:**
1. Check if `@playwright/test` is installed: `npx playwright --version`. If not, run `npm install -D @playwright/test` and `npx playwright install`.
2. Create directory structure:
```
tests/
e2e/
auth/
features/
api/
pages/ <- POM classes live here
artifacts/
screenshots/
traces/
videos/
```
3. Write `playwright.config.ts` using the template below. The config bakes in failure diagnostics by default: `screenshot: 'only-on-failure'`, `trace: 'on-first-retry'`, and `video: 'retain-on-failure'` so that every failure produces actionable artifacts without manual setup. CI retries (`retries: process.env.CI ? 2 : 0`) absorb transient infrastructure flakiness without masking real bugs.
4. Confirm `playwright.config.ts` is valid TypeScript: `npx tsc --noEmit`. Run this deterministic check before any subjective assessment of the config -- compiler errors are facts, opinions are not.
**Artifact:** `playwright.config.ts` + `tests/e2e/` directory structure.
**Gate:** `playwright.config.ts` exists AND `tests/e2e/` directory exists. If either is missing, do not proceed to Phase 2 -- diagnose and fix.
#### playwright.config.ts Template
```typescript
import { defineConfig, devices } from '@playwright/test';
export default defineConfig({
testDir: './tests/e2e',
fullyParallel: true,
forbidOnly: !!process.env.CI,
retries: process.env.CI ? 2 : 0,
workers: process.env.CI ? 1 : undefined,
reporter: [
['html', { outputFolder: 'playwright-report' }],
['json', { outputFile: 'playwright-results.json' }],
],
use: {
baseURL: process.env.BASE_URL || 'http://localhost:3000',
trace: 'on-first-retry',
screenshot: 'only-on-failure',
video: 'retain-on-failure',
},
outputDir: 'artifacts/',
projects: [
{ name: 'chromium', use: { ...devices['Desktop Chrome'] } },
{ name: 'firefox', use: { ...devices['Desktop Firefox'] } },
{ name: 'webkit', use: { ...devices['Desktop Safari'] } },
{
name: 'Mobile Chrome',
use: { ...devices['Pixel 5'] },
},
],
});
```
The multi-browser matrix (Chromium, Firefox, WebKit) is the default because cross-browser bugs caught in CI are cheaper than cross-browser bugs caught in production. Remove browsers only when the project explicitly constrains the target set.
---
### PHASE 2: BUILD
**Goal:** Write POM classes for target feature areas, then write spec files that use those POMs.
Every page or feature area gets a typed Page Object class. Spec files never contain inline locators -- all selectors live in the POM. This separation means a selector change is a one-line POM edit, not a grep-and-replace across dozens of specs.
**Actions:**
1. Identify the feature areas under test (auth, checkout, dashboard, etc.).
2. For each area, create a POM class in `pages/` (see POM Pattern below). All locators must use `data-testid` attributes via `page.getByTestId()`. CSS selectors (`page.locator('.btn-primary')`) break silently when styles change. XPath breaks on DOM restructuring. Text matching (`page.locator('text=Submit')`) breaks on copy changes. `data-testid` is a testing contract that survives all three.
3. Write spec files in `tests/e2e/<area>/` using the POMs.
4. Run `npx tsc --noEmit` to verify all files compile.
5. Fix any TypeScript errors before proceeding.
**Artifact:** `tests/e2e/**/*.spec.ts` files + `pages/*.ts` POM classes, all compiling cleanly.
**Gate:** At least one `.spec.ts` exists under `tests/e2e/` AND `npx tsc --noEmit` exits 0. If compile fails, fix errors -- do not proceed to Phase 3 with broken TypeScript.
#### POM Pattern
```typescript
// pages/LoginPage.ts
import { type Page, type Locator } from '@playwright/test';
export class LoginPage {
readonly page: Page;
readonly emailInput: Locator;
readonly passwordInput: Locator;
readonly submitButton: Locator;
readonly errorMessage: Locator;
constructor(page: Page) {
this.page = page;
this.emailInput = page.getByTestId('login-email');
this.passwordInput = page.getByTestId('login-password');
this.submitButton = page.getByTestId('login-submit');
this.errorMessage = page.getByTestId('login-error');
}
async goto() {
await this.page.goto('/login');
await this.page.waitForLoadState('networkidle');
}
async login(email: string, password: string) {
await this.emailInput.fill(email);
await this.passwordInput.fill(password);
await this.submitButton.click();
}
}
```
```typescript
// tests/e2e/auth/login.spec.ts
import { test, expect } from '@playwright/test';
import { LoginPage } from '../../../pages/LoginPage';
test.describe('Login Flow', () => {
let loginPage: LoginPage;
test.beforeEach(async ({ page }) => {
loginPage = new LoginPage(page);
await loginPage.goto();
});
test('successful login redirects to dashboard', async ({ page }) => {
await loginPage.login('user@example.com', 'password123');
await expect(page).toHaveURL('/dashboard');
});
test('invalid credentials shows error message', async () => {
await loginPage.login('bad@example.com', 'wrong');
await expect(loginPage.errorMessage).toBeVisible();
await expect(loginPage.errorMessage).toContainText('Invalid credentials');
});
});
```
#### data-testid Convention
- **Format**: `<component>-<element>` -- e.g., `login-email`, `checkout-submit`, `nav-profile-link`
- **Scope**: Add `data-testid` to interactive elements and status regions the tests need to assert on
- **Stability**: `data-testid` attributes must not change with styling or refactoring -- they are a testing contract
#### Waiting and Timing
Never use `waitForTimeout()` or `setTimeout()` in tests. Arbitrary waits pass slowly on fast machines and fail on slow ones -- they encode a guess about timing instead of observing the actual condition. Use condition-based waiting instead:
| Instead of | Use |
|-----------|-----|
| `await page.waitForTimeout(2000)` | `await expect(locator).toBeVisible()` or `await page.waitForResponse(...)` |
| `await page.waitForTimeout(0)` to "flush" | `await page.waitForLoadState('networkidle')` |
| `page.click('button')` without waiting | `locator.click()` -- Playwright auto-waits for actionability |
Each test must own its own setup in `beforeEach`. Tests sharing state via global variables break parallel execution because Playwright runs specs concurrently by default.
---
### PHASE 3: RUN
**Goal:** Execute the test suite, capture the results JSON, and identify any failing or flaky tests.
**Actions:**
1. Ensure the application under test is running (or document the `BASE_URL` required).
2. Run the full suite with JSON reporter configured in `playwright.config.ts`:
```bash
npx playwright test
```
3. If any tests fail, run them in isolation with `--repeat-each=5` to distinguish flaky from consistently failing:
```bash
npx playwright test tests/e2e/auth/login.spec.ts --repeat-each=5
```
4. Quarantine confirmed flaky tests with `test.fixme()`. Never delete a failing test -- deleted tests leave silent coverage gaps. Quarantined tests are visible debt with tracking references:
```typescript
test.fixme('flaky: login redirects intermittently', async ({ page }) => {
// TODO: #123 -- investigate race condition with auth cookie
...
});
```
5. Do NOT use `test.skip()` to hide broken tests. `test.skip()` is for conditional environment guards (e.g., "skip on WebKit"), not for sweeping failures under the rug.
**Artifact:** `playwright-results.json` (presence is the gate -- pass rate is not).
**Gate:** `playwright-results.json` exists at the project root. The file must contain valid JSON. Pass rate does not block Phase 4 -- reporting on failures is Phase 4's job.
#### Flaky Test Quarantine Protocol
When a test fails intermittently:
1. **Reproduce**: `npx playwright test <file> --repeat-each=5` -- if it fails at least once in 5 runs, it is flaky.
2. **Quarantine**: Replace `test(` with `test.fixme(` and add a comment with the symptom and a tracking reference.
3. **Do not delete**: Deleted tests leave coverage gaps. Quarantined tests are visible debt.
4. **Fix criteria**: Before removing `test.fixme`, the test must pass 10/10 with `--repeat-each=10`.
```typescript
// Before
test('checkout completes successfully', async ({ page }) => { ... });
// After quarantine
test.fixme('checkout completes successfully', async ({ page }) => {
// FLAKY: intermittent race on payment confirmation response
// TODO: #456 -- investigate network timing in checkout flow
...
});
```
---
### PHASE 4: VALIDATE
**Goal:** Deterministic checks on test output, then structured report generation.
**Actions:**
1. **Deterministic checks first** -- run these before any LLM summary because compiler output and JSON parsing are facts, not opinions:
- `playwright-results.json` exists and parses as valid JSON.
- Extract counts: `python3 -c "import json,sys; d=json.load(open('playwright-results.json')); print(d.get('stats', d))"`
- Identify all `unexpected` (failed) and `flaky` result entries.
2. **LLM triage** (only after deterministic checks pass):
- For each failed test, identify whether it is: (a) a broken assertion, (b) a selector mismatch, (c) a timing/async issue, or (d) an application bug.
- Categorize flaky tests for quarantine vs. fix.
3. Write `e2e-report.md` using the report template below.
**Artifact:** `e2e-report.md`.
**Gate:** `e2e-report.md` exists. Skill is complete only when this file is written.
#### e2e-report.md Template
```markdown
# E2E Test Report
**Date**: YYYY-MM-DD
**Playwright version**: X.X.X
**Base URL**: http://...
**Browsers tested**: Chromium, Firefox, WebKit
## Summary
| Status | Count |
|--------|-------|
| Passed | N |
| Failed | N |
| Flaky (quarantined) | N |
| Skipped | N |
| **Total** | N |
## Failed Tests
### <test name>
- **File**: `tests/e2e/.../file.spec.ts`
- **Error**: <assertion or timeout message>
- **Category**: broken-assertion | selector-mismatch | timing | app-bug
- **Action**: fix | quarantine | investigate
## Quarantined (test.fixme)
| Test | Issue | Tracking |
|------|-------|----------|
| <name> | <symptom> | <issue link or TODO> |
## Artifacts
| Type | Path |
|------|------|
| HTML Report | `playwright-report/index.html` |
| JSON Results | `playwright-results.json` |
| Screenshots | `artifacts/screenshots/` |
| Traces | `artifacts/traces/` |
| Videos | `artifacts/videos/` |
## Next Actions
- [ ] Fix broken assertions in: ...
- [ ] Investigate app bugs: ...
- [ ] Unquarantine after fix: ...
```
---
### CI/CD Integration
#### GitHub Actions Workflow Template
```yaml
name: E2E Tests
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
jobs:
e2e:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Install Playwright browsers
run: npx playwright install --with-deps
- name: Start application
run: npm run build && npm run start &
env:
NODE_ENV: test
- name: Wait for application
run: npx wait-on http://localhost:3000 --timeout 60000
- name: Run E2E tests
run: npx playwright test
env:
BASE_URL: http://localhost:3000
CI: true
- name: Upload test artifacts
uses: actions/upload-artifact@v4
if: always()
with:
name: playwright-artifacts
path: |
playwright-report/
playwright-results.json
artifacts/
retention-days: 30
```
---
## Error Handling
| Symptom | Likely Cause | Fix |
|---------|-------------|-----|
| `npx tsc --noEmit` fails after Phase 1 | Bad config template or missing types | Check `@playwright/test` is in devDependencies, verify `tsconfig.json` includes the test directory |
| Tests pass locally, fail in CI | Missing browser deps or wrong `BASE_URL` | Use `npx playwright install --with-deps` in CI; verify `BASE_URL` env var matches the running app |
| `playwright-results.json` missing after run | Reporter not configured or test runner crashed | Verify `json` reporter is in `playwright.config.ts`; check for OOM or process kill signals |
| Locator timeout on element that exists | Element present but not actionable (hidden, disabled, covered) | Use `await expect(locator).toBeVisible()` before interaction; check for overlays or modals |
| `page.fill()` appends instead of replacing | Input field has existing value | Use `locator.clear()` then `locator.fill()` |
| Flaky test passes 4/5 runs | Race condition, network timing, or animation interference | Quarantine with `test.fixme()`, reproduce with `--repeat-each=10`, check for missing `waitFor` conditions |
| Locators depending on `nth(0)` break randomly | DOM order is not stable | Add a `data-testid` to the specific element instead of relying on position |
---
## References
- [playwright-patterns.md](references/playwright-patterns.md) -- POM examples, condition-based waiting, multi-browser config, financial skip guards
- [wallet-testing.md](references/wallet-testing.md) -- Web3/MetaMask mock patterns with `addInitScript`
- [financial-flows.md](references/financial-flows.md) -- Production skip guards, blockchain confirmation waits
- [flakiness-triage.md](references/flakiness-triage.md) -- `--repeat-each`, `--retries`, quarantine decision tree
- [ADR-107](../../adr/ADR-107-e2e-testing.md) -- Decision record for this skill
- [Playwright docs](https://playwright.dev/docs/intro) -- Official API referenceRelated Skills
testing-anti-patterns
Identify and fix testing mistakes: flaky, brittle, over-mocked tests.
testing-agents-with-subagents
Test agents via subagents: known inputs, captured outputs, verification.
swift-testing
Swift testing: XCTest, Swift Testing framework, async patterns.
php-testing
PHP testing patterns: PHPUnit, test doubles, database testing.
kotlin-testing
Kotlin testing with JUnit 5, Kotest, and coroutine dispatchers.
x-api
Post tweets, build threads, upload media via the X API.
worktree-agent
Mandatory rules for agents in git worktree isolation.
workflow
Structured multi-phase workflows: review, debug, refactor, deploy, create, research, and more.
workflow-help
Interactive guide to workflow system: agents, skills, routing, execution patterns.
wordpress-uploader
WordPress REST API integration for posts and media uploads.
wordpress-live-validation
Validate published WordPress posts in browser via Playwright.
with-anti-rationalization
Anti-rationalization enforcement for maximum-rigor task execution.