playwright-testing

Plan, implement, and debug frontend tests: unit/integration/E2E/visual/a11y. Use for Playwright MCP browser automation, Vitest/Jest/RTL, flaky test triage, CI stabilization, and canvas/WebGL games (Phaser) needing deterministic input plus screenshot/state assertions. Trigger: "test", "E2E", "flaky", "visual regression", "Playwright", "game testing".

23 stars

Best use case

playwright-testing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Plan, implement, and debug frontend tests: unit/integration/E2E/visual/a11y. Use for Playwright MCP browser automation, Vitest/Jest/RTL, flaky test triage, CI stabilization, and canvas/WebGL games (Phaser) needing deterministic input plus screenshot/state assertions. Trigger: "test", "E2E", "flaky", "visual regression", "Playwright", "game testing".

Teams using playwright-testing should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/playwright-testing/SKILL.md --create-dirs "https://raw.githubusercontent.com/chongdashu/threejs-tactics-game/main/.claude/skills/playwright-testing/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/playwright-testing/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How playwright-testing Compares

Feature / Agentplaywright-testingStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Plan, implement, and debug frontend tests: unit/integration/E2E/visual/a11y. Use for Playwright MCP browser automation, Vitest/Jest/RTL, flaky test triage, CI stabilization, and canvas/WebGL games (Phaser) needing deterministic input plus screenshot/state assertions. Trigger: "test", "E2E", "flaky", "visual regression", "Playwright", "game testing".

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Frontend Testing

Unlock reliable confidence fast: enable safe refactors by choosing the right test layer, making the app observable, and eliminating nondeterminism so failures are actionable.

## Philosophy: Confidence Per Minute

Frontend tests fail for two reasons: the product is broken, or the test is lying. Your job is to maximize signal and minimize "test is lying".

**Before writing a test, ask**:
- What user risk am I covering (money, progression, auth, data loss, crashes)?
- What's the narrowest layer that catches this bug class (pure logic vs UI vs full browser)?
- What nondeterminism exists (time, RNG, async loading, network, animations, fonts, GPU)?
- What "ready" signal can I wait on besides `setTimeout`?
- What should a failure print/screenshot so it's diagnosable in CI?

**Core principles**:
1. **Test the contract, not the implementation**: assert stable user-meaningful outcomes and public seams.
2. **Prefer determinism over retries**: make time/RNG/network controllable; remove flake at the source.
3. **Observe like a debugger**: console errors, network failures, screenshots, and state dumps on failure.
4. **One critical flow first**: a reliable smoke test beats 50 flaky tests.

## Test Layer Decision Tree

Pick the cheapest layer that provides needed confidence:

| Layer | Speed | Use For |
|-------|-------|---------|
| **Unit** | Fastest | Pure functions, reducers, validators, math, pathfinding, deterministic simulation |
| **Component** | Medium | UI behavior with mocked IO (React Testing Library, Vue Testing Library) |
| **E2E** | Slowest | Critical user flows across routing, storage, real bundling/runtime |
| **Visual** | Specialized | Layout/pixel regressions; for canvas/WebGL, only after locking determinism |

## Quick Start: First Smoke Test

1. **Define 1 critical flow**: "page loads → user can start → one key action works"
2. **Add a test seam** to the app (see below)
3. **Choose runner**: Playwright MCP for E2E, unit tests for logic
4. **Fail loudly**: treat console errors and failed requests as test failures
5. **Stabilize**: seed RNG, freeze time, fix viewport, disable animations

## Concrete MCP Workflow: Testing a Game

Step-by-step sequence for testing a Phaser/canvas game:

```
1. mcp__playwright__browser_navigate
   → http://localhost:3000?test=1&seed=42

2. mcp__playwright__browser_evaluate
   → () => new Promise(r => { const c = () => window.__TEST__?.ready ? r(true) : setTimeout(c, 100); c(); })
   (Wait for game ready)

3. mcp__playwright__browser_console_messages
   → level: "error"
   (Fail if any errors)

4. mcp__playwright__browser_snapshot
   → Get UI state and refs

5. mcp__playwright__browser_click
   → element: "Start Button", ref: [from snapshot]

6. mcp__playwright__browser_evaluate
   → () => window.__TEST__.state()
   (Assert game state is correct)

7. mcp__playwright__browser_press_key
   → key: "ArrowRight" (or WASD for movement)

8. mcp__playwright__browser_evaluate
   → () => window.__TEST__.state().player.x
   (Verify movement happened)

9. mcp__playwright__browser_take_screenshot
   → filename: "gameplay-state.png"
   (Visual evidence after deterministic setup)
```

## Recommended Test Seams

Add to the app for testability (read-only, stable, minimal):

```javascript
window.__TEST__ = {
  ready: false,           // true after first interactive frame
  seed: null,             // current RNG seed
  sceneKey: null,         // current scene/route
  state: () => ({         // JSON-serializable snapshot
    scene: this.sceneKey,
    player: { x, y, hp },
    score: gameState.score,
    entities: entities.map(e => ({ id: e.id, type: e.type, x: e.x, y: e.y }))
  }),
  commands: {             // optional mutation commands
    reset: () => {},
    seed: (n) => {},
    skipIntro: () => {}
  }
};
```

**Rule**: Expose IDs + essential fields, not raw Phaser/engine objects.

## Anti-Patterns to Avoid

❌ **Testing the wrong layer**: E2E tests for pure logic
*Why tempting*: "Let's just test everything through the browser"
*Better*: Unit tests for logic; reserve E2E for integration contracts

❌ **Testing implementation details**: Asserting DOM structure/classnames
*Why tempting*: Easy to assert what you can see in DevTools
*Better*: Assert user-meaningful outputs (text, score, HP changes)

❌ **Sleep-driven tests**: `wait 2s then click`
*Why tempting*: Simple and "works on my machine"
*Better*: Wait on explicit readiness (DOM marker, `window.__TEST__.ready`)

❌ **Uncontrolled randomness**: RNG/time in assertions
*Why tempting*: "The game uses random, so the test should too"
*Better*: Seed RNG (`?seed=42`), freeze time, assert stable invariants

❌ **Pixel snapshots without determinism**: Canvas screenshots that flake
*Why tempting*: "I'll catch visual bugs automatically"
*Better*: Deterministic mode first; then screenshot at known stable frames

❌ **Retries as a strategy**: "Just bump retries to 3"
*Why tempting*: Quick fix that makes CI green
*Better*: Fix the flake source; retries hide real problems

## Debugging Failed Tests

When a test fails, gather evidence in this order:

1. **Console errors**: `mcp__playwright__browser_console_messages({ level: "error" })`
2. **Network failures**: `mcp__playwright__browser_network_requests()` → check for non-2xx
3. **Screenshot**: `mcp__playwright__browser_take_screenshot()` → visual state at failure
4. **App state**: `mcp__playwright__browser_evaluate({ function: "() => window.__TEST__.state()" })`
5. **Classify the flake** (see references/flake-reduction.md):
   - Readiness? → add explicit wait
   - Timing? → control animation/physics
   - Environment? → lock viewport/DPR
   - Data? → isolate test data

## Graduation Criteria: When Is Testing "Enough"?

Minimum viable test suite:
- [ ] **1 smoke test** that proves the app loads and primary action works
- [ ] **Test seam exists** (`window.__TEST__` with ready flag and state)
- [ ] **Deterministic mode** for canvas/games (`?test=1` enables seeding)
- [ ] **Console errors fail tests** (no silent failures)
- [ ] **CI runs tests** on every push

Level up when:
- Critical paths (auth, payment, save/load) have dedicated E2E
- Unit tests cover complex logic (pathfinding, damage calc, state machines)
- Visual regression on key screens (menu, HUD) with locked determinism

## Visual Regression with imgdiff.py

For pixel comparison of screenshots:

```bash
# Compare baseline to current
python scripts/imgdiff.py baseline.png current.png --out diff.png

# Allow small tolerance (anti-aliasing differences)
python scripts/imgdiff.py baseline.png current.png --max-rms 2.0
```

Exit codes: 0 = identical, 1 = different, 2 = error

## UI Slicing Regressions (Nine-Slice / Ribbons / Bars)

Canvas UI issues (panel seams, segmented ribbons, invisible HUD fills) are best caught with a dedicated UI harness instead of the full gameplay flow.

1. Build a simple `test.html`/scene that loads *only* the UI assets.
2. Render raw slices next to assembled panels (multi-size), and include ribbon/bars with both “raw crop + scale” and “stitched multi-slice” views.
3. Expose `window.__TEST__` with `.commands.showTest(n)` so Playwright can toggle each mode deterministically.
4. Capture targeted screenshots (panels, ribbons, bars) and diff them in CI.

See `references/phaser-canvas-testing.md` for the deterministic setup + screenshot workflow.

## Variation Guidance

Adapt approach based on context:
- **DOM app**: Standard Playwright selectors, wait for text/elements
- **Canvas game**: Test seams mandatory, wait via `window.__TEST__.ready`
- **Hybrid**: DOM for menus, test seams for gameplay
- **CI-only GPU**: May need software rendering flags or skip visual tests
- **UI slicing regressions**: For nine-slice/ribbon/bar artifacts, prefer a small UI harness scene/page with deterministic modes and targeted screenshots (`references/phaser-canvas-testing.md`).

## Bundled Resources

Read these when needed:
- `references/playwright-mcp-cheatsheet.md`: Detailed MCP tool patterns
- `references/phaser-canvas-testing.md`: Deterministic mode for Phaser games
- `references/flake-reduction.md`: Flake classification and fixes

## Remember

You can make almost any frontend (including canvas/WebGL games) testable by adding a tiny, stable seam for readiness + state. One reliable smoke test is the foundation. Aim for tests that are boring to maintain: deterministic, explicit about readiness, and rich in failure evidence. The goal is confidence, not coverage numbers.

Related Skills

threejs-builder

23
from chongdashu/threejs-tactics-game

Creates simple Three.js web apps with scene setup, lighting, geometries, materials animations, and responsive rendering. Use for: "Create a threejs scene/app/showcase" or when user wants 3D web content. Supports ES modules, modern Three.js r150+ APIs.

swift-protocol-di-testing

144923
from affaan-m/everything-claude-code

基于协议的依赖注入,用于可测试的Swift代码——使用聚焦协议和Swift Testing模拟文件系统、网络和外部API。

DevelopmentClaude

perl-testing

144923
from affaan-m/everything-claude-code

使用Test2::V0、Test::More、prove runner、模拟、Devel::Cover覆盖率和TDD方法的Perl测试模式。

DevelopmentClaude

ai-regression-testing

144923
from affaan-m/everything-claude-code

AI辅助开发的回归测试策略。沙盒模式API测试,无需依赖数据库,自动化的缺陷检查工作流程,以及捕捉AI盲点的模式,其中同一模型编写和审查代码。

Software TestingClaudeCursorCodex

rust-testing

144923
from affaan-m/everything-claude-code

Rust testing patterns including unit tests, integration tests, async testing, property-based testing, mocking, and coverage. Follows TDD methodology.

DevelopmentClaude

kotlin-testing

144923
from affaan-m/everything-claude-code

Kotest, MockK, coroutine testi, property-based testing ve Kover coverage ile Kotlin test kalıpları. İdiomatic Kotlin uygulamalarıyla TDD metodolojisini takip eder.

DevelopmentClaude

cpp-testing

144923
from affaan-m/everything-claude-code

C++ テストの作成/更新/修正、GoogleTest/CTest の設定、失敗またはフレーキーなテストの診断、カバレッジ/サニタイザーの追加時にのみ使用します。

DevelopmentClaude

python-testing

144923
from affaan-m/everything-claude-code

Python testing best practices using pytest including fixtures, parametrization, mocking, coverage analysis, async testing, and test organization. Use when writing or improving Python tests.

DevelopmentClaude

golang-testing

144923
from affaan-m/everything-claude-code

Go testing best practices including table-driven tests, test helpers, benchmarking, race detection, coverage analysis, and integration testing patterns. Use when writing or improving Go tests.

DevelopmentClaude

e2e-testing

144923
from affaan-m/everything-claude-code

Playwright E2E testing patterns, Page Object Model, configuration, CI/CD integration, artifact management, and flaky test strategies.

Software TestingClaude

k6-load-testing

31392
from sickn33/antigravity-awesome-skills

Comprehensive k6 load testing skill for API, browser, and scalability testing. Write realistic load scenarios, analyze results, and integrate with CI/CD.

Software TestingClaudeCursorGemini

javascript-testing-patterns

31392
from sickn33/antigravity-awesome-skills

Comprehensive guide for implementing robust testing strategies in JavaScript/TypeScript applications using modern testing frameworks and best practices.