visual-testing

Visual Regression Testing: tool comparison (Chromatic/Percy/Playwright screenshots/BackstopJS), pixel-diff vs AI-based comparison, baseline management, flakiness strategies (masks, tolerances, waitForLoadState), CI integration with GitHub Actions, and Storybook integration.

8 stars

Best use case

visual-testing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Visual Regression Testing: tool comparison (Chromatic/Percy/Playwright screenshots/BackstopJS), pixel-diff vs AI-based comparison, baseline management, flakiness strategies (masks, tolerances, waitForLoadState), CI integration with GitHub Actions, and Storybook integration.

Teams using visual-testing should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/visual-testing/SKILL.md --create-dirs "https://raw.githubusercontent.com/marvinrichter/clarc/main/skills/visual-testing/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/visual-testing/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How visual-testing Compares

Feature / Agentvisual-testingStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Visual Regression Testing: tool comparison (Chromatic/Percy/Playwright screenshots/BackstopJS), pixel-diff vs AI-based comparison, baseline management, flakiness strategies (masks, tolerances, waitForLoadState), CI integration with GitHub Actions, and Storybook integration.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Visual Testing

Visual regression testing — detecting unintended UI changes automatically.

## When to Activate

- Setting up visual regression testing for a UI project
- Choosing between Chromatic, Percy, and Playwright screenshots
- Configuring baseline management (when to update, when to flag)
- Debugging flaky visual tests (antialiasing, font loading, animations)
- Integrating visual tests into CI/CD pipeline
- Preventing cross-platform baseline drift caused by macOS vs. Linux font rendering differences
- Masking dynamic content (timestamps, user avatars, random banners) to eliminate false positives

---

## Concept: Visual Regression Testing

Visual regression testing captures screenshots and compares them against a baseline — flagging any visual change for review.

```
Baseline (approved) ──────────────── Current screenshot
       │                                     │
       └──────── pixel diff ─────────────────┘
                      │
                 Difference > threshold?
                 YES → Fail test / flag for review
                 NO  → Pass
```

**Types of comparison:**

| Approach | Tool | Accuracy | Noise |
|----------|------|---------|-------|
| Pixel-exact | Playwright, BackstopJS | High | High (fonts, AA, rendering) |
| AI-based | Chromatic, Percy | Ignores irrelevant diffs | Low |
| Perceptual hash | Custom | Medium | Medium |

---

## Tool Comparison

| Tool | Type | Storybook | Multi-browser | Pricing |
|------|------|-----------|--------------|---------|
| **Chromatic** | Cloud (AI-based) | ✅ First-class | ✅ | Free up to 5k snapshots/mo |
| **Percy** (BrowserStack) | Cloud (AI-based) | ✅ | ✅ | Free up to 5k/mo |
| **Playwright** `toHaveScreenshot` | In-repo (pixel-diff) | ⚠️ Via test-runner | ✅ | Free (open source) |
| **BackstopJS** | Self-hosted (pixel-diff) | ⚠️ Manual setup | ✅ (Chromium) | Free |

**Decision guide:**
- Has Storybook + wants UI review workflow → **Chromatic**
- Has Storybook + wants multi-browser + cloud → **Percy**
- No Storybook + already uses Playwright → **Playwright `toHaveScreenshot`**
- On-prem / no cloud allowed → **BackstopJS**

---

## Chromatic (Storybook)

### Setup

```bash
npm install --save-dev chromatic

# Get project token from chromatic.com
npx chromatic --project-token=<your-token>
```

### GitHub Actions

```yaml
# .github/workflows/chromatic.yml
name: Chromatic

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  chromatic:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for baseline comparison

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm

      - run: npm ci

      - name: Publish to Chromatic
        uses: chromaui/action@latest
        with:
          projectToken: ${{ secrets.CHROMATIC_PROJECT_TOKEN }}
          onlyChanged: true           # Only changed Stories
          exitZeroOnChanges: false    # Fail CI if visual changes found
          autoAcceptChanges: main     # Auto-accept changes on main branch
```

### PR Workflow

On PRs, Chromatic shows a visual diff UI:
- **Accept** — this change was intentional, update baseline
- **Deny** — this is a regression, must be fixed

```bash
# Update baselines locally (accept all current state)
npx chromatic --project-token=<token> --auto-accept-changes

# Build only changed stories (faster)
npx chromatic --project-token=<token> --only-changed
```

---

## Playwright `toHaveScreenshot`

### Setup

```bash
npm install --save-dev @playwright/test
npx playwright install
```

### Writing Visual Tests

```typescript
// tests/visual/homepage.spec.ts
import { test, expect } from '@playwright/test';

test.describe('Homepage', () => {
  test('renders correctly on desktop', async ({ page }) => {
    await page.goto('/');

    // Wait for everything to be stable
    await page.waitForLoadState('networkidle');
    await page.evaluate(() => document.fonts.ready);

    // Mask dynamic content (timestamps, avatars)
    await expect(page).toHaveScreenshot('homepage-desktop.png', {
      mask: [
        page.locator('[data-testid="last-updated"]'),
        page.locator('.user-avatar'),
      ],
      maxDiffPixelRatio: 0.01,  // Allow 1% pixel difference
    });
  });

  test('renders product card correctly', async ({ page }) => {
    await page.goto('/products');
    await page.waitForSelector('[data-testid="product-card"]');

    const card = page.locator('[data-testid="product-card"]').first();

    await expect(card).toHaveScreenshot('product-card.png', {
      animations: 'disabled',  // No animation artifacts
    });
  });

  test('mobile viewport', async ({ page }) => {
    await page.setViewportSize({ width: 375, height: 812 });
    await page.goto('/');
    await page.waitForLoadState('networkidle');

    await expect(page).toHaveScreenshot('homepage-mobile.png');
  });
});
```

### playwright.config.ts

```typescript
import { defineConfig, devices } from '@playwright/test';

export default defineConfig({
  testDir: './tests/visual',
  snapshotDir: './tests/visual/__snapshots__',
  updateSnapshots: 'none',  // Don't auto-update; use --update-snapshots flag

  use: {
    baseURL: 'http://localhost:3000',
    screenshot: 'only-on-failure',
  },

  // Run on multiple viewports
  projects: [
    {
      name: 'Desktop Chrome',
      use: { ...devices['Desktop Chrome'] },
    },
    {
      name: 'Mobile Safari',
      use: { ...devices['iPhone 13'] },
    },
  ],

  // Start dev server before tests
  webServer: {
    command: 'npm run dev',
    port: 3000,
    reuseExistingServer: !process.env.CI,
  },
});
```

### Update Baselines

```bash
# Create initial baselines or update intentionally
npx playwright test --update-snapshots

# Update only specific tests
npx playwright test homepage --update-snapshots

# Run comparison (CI — never update automatically)
npx playwright test
```

### CI Integration

```yaml
# .github/workflows/visual.yml
name: Visual Regression

on:
  push:
    branches: [main]
  pull_request:

jobs:
  visual:
    runs-on: ubuntu-latest  # Use SAME OS as baseline generation!
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with: { node-version: 20, cache: npm }

      - run: npm ci
      - run: npx playwright install --with-deps chromium

      - name: Run visual tests
        run: npx playwright test tests/visual/

      - name: Upload diff artifacts on failure
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: visual-diff
          path: tests/visual/__snapshots__/
          retention-days: 30
```

---

## BackstopJS (Open Source)

```bash
npm install --save-dev backstopjs

# Initialize
npx backstop init

# Generates backstop.json with default scenarios
```

```json
// backstop.json
{
  "id": "my-app",
  "viewports": [
    { "label": "desktop", "width": 1280, "height": 900 },
    { "label": "mobile", "width": 375, "height": 812 }
  ],
  "scenarios": [
    {
      "label": "Homepage",
      "url": "http://localhost:3000",
      "delay": 500,
      "hideSelectors": [".timestamp", ".avatar"],
      "misMatchThreshold": 0.1,
      "requireSameDimensions": true
    },
    {
      "label": "Product Card",
      "url": "http://localhost:3000/products",
      "selectors": ["[data-testid='product-card']"],
      "delay": 300,
      "misMatchThreshold": 0.5
    }
  ],
  "paths": {
    "bitmaps_reference": "backstop_data/bitmaps_reference",
    "bitmaps_test": "backstop_data/bitmaps_test",
    "html_report": "backstop_data/html_report"
  },
  "engine": "playwright",
  "report": ["browser", "CI"]
}
```

```bash
# Create baseline
npx backstop reference

# Run test
npx backstop test

# Approve new baseline (after intentional changes)
npx backstop approve
```

---

## Flakiness Prevention

Visual tests are the most flaky test type. Key strategies:

### 1. Wait for Stable State

```typescript
// WRONG: screenshot before fonts/network settle
await page.goto('/');
await expect(page).toHaveScreenshot();

// CORRECT: wait for everything
await page.goto('/');
await page.waitForLoadState('networkidle');
await page.evaluate(() => document.fonts.ready);
await page.waitForTimeout(200);  // Extra buffer for CSS animations
await expect(page).toHaveScreenshot({ animations: 'disabled' });
```

### 2. Mask Dynamic Content

```typescript
// Mask elements that change every render
await expect(page).toHaveScreenshot({
  mask: [
    page.locator('[data-testid="current-time"]'),
    page.locator('.random-promo-banner'),
    page.locator('[data-testid="session-id"]'),
  ],
});
```

### 3. Disable Animations

```typescript
// Option A: Playwright config
use: {
  launchOptions: {
    args: ['--force-prefers-reduced-motion'],  // Chromium
  },
}

// Option B: Inject CSS
await page.addStyleTag({
  content: `
    *, *::before, *::after {
      animation-duration: 0s !important;
      transition-duration: 0s !important;
    }
  `,
});

// Option C: toHaveScreenshot option
await expect(page).toHaveScreenshot({ animations: 'disabled' });
```

### 4. Platform Consistency

```bash
# Generate baselines on exactly the same OS as CI
# WRONG: generate on macOS, compare on Linux CI → font rendering differs

# CORRECT: Use Docker for local baseline generation
docker run --rm -v $(pwd):/work mcr.microsoft.com/playwright:v1.44.0-jammy bash -c \
  "cd /work && npm ci && npx playwright test --update-snapshots"
```

### 5. Tolerances

```typescript
// Small tolerance for anti-aliasing and sub-pixel rendering
await expect(page).toHaveScreenshot({
  maxDiffPixelRatio: 0.01,  // 1% of all pixels can differ
  // OR
  maxDiffPixels: 50,         // At most 50 pixels can differ
  // OR (Playwright 1.50+)
  threshold: 0.2,            // Per-pixel color difference threshold (0-1)
});
```

---

## Baseline Management Best Practices

```
DO commit baselines to git — they're your visual contract
DO update baselines intentionally (not automatically in CI)
DO run visual tests in the same Docker image as CI
DO organize by component/page, not randomly

DON'T auto-update baselines on every push
DON'T skip visual tests to "fix" CI
DON'T generate baselines on developer machines if CI uses different fonts/OS
```

```bash
# Baseline update workflow (team process):
# 1. Make UI change (intentional)
# 2. Update baselines locally
npx playwright test --update-snapshots

# 3. Review diff in PR — confirm change is intentional
# 4. Merge PR with updated snapshots committed
git add tests/visual/__snapshots__/
git commit -m "chore(visual): update baselines for new button style"
```

## Reference

- `storybook-patterns` — Storybook CSF3, play functions, Chromatic CI integration
- `e2e-testing` — Playwright E2E tests (functional, not visual)

Related Skills

visual-identity

8
from marvinrichter/clarc

Brand identity development: color palette construction (primary/secondary/semantic/neutral), logo concept brief writing, typeface pairings, brand voice definition, mood board direction, and Brand Guidelines document structure. Use when establishing or evolving a visual brand — not for implementing existing tokens.

typescript-testing

8
from marvinrichter/clarc

TypeScript testing patterns: Vitest for unit/integration, Playwright for E2E, MSW for API mocking, Testing Library for React components. Core TDD methodology for TypeScript/JavaScript projects.

swift-testing

8
from marvinrichter/clarc

Swift testing patterns: Swift Testing framework (Swift 6+), XCTest for UI tests, async/await test cases, actor testing, Combine testing, and XCUITest for UI automation. TDD for Swift/SwiftUI.

swift-protocol-di-testing

8
from marvinrichter/clarc

Protocol-based dependency injection for testable Swift code — mock file system, network, and external APIs using focused protocols and Swift Testing.

scala-testing

8
from marvinrichter/clarc

Scala testing with ScalaTest, MUnit, and ScalaCheck: FunSpec/FlatSpec test structure, property-based testing with forAll, mocking with MockitoSugar, Cats Effect testing with munit-cats-effect (runTest/IOSuite), ZIO Test, Testcontainers-Scala for database integration tests, and CI integration with sbt. Use when writing or reviewing Scala tests.

rust-testing

8
from marvinrichter/clarc

Rust testing patterns — unit tests with mockall, integration tests with sqlx transactions, HTTP handler testing (axum), benchmarks (criterion), property tests (proptest), fuzzing, and CI with cargo-nextest.

rust-testing-advanced

8
from marvinrichter/clarc

Advanced Rust testing anti-patterns and corrections — cfg(test) placement, expect() over unwrap(), mockall expectation ordering, executor mixing (#[tokio::test] vs block_on), PgPool isolation with

ruby-testing

8
from marvinrichter/clarc

RSpec testing patterns for Ruby and Rails — factories, mocks, request specs, feature specs, VCR, and SimpleCov coverage.

r-testing

8
from marvinrichter/clarc

R testing patterns: testthat 3e with expect_* assertions, snapshot testing, mocking with mockery and httptest2, covr code coverage, lintr static analysis, property-based testing with hedgehog, testing Shiny apps with shinytest2. Use when writing or reviewing R tests.

python-testing

8
from marvinrichter/clarc

Python testing strategies using pytest, TDD methodology, fixtures, mocking, and parametrization. Core testing fundamentals.

python-testing-advanced

8
from marvinrichter/clarc

Advanced Python testing — async testing with pytest-asyncio, exception/side-effect testing, test organization, common patterns (API, database, class methods), pytest configuration, and CLI reference. Extends python-testing.

php-testing

8
from marvinrichter/clarc

PHP testing patterns: PHPUnit 11 with mocks and data providers, Pest v3 with expectations and datasets, Laravel feature/HTTP tests with RefreshDatabase, Symfony WebTestCase, PHPStan static analysis, Infection mutation testing. Use when writing or reviewing PHP tests.