visual-review

Take screenshots of web pages and UI using Playwright for visual review and iteration

191 stars

bydsifry

View on GitHub Installation ↓

Best use case

visual-review is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Take screenshots of web pages and UI using Playwright for visual review and iteration

Teams using visual-review should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/visual-review/SKILL.md --create-dirs "https://raw.githubusercontent.com/dsifry/metaswarm/main/skills/visual-review/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/visual-review/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How visual-review Compares

Feature / Agent	visual-review	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Take screenshots of web pages and UI using Playwright for visual review and iteration

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Visual Review

## Purpose

Agents cannot see rendered web pages, presentations, or UI. This skill bridges that gap by using Playwright to capture screenshots of web pages (local files, localhost servers, or deployed URLs) so agents can visually inspect their work, identify layout issues, and iterate.

This is especially useful for:
- **Reveal.js presentations** — capture each slide individually
- **Web UIs** — verify layout, styling, and responsiveness
- **Landing pages** — check visual design matches intent
- **Email templates** — verify rendering

## Prerequisites

This skill requires **Playwright** to be available. Before first use, check and install if needed:

```bash
# Check if Playwright is available
npx playwright --version 2>/dev/null

# If not available, install it (one-time setup)
npx playwright install chromium
```

**Important:** Only the `chromium` browser is needed. Do not install all browsers — it wastes disk space and time.

If `npx playwright` fails entirely, the user may need to install it:
```bash
npm install -g playwright
npx playwright install chromium
```

## Workflow

### Phase 1: Setup

1. **Verify Playwright is available** — run `npx playwright --version`
2. **If not installed** — run `npx playwright install chromium` and inform the user
3. **Create output directory** — `mkdir -p /tmp/visual-review`
4. **Determine the target** — local file path, localhost URL, or deployed URL

### Phase 2: Capture Screenshots

#### For a single page:

```bash
npx playwright screenshot \
  --browser chromium \
  --viewport-size "1456,816" \
  --wait-for-timeout 3000 \
  "<URL_OR_FILE_PATH>" \
  /tmp/visual-review/page.png
```

**Parameters:**
- `--viewport-size "1456,816"` — 16:9 aspect ratio, good for presentations
- `--wait-for-timeout 3000` — wait 3 seconds for fonts, images, and animations to settle
- Use `file:///absolute/path/to/file.html` for local files
- Use full URLs for deployed sites

#### For Reveal.js presentations (multiple slides):

Capture each slide by appending the slide hash to the URL:

```bash
# Capture all slides (adjust count as needed)
for i in $(seq 0 15); do
  npx playwright screenshot \
    --browser chromium \
    --viewport-size "1456,816" \
    --wait-for-timeout 2000 \
    "<BASE_URL>#/$i" \
    "/tmp/visual-review/slide-$i.png"
done
```

**Tip:** To capture slides with all fragments visible, add `?fragments=true` or use JavaScript injection via a Playwright script. The simpler approach is to accept that fragment-heavy slides will appear in their initial (pre-click) state.

#### For responsive testing:

```bash
# Mobile
npx playwright screenshot --viewport-size "390,844" "<URL>" /tmp/visual-review/mobile.png

# Tablet
npx playwright screenshot --viewport-size "768,1024" "<URL>" /tmp/visual-review/tablet.png

# Desktop wide
npx playwright screenshot --viewport-size "1920,1080" "<URL>" /tmp/visual-review/desktop.png
```

### Phase 3: Show and Review Screenshots

Use the Read tool to analyze each screenshot:

```
Read /tmp/visual-review/slide-0.png
Read /tmp/visual-review/slide-1.png
...
```

**Showing screenshots to the user:** The agent can see screenshots via Read, but the user cannot. When the user wants to see what you see — or when you want to discuss a visual issue together — there are two approaches depending on the environment:

**Local machine (has a display):**

```bash
# Open a specific screenshot
open /tmp/visual-review/slide-2.png

# Open all screenshots at once
open /tmp/visual-review/slide-*.png
```

**Remote/headless machine (SSH, tmux, etc.):**

Serve the screenshots directory over HTTP so the user can view them in their local browser:

```bash
# Start a simple file server (runs in background)
# Use a high port (>10000) to avoid conflicts with dev tools
python3 -m http.server 18080 --directory /tmp/visual-review &

# User can now browse to:
#   http://<hostname>:18080/
#   http://<hostname>:18080/slide-2.png
#   etc.
```

Pick any open high-numbered port (18080, 19090, etc.) — low ports like 8080 are often taken by dev tools. This serves the entire `/tmp/visual-review/` directory with an auto-generated file listing. The user can click through all screenshots in their browser. Stop the server when done:

```bash
# Stop the background server
kill %1
# Or find and kill by port
lsof -ti:18080 | xargs kill
```

Use whichever approach fits the environment. This is useful when collaborating on visual issues, when the user asks what something looks like, or when you want confirmation on a subjective design choice.

For each screenshot, evaluate:
- **Layout** — Is content centered/aligned as intended?
- **Typography** — Are headings, body text, and code readable?
- **Colors** — Do accent colors, backgrounds, and contrasts look right?
- **Spacing** — Is there enough whitespace? Any cramped areas?
- **Content** — Is all expected content visible? (Note: fragment content may be hidden)
- **Responsiveness** — Does it work at the target viewport size?

### Phase 4: Document Findings

Create a findings summary:

```markdown
## Visual Review — [Project Name]

### Reviewed: [date]

| Slide/Page | Status | Notes |
|-----------|--------|-------|
| slide-0   | Good   | Title centered, accent color correct |
| slide-1   | Issue  | Text overflows right edge on mobile |
| ...       | ...    | ... |

### Issues to Fix
1. [Description of issue + which slide/page]
2. [...]

### Looks Good
- [Things that are working well]
```

### Phase 5: Fix and Re-capture

After making fixes:
1. Re-capture only the affected slides/pages
2. Review the new screenshots
3. Repeat until satisfied

## Reveal.js-Specific Tips

- **Slide numbering**: Reveal.js uses `#/0` for the first slide, `#/1` for the second, etc.
- **Vertical slides**: Use `#/horizontal/vertical` format (e.g., `#/2/1`)
- **Fragments**: Content with `class="fragment"` is hidden until clicked. Screenshots show the initial state. This is expected behavior, not a bug.
- **Speaker notes**: Not visible in screenshots (they appear in a separate window with `S` key)
- **Theme loading**: Use `--wait-for-timeout 3000` to ensure CDN-loaded themes and fonts are ready

## Common Mistakes

| Mistake | Why It's Wrong | Do This Instead |
|---------|---------------|-----------------|
| Not waiting for page load | Fonts/images may not render | Use `--wait-for-timeout 3000` |
| Using `http://` for local files | File URLs need `file:///` protocol | Use `file:///absolute/path` |
| Installing all Playwright browsers | Wastes time and disk | Only install `chromium` |
| Treating fragment-hidden content as a bug | Fragments are designed to reveal on click | Note it as expected |
| Forgetting to create output directory | Screenshots fail silently | Always `mkdir -p /tmp/visual-review` first |
| Taking too many screenshots at once | Floods context window | Review in batches of 4-6 |
| Not offering to `open` when discussing visuals | User can't see what you're describing | Use `open` when collaborating on visual issues |

## Integration Points

**Upstream:** Any skill that produces visual output (presentations, web UIs, email templates)

**Downstream:** Fixes feed back into the source files, then re-capture to verify

## Verification Checklist

- [ ] Playwright is installed and working
- [ ] Screenshots captured at correct viewport size
- [ ] All pages/slides reviewed
- [ ] Issues documented with specific descriptions
- [ ] Fixes verified with re-captured screenshots

Related Skills

plan-review-gate

191

from dsifry/metaswarm

Automatic adversarial review gate that spawns 3 independent reviewers in parallel after any plan is drafted - all must PASS before presenting to user

design-review-gate

191

from dsifry/metaswarm

Automatic review gate that runs after brainstorming completes - spawns PM, Architect, Designer, Security, and CTO agents in parallel, iterates until all approve

status

191

from dsifry/metaswarm

Diagnostic status report — shows metaswarm installation state, project setup, and potential issues

start

191

from dsifry/metaswarm

Use when starting work on any task, when the user mentions metaswarm, or when the user wants to begin tracked development work

setup

191

from dsifry/metaswarm

Interactive project setup — detects your project, configures metaswarm, writes project-local files

pr-shepherd

191

from dsifry/metaswarm

Monitor a PR through to merge — handle CI failures, review comments, and thread resolution automatically until all checks pass

orchestrated-execution

191

from dsifry/metaswarm

4-phase execution loop for work units - IMPLEMENT, VALIDATE, ADVERSARIAL REVIEW, COMMIT

migrate

191

from dsifry/metaswarm

Migrate from npm-installed metaswarm to the marketplace plugin — removes redundant files with safety checks

handling-pr-comments

191

from dsifry/metaswarm

Address PR review feedback systematically — fetch inline comments, review bodies, handle outside-diff-range comments, resolve threads with proper attribution

external-tools

191

from dsifry/metaswarm

Delegate implementation and review tasks to external AI CLI tools (Codex, Gemini) with cross-model adversarial review

create-issue

191

from dsifry/metaswarm

Create comprehensive GitHub issues with TDD plans, acceptance criteria, and agent instructions for autonomous PR lifecycle management

brainstorming-extension

191

from dsifry/metaswarm

Enforces design review gate after brainstorming — bridges superpowers:brainstorming into the metaswarm quality pipeline