browser-testing-with-screenshots
Use when testing web applications with visual verification - automates Chrome browser interactions, element selection, and screenshot capture for confirming UI functionality
Best use case
browser-testing-with-screenshots is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Use when testing web applications with visual verification - automates Chrome browser interactions, element selection, and screenshot capture for confirming UI functionality
Teams using browser-testing-with-screenshots should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/browser-testing-with-screenshots/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How browser-testing-with-screenshots Compares
| Feature / Agent | browser-testing-with-screenshots | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Use when testing web applications with visual verification - automates Chrome browser interactions, element selection, and screenshot capture for confirming UI functionality
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Browser Testing with Screenshots
## Overview
**Automate Chrome browser testing with visual verification using browser-tools.** Connect to Chrome DevTools Protocol for navigation, interaction, and screenshot capture to confirm application functionality.
## Prerequisites
**REQUIRED:** Install agent-tools from https://github.com/badlogic/agent-tools
```bash
# Clone and install agent-tools
git clone https://github.com/badlogic/agent-tools.git
cd agent-tools
# Follow installation instructions in the repository
# Ensure all executables (browser-start.js, browser-nav.js, etc.) are in your PATH
```
**Verify installation:**
```bash
# Check that browser tools are available
which browser-start.js
which browser-nav.js
which browser-screenshot.js
```
All browser-* commands referenced in this skill come from the agent-tools repository and must be properly installed and accessible in your system PATH.
## When to Use
**Use this skill when:**
- Testing web application UI flows
- Verifying visual changes or layouts
- Automating repetitive browser interactions
- Documenting application behavior with screenshots
- Testing localhost applications during development
- Need to interact with elements that require human-like selection
**Don't use for:**
- API testing (use direct HTTP calls)
- Headless testing where visuals don't matter
- Simple page content validation (use curl/wget)
## Quick Reference
| Task | Command | Purpose |
|------|---------|---------|
| Start browser | `browser-start.js` | Launch Chrome with debugging |
| Navigate | `browser-nav.js http://localhost:5172/dashboard` | Go to specific URL |
| Take screenshot | `browser-screenshot.js` | Capture current viewport |
| Pick elements | `browser-pick.js "Select the login button"` | Interactive element selection |
| Run JavaScript | `browser-eval.js 'document.title'` | Execute code in page context |
| Extract content | `browser-content.js` | Get readable page content |
| View cookies | `browser-cookies.js` | List session cookies |
## Setup and Basic Workflow
### 1. Start Chrome with Remote Debugging
```bash
# Launch Chrome with debugging enabled (preserves user profile)
browser-start.js
# Or start fresh (no cookies, clean state)
browser-start.js --fresh
```
**Expected Result**: Chrome opens on port 9222 with DevTools Protocol enabled
### 2. Navigate to Application
```bash
# Go to your application starting point
browser-nav.js http://localhost:5172/dashboard
```
**Verify**: Browser navigates to dashboard page
### 3. Capture Baseline Screenshot
```bash
# Take initial screenshot to confirm page loaded
browser-screenshot.js
```
**Output**: Returns path to screenshot file (e.g., `screenshot_20231203_141532.png`)
## Testing Workflow with Screenshots
### Complete Test Scenario Example
```bash
#!/bin/bash
# Test login and dashboard functionality
echo "🚀 Starting browser test..."
# 1. Launch browser
browser-start.js --fresh
# 2. Navigate to login page
browser-nav.js http://localhost:5172/login
sleep 2
# 3. Take screenshot of login page
LOGIN_SHOT=$(browser-screenshot.js)
echo "📸 Login page: $LOGIN_SHOT"
# 4. Fill login form (interactive element picking)
browser-pick.js "Click the username field"
browser-eval.js 'document.activeElement.value = "testuser"'
browser-pick.js "Click the password field"
browser-eval.js 'document.activeElement.value = "password123"'
# 5. Screenshot filled form
FORM_SHOT=$(browser-screenshot.js)
echo "📸 Filled form: $FORM_SHOT"
# 6. Submit form
browser-pick.js "Click the login button"
sleep 3
# 7. Verify dashboard loaded
browser-nav.js http://localhost:5172/dashboard
DASHBOARD_SHOT=$(browser-screenshot.js)
echo "📸 Dashboard: $DASHBOARD_SHOT"
# 8. Verify specific dashboard elements
browser-pick.js "Select the navigation menu"
browser-eval.js 'console.log("Navigation found:", !!document.querySelector(".nav"))'
echo "✅ Test complete. Screenshots saved."
```
### Element Interaction Pattern
```bash
# Interactive element selection (best for dynamic content)
browser-pick.js "Select the submit button"
# User clicks element in browser → returns CSS selector
# Use returned selector for automation
SELECTOR=$(browser-pick.js "Select the submit button" | grep "selector:")
browser-eval.js "document.querySelector('$SELECTOR').click()"
# Take screenshot to verify action
browser-screenshot.js
```
## Advanced Usage
### JavaScript Evaluation for Complex Interactions
```bash
# Check if element exists before interaction
browser-eval.js 'document.querySelector("#login-form") !== null'
# Wait for dynamic content
browser-eval.js '
new Promise(resolve => {
const check = () => {
if (document.querySelector(".loaded")) resolve(true);
else setTimeout(check, 100);
};
check();
})
'
# Extract form data
browser-eval.js 'JSON.stringify(Object.fromEntries(new FormData(document.querySelector("form"))))'
```
### Screenshot with Timing
```bash
# Navigate and wait before screenshot
browser-nav.js http://localhost:5172/slow-page
sleep 5 # Wait for animations/loading
browser-screenshot.js
```
### Content Extraction for Verification
```bash
# Get page title
PAGE_TITLE=$(browser-eval.js 'document.title')
echo "Current page: $PAGE_TITLE"
# Extract readable content
browser-content.js > page_content.md
# Check for specific text
browser-eval.js 'document.body.textContent.includes("Welcome to Dashboard")'
```
## Common Mistakes
| Mistake | Problem | Solution |
|---------|---------|----------|
| No sleep after navigation | Screenshots of loading page | Add `sleep 2-5` after nav |
| Hardcoded selectors | Breaks when UI changes | Use `browser-pick.js` for selection |
| Missing Chrome setup | "Connection refused" errors | Run `browser-start.js` first |
| Wrong localhost port | Navigation fails | Verify application is running on correct port |
| Screenshot timing | Captures before content loads | Wait for page load or specific elements |
| Not preserving state | Login lost between commands | Use default profile, not `--fresh` |
## Error Handling
```bash
# Check if Chrome is running
if ! browser-eval.js 'true' 2>/dev/null; then
echo "❌ Chrome not connected. Running browser-start.js..."
browser-start.js
fi
# Verify navigation succeeded
if browser-eval.js 'location.href.includes("dashboard")'; then
echo "✅ Navigation successful"
else
echo "❌ Navigation failed"
exit 1
fi
```
## File Output Patterns
- **Screenshots**: `screenshot_YYYYMMDD_HHMMSS.png` in current directory
- **Content**: Markdown format via stdout from `browser-content.js`
- **Selectors**: CSS selectors from `browser-pick.js` interaction
- **JavaScript results**: JSON or string values from `browser-eval.js`
## Integration with Testing Frameworks
```bash
# Create test evidence directory
mkdir -p test-results/$(date +%Y%m%d_%H%M%S)
cd test-results/$(date +%Y%m%d_%H%M%S)
# Run tests with organized screenshots
browser-start.js
for page in login dashboard profile; do
browser-nav.js "http://localhost:5172/$page"
sleep 2
screenshot=$(browser-screenshot.js)
mv "$screenshot" "${page}_page.png"
echo "✅ $page page tested"
done
```
## Real-World Impact
**Benefits:**
- **Visual verification**: Screenshots provide immediate feedback on UI state
- **Interactive debugging**: Element picker works with dynamic/complex selectors
- **State preservation**: Maintains login sessions between commands
- **Evidence collection**: Automated screenshot capture for test documentation
- **Development workflow**: Quick verification of localhost changes
**Results:**
- Faster UI testing iteration (visual confirmation vs manual checking)
- Reliable element selection (human picks, automation uses)
- Test documentation with visual proof
- Catches visual regressions immediately
---
**Key principle:** Combine automated navigation with human element selection for robust, maintainable browser testing.Related Skills
agent-relay-orchestrator
Run headless multi-agent orchestration sessions via Agent Relay. Use when spawning teams of agents, creating channels for coordination, managing agent lifecycle, and running parallel workloads across Claude/Codex/Gemini/Pi/Droid agents.
adding-swarm-patterns
Use when adding new multi-agent coordination patterns to agent-relay - provides checklist for types, schema, templates, and docs updates
agent-relay
Use when you need Codex to coordinate multiple agents through Relaycast for peer-to-peer messaging, lead/worker handoffs, or shared status tracking across sub-agents and terminals.
openclaw-relay
Real-time messaging across OpenClaw instances (channels, DMs, threads, reactions, search).
prpm-json-best-practices
Best practices for structuring prpm.json package manifests with required fields, tags, organization, multi-package management, enhanced file format, eager/lazy activation, and conversion hints
implementing-command-palettes
Use when building Cmd+K command palettes in React - covers keyboard navigation with arrow keys, keeping selected items in view with scrollIntoView, filtering with shortcut matching, and preventing infinite re-renders from reference instability
github-oauth-nango-integration
Use when implementing GitHub OAuth + GitHub App authentication with Nango - provides two-connection pattern for user login and repo access with webhook handling
frontend-design
Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
deploying-to-staging-environment
Use when deploying changes to staging across relay, relay-dashboard, and relay-cloud repos - coordinates multi-repo branch syncing using git worktrees, automatically triggers staging deployments via GitHub Actions
debugging-websocket-issues
Use when seeing WebSocket errors like "Invalid frame header", "RSV1 must be clear", or "WS_ERR_UNEXPECTED_RSV_1" - covers multiple WebSocketServer conflicts, compression issues, and raw frame debugging techniques
creating-skills
Use when creating new Claude Code skills or improving existing ones - ensures skills are discoverable, scannable, and effective through proper structure, CSO optimization, and real examples
creating-claude-rules
Use when creating or fixing .claude/rules/ files - provides correct paths frontmatter (not globs), glob patterns, and avoids Cursor-specific fields like alwaysApply