Browser
Code-first Playwright automation via TypeScript scripts. USE WHEN writing reusable automation scripts, VERIFY phase (confirming a web change actually works), headless programmatic testing, or need token-efficient browser automation in code. NOT for quick one-off CLI tasks (use AgentBrowser), NOT for authenticated sites with saved logins (use ChromeMCP+WebExplore), NOT for documenting a UI into a spec (use WebExplore).
Best use case
Browser is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Code-first Playwright automation via TypeScript scripts. USE WHEN writing reusable automation scripts, VERIFY phase (confirming a web change actually works), headless programmatic testing, or need token-efficient browser automation in code. NOT for quick one-off CLI tasks (use AgentBrowser), NOT for authenticated sites with saved logins (use ChromeMCP+WebExplore), NOT for documenting a UI into a spec (use WebExplore).
Teams using Browser should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/browser/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How Browser Compares
| Feature / Agent | Browser | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Code-first Playwright automation via TypeScript scripts. USE WHEN writing reusable automation scripts, VERIFY phase (confirming a web change actually works), headless programmatic testing, or need token-efficient browser automation in code. NOT for quick one-off CLI tasks (use AgentBrowser), NOT for authenticated sites with saved logins (use ChromeMCP+WebExplore), NOT for documenting a UI into a spec (use WebExplore).
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
# Browser - Code-First Browser Automation
**Browser automation and web verification using code-first Playwright.**
---
## 🔌 File-Based MCP
This skill is a **file-based MCP** - a code-first API wrapper that replaces token-heavy MCP protocol calls.
**Why file-based?** Filter data in code BEFORE returning to model context = 99%+ token savings.
**Architecture:** See `$PAI_DIR/skills/CORE/SYSTEM/DOCUMENTATION/FileBasedMCPs.md`
---
## Quick Start
```typescript
import { PlaywrightBrowser } from '$PAI_DIR/skills/Browser/index.ts'
const browser = new PlaywrightBrowser()
await browser.launch()
await browser.navigate('https://example.com')
await browser.screenshot({ path: 'screenshot.png' })
await browser.close()
```
**Why This Approach:**
- MCP loads ~13,700 tokens at startup
- Code-first loads ~50-200 tokens per operation
- Full Playwright API access, not limited to 21 MCP tools
---
## Voice Notification
**When executing a Browser workflow, do BOTH:**
1. **Send voice notification**:
```bash
curl -s -X POST http://localhost:8888/notify \
-H "Content-Type: application/json" \
-d '{"message": "Running the Browser workflow"}' \
> /dev/null 2>&1 &
```
2. **Output text notification**:
```
Running the **Browser** workflow...
```
---
## Workflow Routing
| Trigger | Workflow |
|---------|----------|
| Navigate to URL, take screenshot | `Workflows/Screenshot.md` |
| Verify page loads correctly | `Workflows/VerifyPage.md` |
| Fill forms, interact with page | `Workflows/Interact.md` |
| Extract page content | `Workflows/Extract.md` |
---
## API Reference
### Navigation
```typescript
await browser.launch(options?) // Start browser
await browser.navigate(url) // Go to URL
await browser.goBack() // History back
await browser.goForward() // History forward
await browser.reload() // Refresh
browser.getUrl() // Current URL
await browser.getTitle() // Page title
await browser.close() // Shut down browser
```
### Capture
```typescript
await browser.screenshot({ path, fullPage, selector })
await browser.getVisibleText(selector?)
await browser.getVisibleHtml({ removeScripts, minify })
await browser.savePdf(path, { format })
await browser.getAccessibilityTree()
```
### Network Monitoring
```typescript
browser.getNetworkLogs(options?) // Get all network requests/responses
browser.getNetworkStats() // Get summary statistics
browser.clearNetworkLogs() // Clear captured logs
```
### Dialog Handling
```typescript
browser.setDialogHandler(auto, response?) // Configure auto-handling
browser.getPendingDialog() // Get current dialog info
await browser.handleDialog(action, promptText?) // Handle dialog manually
```
### Tab Management
```typescript
browser.getTabs() // List all open tabs
await browser.newTab(url?) // Open new tab
await browser.switchTab(index) // Switch to tab by index
await browser.closeTab() // Close current tab
```
### Interaction
```typescript
await browser.click(selector)
await browser.hover(selector)
await browser.fill(selector, value)
await browser.type(selector, text, delay?)
await browser.select(selector, value)
await browser.pressKey(key, selector?)
await browser.drag(source, target)
await browser.uploadFile(selector, path)
```
### Waiting
```typescript
await browser.waitForSelector(selector, { state, timeout })
await browser.waitForText(text, { state, timeout })
await browser.waitForNavigation({ url, timeout })
await browser.waitForNetworkIdle(timeout?)
await browser.wait(ms)
await browser.waitForResponse(urlPattern)
```
### JavaScript
```typescript
await browser.evaluate(script)
browser.getConsoleLogs({ type, search, limit, clear })
await browser.setUserAgent(ua)
```
### Viewport
```typescript
await browser.resize(width, height)
await browser.setDevice('iPhone 14')
```
### iFrame
```typescript
await browser.iframeClick(iframeSelector, elementSelector)
await browser.iframeFill(iframeSelector, elementSelector, value)
```
---
## VERIFY Phase Integration
**The Browser skill is MANDATORY for VERIFY phase of web changes.**
Before claiming ANY web change is "live" or "working":
1. Launch browser
2. Navigate to the EXACT URL
3. Verify the EXACT element that changed
4. Take screenshot as evidence
5. Close browser
```typescript
// VERIFY Phase Pattern
import { PlaywrightBrowser } from '$PAI_DIR/skills/Browser/index.ts'
const browser = new PlaywrightBrowser()
await browser.launch({ headless: true })
await browser.navigate('https://example.com/changed-page')
await browser.waitForSelector('.changed-element')
const text = await browser.getVisibleText('.changed-element')
await browser.screenshot({ path: '/tmp/verify.png' })
await browser.close()
console.log(`Verified: "${text}"`)
```
**If you haven't LOOKED at the rendered page, you CANNOT claim it works.**
---
## CLI Tool
**Location:** `Tools/Browse.ts`
```bash
# Open URL in visible browser
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts open <url>
# Take screenshot
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts screenshot <url> [path]
# Verify element exists
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts verify <url> <selector>
```
**Examples:**
```bash
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts open https://danielmiessler.com
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts screenshot https://example.com /tmp/shot.png
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts verify https://example.com "body"
```
---
## Examples
### Verify Page Loads
```bash
bun $PAI_DIR/skills/Browser/examples/verify-page.ts https://danielmiessler.com
```
### Take Screenshot
```bash
bun $PAI_DIR/skills/Browser/examples/screenshot.ts https://example.com screenshot.png
```
### Fill Form
```typescript
const browser = new PlaywrightBrowser()
await browser.launch()
await browser.navigate('https://example.com/form')
await browser.fill('#email', 'test@example.com')
await browser.fill('#password', 'secret')
await browser.click('button[type="submit"]')
await browser.waitForNavigation()
await browser.close()
```
---
## Alternative Implementations (Reference Only)
### Option A: Playwright MCP (Microsoft Official)
```bash
# npx @playwright/mcp@latest
# 25K GitHub stars, uses accessibility tree
# Pro: Official Microsoft support, well-maintained
# Con: 13,700 tokens at startup
```
### Option B: Chrome DevTools MCP (Google Official)
```bash
# npx @anthropic/chrome-devtools-mcp
# Best debugging capabilities, CDP protocol
# Pro: Deep browser internals access
# Con: Chrome-only, complex setup
```
### Option C: claude --chrome (Native Anthropic)
```bash
# claude --chrome
# Simplest option - built into Claude Code
# Pro: Zero configuration, native integration
# Con: Limited API compared to Playwright
```
### Option D: Stagehand (Browserbase)
```bash
# npx stagehand
# 19.9K stars, won Anthropic hackathon
# Pro: AI-native actions (act, extract, observe)
# Con: Emerging, less mature than Playwright
```
---
## Token Savings Comparison
| Approach | Tokens | Notes |
|----------|--------|-------|
| Playwright MCP | ~13,700 | Loaded at startup, always |
| Code-first | ~50-200 | Only what you use |
| **Savings** | **99%+** | Per operation |
---
## Full Documentation
**CLI Tool:** `Tools/Browse.ts`
**Implementation:** `README.md`
**API Reference:** `index.ts`
**Examples:** `examples/`Related Skills
agent-browser
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
k8s-browser
Browser automation for Kubernetes dashboards and web UIs. Use when interacting with Kubernetes Dashboard, Grafana, ArgoCD UI, or other web interfaces. Requires MCP_BROWSER_ENABLED=true.
browser-automation
Browser automation powers web testing, scraping, and AI agent interactions. The difference between a flaky script and a reliable system comes down to understanding selectors, waiting strategies, and anti-detection patterns. This skill covers Playwright (recommended) and Puppeteer, with patterns for testing, scraping, and agentic browser control. Key insight: Playwright won the framework war. Unless you need Puppeteer's stealth ecosystem or are Chrome-only, Playwright is the better choice in 202
live-web-page-browser
Use AgentPMT external API to run the Live Web Page Browser tool with wallet signatures, credits purchase, or credits earned from jobs.
agent-browser-upstream
Safely sync navigator's agent-browser fork with upstream vercel-labs/agent-browser, analyze changes, and generate integration documentation
bgo
Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.
partner-revenue-desk
Operating model for tracking, attributing, and accelerating partner-sourced revenue.
parallel-data-enrichment
Structured company and entity data enrichment using Parallel AI Task API with core/base processors. Returns typed JSON output. No binary install — requires PARALLEL_API_KEY in .env.local.
parallel-agents
Multi-agent orchestration patterns. Use when multiple independent tasks can run with different domain expertise or when comprehensive analysis requires multiple perspectives.
paper-writing-assistant
Assist in drafting research papers and meeting notes, enforcing academic rigor and formatting.
pandas-data-manipulation-rules
Focuses on pandas-specific rules for data manipulation, including method chaining, data selection using loc/iloc, and groupby operations.
pagent
Guide for using pagent - a PRD-to-code orchestration tool. Use when users ask how to use pagent, run agents, create PRDs, or transform requirements into code.