Browser

Code-first Playwright automation via TypeScript scripts. USE WHEN writing reusable automation scripts, VERIFY phase (confirming a web change actually works), headless programmatic testing, or need token-efficient browser automation in code. NOT for quick one-off CLI tasks (use AgentBrowser), NOT for authenticated sites with saved logins (use ChromeMCP+WebExplore), NOT for documenting a UI into a spec (use WebExplore).

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

Browser is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using Browser should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/browser/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/cli-automation/browser/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/browser/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How Browser Compares

Feature / Agent	Browser	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# Browser - Code-First Browser Automation

**Browser automation and web verification using code-first Playwright.**

---

## 🔌 File-Based MCP

This skill is a **file-based MCP** - a code-first API wrapper that replaces token-heavy MCP protocol calls.

**Why file-based?** Filter data in code BEFORE returning to model context = 99%+ token savings.

**Architecture:** See `$PAI_DIR/skills/CORE/SYSTEM/DOCUMENTATION/FileBasedMCPs.md`

---

## Quick Start

```typescript
import { PlaywrightBrowser } from '$PAI_DIR/skills/Browser/index.ts'

const browser = new PlaywrightBrowser()
await browser.launch()
await browser.navigate('https://example.com')
await browser.screenshot({ path: 'screenshot.png' })
await browser.close()
```

**Why This Approach:**
- MCP loads ~13,700 tokens at startup
- Code-first loads ~50-200 tokens per operation
- Full Playwright API access, not limited to 21 MCP tools

---

## Voice Notification

**When executing a Browser workflow, do BOTH:**

1. **Send voice notification**:
   ```bash
   curl -s -X POST http://localhost:8888/notify \
     -H "Content-Type: application/json" \
     -d '{"message": "Running the Browser workflow"}' \
     > /dev/null 2>&1 &
   ```

2. **Output text notification**:
   ```
   Running the **Browser** workflow...
   ```

---

## Workflow Routing

| Trigger | Workflow |
|---------|----------|
| Navigate to URL, take screenshot | `Workflows/Screenshot.md` |
| Verify page loads correctly | `Workflows/VerifyPage.md` |
| Fill forms, interact with page | `Workflows/Interact.md` |
| Extract page content | `Workflows/Extract.md` |

---

## API Reference

### Navigation
```typescript
await browser.launch(options?)      // Start browser
await browser.navigate(url)         // Go to URL
await browser.goBack()              // History back
await browser.goForward()           // History forward
await browser.reload()              // Refresh
browser.getUrl()                    // Current URL
await browser.getTitle()            // Page title
await browser.close()               // Shut down browser
```

### Capture
```typescript
await browser.screenshot({ path, fullPage, selector })
await browser.getVisibleText(selector?)
await browser.getVisibleHtml({ removeScripts, minify })
await browser.savePdf(path, { format })
await browser.getAccessibilityTree()
```

### Network Monitoring
```typescript
browser.getNetworkLogs(options?)    // Get all network requests/responses
browser.getNetworkStats()           // Get summary statistics
browser.clearNetworkLogs()          // Clear captured logs
```

### Dialog Handling
```typescript
browser.setDialogHandler(auto, response?)   // Configure auto-handling
browser.getPendingDialog()                   // Get current dialog info
await browser.handleDialog(action, promptText?)  // Handle dialog manually
```

### Tab Management
```typescript
browser.getTabs()                   // List all open tabs
await browser.newTab(url?)          // Open new tab
await browser.switchTab(index)      // Switch to tab by index
await browser.closeTab()            // Close current tab
```

### Interaction
```typescript
await browser.click(selector)
await browser.hover(selector)
await browser.fill(selector, value)
await browser.type(selector, text, delay?)
await browser.select(selector, value)
await browser.pressKey(key, selector?)
await browser.drag(source, target)
await browser.uploadFile(selector, path)
```

### Waiting
```typescript
await browser.waitForSelector(selector, { state, timeout })
await browser.waitForText(text, { state, timeout })
await browser.waitForNavigation({ url, timeout })
await browser.waitForNetworkIdle(timeout?)
await browser.wait(ms)
await browser.waitForResponse(urlPattern)
```

### JavaScript
```typescript
await browser.evaluate(script)
browser.getConsoleLogs({ type, search, limit, clear })
await browser.setUserAgent(ua)
```

### Viewport
```typescript
await browser.resize(width, height)
await browser.setDevice('iPhone 14')
```

### iFrame
```typescript
await browser.iframeClick(iframeSelector, elementSelector)
await browser.iframeFill(iframeSelector, elementSelector, value)
```

---

## VERIFY Phase Integration

**The Browser skill is MANDATORY for VERIFY phase of web changes.**

Before claiming ANY web change is "live" or "working":

1. Launch browser
2. Navigate to the EXACT URL
3. Verify the EXACT element that changed
4. Take screenshot as evidence
5. Close browser

```typescript
// VERIFY Phase Pattern
import { PlaywrightBrowser } from '$PAI_DIR/skills/Browser/index.ts'

const browser = new PlaywrightBrowser()
await browser.launch({ headless: true })
await browser.navigate('https://example.com/changed-page')
await browser.waitForSelector('.changed-element')
const text = await browser.getVisibleText('.changed-element')
await browser.screenshot({ path: '/tmp/verify.png' })
await browser.close()

console.log(`Verified: "${text}"`)
```

**If you haven't LOOKED at the rendered page, you CANNOT claim it works.**

---

## CLI Tool

**Location:** `Tools/Browse.ts`

```bash
# Open URL in visible browser
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts open <url>

# Take screenshot
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts screenshot <url> [path]

# Verify element exists
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts verify <url> <selector>
```

**Examples:**
```bash
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts open https://danielmiessler.com
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts screenshot https://example.com /tmp/shot.png
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts verify https://example.com "body"
```

---

## Examples

### Verify Page Loads

```bash
bun $PAI_DIR/skills/Browser/examples/verify-page.ts https://danielmiessler.com
```

### Take Screenshot

```bash
bun $PAI_DIR/skills/Browser/examples/screenshot.ts https://example.com screenshot.png
```

### Fill Form

```typescript
const browser = new PlaywrightBrowser()
await browser.launch()
await browser.navigate('https://example.com/form')
await browser.fill('#email', 'test@example.com')
await browser.fill('#password', 'secret')
await browser.click('button[type="submit"]')
await browser.waitForNavigation()
await browser.close()
```

---

## Alternative Implementations (Reference Only)

### Option A: Playwright MCP (Microsoft Official)
```bash
# npx @playwright/mcp@latest
# 25K GitHub stars, uses accessibility tree
# Pro: Official Microsoft support, well-maintained
# Con: 13,700 tokens at startup
```

### Option B: Chrome DevTools MCP (Google Official)
```bash
# npx @anthropic/chrome-devtools-mcp
# Best debugging capabilities, CDP protocol
# Pro: Deep browser internals access
# Con: Chrome-only, complex setup
```

### Option C: claude --chrome (Native Anthropic)
```bash
# claude --chrome
# Simplest option - built into Claude Code
# Pro: Zero configuration, native integration
# Con: Limited API compared to Playwright
```

### Option D: Stagehand (Browserbase)
```bash
# npx stagehand
# 19.9K stars, won Anthropic hackathon
# Pro: AI-native actions (act, extract, observe)
# Con: Emerging, less mature than Playwright
```

---

## Token Savings Comparison

| Approach | Tokens | Notes |
|----------|--------|-------|
| Playwright MCP | ~13,700 | Loaded at startup, always |
| Code-first | ~50-200 | Only what you use |
| **Savings** | **99%+** | Per operation |

---

## Full Documentation

**CLI Tool:** `Tools/Browse.ts`
**Implementation:** `README.md`
**API Reference:** `index.ts`
**Examples:** `examples/`

Related Skills

agent-browser

from diegosouzapw/awesome-omni-skill

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.

k8s-browser

from diegosouzapw/awesome-omni-skill

Browser automation for Kubernetes dashboards and web UIs. Use when interacting with Kubernetes Dashboard, Grafana, ArgoCD UI, or other web interfaces. Requires MCP_BROWSER_ENABLED=true.

browser-automation

from diegosouzapw/awesome-omni-skill

Browser automation powers web testing, scraping, and AI agent interactions. The difference between a flaky script and a reliable system comes down to understanding selectors, waiting strategies, and anti-detection patterns. This skill covers Playwright (recommended) and Puppeteer, with patterns for testing, scraping, and agentic browser control. Key insight: Playwright won the framework war. Unless you need Puppeteer's stealth ecosystem or are Chrome-only, Playwright is the better choice in 202

live-web-page-browser

from diegosouzapw/awesome-omni-skill

Use AgentPMT external API to run the Live Web Page Browser tool with wallet signatures, credits purchase, or credits earned from jobs.

agent-browser-upstream

from diegosouzapw/awesome-omni-skill

Safely sync navigator's agent-browser fork with upstream vercel-labs/agent-browser, analyze changes, and generate integration documentation

bgo

from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

partner-revenue-desk

from diegosouzapw/awesome-omni-skill

Operating model for tracking, attributing, and accelerating partner-sourced revenue.

parallel-data-enrichment

from diegosouzapw/awesome-omni-skill

Structured company and entity data enrichment using Parallel AI Task API with core/base processors. Returns typed JSON output. No binary install — requires PARALLEL_API_KEY in .env.local.

parallel-agents

from diegosouzapw/awesome-omni-skill

Multi-agent orchestration patterns. Use when multiple independent tasks can run with different domain expertise or when comprehensive analysis requires multiple perspectives.

paper-writing-assistant

from diegosouzapw/awesome-omni-skill

Assist in drafting research papers and meeting notes, enforcing academic rigor and formatting.

pandas-data-manipulation-rules

from diegosouzapw/awesome-omni-skill

Focuses on pandas-specific rules for data manipulation, including method chaining, data selection using loc/iloc, and groupby operations.

pagent

from diegosouzapw/awesome-omni-skill

Guide for using pagent - a PRD-to-code orchestration tool. Use when users ask how to use pagent, run agents, create PRDs, or transform requirements into code.