browser

Web browser automation with AI-optimized snapshots for claude-flow agents

41 stars

byspencermarx

View on GitHub Installation ↓

Best use case

browser is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Web browser automation with AI-optimized snapshots for claude-flow agents

Teams using browser should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/browser/SKILL.md --create-dirs "https://raw.githubusercontent.com/spencermarx/open-code-review/main/.claude/skills/browser/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/browser/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How browser Compares

Feature / Agent	browser	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Web browser automation with AI-optimized snapshots for claude-flow agents

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

SKILL.md Source

# Browser Automation Skill

Web browser automation using agent-browser with AI-optimized snapshots. Reduces context by 93% using element refs (@e1, @e2) instead of full DOM.

## Core Workflow

```bash
# 1. Navigate to page
agent-browser open <url>

# 2. Get accessibility tree with element refs
agent-browser snapshot -i    # -i = interactive elements only

# 3. Interact using refs from snapshot
agent-browser click @e2
agent-browser fill @e3 "text"

# 4. Re-snapshot after page changes
agent-browser snapshot -i
```

## Quick Reference

### Navigation
| Command | Description |
|---------|-------------|
| `open <url>` | Navigate to URL |
| `back` | Go back |
| `forward` | Go forward |
| `reload` | Reload page |
| `close` | Close browser |

### Snapshots (AI-Optimized)
| Command | Description |
|---------|-------------|
| `snapshot` | Full accessibility tree |
| `snapshot -i` | Interactive elements only (buttons, links, inputs) |
| `snapshot -c` | Compact (remove empty elements) |
| `snapshot -d 3` | Limit depth to 3 levels |
| `screenshot [path]` | Capture screenshot (base64 if no path) |

### Interaction
| Command | Description |
|---------|-------------|
| `click <sel>` | Click element |
| `fill <sel> <text>` | Clear and fill input |
| `type <sel> <text>` | Type with key events |
| `press <key>` | Press key (Enter, Tab, etc.) |
| `hover <sel>` | Hover element |
| `select <sel> <val>` | Select dropdown option |
| `check/uncheck <sel>` | Toggle checkbox |
| `scroll <dir> [px]` | Scroll page |

### Get Info
| Command | Description |
|---------|-------------|
| `get text <sel>` | Get text content |
| `get html <sel>` | Get innerHTML |
| `get value <sel>` | Get input value |
| `get attr <sel> <attr>` | Get attribute |
| `get title` | Get page title |
| `get url` | Get current URL |

### Wait
| Command | Description |
|---------|-------------|
| `wait <selector>` | Wait for element |
| `wait <ms>` | Wait milliseconds |
| `wait --text "text"` | Wait for text |
| `wait --url "pattern"` | Wait for URL |
| `wait --load networkidle` | Wait for load state |

### Sessions
| Command | Description |
|---------|-------------|
| `--session <name>` | Use isolated session |
| `session list` | List active sessions |

## Selectors

### Element Refs (Recommended)
```bash
# Get refs from snapshot
agent-browser snapshot -i
# Output: button "Submit" [ref=e2]

# Use ref to interact
agent-browser click @e2
```

### CSS Selectors
```bash
agent-browser click "#submit"
agent-browser fill ".email-input" "test@test.com"
```

### Semantic Locators
```bash
agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"
agent-browser find testid "login-btn" click
```

## Examples

### Login Flow
```bash
agent-browser open https://example.com/login
agent-browser snapshot -i
agent-browser fill @e2 "user@example.com"
agent-browser fill @e3 "password123"
agent-browser click @e4
agent-browser wait --url "**/dashboard"
```

### Form Submission
```bash
agent-browser open https://example.com/contact
agent-browser snapshot -i
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser fill @e3 "Hello, this is my message"
agent-browser click @e4
agent-browser wait --text "Thank you"
```

### Data Extraction
```bash
agent-browser open https://example.com/products
agent-browser snapshot -i
# Iterate through product refs
agent-browser get text @e1  # Product name
agent-browser get text @e2  # Price
agent-browser get attr @e3 href  # Link
```

### Multi-Session (Swarm)
```bash
# Session 1: Navigator
agent-browser --session nav open https://example.com
agent-browser --session nav state save auth.json

# Session 2: Scraper (uses same auth)
agent-browser --session scrape state load auth.json
agent-browser --session scrape open https://example.com/data
agent-browser --session scrape snapshot -i
```

## Integration with Claude Flow

### MCP Tools
All browser operations are available as MCP tools with `browser/` prefix:
- `browser/open`
- `browser/snapshot`
- `browser/click`
- `browser/fill`
- `browser/screenshot`
- etc.

### Memory Integration
```bash
# Store successful patterns
npx @claude-flow/cli memory store --namespace browser-patterns --key "login-flow" --value "snapshot->fill->click->wait"

# Retrieve before similar task
npx @claude-flow/cli memory search --query "login automation"
```

### Hooks
```bash
# Pre-browse hook (get context)
npx @claude-flow/cli hooks pre-edit --file "browser-task.ts"

# Post-browse hook (record success)
npx @claude-flow/cli hooks post-task --task-id "browse-1" --success true
```

## Tips

1. **Always use snapshots** - They're optimized for AI with refs
2. **Prefer `-i` flag** - Gets only interactive elements, smaller output
3. **Use refs, not selectors** - More reliable, deterministic
4. **Re-snapshot after navigation** - Page state changes
5. **Use sessions for parallel work** - Each session is isolated

Related Skills

ocr

from spencermarx/open-code-review

AI-powered multi-agent code review. Simulates a team of Principal Engineers reviewing code from different perspectives. Use when asked to review code, check a PR, analyze changes, or perform code review.

Verification & Quality Assurance

from spencermarx/open-code-review

Comprehensive truth scoring, code quality verification, and automatic rollback system with 0.95 accuracy threshold for ensuring high-quality agent outputs and codebase reliability.

V3 Swarm Coordination

from spencermarx/open-code-review

15-agent hierarchical mesh coordination for v3 implementation. Orchestrates parallel execution across security, core, and integration domains following 10 ADRs with 14-week timeline.

V3 Security Overhaul

from spencermarx/open-code-review

Complete security architecture overhaul for claude-flow v3. Addresses critical CVEs (CVE-1, CVE-2, CVE-3) and implements secure-by-default patterns. Use for security-first v3 implementation.

V3 Performance Optimization

from spencermarx/open-code-review

Achieve aggressive v3 performance targets: 2.49x-7.47x Flash Attention speedup, 150x-12,500x search improvements, 50-75% memory reduction. Comprehensive benchmarking and optimization suite.

V3 Memory Unification

from spencermarx/open-code-review

Unify 6+ memory systems into AgentDB with HNSW indexing for 150x-12,500x search improvements. Implements ADR-006 (Unified Memory Service) and ADR-009 (Hybrid Memory Backend).

V3 MCP Optimization

from spencermarx/open-code-review

MCP server optimization and transport layer enhancement for claude-flow v3. Implements connection pooling, load balancing, tool registry optimization, and performance monitoring for sub-100ms response times.

V3 Deep Integration

from spencermarx/open-code-review

Deep agentic-flow@alpha integration implementing ADR-001. Eliminates 10,000+ duplicate lines by building claude-flow as specialized extension rather than parallel implementation.

V3 DDD Architecture

from spencermarx/open-code-review

Domain-Driven Design architecture for claude-flow v3. Implements modular, bounded context architecture with clean separation of concerns and microkernel pattern.

V3 Core Implementation

from spencermarx/open-code-review

Core module implementation for claude-flow v3. Implements DDD domains, clean architecture patterns, dependency injection, and modular TypeScript codebase with comprehensive testing.

V3 CLI Modernization

from spencermarx/open-code-review

CLI modernization and hooks system enhancement for claude-flow v3. Implements interactive prompts, command decomposition, enhanced hooks integration, and intelligent workflow automation.

Swarm Orchestration

from spencermarx/open-code-review

Orchestrate multi-agent swarms with agentic-flow for parallel task execution, dynamic topology, and intelligent coordination. Use when scaling beyond single agents, implementing complex workflows, or building distributed AI systems.