browser
Web browser automation with AI-optimized snapshots for claude-flow agents
Best use case
browser is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Web browser automation with AI-optimized snapshots for claude-flow agents
Teams using browser should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/browser/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How browser Compares
| Feature / Agent | browser | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Web browser automation with AI-optimized snapshots for claude-flow agents
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
# Browser Automation Skill Web browser automation using agent-browser with AI-optimized snapshots. Reduces context by 93% using element refs (@e1, @e2) instead of full DOM. ## Core Workflow ```bash # 1. Navigate to page agent-browser open <url> # 2. Get accessibility tree with element refs agent-browser snapshot -i # -i = interactive elements only # 3. Interact using refs from snapshot agent-browser click @e2 agent-browser fill @e3 "text" # 4. Re-snapshot after page changes agent-browser snapshot -i ``` ## Quick Reference ### Navigation | Command | Description | |---------|-------------| | `open <url>` | Navigate to URL | | `back` | Go back | | `forward` | Go forward | | `reload` | Reload page | | `close` | Close browser | ### Snapshots (AI-Optimized) | Command | Description | |---------|-------------| | `snapshot` | Full accessibility tree | | `snapshot -i` | Interactive elements only (buttons, links, inputs) | | `snapshot -c` | Compact (remove empty elements) | | `snapshot -d 3` | Limit depth to 3 levels | | `screenshot [path]` | Capture screenshot (base64 if no path) | ### Interaction | Command | Description | |---------|-------------| | `click <sel>` | Click element | | `fill <sel> <text>` | Clear and fill input | | `type <sel> <text>` | Type with key events | | `press <key>` | Press key (Enter, Tab, etc.) | | `hover <sel>` | Hover element | | `select <sel> <val>` | Select dropdown option | | `check/uncheck <sel>` | Toggle checkbox | | `scroll <dir> [px]` | Scroll page | ### Get Info | Command | Description | |---------|-------------| | `get text <sel>` | Get text content | | `get html <sel>` | Get innerHTML | | `get value <sel>` | Get input value | | `get attr <sel> <attr>` | Get attribute | | `get title` | Get page title | | `get url` | Get current URL | ### Wait | Command | Description | |---------|-------------| | `wait <selector>` | Wait for element | | `wait <ms>` | Wait milliseconds | | `wait --text "text"` | Wait for text | | `wait --url "pattern"` | Wait for URL | | `wait --load networkidle` | Wait for load state | ### Sessions | Command | Description | |---------|-------------| | `--session <name>` | Use isolated session | | `session list` | List active sessions | ## Selectors ### Element Refs (Recommended) ```bash # Get refs from snapshot agent-browser snapshot -i # Output: button "Submit" [ref=e2] # Use ref to interact agent-browser click @e2 ``` ### CSS Selectors ```bash agent-browser click "#submit" agent-browser fill ".email-input" "test@test.com" ``` ### Semantic Locators ```bash agent-browser find role button click --name "Submit" agent-browser find label "Email" fill "test@test.com" agent-browser find testid "login-btn" click ``` ## Examples ### Login Flow ```bash agent-browser open https://example.com/login agent-browser snapshot -i agent-browser fill @e2 "user@example.com" agent-browser fill @e3 "password123" agent-browser click @e4 agent-browser wait --url "**/dashboard" ``` ### Form Submission ```bash agent-browser open https://example.com/contact agent-browser snapshot -i agent-browser fill @e1 "John Doe" agent-browser fill @e2 "john@example.com" agent-browser fill @e3 "Hello, this is my message" agent-browser click @e4 agent-browser wait --text "Thank you" ``` ### Data Extraction ```bash agent-browser open https://example.com/products agent-browser snapshot -i # Iterate through product refs agent-browser get text @e1 # Product name agent-browser get text @e2 # Price agent-browser get attr @e3 href # Link ``` ### Multi-Session (Swarm) ```bash # Session 1: Navigator agent-browser --session nav open https://example.com agent-browser --session nav state save auth.json # Session 2: Scraper (uses same auth) agent-browser --session scrape state load auth.json agent-browser --session scrape open https://example.com/data agent-browser --session scrape snapshot -i ``` ## Integration with Claude Flow ### MCP Tools All browser operations are available as MCP tools with `browser/` prefix: - `browser/open` - `browser/snapshot` - `browser/click` - `browser/fill` - `browser/screenshot` - etc. ### Memory Integration ```bash # Store successful patterns npx @claude-flow/cli memory store --namespace browser-patterns --key "login-flow" --value "snapshot->fill->click->wait" # Retrieve before similar task npx @claude-flow/cli memory search --query "login automation" ``` ### Hooks ```bash # Pre-browse hook (get context) npx @claude-flow/cli hooks pre-edit --file "browser-task.ts" # Post-browse hook (record success) npx @claude-flow/cli hooks post-task --task-id "browse-1" --success true ``` ## Tips 1. **Always use snapshots** - They're optimized for AI with refs 2. **Prefer `-i` flag** - Gets only interactive elements, smaller output 3. **Use refs, not selectors** - More reliable, deterministic 4. **Re-snapshot after navigation** - Page state changes 5. **Use sessions for parallel work** - Each session is isolated
Related Skills
ocr
AI-powered multi-agent code review. Simulates a team of Principal Engineers reviewing code from different perspectives. Use when asked to review code, check a PR, analyze changes, or perform code review.
Verification & Quality Assurance
Comprehensive truth scoring, code quality verification, and automatic rollback system with 0.95 accuracy threshold for ensuring high-quality agent outputs and codebase reliability.
V3 Swarm Coordination
15-agent hierarchical mesh coordination for v3 implementation. Orchestrates parallel execution across security, core, and integration domains following 10 ADRs with 14-week timeline.
V3 Security Overhaul
Complete security architecture overhaul for claude-flow v3. Addresses critical CVEs (CVE-1, CVE-2, CVE-3) and implements secure-by-default patterns. Use for security-first v3 implementation.
V3 Performance Optimization
Achieve aggressive v3 performance targets: 2.49x-7.47x Flash Attention speedup, 150x-12,500x search improvements, 50-75% memory reduction. Comprehensive benchmarking and optimization suite.
V3 Memory Unification
Unify 6+ memory systems into AgentDB with HNSW indexing for 150x-12,500x search improvements. Implements ADR-006 (Unified Memory Service) and ADR-009 (Hybrid Memory Backend).
V3 MCP Optimization
MCP server optimization and transport layer enhancement for claude-flow v3. Implements connection pooling, load balancing, tool registry optimization, and performance monitoring for sub-100ms response times.
V3 Deep Integration
Deep agentic-flow@alpha integration implementing ADR-001. Eliminates 10,000+ duplicate lines by building claude-flow as specialized extension rather than parallel implementation.
V3 DDD Architecture
Domain-Driven Design architecture for claude-flow v3. Implements modular, bounded context architecture with clean separation of concerns and microkernel pattern.
V3 Core Implementation
Core module implementation for claude-flow v3. Implements DDD domains, clean architecture patterns, dependency injection, and modular TypeScript codebase with comprehensive testing.
V3 CLI Modernization
CLI modernization and hooks system enhancement for claude-flow v3. Implements interactive prompts, command decomposition, enhanced hooks integration, and intelligent workflow automation.
Swarm Orchestration
Orchestrate multi-agent swarms with agentic-flow for parallel task execution, dynamic topology, and intelligent coordination. Use when scaling beyond single agents, implementing complex workflows, or building distributed AI systems.