browser
Web browser automation with AI-optimized snapshots for claude-flow agents
Best use case
browser is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Web browser automation with AI-optimized snapshots for claude-flow agents
Teams using browser should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/browser/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How browser Compares
| Feature / Agent | browser | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Web browser automation with AI-optimized snapshots for claude-flow agents
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Browser Automation Skill Web browser automation using agent-browser with AI-optimized snapshots. Reduces context by 93% using element refs (@e1, @e2) instead of full DOM. ## Core Workflow ```bash # 1. Navigate to page agent-browser open <url> # 2. Get accessibility tree with element refs agent-browser snapshot -i # -i = interactive elements only # 3. Interact using refs from snapshot agent-browser click @e2 agent-browser fill @e3 "text" # 4. Re-snapshot after page changes agent-browser snapshot -i ``` ## Quick Reference ### Navigation | Command | Description | |---------|-------------| | `open <url>` | Navigate to URL | | `back` | Go back | | `forward` | Go forward | | `reload` | Reload page | | `close` | Close browser | ### Snapshots (AI-Optimized) | Command | Description | |---------|-------------| | `snapshot` | Full accessibility tree | | `snapshot -i` | Interactive elements only (buttons, links, inputs) | | `snapshot -c` | Compact (remove empty elements) | | `snapshot -d 3` | Limit depth to 3 levels | | `screenshot [path]` | Capture screenshot (base64 if no path) | ### Interaction | Command | Description | |---------|-------------| | `click <sel>` | Click element | | `fill <sel> <text>` | Clear and fill input | | `type <sel> <text>` | Type with key events | | `press <key>` | Press key (Enter, Tab, etc.) | | `hover <sel>` | Hover element | | `select <sel> <val>` | Select dropdown option | | `check/uncheck <sel>` | Toggle checkbox | | `scroll <dir> [px]` | Scroll page | ### Get Info | Command | Description | |---------|-------------| | `get text <sel>` | Get text content | | `get html <sel>` | Get innerHTML | | `get value <sel>` | Get input value | | `get attr <sel> <attr>` | Get attribute | | `get title` | Get page title | | `get url` | Get current URL | ### Wait | Command | Description | |---------|-------------| | `wait <selector>` | Wait for element | | `wait <ms>` | Wait milliseconds | | `wait --text "text"` | Wait for text | | `wait --url "pattern"` | Wait for URL | | `wait --load networkidle` | Wait for load state | ### Sessions | Command | Description | |---------|-------------| | `--session <name>` | Use isolated session | | `session list` | List active sessions | ## Selectors ### Element Refs (Recommended) ```bash # Get refs from snapshot agent-browser snapshot -i # Output: button "Submit" [ref=e2] # Use ref to interact agent-browser click @e2 ``` ### CSS Selectors ```bash agent-browser click "#submit" agent-browser fill ".email-input" "test@test.com" ``` ### Semantic Locators ```bash agent-browser find role button click --name "Submit" agent-browser find label "Email" fill "test@test.com" agent-browser find testid "login-btn" click ``` ## Examples ### Login Flow ```bash agent-browser open https://example.com/login agent-browser snapshot -i agent-browser fill @e2 "user@example.com" agent-browser fill @e3 "password123" agent-browser click @e4 agent-browser wait --url "**/dashboard" ``` ### Form Submission ```bash agent-browser open https://example.com/contact agent-browser snapshot -i agent-browser fill @e1 "John Doe" agent-browser fill @e2 "john@example.com" agent-browser fill @e3 "Hello, this is my message" agent-browser click @e4 agent-browser wait --text "Thank you" ``` ### Data Extraction ```bash agent-browser open https://example.com/products agent-browser snapshot -i # Iterate through product refs agent-browser get text @e1 # Product name agent-browser get text @e2 # Price agent-browser get attr @e3 href # Link ``` ### Multi-Session (Swarm) ```bash # Session 1: Navigator agent-browser --session nav open https://example.com agent-browser --session nav state save auth.json # Session 2: Scraper (uses same auth) agent-browser --session scrape state load auth.json agent-browser --session scrape open https://example.com/data agent-browser --session scrape snapshot -i ``` ## Integration with Claude Flow ### MCP Tools All browser operations are available as MCP tools with `browser/` prefix: - `browser/open` - `browser/snapshot` - `browser/click` - `browser/fill` - `browser/screenshot` - etc. ### Memory Integration ```bash # Store successful patterns npx @claude-flow/cli memory store --namespace browser-patterns --key "login-flow" --value "snapshot->fill->click->wait" # Retrieve before similar task npx @claude-flow/cli memory search --query "login automation" ``` ### Hooks ```bash # Pre-browse hook (get context) npx @claude-flow/cli hooks pre-edit --file "browser-task.ts" # Post-browse hook (record success) npx @claude-flow/cli hooks post-task --task-id "browse-1" --success true ``` ## Tips 1. **Always use snapshots** - They're optimized for AI with refs 2. **Prefer `-i` flag** - Gets only interactive elements, smaller output 3. **Use refs, not selectors** - More reliable, deterministic 4. **Re-snapshot after navigation** - Page state changes 5. **Use sessions for parallel work** - Each session is isolated
Related Skills
qe-browser
Browser automation for QE agents using Vibium (WebDriver BiDi) with assertions, batch execution, visual diff, prompt-injection scanning, and semantic intents. Use when any QE skill needs to drive a real browser — visual testing, accessibility audits, E2E flow verification, pentest validation, or exploratory testing.
qe-visual-testing-advanced
Advanced visual regression testing with pixel-perfect comparison, AI-powered diff analysis, responsive design validation, and cross-browser visual consistency. Use when detecting UI regressions, validating designs, or ensuring visual consistency.
qe-verification-quality
Comprehensive truth scoring, code quality verification, and automatic rollback system with 0.95 accuracy threshold for ensuring high-quality agent outputs and codebase reliability.
qe-testability-scoring
AI-powered testability assessment using 10 principles of intrinsic testability with Playwright and optional Vibium integration. Evaluates web applications against Observability, Controllability, Algorithmic Simplicity, Transparency, Stability, Explainability, Unbugginess, Smallness, Decomposability, and Similarity. Use when assessing software testability, evaluating test readiness, identifying testability improvements, or generating testability reports.
qe-test-reporting-analytics
Advanced test reporting, quality dashboards, predictive analytics, trend analysis, and executive reporting for QE metrics. Use when communicating quality status, tracking trends, or making data-driven decisions.
qe-test-idea-rewriting
Transform passive 'Verify X' test descriptions into active, observable test actions. Use when test ideas lack specificity, use vague language, or fail quality validation. Converts to action-verb format for clearer, more testable descriptions.
qe-test-environment-management
Test environment provisioning, infrastructure as code for testing, Docker/Kubernetes for test environments, service virtualization, and cost optimization. Use when managing test infrastructure, ensuring environment parity, or optimizing testing costs.
qe-test-design-techniques
Systematic test design with boundary value analysis, equivalence partitioning, decision tables, state transition testing, and combinatorial testing. Use when designing comprehensive test cases, reducing redundant tests, or ensuring systematic coverage.
qe-test-data-management
Strategic test data generation, management, and privacy compliance. Use when creating test data, handling PII, ensuring GDPR/CCPA compliance, or scaling data generation for realistic testing scenarios.
qe-test-automation-strategy
Design and implement effective test automation with proper pyramid, patterns, and CI/CD integration. Use when building automation frameworks or improving test efficiency.
qe-technical-writing
Write clear, engaging technical content from real experience. Use when writing blog posts, documentation, tutorials, or technical articles.
qe-tdd-london-chicago
Apply London (mock-based) and Chicago (state-based) TDD schools. Use when practicing test-driven development or choosing testing style for your context.