browser

Web browser automation with AI-optimized snapshots for claude-flow agents

298 stars

byproffesor-for-testing

View on GitHub Installation ↓

Best use case

browser is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Web browser automation with AI-optimized snapshots for claude-flow agents

Teams using browser should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/browser/SKILL.md --create-dirs "https://raw.githubusercontent.com/proffesor-for-testing/agentic-qe/main/.claude/skills/browser/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/browser/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How browser Compares

Feature / Agent	browser	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Web browser automation with AI-optimized snapshots for claude-flow agents

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Browser Automation Skill

Web browser automation using agent-browser with AI-optimized snapshots. Reduces context by 93% using element refs (@e1, @e2) instead of full DOM.

## Core Workflow

```bash
# 1. Navigate to page
agent-browser open <url>

# 2. Get accessibility tree with element refs
agent-browser snapshot -i    # -i = interactive elements only

# 3. Interact using refs from snapshot
agent-browser click @e2
agent-browser fill @e3 "text"

# 4. Re-snapshot after page changes
agent-browser snapshot -i
```

## Quick Reference

### Navigation
| Command | Description |
|---------|-------------|
| `open <url>` | Navigate to URL |
| `back` | Go back |
| `forward` | Go forward |
| `reload` | Reload page |
| `close` | Close browser |

### Snapshots (AI-Optimized)
| Command | Description |
|---------|-------------|
| `snapshot` | Full accessibility tree |
| `snapshot -i` | Interactive elements only (buttons, links, inputs) |
| `snapshot -c` | Compact (remove empty elements) |
| `snapshot -d 3` | Limit depth to 3 levels |
| `screenshot [path]` | Capture screenshot (base64 if no path) |

### Interaction
| Command | Description |
|---------|-------------|
| `click <sel>` | Click element |
| `fill <sel> <text>` | Clear and fill input |
| `type <sel> <text>` | Type with key events |
| `press <key>` | Press key (Enter, Tab, etc.) |
| `hover <sel>` | Hover element |
| `select <sel> <val>` | Select dropdown option |
| `check/uncheck <sel>` | Toggle checkbox |
| `scroll <dir> [px]` | Scroll page |

### Get Info
| Command | Description |
|---------|-------------|
| `get text <sel>` | Get text content |
| `get html <sel>` | Get innerHTML |
| `get value <sel>` | Get input value |
| `get attr <sel> <attr>` | Get attribute |
| `get title` | Get page title |
| `get url` | Get current URL |

### Wait
| Command | Description |
|---------|-------------|
| `wait <selector>` | Wait for element |
| `wait <ms>` | Wait milliseconds |
| `wait --text "text"` | Wait for text |
| `wait --url "pattern"` | Wait for URL |
| `wait --load networkidle` | Wait for load state |

### Sessions
| Command | Description |
|---------|-------------|
| `--session <name>` | Use isolated session |
| `session list` | List active sessions |

## Selectors

### Element Refs (Recommended)
```bash
# Get refs from snapshot
agent-browser snapshot -i
# Output: button "Submit" [ref=e2]

# Use ref to interact
agent-browser click @e2
```

### CSS Selectors
```bash
agent-browser click "#submit"
agent-browser fill ".email-input" "test@test.com"
```

### Semantic Locators
```bash
agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"
agent-browser find testid "login-btn" click
```

## Examples

### Login Flow
```bash
agent-browser open https://example.com/login
agent-browser snapshot -i
agent-browser fill @e2 "user@example.com"
agent-browser fill @e3 "password123"
agent-browser click @e4
agent-browser wait --url "**/dashboard"
```

### Form Submission
```bash
agent-browser open https://example.com/contact
agent-browser snapshot -i
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser fill @e3 "Hello, this is my message"
agent-browser click @e4
agent-browser wait --text "Thank you"
```

### Data Extraction
```bash
agent-browser open https://example.com/products
agent-browser snapshot -i
# Iterate through product refs
agent-browser get text @e1  # Product name
agent-browser get text @e2  # Price
agent-browser get attr @e3 href  # Link
```

### Multi-Session (Swarm)
```bash
# Session 1: Navigator
agent-browser --session nav open https://example.com
agent-browser --session nav state save auth.json

# Session 2: Scraper (uses same auth)
agent-browser --session scrape state load auth.json
agent-browser --session scrape open https://example.com/data
agent-browser --session scrape snapshot -i
```

## Integration with Claude Flow

### MCP Tools
All browser operations are available as MCP tools with `browser/` prefix:
- `browser/open`
- `browser/snapshot`
- `browser/click`
- `browser/fill`
- `browser/screenshot`
- etc.

### Memory Integration
```bash
# Store successful patterns
npx @claude-flow/cli memory store --namespace browser-patterns --key "login-flow" --value "snapshot->fill->click->wait"

# Retrieve before similar task
npx @claude-flow/cli memory search --query "login automation"
```

### Hooks
```bash
# Pre-browse hook (get context)
npx @claude-flow/cli hooks pre-edit --file "browser-task.ts"

# Post-browse hook (record success)
npx @claude-flow/cli hooks post-task --task-id "browse-1" --success true
```

## Tips

1. **Always use snapshots** - They're optimized for AI with refs
2. **Prefer `-i` flag** - Gets only interactive elements, smaller output
3. **Use refs, not selectors** - More reliable, deterministic
4. **Re-snapshot after navigation** - Page state changes
5. **Use sessions for parallel work** - Each session is isolated

Related Skills

qe-browser

298

from proffesor-for-testing/agentic-qe

Browser automation for QE agents using Vibium (WebDriver BiDi) with assertions, batch execution, visual diff, prompt-injection scanning, and semantic intents. Use when any QE skill needs to drive a real browser — visual testing, accessibility audits, E2E flow verification, pentest validation, or exploratory testing.

qe-visual-testing-advanced

298

from proffesor-for-testing/agentic-qe

Advanced visual regression testing with pixel-perfect comparison, AI-powered diff analysis, responsive design validation, and cross-browser visual consistency. Use when detecting UI regressions, validating designs, or ensuring visual consistency.

qe-verification-quality

298

from proffesor-for-testing/agentic-qe

Comprehensive truth scoring, code quality verification, and automatic rollback system with 0.95 accuracy threshold for ensuring high-quality agent outputs and codebase reliability.

qe-testability-scoring

298

from proffesor-for-testing/agentic-qe

AI-powered testability assessment using 10 principles of intrinsic testability with Playwright and optional Vibium integration. Evaluates web applications against Observability, Controllability, Algorithmic Simplicity, Transparency, Stability, Explainability, Unbugginess, Smallness, Decomposability, and Similarity. Use when assessing software testability, evaluating test readiness, identifying testability improvements, or generating testability reports.

qe-test-reporting-analytics

298

from proffesor-for-testing/agentic-qe

Advanced test reporting, quality dashboards, predictive analytics, trend analysis, and executive reporting for QE metrics. Use when communicating quality status, tracking trends, or making data-driven decisions.

qe-test-idea-rewriting

298

from proffesor-for-testing/agentic-qe

Transform passive 'Verify X' test descriptions into active, observable test actions. Use when test ideas lack specificity, use vague language, or fail quality validation. Converts to action-verb format for clearer, more testable descriptions.

qe-test-environment-management

298

from proffesor-for-testing/agentic-qe

Test environment provisioning, infrastructure as code for testing, Docker/Kubernetes for test environments, service virtualization, and cost optimization. Use when managing test infrastructure, ensuring environment parity, or optimizing testing costs.

qe-test-design-techniques

298

from proffesor-for-testing/agentic-qe

Systematic test design with boundary value analysis, equivalence partitioning, decision tables, state transition testing, and combinatorial testing. Use when designing comprehensive test cases, reducing redundant tests, or ensuring systematic coverage.

qe-test-data-management

298

from proffesor-for-testing/agentic-qe

Strategic test data generation, management, and privacy compliance. Use when creating test data, handling PII, ensuring GDPR/CCPA compliance, or scaling data generation for realistic testing scenarios.

qe-test-automation-strategy

298

from proffesor-for-testing/agentic-qe

Design and implement effective test automation with proper pyramid, patterns, and CI/CD integration. Use when building automation frameworks or improving test efficiency.

qe-technical-writing

298

from proffesor-for-testing/agentic-qe

Write clear, engaging technical content from real experience. Use when writing blog posts, documentation, tutorials, or technical articles.

qe-tdd-london-chicago

298

from proffesor-for-testing/agentic-qe

Apply London (mock-based) and Chicago (state-based) TDD schools. Use when practicing test-driven development or choosing testing style for your context.