agent-browser
Automates browser interactions via CLI using agent-browser by Vercel Labs. Covers navigation, clicking, form filling, snapshots, refs-based selectors, agent mode with JSON output, session management, and CDP integration. Use when the user needs to automate web browsing, scrape pages, fill forms, or integrate browser automation into AI agent workflows.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/agent-browser/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How agent-browser Compares
| Feature / Agent | agent-browser | Standard Approach |
|---|---|---|
| Platform Support | multi | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Automates browser interactions via CLI using agent-browser by Vercel Labs. Covers navigation, clicking, form filling, snapshots, refs-based selectors, agent mode with JSON output, session management, and CDP integration. Use when the user needs to automate web browsing, scrape pages, fill forms, or integrate browser automation into AI agent workflows.
Which AI agents support this skill?
This skill is compatible with multi.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
## When to use this skill Use this skill whenever the user wants to: - Automate browser interactions (click, fill, navigate, screenshot) via CLI - Scrape web content or extract data from pages - Build AI agent workflows that interact with websites - Use refs-based element selection for deterministic automation - Run browser automation in agent mode with JSON output - Manage authenticated sessions with custom headers or CDP ## How to use this skill This skill is organized to match the agent-browser official documentation structure (https://github.com/vercel-labs/agent-browser/blob/main/README.md). When working with agent-browser: ### Quick-Start Example: Snapshot → Identify → Interact ```bash # 1. Install npm install -g @anthropic-ai/agent-browser # 2. Open a page and take a snapshot to get element refs agent-browser open "https://example.com" agent-browser snapshot # Output includes refs like @e1, @e2, @e3 for each element # 3. Click an element by ref agent-browser click @e3 # 4. Fill a form field agent-browser fill @e5 "hello@example.com" # 5. Agent mode (JSON output for programmatic use) agent-browser snapshot --json ``` ### Detailed Documentation 1. **Install agent-browser**: - Load `examples/getting-started/installation.md` for installation instructions 2. **Quick Start**: - Load `examples/quick-start/quick-start.md` for basic workflow examples 3. **Learn core commands**: - Load `examples/commands/basic-commands.md` for basic commands (open, click, fill, etc.) - Load `examples/commands/advanced-commands.md` for advanced commands (snapshot, eval, etc.) - Load `examples/commands/get-info/` for information retrieval commands - Load `examples/commands/check-state/` for state checking commands - Load `examples/commands/find-elements/` for semantic locator commands - Load `examples/commands/wait/` for wait commands - Load `examples/commands/mouse-control/` for mouse control commands - Load `examples/commands/browser-settings/` for browser configuration - Load `examples/commands/cookies-storage/` for cookies and storage management - Load `examples/commands/network/` for network interception - Load `examples/commands/tabs-windows/` for tab and window management - Load `examples/commands/frames/` for iframe handling - Load `examples/commands/dialogs/` for dialog handling - Load `examples/commands/debug/` for debugging commands - Load `examples/commands/navigation/` for navigation commands - Load `examples/commands/setup/` for setup commands 4. **Understand selectors**: - Load `examples/selectors/refs.md` for refs-based selection (@e1, @e2, etc.) - Load `examples/selectors/traditional-selectors.md` for CSS, XPath, and semantic locators 5. **Use agent mode**: - Load `examples/agent-mode/introduction.md` for agent mode overview - Load `examples/agent-mode/optimal-workflow.md` for optimal AI workflow - Load `examples/agent-mode/integration.md` for integrating with AI agents 6. **Advanced features**: - Load `examples/advanced/sessions.md` for session management - Load `examples/advanced/headed-mode.md` for debugging with visible browser - Load `examples/advanced/authenticated-sessions.md` for authentication via headers - Load `examples/advanced/custom-executable.md` for custom browser executable - Load `examples/advanced/cdp-mode.md` for Chrome DevTools Protocol integration - Load `examples/advanced/streaming.md` for browser viewport streaming - Load `examples/advanced/architecture.md` for architecture overview - Load `examples/advanced/platforms.md` for platform support - Load `examples/advanced/usage-with-agents.md` for AI agent integration patterns 7. **Configure options**: - Load `examples/options/global-options.md` for global CLI options - Load `examples/options/snapshot-options.md` for snapshot-specific options - Load `examples/options/session-options.md` for session management options 8. **Reference API documentation** when needed: - `api/commands.md` - Complete command reference - `api/selectors.md` - Selector reference - `api/options.md` - Options reference 9. **Use templates** for quick start: - `templates/basic-automation.md` - Basic automation workflow - `templates/ai-agent-workflow.md` - AI agent workflow template ### Doc mapping (one-to-one with official documentation) - See examples and API files → https://github.com/vercel-labs/agent-browser ## Examples and Templates This skill includes detailed examples organized to match the official documentation structure. All examples are in the `examples/` directory (see mapping above). **To use examples:** - Identify the topic from the user's request - Load the appropriate example file from the mapping above - Follow the instructions, syntax, and best practices in that file - Adapt the code examples to your specific use case **To use templates:** - Reference templates in `templates/` directory for common scaffolding - Adapt templates to your specific needs and coding style ## API Reference - **Commands API**: `api/commands.md` - Complete command reference with syntax and examples - **Selectors API**: `api/selectors.md` - Selector types and usage reference - **Options API**: `api/options.md` - All options reference ## Best Practices 1. **Use Refs**: Prefer refs (@e1, @e2) over traditional selectors for deterministic automation 2. **Snapshot First**: Always snapshot before interacting with elements to get refs 3. **Agent Mode**: Use `--json` flag for machine-readable output in agent mode 4. **Session Management**: Use `--session` to maintain state across commands 5. **Interactive Snapshot**: Use `-i` flag for interactive snapshot selection 6. **Semantic Locators**: Use semantic locators (role/name) when refs are not available 7. **Error Handling**: Check command exit codes and error messages 8. **Wait for Navigation**: Commands automatically wait for navigation to complete 9. **Headed Mode**: Use `--headed` for debugging, headless for production 10. **CDP Integration**: Use `--cdp` for Chrome DevTools Protocol integration 11. **Streaming**: Use `AGENT_BROWSER_STREAM_PORT` for live browser preview 12. **Authenticated Sessions**: Use `--headers` for authentication without login flows 13. **Custom Executable**: Use `--executable-path` for serverless deployments or custom browsers 14. **Snapshot Options**: Combine `-i`, `-c`, `-d`, `-s` options to optimize snapshot output ## Resources - **GitHub Repository**: https://github.com/vercel-labs/agent-browser - **Official README**: https://github.com/vercel-labs/agent-browser/blob/main/README.md - **Agent Mode Documentation**: https://agent-browser.dev/agent-mode - **Issues**: https://github.com/vercel-labs/agent-browser/issues ## Keywords agent-browser, CLI browser automation, AI agents, browser automation CLI, refs, snapshot, agent mode, semantic locators, browser automation tool, command-line browser, AI agent browser, deterministic selectors, accessibility tree, browser commands, web automation CLI, sessions, headed mode, authenticated sessions, CDP mode, streaming, Chrome DevTools Protocol, Playwright, browser automation for AI