agent-browser
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
Best use case
agent-browser is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
Teams using agent-browser should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/agent-browser/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How agent-browser Compares
| Feature / Agent | agent-browser | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
# Browser Automation with agent-browser ## Core Workflow Every browser automation follows this pattern: 1. **Navigate**: `agent-browser open <url>` 2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`) 3. **Interact**: Use refs to click, fill, select 4. **Re-snapshot**: After navigation or DOM changes, get fresh refs ```bash agent-browser open https://example.com/form agent-browser snapshot -i # Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit" agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # Check result ``` ## Essential Commands ```bash # Navigation agent-browser open <url> # Navigate (aliases: goto, navigate) agent-browser close # Close browser # Snapshot agent-browser snapshot -i # Interactive elements with refs (recommended) agent-browser snapshot -s "#selector" # Scope to CSS selector # Interaction (use @refs from snapshot) agent-browser click @e1 # Click element agent-browser fill @e2 "text" # Clear and type text agent-browser type @e2 "text" # Type without clearing agent-browser select @e1 "option" # Select dropdown option agent-browser check @e1 # Check checkbox agent-browser press Enter # Press key agent-browser scroll down 500 # Scroll page # Get information agent-browser get text @e1 # Get element text agent-browser get url # Get current URL agent-browser get title # Get page title # Wait agent-browser wait @e1 # Wait for element agent-browser wait --load networkidle # Wait for network idle agent-browser wait --url "**/page" # Wait for URL pattern agent-browser wait 2000 # Wait milliseconds # Capture agent-browser screenshot # Screenshot to temp dir agent-browser screenshot --full # Full page screenshot agent-browser pdf output.pdf # Save as PDF ``` ## Common Patterns ### Form Submission ```bash agent-browser open https://example.com/signup agent-browser snapshot -i agent-browser fill @e1 "Jane Doe" agent-browser fill @e2 "jane@example.com" agent-browser select @e3 "California" agent-browser check @e4 agent-browser click @e5 agent-browser wait --load networkidle ``` ### Authentication with State Persistence ```bash # Login once and save state agent-browser open https://app.example.com/login agent-browser snapshot -i agent-browser fill @e1 "$USERNAME" agent-browser fill @e2 "$PASSWORD" agent-browser click @e3 agent-browser wait --url "**/dashboard" agent-browser state save auth.json # Reuse in future sessions agent-browser state load auth.json agent-browser open https://app.example.com/dashboard ``` ### Data Extraction ```bash agent-browser open https://example.com/products agent-browser snapshot -i agent-browser get text @e5 # Get specific element text agent-browser get text body > page.txt # Get all page text # JSON output for parsing agent-browser snapshot -i --json agent-browser get text @e1 --json ``` ### Parallel Sessions ```bash agent-browser --session site1 open https://site-a.com agent-browser --session site2 open https://site-b.com agent-browser --session site1 snapshot -i agent-browser --session site2 snapshot -i agent-browser session list ``` ### Visual Browser (Debugging) ```bash agent-browser --headed open https://example.com agent-browser highlight @e1 # Highlight element agent-browser record start demo.webm # Record session ``` ### iOS Simulator (Mobile Safari) ```bash # List available iOS simulators agent-browser device list # Launch Safari on a specific device agent-browser -p ios --device "iPhone 16 Pro" open https://example.com # Same workflow as desktop - snapshot, interact, re-snapshot agent-browser -p ios snapshot -i agent-browser -p ios tap @e1 # Tap (alias for click) agent-browser -p ios fill @e2 "text" agent-browser -p ios swipe up # Mobile-specific gesture # Take screenshot agent-browser -p ios screenshot mobile.png # Close session (shuts down simulator) agent-browser -p ios close ``` **Requirements:** macOS with Xcode, Appium (`npm install -g appium && appium driver install xcuitest`) **Real devices:** Works with physical iOS devices if pre-configured. Use `--device "<UDID>"` where UDID is from `xcrun xctrace list devices`. ## Ref Lifecycle (Important) Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after: - Clicking links or buttons that navigate - Form submissions - Dynamic content loading (dropdowns, modals) ```bash agent-browser click @e5 # Navigates to new page agent-browser snapshot -i # MUST re-snapshot agent-browser click @e1 # Use new refs ``` ## Semantic Locators (Alternative to Refs) When refs are unavailable or unreliable, use semantic locators: ```bash agent-browser find text "Sign In" click agent-browser find label "Email" fill "user@test.com" agent-browser find role button click --name "Submit" agent-browser find placeholder "Search" type "query" agent-browser find testid "submit-btn" click ``` ## Deep-Dive Documentation | Reference | When to Use | |-----------|-------------| | [references/commands.md](references/commands.md) | Full command reference with all options | | [references/snapshot-refs.md](references/snapshot-refs.md) | Ref lifecycle, invalidation rules, troubleshooting | | [references/session-management.md](references/session-management.md) | Parallel sessions, state persistence, concurrent scraping | | [references/authentication.md](references/authentication.md) | Login flows, OAuth, 2FA handling, state reuse | | [references/video-recording.md](references/video-recording.md) | Recording workflows for debugging and documentation | | [references/proxy-support.md](references/proxy-support.md) | Proxy configuration, geo-testing, rotating proxies | ## Ready-to-Use Templates | Template | Description | |----------|-------------| | [templates/form-automation.sh](templates/form-automation.sh) | Form filling with validation | | [templates/authenticated-session.sh](templates/authenticated-session.sh) | Login once, reuse state | | [templates/capture-workflow.sh](templates/capture-workflow.sh) | Content extraction with screenshots | ```bash ./templates/form-automation.sh https://example.com/form ./templates/authenticated-session.sh https://app.example.com/login ./templates/capture-workflow.sh https://example.com ./output ```
Related Skills
leanspec
The spec-coding methodology for AI-assisted development. Use when planning features, creating/refining/implementing/verifying specs, or organising a project. Works with whatever spec backend your team already uses — local markdown, GitHub Issues, Azure DevOps, Jira — by delegating platform-specific details to a LeanSpec adapter.
watch-ci
Watch GitHub Actions CI status for the current commit until completion. Use after pushing changes to monitor build results.
parallel-worktrees
Run multiple AI coding agent sessions in parallel using git worktrees — each agent isolated in its own worktree, working on a separate branch. Use this skill whenever the user wants to: run two or more AI agents simultaneously on different features or bugs, set up isolated agent workspaces in the same repo, push parallel branches to GitHub and open/update PRs, coordinate between concurrent agent sessions, or clean up after merging. Triggers on: "parallel agents", "multiple agent sessions", "git worktree", "run agents in parallel", "work on two things at once", "isolated agent workspace", "spin up another agent", or any request involving simultaneous AI-assisted development streams.
leanspec-development
Development workflows, commands, publishing, CI/CD, changelog management, and contribution guidelines for LeanSpec. Use when contributing code, fixing bugs, setting up dev environment, running tests or linting, working with the monorepo structure, looking up build/dev/test/publish/format/lint commands, preparing releases, publishing to npm, bumping versions, syncing package versions, testing dev builds, troubleshooting npm distribution, updating changelogs, triggering CI/CD workflows, monitoring build status, debugging failed runs, managing artifacts, checking CI before releases, or researching AI agent runners. Triggers include any development, scripting, publishing, CI/CD, changelog, or runner research task in this project.
github-integration
Enable the GitHub CLI (`gh`) in Claude Code cloud sessions and GitHub Copilot coding agent environments. Use this skill when: (1) setting up a project so cloud AI agents can use `gh` for PRs, issues, and releases, (2) configuring setup scripts or SessionStart hooks for `gh` installation, (3) adding `copilot-setup-steps.yml` for GitHub Copilot agents, (4) troubleshooting `gh` auth failures in cloud sessions, or (5) configuring `GH_TOKEN` for headless environments. Triggers on: "enable gh", "github integration", "Claude Code cloud setup", "copilot setup steps", "gh auth in cloud", "gh not working in cloud", "setup script", or any request involving GitHub CLI access from cloud-based AI coding agents.
browser-qa
## When to use
browser-extension-builder
Expert in building browser extensions that solve real problems - Chrome, Firefox, and cross-browser extensions. Covers extension architecture, manifest v3, content scripts, popup UIs, monetization strategies, and Chrome Web Store publishing.
browser-automation
Browser automation powers web testing, scraping, and AI agent interactions. The difference between a flaky script and a reliable system comes down to understanding selectors, waiting strategies, and anti-detection patterns.
my-browser-agent
A custom browser automation skill using Playwright.
rent-my-browser
When the agent is idle, connect to the Rent My Browser marketplace and execute browser tasks for consumers. Earn money by renting out the node's browser during downtime. Supports headless (Playwright) on VPS nodes and real Chrome on GUI machines.
browser-cdp
Real Chrome browser automation via CDP Proxy — access pages with full user login state, bypass anti-bot detection, perform interactive operations (click/fill/scroll), extract dynamic JavaScript-rendered content, take screenshots. Triggers (satisfy ANY one): - Target URL is a search results page (Bing/Google/YouTube search) - Static fetch (agent-reach/WebFetch) is blocked by anti-bot (captcha/intercept/empty) - Need to read logged-in user's private content - YouTube, Twitter/X, Xiaohongshu, WeChat public accounts, etc. - Task involves "click", "fill form", "scroll", "drag" - Need screenshot or dynamic-rendered page capture
browser-agent-server
Deploy a full-stack multi-model Browser Agent system with FastAPI server, real-time dashboard, VNC streaming, and LLM Council mode. Use when the user asks to set up browser automation, build a browser agent, deploy an AI web agent, create a browser-use server, or needs multi-model browser automation with strategies like council, consensus, fallback chain, or planner-executor.