agent-browser

Browser automation CLI for direct website interaction. Use when the user needs to open URLs, click buttons, fill forms, take screenshots, log in, or test web apps. NOT for web search.

586 stars

Best use case

agent-browser is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Browser automation CLI for direct website interaction. Use when the user needs to open URLs, click buttons, fill forms, take screenshots, log in, or test web apps. NOT for web search.

Teams using agent-browser should expect a more consistent output, faster repeated execution, less prompt rewriting, better workflow continuity with your supporting tools.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.
  • You already have the supporting tools or dependencies needed by this skill.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/agent-browser/SKILL.md --create-dirs "https://raw.githubusercontent.com/CodePhiliaX/youclaw/main/skills/agent-browser/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/agent-browser/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How agent-browser Compares

Feature / Agentagent-browserStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Browser automation CLI for direct website interaction. Use when the user needs to open URLs, click buttons, fill forms, take screenshots, log in, or test web apps. NOT for web search.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Browser Automation with agent-browser

## Performance Rules (CRITICAL)

- **ALWAYS chain commands with `&&`** when you don't need intermediate output. Each separate tool call costs seconds of round-trip latency.
- **ALWAYS combine** open + wait + snapshot into one call: `agent-browser open <url> && agent-browser wait --load load && agent-browser snapshot -i`
- **ALWAYS batch** multiple interactions (fill, click, select) into one `&&` chain when refs are already known.
- **Use `--load load`** (DOM load event) by default. Only use `networkidle` when you specifically need all XHR/fetch to complete (e.g., waiting for API-driven content).
- **Do NOT snapshot after every interaction** — only re-snapshot when you need to discover new element refs (after navigation or major DOM changes).

## Core Workflow

Every browser automation follows this pattern:

1. **Navigate + Snapshot** (one call): `agent-browser open <url> && agent-browser wait --load load && agent-browser snapshot -i`
2. **Interact**: Batch all interactions using known refs in one `&&` chain
3. **Re-snapshot**: Only after navigation or major DOM changes

```bash
# Step 1: Open and discover elements (ONE tool call)
agent-browser open https://example.com/form && agent-browser wait --load load && agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"

# Step 2: Batch all interactions (ONE tool call)
agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3 && agent-browser wait --load load

# Step 3: Only snapshot if you need to verify or discover new elements
agent-browser snapshot -i
```

This reduces 7+ round-trips to just 2-3.

## Essential Commands

```bash
# Navigation
agent-browser open <url>              # Navigate (aliases: goto, navigate)
agent-browser close                   # Close browser

# Snapshot
agent-browser snapshot -i             # Interactive elements with refs (recommended)
agent-browser snapshot -i -C          # Include cursor-interactive elements
agent-browser snapshot -s "#selector" # Scope to CSS selector

# Interaction (use @refs from snapshot)
agent-browser click @e1               # Click element
agent-browser fill @e2 "text"         # Clear and type text
agent-browser type @e2 "text"         # Type without clearing
agent-browser select @e1 "option"     # Select dropdown option
agent-browser check @e1               # Check checkbox
agent-browser press Enter             # Press key
agent-browser scroll down 500         # Scroll page

# Get information
agent-browser get text @e1            # Get element text
agent-browser get url                 # Get current URL
agent-browser get title               # Get page title

# Wait
agent-browser wait @e1                # Wait for element
agent-browser wait --load load        # Wait for DOM load (fast, preferred)
agent-browser wait --load networkidle # Wait for network idle (slow, use only when needed)
agent-browser wait --url "**/page"    # Wait for URL pattern
agent-browser wait 2000               # Wait milliseconds

# Capture
agent-browser screenshot              # Screenshot to temp dir
agent-browser screenshot --full       # Full page screenshot
agent-browser screenshot --annotate   # Annotated screenshot with numbered element labels
agent-browser pdf output.pdf          # Save as PDF

# Diff (compare page states)
agent-browser diff snapshot                          # Compare current vs last snapshot
agent-browser diff screenshot --baseline before.png  # Visual pixel diff
agent-browser diff url <url1> <url2>                 # Compare two pages
```

## Common Patterns

### Form Submission

```bash
# Step 1: Open and discover elements (ONE call)
agent-browser open https://example.com/signup && agent-browser wait --load load && agent-browser snapshot -i

# Step 2: Fill and submit (ONE call)
agent-browser fill @e1 "Jane Doe" && agent-browser fill @e2 "jane@example.com" && agent-browser select @e3 "California" && agent-browser check @e4 && agent-browser click @e5 && agent-browser wait --load load
```

### Authentication with State Persistence

```bash
# Login: open + snapshot (ONE call)
agent-browser open https://app.example.com/login && agent-browser wait --load load && agent-browser snapshot -i

# Fill credentials + submit (ONE call)
agent-browser fill @e1 "$USERNAME" && agent-browser fill @e2 "$PASSWORD" && agent-browser click @e3 && agent-browser wait --url "**/dashboard"
agent-browser state save auth.json

# Reuse in future sessions
agent-browser state load auth.json && agent-browser open https://app.example.com/dashboard
```

### Data Extraction

```bash
# Open + snapshot in one call
agent-browser open https://example.com/products && agent-browser wait --load load && agent-browser snapshot -i
agent-browser get text @e5           # Get specific element text
agent-browser get text body > page.txt  # Get all page text

# JSON output for parsing
agent-browser snapshot -i --json
```

## Ref Lifecycle (Important)

Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after:

- Clicking links or buttons that navigate
- Form submissions
- Dynamic content loading (dropdowns, modals)

```bash
agent-browser click @e5              # Navigates to new page
agent-browser snapshot -i            # MUST re-snapshot
agent-browser click @e1              # Use new refs
```

## Annotated Screenshots (Vision Mode)

Use `--annotate` to take a screenshot with numbered labels overlaid on interactive elements.

```bash
agent-browser screenshot --annotate
# Output includes the image path and a legend:
#   [1] @e1 button "Submit"
#   [2] @e2 link "Home"
#   [3] @e3 textbox "Email"
agent-browser click @e2              # Click using ref from annotated screenshot
```

## Semantic Locators (Alternative to Refs)

When refs are unavailable or unreliable, use semantic locators:

```bash
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
```

## Session Management

Always close your browser session when done:

```bash
agent-browser close                    # Close default session
agent-browser --session agent1 close   # Close specific session
```

## Browser Profile (Persistent Login)

If the system prompt provides `--session`, `--profile`, `--headed`, and/or `--executable-path` parameters, you MUST include them in every `agent-browser` command. This allows reusing persistent login state (cookies, localStorage, etc.).

```bash
agent-browser --session my-profile --profile /path/to/profile --headed open https://app.example.com
agent-browser --session my-profile --profile /path/to/profile --headed snapshot -i
```

### Troubleshooting
If `agent-browser` fails to launch:
1. Try `agent-browser install chrome` then retry once
2. If headed mode fails, try without `--headed` (headless) while keeping `--profile` and `--session`
3. Do NOT retry the same failing command more than 2 times

## Default Mode (No Profile)

When no browser profile is specified in the system prompt, use agent-browser without `--headed`, `--profile`, or `--session` flags. The browser runs in headless mode by default.

Related Skills

agent-browser

13089
from EveryInc/compound-engineering-plugin

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.

agent-browser

5966
from JimmyLv/BibiGPT-v1

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.

agent-browser-zh

3891
from openclaw/skills

A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands. (Chinese localized version)

agent-browser-local

3891
from openclaw/skills

Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection

agent-browser-with-camoufox

3891
from openclaw/skills

One-click deployment of camoufox anti-detection browser with modified agent-browser. Patches agent-browser to auto-detect camoufox/firefox from executable path instead of defaulting to chromium. Includes SkillsI integration for seamless browser automation workflows.

agent-browser-automation

3817
from openclaw/skills

Headless browser automation CLI for AI agents using native Rust binary with Chrome DevTools Protocol

agent-browser

1864
from LeoYeAI/openclaw-master-skills

A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands.

agent-browser

1802
from FreedomIntelligence/OpenClaw-Medical-Skills

Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.

agent-browser

1592
from openakita/openakita

Browser automation for AI agents via inference.sh. Navigate web pages, interact with elements using @e refs, take screenshots, record video. Capabilities: web scraping, form filling, clicking, typing, drag-drop, file upload, JavaScript execution. Use for: web automation, data extraction, testing, agent browsing, research. Triggers: browser, web automation, scrape, navigate, click, fill form, screenshot, browse web, playwright, headless browser, web agent, surf internet, record video

agent-browser

1172
from inclusionAI/AWorld

Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.

core-agent-browser

984
from actionbook/rust-skills

No description provided.

agent-browser

685
from openai/plugins

Browser automation CLI for AI agents. Use when the user needs to interact with websites, verify dev server output, test web apps, navigate pages, fill forms, click buttons, take screenshots, extract data, or automate any browser task. Also triggers when a dev server starts so you can verify it visually.