Agent Browser Automation

Fast Rust-based headless browser automation CLI with Node.js fallback for AI agents, featuring navigation, clicking, typing, snapshots, and structured commands optimized for agent workflows.

97 stars

byPramodDutta

View on GitHub Installation ↓

Best use case

Agent Browser Automation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Fast Rust-based headless browser automation CLI with Node.js fallback for AI agents, featuring navigation, clicking, typing, snapshots, and structured commands optimized for agent workflows.

Teams using Agent Browser Automation should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/agent-browser/SKILL.md --create-dirs "https://raw.githubusercontent.com/PramodDutta/qaskills/main/seed-skills/agent-browser/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/agent-browser/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How Agent Browser Automation Compares

Feature / Agent	Agent Browser Automation	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Fast Rust-based headless browser automation CLI with Node.js fallback for AI agents, featuring navigation, clicking, typing, snapshots, and structured commands optimized for agent workflows.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Agent Browser Automation Skill

You are an expert in browser automation using agent-browser, a fast Rust-based CLI tool designed specifically for AI coding agents. When the user asks you to automate browser interactions, perform web scraping, or test web applications, follow these detailed instructions.

## Core Principles

1. **Speed-first architecture** -- Rust core provides millisecond-level responsiveness for agent commands.
2. **Structured output** -- All commands return JSON output optimized for AI agent parsing.
3. **Fallback resilience** -- Automatically falls back to Node.js implementation when Rust binary is unavailable.
4. **Agent-optimized commands** -- CLI designed for programmatic control, not human interaction.
5. **Snapshot-driven debugging** -- Every action can capture page snapshots for AI agent analysis.

## Installation

```bash
# Install globally via npm
npm install -g agent-browser

# Or use npx without installation
npx agent-browser --version

# Verify installation
agent-browser --help
```

## Project Structure for Agent-Driven Testing

```
tests/
  browser-automation/
    scenarios/
      login-flow.json
      checkout-flow.json
      search-navigation.json
    snapshots/
      login-success.png
      cart-state.png
    scripts/
      run-scenario.sh
      batch-test.sh
  config/
    browser-config.json
    viewport-sizes.json
```

## Core Commands

### Navigation Commands

```bash
# Navigate to URL
agent-browser navigate --url "https://example.com"

# Navigate with custom viewport
agent-browser navigate --url "https://example.com" --viewport "1920x1080"

# Navigate with custom user agent
agent-browser navigate --url "https://example.com" \
  --user-agent "Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X)"

# Navigate and wait for network idle
agent-browser navigate --url "https://example.com" --wait-until "networkidle"
```

**Output:**
```json
{
  "success": true,
  "url": "https://example.com",
  "title": "Example Domain",
  "loadTime": 234,
  "timestamp": "2024-02-12T10:30:00Z"
}
```

### Clicking Elements

```bash
# Click by selector
agent-browser click --selector "button#submit"

# Click by text content
agent-browser click --text "Sign In"

# Click with coordinate offset
agent-browser click --selector ".dropdown" --offset "10,20"

# Wait for element before clicking
agent-browser click --selector "button.load-more" --wait 5000
```

**Output:**
```json
{
  "success": true,
  "element": "button#submit",
  "action": "click",
  "timestamp": "2024-02-12T10:30:01Z"
}
```

### Typing Input

```bash
# Type into input field
agent-browser type --selector "input#email" --text "user@example.com"

# Type with delay between keystrokes (human-like)
agent-browser type --selector "input#search" --text "automation" --delay 50

# Type and press Enter
agent-browser type --selector "input#search" --text "query" --enter

# Clear field before typing
agent-browser type --selector "input#username" --text "newuser" --clear
```

**Output:**
```json
{
  "success": true,
  "element": "input#email",
  "action": "type",
  "length": 17,
  "timestamp": "2024-02-12T10:30:02Z"
}
```

### Taking Snapshots

```bash
# Full page screenshot
agent-browser snapshot --output "screenshot.png"

# Snapshot specific element
agent-browser snapshot --selector "#content" --output "content.png"

# Snapshot with custom dimensions
agent-browser snapshot --viewport "1920x1080" --output "desktop.png"

# Get page HTML
agent-browser snapshot --format "html" --output "page.html"

# Get page as PDF
agent-browser snapshot --format "pdf" --output "document.pdf"
```

**Output:**
```json
{
  "success": true,
  "format": "png",
  "path": "screenshot.png",
  "size": 245678,
  "dimensions": {
    "width": 1280,
    "height": 720
  },
  "timestamp": "2024-02-12T10:30:03Z"
}
```

### Extracting Data

```bash
# Extract text from element
agent-browser extract --selector "h1.title" --attribute "text"

# Extract attribute value
agent-browser extract --selector "a.link" --attribute "href"

# Extract multiple elements
agent-browser extract --selector "li.item" --all

# Extract as JSON
agent-browser extract --selector ".product-card" --json \
  --fields "title:.title,price:.price,image:img@src"
```

**Output:**
```json
{
  "success": true,
  "selector": "h1.title",
  "results": [
    {
      "text": "Welcome to Our Site",
      "visible": true
    }
  ],
  "count": 1,
  "timestamp": "2024-02-12T10:30:04Z"
}
```

### Waiting for Elements

```bash
# Wait for element to appear
agent-browser wait --selector ".loading-complete" --timeout 10000

# Wait for element to disappear
agent-browser wait --selector ".spinner" --hidden --timeout 5000

# Wait for text to appear
agent-browser wait --text "Success" --timeout 3000

# Wait for custom condition
agent-browser wait --script "document.readyState === 'complete'"
```

**Output:**
```json
{
  "success": true,
  "condition": "element_visible",
  "selector": ".loading-complete",
  "waited": 1234,
  "timestamp": "2024-02-12T10:30:05Z"
}
```

## Scenario-Based Automation

### Login Flow Example

```bash
#!/bin/bash
# login-test.sh

# Navigate to login page
agent-browser navigate --url "https://app.example.com/login"

# Fill email
agent-browser type --selector "input#email" --text "user@example.com"

# Fill password
agent-browser type --selector "input#password" --text "$PASSWORD"

# Click submit
agent-browser click --selector "button[type='submit']"

# Wait for dashboard
agent-browser wait --selector ".dashboard" --timeout 5000

# Take success snapshot
agent-browser snapshot --output "login-success.png"

# Extract user info
agent-browser extract --selector ".user-name" --attribute "text"
```

### E2E Shopping Flow

```bash
#!/bin/bash
# checkout-flow.sh

set -e  # Exit on any error

echo "Starting checkout flow test..."

# 1. Navigate to product page
agent-browser navigate --url "https://shop.example.com/products/widget-123"
agent-browser wait --selector ".product-details" --timeout 5000

# 2. Add to cart
agent-browser click --selector "button.add-to-cart"
agent-browser wait --text "Added to cart" --timeout 3000

# 3. Open cart
agent-browser click --selector ".cart-icon"
agent-browser wait --selector ".cart-items" --timeout 2000

# 4. Take cart snapshot
agent-browser snapshot --selector ".cart-summary" --output "cart-state.png"

# 5. Proceed to checkout
agent-browser click --text "Proceed to Checkout"
agent-browser wait --selector ".checkout-form" --timeout 5000

# 6. Fill shipping info
agent-browser type --selector "#shipping-name" --text "John Doe"
agent-browser type --selector "#shipping-address" --text "123 Main St"
agent-browser type --selector "#shipping-city" --text "San Francisco"

# 7. Take final snapshot
agent-browser snapshot --output "checkout-complete.png"

echo "Checkout flow completed successfully"
```

## Integration with AI Agent Workflows

### TypeScript/JavaScript Integration

```typescript
import { execSync } from 'child_process';

interface BrowserResult {
  success: boolean;
  [key: string]: any;
}

class AgentBrowser {
  private baseCommand = 'agent-browser';

  async navigate(url: string, options: { viewport?: string; waitUntil?: string } = {}): Promise<BrowserResult> {
    const args = [`navigate`, `--url`, `"${url}"`];
    if (options.viewport) args.push(`--viewport`, options.viewport);
    if (options.waitUntil) args.push(`--wait-until`, options.waitUntil);

    return this.exec(args);
  }

  async click(selector: string, options: { wait?: number; offset?: string } = {}): Promise<BrowserResult> {
    const args = [`click`, `--selector`, `"${selector}"`];
    if (options.wait) args.push(`--wait`, options.wait.toString());
    if (options.offset) args.push(`--offset`, options.offset);

    return this.exec(args);
  }

  async type(selector: string, text: string, options: { delay?: number; enter?: boolean; clear?: boolean } = {}): Promise<BrowserResult> {
    const args = [`type`, `--selector`, `"${selector}"`, `--text`, `"${text}"`];
    if (options.delay) args.push(`--delay`, options.delay.toString());
    if (options.enter) args.push(`--enter`);
    if (options.clear) args.push(`--clear`);

    return this.exec(args);
  }

  async snapshot(output: string, options: { format?: string; selector?: string; viewport?: string } = {}): Promise<BrowserResult> {
    const args = [`snapshot`, `--output`, `"${output}"`];
    if (options.format) args.push(`--format`, options.format);
    if (options.selector) args.push(`--selector`, `"${options.selector}"`);
    if (options.viewport) args.push(`--viewport`, options.viewport);

    return this.exec(args);
  }

  async extract(selector: string, options: { attribute?: string; all?: boolean; json?: boolean } = {}): Promise<BrowserResult> {
    const args = [`extract`, `--selector`, `"${selector}"`];
    if (options.attribute) args.push(`--attribute`, options.attribute);
    if (options.all) args.push(`--all`);
    if (options.json) args.push(`--json`);

    return this.exec(args);
  }

  private exec(args: string[]): BrowserResult {
    const command = `${this.baseCommand} ${args.join(' ')}`;
    try {
      const output = execSync(command, { encoding: 'utf-8' });
      return JSON.parse(output);
    } catch (error: any) {
      return {
        success: false,
        error: error.message,
        command,
      };
    }
  }
}

// Usage example
async function testLoginFlow() {
  const browser = new AgentBrowser();

  // Navigate
  const navResult = await browser.navigate('https://app.example.com/login', {
    waitUntil: 'networkidle',
  });

  if (!navResult.success) {
    throw new Error(`Navigation failed: ${navResult.error}`);
  }

  // Login
  await browser.type('#email', 'user@example.com');
  await browser.type('#password', process.env.PASSWORD || '', { enter: true });

  // Wait for dashboard
  await browser.wait('.dashboard', { timeout: 5000 });

  // Capture success state
  await browser.snapshot('login-success.png');

  console.log('Login test completed successfully');
}
```

### Python Integration

```python
import json
import subprocess
from typing import Dict, Any, Optional

class AgentBrowser:
    def __init__(self, browser_path: str = "agent-browser"):
        self.browser_path = browser_path

    def navigate(self, url: str, viewport: Optional[str] = None, wait_until: Optional[str] = None) -> Dict[str, Any]:
        args = [self.browser_path, "navigate", "--url", url]
        if viewport:
            args.extend(["--viewport", viewport])
        if wait_until:
            args.extend(["--wait-until", wait_until])
        return self._exec(args)

    def click(self, selector: str, wait: Optional[int] = None, offset: Optional[str] = None) -> Dict[str, Any]:
        args = [self.browser_path, "click", "--selector", selector]
        if wait:
            args.extend(["--wait", str(wait)])
        if offset:
            args.extend(["--offset", offset])
        return self._exec(args)

    def type(self, selector: str, text: str, delay: Optional[int] = None, enter: bool = False, clear: bool = False) -> Dict[str, Any]:
        args = [self.browser_path, "type", "--selector", selector, "--text", text]
        if delay:
            args.extend(["--delay", str(delay)])
        if enter:
            args.append("--enter")
        if clear:
            args.append("--clear")
        return self._exec(args)

    def snapshot(self, output: str, format: Optional[str] = None, selector: Optional[str] = None) -> Dict[str, Any]:
        args = [self.browser_path, "snapshot", "--output", output]
        if format:
            args.extend(["--format", format])
        if selector:
            args.extend(["--selector", selector])
        return self._exec(args)

    def extract(self, selector: str, attribute: Optional[str] = None, all: bool = False) -> Dict[str, Any]:
        args = [self.browser_path, "extract", "--selector", selector]
        if attribute:
            args.extend(["--attribute", attribute])
        if all:
            args.append("--all")
        return self._exec(args)

    def _exec(self, args: list) -> Dict[str, Any]:
        try:
            result = subprocess.run(args, capture_output=True, text=True, check=True)
            return json.loads(result.stdout)
        except subprocess.CalledProcessError as e:
            return {
                "success": False,
                "error": e.stderr,
                "command": " ".join(args)
            }

# Usage example
def test_search_flow():
    browser = AgentBrowser()

    # Navigate to search page
    nav = browser.navigate("https://example.com/search")
    assert nav["success"], f"Navigation failed: {nav.get('error')}"

    # Enter search query
    browser.type("input#search", "automation testing", enter=True)

    # Wait for results
    browser.wait(".search-results", timeout=5000)

    # Extract result titles
    results = browser.extract(".result-item h3", attribute="text", all=True)

    print(f"Found {len(results.get('results', []))} search results")

    # Take snapshot
    browser.snapshot("search-results.png")
```

## Configuration Management

### Browser Config File

```json
{
  "viewport": {
    "width": 1920,
    "height": 1080
  },
  "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
  "timeout": {
    "navigation": 30000,
    "element": 10000,
    "network": 5000
  },
  "screenshot": {
    "format": "png",
    "quality": 80,
    "fullPage": true
  },
  "headless": true,
  "slowMo": 0,
  "devtools": false
}
```

### Load Config in Commands

```bash
# Use config file
agent-browser --config ./browser-config.json navigate --url "https://example.com"

# Override config values
agent-browser --config ./browser-config.json --viewport "1280x720" navigate --url "https://example.com"
```

## Performance Optimization

### Rust vs Node.js Performance

**Rust Implementation Benefits:**
- 10-50x faster command execution
- Lower memory footprint (20-30 MB vs 100+ MB for Node.js)
- Instant startup time (<100ms vs 500-1000ms)
- Compiled binary distribution (no runtime dependencies)

**When Node.js Fallback Activates:**
- Rust binary not available for platform
- Specific browser features requiring Node.js Puppeteer/Playwright
- Debug mode requiring Chrome DevTools Protocol

### Batch Command Optimization

```bash
# Chain commands for efficiency
agent-browser chain \
  "navigate --url https://example.com" \
  "type --selector #search --text query" \
  "click --selector button" \
  "wait --selector .results" \
  "snapshot --output results.png"

# Output includes timing for each step
```

## Best Practices

1. **Use JSON output parsing** -- Always parse JSON responses for reliable AI agent integration.
2. **Implement retry logic** -- Network issues and timing can cause transient failures.
3. **Take snapshots liberally** -- Visual snapshots help AI agents understand page state.
4. **Use semantic selectors** -- Prefer IDs and stable data attributes over brittle CSS selectors.
5. **Set appropriate timeouts** -- Balance speed with reliability based on page complexity.
6. **Chain related commands** -- Use command chaining to reduce overhead.
7. **Cache browser instances** -- Reuse browser contexts for faster subsequent commands.
8. **Validate responses** -- Check success field before proceeding with workflow.
9. **Log all actions** -- Maintain audit trail for debugging and compliance.
10. **Use structured scenarios** -- Define reusable JSON scenarios for common flows.

## Anti-Patterns to Avoid

1. **Hardcoded waits** -- Avoid fixed delays; use element waiting instead.
2. **Ignoring errors** -- Always check success field and handle failures gracefully.
3. **Over-reliance on XPath** -- Prefer CSS selectors for better performance.
4. **Snapshot overload** -- Don't snapshot every step; focus on key states.
5. **Single-use scripts** -- Create reusable functions instead of one-off scripts.
6. **No error recovery** -- Implement retry and fallback mechanisms.
7. **Unvalidated input** -- Sanitize and validate all user input before typing.
8. **Resource leaks** -- Always close browser instances when done.
9. **Ignoring performance** -- Monitor command execution times and optimize.
10. **Poor selector strategies** -- Avoid deep nesting and position-dependent selectors.

## Common Scenarios

### Form Automation

```bash
# Registration form
agent-browser navigate --url "https://app.example.com/register"
agent-browser type --selector "#firstName" --text "John"
agent-browser type --selector "#lastName" --text "Doe"
agent-browser type --selector "#email" --text "john@example.com"
agent-browser type --selector "#password" --text "SecurePass123!"
agent-browser click --selector "#terms-checkbox"
agent-browser click --selector "button[type='submit']"
agent-browser wait --text "Registration successful" --timeout 5000
agent-browser snapshot --output "registration-success.png"
```

### Data Extraction

```bash
# Extract product information
agent-browser navigate --url "https://shop.example.com/products"
agent-browser wait --selector ".product-grid" --timeout 5000
agent-browser extract --selector ".product-card" --json \
  --fields "name:.product-name,price:.product-price,rating:.rating" \
  --all > products.json
```

### Multi-Step Workflow

```bash
# Complex dashboard interaction
agent-browser navigate --url "https://dashboard.example.com"
agent-browser wait --selector ".dashboard-loaded" --timeout 10000
agent-browser click --selector "a[href='/analytics']"
agent-browser wait --selector ".chart-container" --timeout 5000
agent-browser click --selector "button.date-filter"
agent-browser click --text "Last 30 days"
agent-browser wait --selector ".chart-updated" --timeout 3000
agent-browser snapshot --selector ".analytics-panel" --output "analytics.png"
```

## Debugging and Troubleshooting

### Enable Debug Mode

```bash
# Verbose output
agent-browser --debug navigate --url "https://example.com"

# Show browser window (non-headless)
agent-browser --headless false navigate --url "https://example.com"

# Save DevTools logs
agent-browser --console-log ./console.log navigate --url "https://example.com"
```

### Error Handling

```typescript
async function robustBrowserAction(action: () => Promise<BrowserResult>): Promise<BrowserResult> {
  const maxRetries = 3;
  let lastError: any;

  for (let i = 0; i < maxRetries; i++) {
    try {
      const result = await action();
      if (result.success) {
        return result;
      }
      lastError = result.error;
    } catch (error) {
      lastError = error;
    }

    // Exponential backoff
    await new Promise(resolve => setTimeout(resolve, Math.pow(2, i) * 1000));
  }

  throw new Error(`Action failed after ${maxRetries} retries: ${lastError}`);
}
```

## Integration with AI Agents

When AI agents use agent-browser, they should:

1. **Parse JSON responses** to understand command results
2. **Analyze snapshots** to verify visual state
3. **Adapt selectors** based on page structure
4. **Handle errors gracefully** with retry logic
5. **Chain commands efficiently** to minimize overhead
6. **Validate outcomes** by checking both JSON and visual snapshots
7. **Maintain context** across multi-step flows
8. **Learn from failures** by analyzing error messages and snapshots

This tool empowers AI agents to interact with web applications at high speed with structured, parseable output optimized for autonomous decision-making.

Related Skills

Vibe Check - Browser Automation

from PramodDutta/qaskills

Browser automation for AI agents. Navigate pages, fill forms, click elements, take screenshots, and manage tabs — all through simple CLI commands. 2.6k+ GitHub stars.

REST Assured API Automation Framework

from PramodDutta/qaskills

Production-grade REST API automation framework with REST Assured, POJO serialization using GSON, PayloadManager pattern, E2E integration workflows with TestNG ITestContext, and Allure reporting.

Responsive Design Testing Automation

from PramodDutta/qaskills

Automated responsive design testing across breakpoints, viewports, and devices with visual comparison and layout verification.

Postman & Newman Automation

from PramodDutta/qaskills

Automated API testing using Postman collections with Newman CLI for CI/CD integration, environment management, and test reporting.

Playwright CLI Browser Automation

from PramodDutta/qaskills

Command-line browser automation with Playwright CLI for navigation, snapshots, uploads, downloads, tracing, and QA workflows.

Playwright CLI Automation

from PramodDutta/qaskills

CLI-first browser automation using Playwright CLI for navigation, form filling, snapshots, screenshots, data extraction, and UI-flow debugging from the terminal.

Cross-Browser Compatibility Testing

from PramodDutta/qaskills

Automated cross-browser testing across Chrome, Firefox, Safari, and Edge with visual comparison, feature detection, and polyfill verification.

BrowserStack Cloud Testing

from PramodDutta/qaskills

Cloud-based cross-browser and cross-device testing with BrowserStack including Automate, App Automate, Percy visual testing, Observability, and integration with Playwright, Selenium, and CI/CD pipelines.

Browser-Use Automation

from PramodDutta/qaskills

CLI tool for persistent browser automation with multi-session support, featuring Chromium/Real/Remote browser modes, cookie management, JavaScript execution, and long-running automation workflows.

Browser Extension Testing

from PramodDutta/qaskills

Testing browser extensions including content script injection, background worker testing, popup UI testing, storage API testing, and cross-browser compatibility.

axe-core Accessibility Automation

from PramodDutta/qaskills

Automated accessibility testing with axe-core integrated into CI pipelines, including custom rule configuration, issue prioritization, and remediation guidance.

Zod Schema Testing

from PramodDutta/qaskills

Comprehensive testing patterns for Zod schemas covering validation testing, transform testing, error message verification, and integration with API endpoints and forms