browser-skill

Interactive browser automation - navigate, click, type, fill forms, take screenshots, get accessibility snapshots. Supports system Chrome/Edge via auto-detection.

100 stars

Best use case

browser-skill is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Interactive browser automation - navigate, click, type, fill forms, take screenshots, get accessibility snapshots. Supports system Chrome/Edge via auto-detection.

Teams using browser-skill should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/browser-skill/SKILL.md --create-dirs "https://raw.githubusercontent.com/trohitg/MachinaOS/main/server/skills/web_agent/browser-skill/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/browser-skill/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How browser-skill Compares

Feature / Agentbrowser-skillStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Interactive browser automation - navigate, click, type, fill forms, take screenshots, get accessibility snapshots. Supports system Chrome/Edge via auto-detection.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Browser Automation Skill

## Core Workflow

Use the **snapshot -> act -> snapshot** loop:

1. `navigate` to a URL
2. `snapshot` to see interactive elements (returns `@eN` refs)
3. `click` / `type` / `fill` / `select` using `@eN` refs as selectors
4. `snapshot` again to see the result
5. Repeat until task is complete

## browser Tool

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| operation | string | Yes | One of: navigate, click, type, fill, screenshot, snapshot, get_text, get_html, eval, wait, scroll, select, console, errors |
| url | string | navigate | URL to open |
| selector | string | click/type/fill/get_text/get_html/wait/select | CSS selector or `@eN` ref from snapshot |
| text | string | type | Text to type keystroke by keystroke |
| value | string | fill/select | Value to fill or dropdown option to select |
| expression | string | eval | JavaScript to execute in page context |
| direction | string | scroll | up, down, left, right (default: down) |
| amount | int | scroll | Pixels to scroll (default: 500) |
| fullPage | bool | screenshot | Capture full scrollable page (default: false) |
| annotate | bool | screenshot | Add numbered labels to elements (default: false) |
| screenshotFormat | string | screenshot | Image format: png (default) or jpeg |
| screenshotQuality | int | screenshot | JPEG quality 1-100 (default: 80, only for jpeg) |

## Operations

### navigate
Open a URL in the browser.
```json
{"operation": "navigate", "url": "https://example.com"}
```

### snapshot
Get the accessibility tree with `@eN` element refs. This is the primary way to see what is on the page.
```json
{"operation": "snapshot"}
```
Returns interactive elements like:
```
- heading "Example Domain" [ref=@e1]
- link "More information..." [ref=@e2]
- textbox "Search" [ref=@e3]
```

### click
Click an element using its `@eN` ref or CSS selector.
```json
{"operation": "click", "selector": "@e2"}
```

### type
Type text into an element keystroke by keystroke.
```json
{"operation": "type", "selector": "@e3", "text": "search query"}
```

### fill
Clear an input field and fill it with a value.
```json
{"operation": "fill", "selector": "@e3", "value": "new value"}
```

### screenshot
Take a screenshot of the current page.
```json
{"operation": "screenshot", "fullPage": true}
```

**Annotated screenshot** (numbered labels on interactive elements -- best for AI vision):
```json
{"operation": "screenshot", "annotate": true}
```

**JPEG format** (smaller file size):
```json
{"operation": "screenshot", "screenshotFormat": "jpeg", "screenshotQuality": 80}
```

### get_text
Extract text content from an element.
```json
{"operation": "get_text", "selector": "@e1"}
```

### eval
Execute JavaScript in the page context.
```json
{"operation": "eval", "expression": "document.title"}
```

### wait
Wait for an element to appear on the page.
```json
{"operation": "wait", "selector": "#results"}
```

### scroll
Scroll the page.
```json
{"operation": "scroll", "direction": "down", "amount": 500}
```

### select
Select a dropdown option.
```json
{"operation": "select", "selector": "@e5", "value": "option-value"}
```

### console
Get browser console output (log, warn, error messages).
```json
{"operation": "console"}
```
Returns `{"messages": [{"text": "hello", "type": "log"}, ...]}`.

### errors
Get JavaScript errors from the page.
```json
{"operation": "errors"}
```
Returns `{"errors": [...]}`.

## Using Your Real Browser

By default the browser node uses a bundled Chromium. To use your system browser (with existing logins, extensions, etc.), select it in the **Browser** dropdown under Advanced:

| Option | Description |
|--------|-------------|
| **Bundled Chrome** | Default. Downloads and uses its own Chromium. |
| **Google Chrome** | Auto-detected from system PATH or Windows registry. |
| **Microsoft Edge** | Auto-detected from system PATH or Windows registry. |
| **Chromium** | Auto-detected from system PATH or Windows registry. |
| **Custom Path** | Manually specify an executable path. |

Browser detection uses `shutil.which()` on Linux/macOS (PATH lookup) and the Windows App Paths registry (`HKLM\...\App Paths\chrome.exe`) -- the same method Selenium and Playwright use. No hardcoded paths.

### Additional options

- **New Window**: Opens a new browser window instead of a tab in an existing instance. On by default when using a system browser. Only visible for non-bundled browsers.
- **Chrome Profile**: Reuse login state from a named Chrome profile (e.g. `Default`, `Profile 1`).
- **Auto Connect**: Attach to an already-running Chrome with remote debugging:
  ```
  chrome --remote-debugging-port=9222
  ```

### Lifecycle

The browser daemon auto-starts on first use and persists between commands (for session reuse). It is automatically shut down when MachinaOs stops -- no manual cleanup needed.

## Stealth / Anti-Detection

These settings reduce bot detection. Configure them in the node's Advanced section, not as tool arguments.

- **Action Delay**: Native wait (ms) before each action. Set 500-2000ms for bot-protected sites.
- **User Agent**: Custom user-agent string to override Chrome default.
- **Proxy**: Route all browser traffic through a proxy (e.g. `http://user:pass@host:port`).

## Tips

- Always `snapshot` first to discover `@eN` element refs before interacting.
- Prefer `@eN` refs over CSS selectors -- they are stable across the session.
- Use `fill` for form inputs (clears first), `type` for search boxes (keystroke events).
- Use `screenshot` to visually verify page state when uncertain.
- Use `wait` before interacting with dynamically loaded elements.
- Use `eval` sparingly -- prefer snapshot + click/fill for most tasks.
- Select **Google Chrome** or **Microsoft Edge** in the Browser dropdown to use your real browser with existing logins.

Related Skills

serper-search-skill

100
from trohitg/MachinaOS

Search the web using Serper API for Google-powered search results including web, news, images, and places.

proxy-config-skill

100
from trohitg/MachinaOS

Configure residential proxy providers and make proxied HTTP requests with geo-targeting.

perplexity-search-skill

100
from trohitg/MachinaOS

Search the web using Perplexity Sonar AI for synthesized answers with citations, related questions, and optional images.

http-request-skill

100
from trohitg/MachinaOS

Make HTTP requests to external APIs and web services. Supports GET, POST, PUT, DELETE, PATCH methods with headers and JSON body.

duckduckgo-search-skill

100
from trohitg/MachinaOS

Search the web using DuckDuckGo for free, privacy-focused results with no API key required.

crawlee-scraper-skill

100
from trohitg/MachinaOS

Read and extract content from any web page URL.

brave-search-skill

100
from trohitg/MachinaOS

Search the web using Brave Search API for privacy-focused, independent search results with no tracking.

apify-skill

100
from trohitg/MachinaOS

Run web scrapers and extract data from websites and social media platforms using Apify actors. Supports Instagram, TikTok, Twitter/X, LinkedIn, Facebook, YouTube, Google Search, and general web crawling.

nearby-places-skill

100
from trohitg/MachinaOS

Search for nearby places like restaurants, cafes, stores, and services using Google Places API. Find places by type and location.

shell-skill

100
from trohitg/MachinaOS

Execute short-lived shell commands in a sandboxed environment. No PATH access -- use process_manager for npm/python/node commands.

process-manager-skill

100
from trohitg/MachinaOS

Start, stop, and manage long-running processes with full system PATH. Use for npm, python, node, dev servers, watchers, build tools. Destructive file commands blocked.

powershell-skill

100
from trohitg/MachinaOS

Windows PowerShell commands and patterns for process management, file operations, and system administration.