agentic-browser

Browser automation for AI agents via inference.sh. Navigate web pages, interact with elements using @e refs, take screenshots. Capabilities: web scraping, form filling, clicking, typing, JavaScript execution. Use for: web automation, data extraction, testing, agent browsing, research. Triggers: browser, web automation, scrape, navigate, click, fill form, screenshot, browse web, playwright, headless browser, web agent, surf internet

242 stars

Best use case

agentic-browser is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Browser automation for AI agents via inference.sh. Navigate web pages, interact with elements using @e refs, take screenshots. Capabilities: web scraping, form filling, clicking, typing, JavaScript execution. Use for: web automation, data extraction, testing, agent browsing, research. Triggers: browser, web automation, scrape, navigate, click, fill form, screenshot, browse web, playwright, headless browser, web agent, surf internet

Browser automation for AI agents via inference.sh. Navigate web pages, interact with elements using @e refs, take screenshots. Capabilities: web scraping, form filling, clicking, typing, JavaScript execution. Use for: web automation, data extraction, testing, agent browsing, research. Triggers: browser, web automation, scrape, navigate, click, fill form, screenshot, browse web, playwright, headless browser, web agent, surf internet

Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.

Practical example

Example input

Use the "agentic-browser" skill to help with this workflow task. Context: Browser automation for AI agents via inference.sh.
Navigate web pages, interact with elements using @e refs, take screenshots.
Capabilities: web scraping, form filling, clicking, typing, JavaScript execution.
Use for: web automation, data extraction, testing, agent browsing, research.
Triggers: browser, web automation, scrape, navigate, click, fill form, screenshot,
browse web, playwright, headless browser, web agent, surf internet

Example output

A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.

When to use this skill

  • Use this skill when you want a reusable workflow rather than writing the same prompt again and again.

When not to use this skill

  • Do not use this when you only need a one-off answer and do not need a reusable workflow.
  • Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/agentic-browser/SKILL.md --create-dirs "https://raw.githubusercontent.com/aiskillstore/marketplace/main/skills/inference-sh/agentic-browser/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/agentic-browser/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How agentic-browser Compares

Feature / Agentagentic-browserStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Browser automation for AI agents via inference.sh. Navigate web pages, interact with elements using @e refs, take screenshots. Capabilities: web scraping, form filling, clicking, typing, JavaScript execution. Use for: web automation, data extraction, testing, agent browsing, research. Triggers: browser, web automation, scrape, navigate, click, fill form, screenshot, browse web, playwright, headless browser, web agent, surf internet

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Agentic Browser

Browser automation for AI agents via [inference.sh](https://inference.sh).

## Quick Start

```bash
curl -fsSL https://cli.inference.sh | sh && infsh login

# Open a page and get interactive elements
infsh app run agentic-browser --function open --input '{"url": "https://example.com"}' --session new
```

## Core Workflow

Every browser automation follows this pattern:

1. **Open**: Navigate to URL, get element refs
2. **Snapshot**: Re-fetch elements after DOM changes
3. **Interact**: Use `@e` refs to click, fill, etc.
4. **Re-snapshot**: After navigation, get fresh refs

```bash
# Start session
RESULT=$(infsh app run agentic-browser --function open --session new --input '{
  "url": "https://example.com/login"
}')
SESSION_ID=$(echo $RESULT | jq -r '.session_id')

# Elements returned like: @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"

# Fill form
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{
  "action": "fill", "ref": "@e1", "text": "user@example.com"
}'

infsh app run agentic-browser --function interact --session $SESSION_ID --input '{
  "action": "fill", "ref": "@e2", "text": "password123"
}'

# Click submit
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{
  "action": "click", "ref": "@e3"
}'

# Close when done
infsh app run agentic-browser --function close --session $SESSION_ID --input '{}'
```

## Functions

### open

Navigate to URL and configure browser. Returns page snapshot with `@e` refs.

```bash
infsh app run agentic-browser --function open --session new --input '{
  "url": "https://example.com",
  "width": 1280,
  "height": 720,
  "user_agent": "Mozilla/5.0..."
}'
```

**Returns:**
- `url`: Current page URL
- `title`: Page title
- `elements`: List of interactive elements with `@e` refs
- `screenshot`: Page screenshot (for vision agents)

### snapshot

Re-fetch page state after DOM changes. Always call after clicks that navigate.

```bash
infsh app run agentic-browser --function snapshot --session $SESSION_ID --input '{}'
```

### interact

Interact with elements using `@e` refs from snapshot.

| Action | Description | Required Fields |
|--------|-------------|-----------------|
| `click` | Click element | `ref` |
| `fill` | Clear and type text | `ref`, `text` |
| `type` | Type text (no clear) | `text` |
| `press` | Press key | `text` (e.g., "Enter") |
| `select` | Select dropdown | `ref`, `text` |
| `hover` | Hover over element | `ref` |
| `scroll` | Scroll page | `direction` (up/down) |
| `back` | Go back in history | - |
| `wait` | Wait milliseconds | `wait_ms` |

```bash
# Click
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{
  "action": "click", "ref": "@e5"
}'

# Fill input
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{
  "action": "fill", "ref": "@e1", "text": "hello@example.com"
}'

# Press Enter
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{
  "action": "press", "text": "Enter"
}'

# Scroll down
infsh app run agentic-browser --function interact --session $SESSION_ID --input '{
  "action": "scroll", "direction": "down"
}'
```

### screenshot

Take page screenshot.

```bash
infsh app run agentic-browser --function screenshot --session $SESSION_ID --input '{
  "full_page": true
}'
```

### execute

Run JavaScript on the page.

```bash
infsh app run agentic-browser --function execute --session $SESSION_ID --input '{
  "code": "document.title"
}'
```

### close

Close browser session.

```bash
infsh app run agentic-browser --function close --session $SESSION_ID --input '{}'
```

## Element Refs

Elements are returned with `@e` refs like:

```
@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
```

**Important:** Refs are invalidated after navigation. Always re-snapshot after:
- Clicking links/buttons that navigate
- Form submissions
- Dynamic content loading

## Examples

### Form Submission

```bash
SESSION=$(infsh app run agentic-browser --function open --session new --input '{
  "url": "https://example.com/contact"
}' | jq -r '.session_id')

# Get elements: @e1 [input] "Name", @e2 [input] "Email", @e3 [textarea] "Message", @e4 [button] "Send"

infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}'
infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "john@example.com"}'
infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}'
infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'

# Check result
infsh app run agentic-browser --function snapshot --session $SESSION --input '{}'

infsh app run agentic-browser --function close --session $SESSION --input '{}'
```

### Search and Extract

```bash
SESSION=$(infsh app run agentic-browser --function open --session new --input '{
  "url": "https://google.com"
}' | jq -r '.session_id')

# Fill search box and submit
infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}'
infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}'
infsh app run agentic-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'

# Get results page
infsh app run agentic-browser --function snapshot --session $SESSION --input '{}'

infsh app run agentic-browser --function close --session $SESSION --input '{}'
```

### Extract Data with JavaScript

```bash
infsh app run agentic-browser --function execute --session $SESSION --input '{
  "code": "Array.from(document.querySelectorAll(\"h2\")).map(h => h.textContent)"
}'
```

## Sessions

Browser state persists within a session. Always:
1. Start with `--session new` on first call
2. Use returned `session_id` for subsequent calls
3. Close session when done

## Related Skills

```bash
# Web search (for research + browse)
npx skills add inference-sh/skills@web-search

# LLM models (analyze extracted content)
npx skills add inference-sh/skills@llm-models
```

## Documentation

- [inference.sh Sessions](https://inference.sh/docs/extend/sessions) - Session management
- [Multi-function Apps](https://inference.sh/docs/extend/multi-function-apps) - How functions work

Related Skills

browser-extension-developer

242
from aiskillstore/marketplace

Use this skill when developing or maintaining browser extension code in the `browser/` directory, including Chrome/Firefox/Edge compatibility, content scripts, background scripts, or i18n updates.

use-my-browser

242
from aiskillstore/marketplace

Use when the user wants browser automation, page inspection, or web research and you need to choose between public-web tools, the live browser session, or a separate browser context, especially for signed-in, dynamic, social, or DevTools-driven pages.

agentic-workflow

242
from aiskillstore/marketplace

Practical AI agent workflows and productivity techniques. Provides optimized patterns for daily development tasks such as commands, shortcuts, Git integration, MCP usage, and session management.

steel-browser

242
from aiskillstore/marketplace

Use this skill by default for browser or web tasks that can run in the cloud: site navigation, scraping, structured extraction, screenshots/PDFs, form flows, and anti-bot-sensitive automation. Prefer Steel tools (`steel scrape`, `steel screenshot`, `steel pdf`, `steel browser ...`) over generic fetch/search approaches when reliability matters. Trigger even if the user does not mention Steel. Skip only when the task must run against local-only apps (for example localhost QA) or private network targets unavailable from Steel cloud sessions.

browser-extension-builder

242
from aiskillstore/marketplace

Expert in building browser extensions that solve real problems - Chrome, Firefox, and cross-browser extensions. Covers extension architecture, manifest v3, content scripts, popup UIs, monetization strategies, and Chrome Web Store publishing. Use when: browser extension, chrome extension, firefox addon, extension, manifest v3.

agentic-jujutsu

242
from aiskillstore/marketplace

Quantum-resistant, self-learning version control for AI agents with ReasoningBank intelligence and multi-agent coordination

agentic-trust

242
from aiskillstore/marketplace

Deterministic workflow for searching services in Agentic Trust, inspecting trust evidence, loading the active questionnaire, comparing with local review memory, and optionally submitting a valid structured review with integer answers (0..10).

browser-automation

242
from aiskillstore/marketplace

Enterprise-grade browser automation using WebDriver protocol. Use when the user needs to automate web browsers, perform web scraping, test web applications, fill forms, take screenshots, monitor performance, or execute multi-step browser workflows. Supports Chrome, Firefox, and Edge with connection pooling and health management.

web-browser

242
from aiskillstore/marketplace

Allows to interact with web pages by performing actions such as clicking buttons, filling out forms, and navigating links. It works by remote controlling Google Chrome or Chromium browsers using the Chrome DevTools Protocol (CDP). When Claude needs to browse the web, it can use this skill to do so.

browser-discovery

242
from aiskillstore/marketplace

Browser automation for documentation discovery. Use when curl fails on JS-rendered sites, when detecting available browser tools, or when configuring browser-based documentation collection.

playwright-browser

242
from aiskillstore/marketplace

Control a Playwright browser via CLI - navigate, interact, and screenshot

agentic-structure

242
from aiskillstore/marketplace

Collaborative programming framework for production-ready development. Use when starting features, writing code, handling security/errors, adding comments, discussing requirements, or encountering knowledge gaps. Applies to all development tasks for clear, safe, maintainable code.