web-browser

Interact with web pages using agent-browser CLI. MUST run 'browser connect 9222' FIRST to use existing browser with authenticated sessions.

215 stars

bymegalithic

View on GitHub Installation ↓

Best use case

web-browser is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Interact with web pages using agent-browser CLI. MUST run 'browser connect 9222' FIRST to use existing browser with authenticated sessions.

Teams using web-browser should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/web-browser/SKILL.md --create-dirs "https://raw.githubusercontent.com/megalithic/dotfiles/main/home/common/programs/ai/pi-coding-agent/skills/web-browser/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/web-browser/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How web-browser Compares

Feature / Agent	web-browser	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Interact with web pages using agent-browser CLI. MUST run 'browser connect 9222' FIRST to use existing browser with authenticated sessions.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Web Browser Skill

Browser automation using `agent-browser` CLI connected to your running browser.

## 🚨 MANDATORY FIRST STEP

**EVERY browser session MUST start with:**

```bash
browser connect 9222
```

This connects to your running browser with all authenticated sessions (Asana, Figma, GitHub, etc.).

**WITHOUT THIS STEP:**
- Commands will fail or timeout
- You'll get isolated sessions without logins
- User will have to re-authenticate everything

## ⚠️ CRITICAL REQUIREMENTS

### 1. ALWAYS connect to port 9222 FIRST

Before ANY browser operation, you MUST connect to the remote debugging port:

```bash
browser connect 9222
```

This is REQUIRED for accessing authenticated sessions. Without this step, commands will fail or create isolated sessions without your logins.

### 2. NEVER take over existing tabs

When navigating to a URL:
- First check if tab already exists: `browser tab list`
- If found, switch to it: `browser tab <index>`
- If NOT found, open a NEW tab: `browser open <url>`

**NEVER navigate an existing tab to a different URL** - this destroys the user's work/context.

## Correct workflow

```bash
# 1. ALWAYS connect first (required every session)
browser connect 9222

# 2. Check for existing tab
browser tab list

# 3a. If tab exists for your URL, switch to it
browser tab 14

# 3b. If tab doesn't exist, open NEW tab
browser open https://app.asana.com/...

# 4. Interact
browser snapshot -i
browser click @e5
```

## Check if browser is listening

```bash
lsof -i :9222 -sTCP:LISTEN
```

## Common commands

After connecting, use standard agent-browser commands:

### Navigation & tabs
```bash
browser tab list                    # List all tabs
browser tab 14                      # Switch to tab by index
browser open https://example.com    # Open URL (NEW tab)
browser back                        # Go back
browser reload                      # Reload page
```

### Inspection
```bash
browser snapshot -i                 # Get interactive elements with @refs
browser screenshot                  # Take screenshot
browser get title                   # Get page title
browser get url                     # Get current URL
browser get text @e1                # Get text of element
```

### Interaction
```bash
browser click @e1                   # Click element
browser fill @e2 "search text"      # Clear and type
browser type @e3 "append text"      # Type without clearing
browser select @e4 "option"         # Select dropdown
browser press Enter                 # Press key
browser scroll down 500             # Scroll
```

### Waiting
```bash
browser wait @e1                    # Wait for element
browser wait 2000                   # Wait milliseconds
```

## Tab targeting by URL

Instead of remembering tab numbers, find tabs by URL:

```bash
browser tab list | rg -i asana
browser tab list | rg -i localhost:4000
```

## Notes

- Tabs are numbered by CDP, not visual order in browser
- `snapshot -i` gives @refs like @e1, @e2 for clicking
- After page changes (navigation, clicks), re-run `snapshot -i`
- Your browser must be running with `--remote-debugging-port=9222`

Related Skills

writing-clearly-and-concisely

215

from megalithic/dotfiles

Apply Strunk's timeless writing rules to ANY prose humans will read - documentation, commit messages, error messages, explanations, reports, or UI text. Makes your writing clearer, stronger, and more professional.

web-search

215

from megalithic/dotfiles

Web search using DuckDuckGo (free, unlimited). Falls back to pi-web-access extension for content extraction.

tmux

215

from megalithic/dotfiles

Remote control tmux sessions for interactive CLIs (python, gdb, etc.) by sending keystrokes and scraping pane output.

ticket-worker

215

from megalithic/dotfiles

Work on a single tk ticket end-to-end. Use when the user says 'work on ticket X' or when spawned by work-tickets.sh.

ticket-creator

215

from megalithic/dotfiles

Create and refine tickets for the tk ticket system. Use when the user says 'create tickets for X', 'refine ticket X', 'break this into tickets', 'seed tickets from plan', or anything about creating or refining tk tickets.

tell

215

from megalithic/dotfiles

Delegate tasks to other agents - pi sessions or external agents (claude, opencode, aider). Non-blocking with task tracking and completion notifications.

task-pipeline

215

from megalithic/dotfiles

Structured workflow for research → plan → tickets → work. Use when starting or continuing a task with /task, /plan, or /tickets commands.

preview

215

from megalithic/dotfiles

Display code, diffs, images, and other content in a tmux pane or popup. Auto-detects nvim/megaterm for floating popups.

mcpctl

215

from megalithic/dotfiles

Manage MCP server configurations — add, remove, list, inspect, troubleshoot. Use when asked to "add mcp server", "remove mcp", "list mcp servers", "mcp status", "configure mcp", "troubleshoot mcp", or any MCP server management task.

handoff

215

from megalithic/dotfiles

Save session state for later pickup. Use /handoff when context is degrading, /pickup to resume in a new session.

github

215

from megalithic/dotfiles

Interact with GitHub using the `gh` CLI. Use `gh issue`, `gh pr`, `gh run`, and `gh api` for issues, PRs, CI runs, and advanced queries.

git-worktrees

215

from megalithic/dotfiles

Git worktree conventions and commands. Use when creating, switching to, or cleaning up git worktrees for branch work.