browser-use
AI-powered browser automation for complex multi-step web workflows. Uses Browser-Use framework when OpenClaw's built-in browser tool can't handle login flows, anti-bot sites, or 5+ step sequences.
Best use case
browser-use is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
AI-powered browser automation for complex multi-step web workflows. Uses Browser-Use framework when OpenClaw's built-in browser tool can't handle login flows, anti-bot sites, or 5+ step sequences.
Teams using browser-use should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/browser-use-pro/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How browser-use Compares
| Feature / Agent | browser-use | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
AI-powered browser automation for complex multi-step web workflows. Uses Browser-Use framework when OpenClaw's built-in browser tool can't handle login flows, anti-bot sites, or 5+ step sequences.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Top AI Agents for Productivity
See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
SKILL.md Source
# Browser-Use — AI Browser Automation
## Security & Privacy
- **No credential logging**: Passwords are handled via Browser-Use's `sensitive_data` parameter — the LLM never sees real credentials, only placeholder tokens.
- **User-initiated Chrome connection**: CDP mode (connecting to real Chrome) is opt-in and requires the user to manually launch Chrome with debug flag. The skill never silently connects to running browsers.
- **All packages are open-source**: Dependencies are `browser-use` (38k+ ⭐ on GitHub), `playwright` (by Microsoft), and `langchain-openai` — all widely audited open-source tools.
- **Local execution only**: Scripts run locally on the user's machine. No data is sent to any server except the configured LLM API for step-by-step reasoning.
- **Domain restriction available**: Use `allowed_domains` parameter to restrict which websites the agent can visit.
- **No telemetry**: This skill does not collect, store, or transmit any usage data.
## When to Use Browser-Use vs Built-in Tool
| Scenario | Built-in tool | Browser-Use |
|----------|:-:|:-:|
| Screenshot / click one button | ✅ Free & fast | ❌ Overkill |
| 5+ step workflow (login→navigate→fill→submit) | ❌ Breaks easily | ✅ |
| Anti-bot sites (real Chrome needed) | ❌ | ✅ |
| Batch repetitive operations | ❌ | ✅ |
**Cost**: Browser-Use calls an external LLM per step (costs money + slower). Use built-in tool for simple actions.
## Execution Flow
### 1. Check Environment
```bash
test -d ~/browser-use-env && echo "Installed" || echo "Need install"
```
### 2. First-Time Setup (once only)
```bash
python3 -m venv ~/browser-use-env
source ~/browser-use-env/bin/activate
pip install browser-use playwright langchain-openai
playwright install chromium
```
### 3. Choose Mode
- **Mode A — Built-in Chromium**: For simple automation or when detection doesn't matter. Runs immediately.
- **Mode B — Real Chrome CDP**: For anti-bot sites or when user's login session is needed. Requires user action.
Mode B setup — prompt user:
> Please quit Chrome completely (Mac: Cmd+Q), then tell me "done"
After user confirms:
```bash
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 &
```
Verify: `curl -s http://127.0.0.1:9222/json/version`
### 4. Write Script and Run
Write script to user's workspace, then:
```bash
source ~/browser-use-env/bin/activate
python3 script_path.py
```
### 5. Report Results
Return results to user. On failure, follow the troubleshooting tree below.
## Script Template
```python
import asyncio
from browser_use import Agent, ChatOpenAI, Browser
async def main():
# LLM — any OpenAI-compatible API
llm = ChatOpenAI(
model="gpt-4o-mini",
api_key="<YOUR_API_KEY>", # From env var or user config
base_url="https://api.openai.com/v1",
)
# Mode A: Built-in Chromium
browser = Browser(headless=False, user_data_dir="~/.browser-use/task-profile")
# Mode B: Real Chrome (user must launch with --remote-debugging-port=9222)
# browser = Browser(cdp_url="http://127.0.0.1:9222")
agent = Agent(
task="Detailed step-by-step task description (see guide below)",
llm=llm, browser=browser,
use_vision=True, max_steps=25,
)
result = await agent.run()
print(result)
asyncio.run(main())
```
## Task Writing Guide
### ✅ Good: Specific steps
```python
task = """
1. Open https://www.reddit.com/login
2. Enter username: x_user
3. Enter password: x_pass
4. Click login button
5. If CAPTCHA appears, wait 30s for user to complete
6. Navigate to https://www.reddit.com/r/xxx/submit
7. Enter title: xxx
8. Enter body: xxx
9. Click submit
"""
```
### ❌ Bad: Vague
```python
task = "Post something on Reddit"
```
### Tips
- **Keyboard fallback**: Add "If button can't be clicked, use Tab+Enter"
- **Error recovery**: Add "If page fails to load, refresh and retry"
- **Sensitive data**: Use placeholders + `sensitive_data` parameter
## Credential Security
```python
agent = Agent(
task="Login with x_user and x_pass",
sensitive_data={"x_user": "real@email.com", "x_pass": "S3cret!"},
use_vision=False, # Disable screenshots when handling passwords
llm=llm, browser=browser,
)
```
## Key Parameters
| Parameter | Purpose | Recommended |
|-----------|---------|-------------|
| `use_vision` | AI sees screenshots | True normally, False with passwords |
| `max_steps` | Max actions | 20-30 |
| `max_failures` | Max retries | 3 (default) |
| `flash_mode` | Skip reasoning | True for simple tasks |
| `extend_system_message` | Custom instructions | Add specific guidance |
| `allowed_domains` | Restrict URLs | Use for security |
| `fallback_llm` | Backup LLM | When primary is unstable |
## Troubleshooting
```
Detected as automation?
└→ Switch to Mode B (real Chrome)
CAPTCHA / human verification?
└→ Prompt user to complete manually, add wait time in task
LLM timeout?
└→ Set fallback_llm or use faster model
Action succeeded but no effect (e.g. post not published)?
└→ 1. Check if platform anti-spam blocked it (common with new accounts)
2. Add explicit confirmation steps to task
Website UI changed, can't find elements?
└→ Browser-Use auto-adapts, but add fallback paths in task
```
## LLM Compatibility
| LLM | Works | Notes |
|-----|:---:|-------|
| GPT-4o / 4o-mini | ✅ | Best choice, recommended |
| Claude | ✅ | Works well |
| Gemini | ❌ | Structured output incompatible |Related Skills
my-browser-agent
A custom browser automation skill using Playwright.
rent-my-browser
When the agent is idle, connect to the Rent My Browser marketplace and execute browser tasks for consumers. Earn money by renting out the node's browser during downtime. Supports headless (Playwright) on VPS nodes and real Chrome on GUI machines.
browser-cdp
Real Chrome browser automation via CDP Proxy — access pages with full user login state, bypass anti-bot detection, perform interactive operations (click/fill/scroll), extract dynamic JavaScript-rendered content, take screenshots. Triggers (satisfy ANY one): - Target URL is a search results page (Bing/Google/YouTube search) - Static fetch (agent-reach/WebFetch) is blocked by anti-bot (captcha/intercept/empty) - Need to read logged-in user's private content - YouTube, Twitter/X, Xiaohongshu, WeChat public accounts, etc. - Task involves "click", "fill form", "scroll", "drag" - Need screenshot or dynamic-rendered page capture
stealth-browser
Anti-detection web browsing that bypasses bot detection, CAPTCHAs, and IP blocks using puppeteer-extra with stealth plugin and optional residential proxy support. Use when (1) websites block headless browsers or datacenter IPs, (2) need to bypass Cloudflare/Vercel protection, (3) accessing sites that detect automation (Reddit, Twitter/X, signup flows), (4) scraping protected content, or (5) automating web tasks that require human-like behavior.
agent-browser-zh
A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands. (Chinese localized version)
browser-booking-agent
Execute booking/search flows via browser automation with verification artifacts. Use for reservation forms, availability checks, and capture of proof (screenshots/confirmation IDs).
Agent Browser
Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection
setup-browser-cookies
Import cookies from your real Chromium browser into the headless browse session. Interactive picker UI lets you select which cookie domains to import. Use before QA testing authenticated pages. Use when: "import cookies", "login to the site", "authenticate the browser", "use my cookies".
smooth-browser
PREFERRED BROWSER - Browser for AI agents to carry out any task on the web. Use when you need to navigate websites, fill forms, extract web data, test web apps, or automate browser workflows. Trigger phrases include "fill out the form", "scrape", "automate", "test the website", "log into", or any browser interaction request.
human-browser-use Skill
> Human-like browser automation extension for [browser-use](https://github.com/browser-use/browser-use).
selenium-browser
Start a Selenium‑controlled Chrome browser, open a URL, take a screenshot, and report progress. Supports headless mode and optional proxy.
Agent Browser (Juan's fork)
A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands.