browser-use
AI驱动的智能浏览器自动化工具。使用LLM理解页面并自动执行任务,比传统Playwright更智能、更省token。适用于复杂交互、动态页面、需要智能决策的浏览器操作。Chrome浏览器优先。
Best use case
browser-use is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
AI驱动的智能浏览器自动化工具。使用LLM理解页面并自动执行任务,比传统Playwright更智能、更省token。适用于复杂交互、动态页面、需要智能决策的浏览器操作。Chrome浏览器优先。
Teams using browser-use should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/browser-use/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How browser-use Compares
| Feature / Agent | browser-use | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
AI驱动的智能浏览器自动化工具。使用LLM理解页面并自动执行任务,比传统Playwright更智能、更省token。适用于复杂交互、动态页面、需要智能决策的浏览器操作。Chrome浏览器优先。
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# browser-use 智能浏览器自动化
## 概述
browser-use 是一个 AI 驱动的浏览器自动化工具,它使用 LLM 来:
- 理解网页内容
- 智能决策下一步操作
- 自动完成任务
与 Playwright 的区别:
| 特性 | Playwright | browser-use |
|------|-----------|-------------|
| 控制方式 | 预编程脚本 | AI智能决策 |
| 适应性 | 页面变化需重写 | 自动适应 |
| Token消耗 | 较低但需调试 | 智能精简 |
| 复杂交互 | 需精确选择器 | 自然语言描述 |
| 维护成本 | 高 | 低 |
## ⚠️ 资源清理原则(强制)
**所有涉及浏览器的 cron 任务完成后,必须自动关闭 Chrome 进程!**
```python
import asyncio
from browser_use import Agent
async def main():
agent = Agent(task="...", llm=llm)
result = await agent.run()
# ⚠️ 任务结束后必须显式关闭浏览器
if hasattr(agent, 'browser') and agent.browser:
await agent.browser.close()
# ⚠️ 推荐在脚本结束时强制清理残留进程
import subprocess
subprocess.run(['pkill', '-f', 'chrome'], capture_output=True)
return result
```
**原因**: 避免内存泄漏和资源占用,防止 Gateway CPU 100% 过载
## 安装状态
✅ 已安装:
- browser-use 0.11.11
- browser-use-sdk 2.0.15
## 快速开始
### 基本用法
```python
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def main():
agent = Agent(
task="打开 polymarket.com,查看 Fed 利率市场",
llm=ChatOpenAI(model="gpt-4o"),
)
result = await agent.run()
print(result)
asyncio.run(main())
```
### 使用自定义 LLM(推荐配置)
**⚠️ 重要**:your-provider API 是 **Anthropic 格式**,不是 OpenAI 格式!必须使用 `ChatAnthropic`。
```python
from browser_use.llm.anthropic.chat import ChatAnthropic
# your-provider API(Anthropic 兼容)✅ 推荐
llm = ChatAnthropic(
model="claude-sonnet-4-6",
base_url="https://your-anthropic-proxy.example.com", # 注意:不加 /v1
api_key="your-api-key", # 或从 pass show api/your-provider 获取
)
agent = Agent(
task="你的任务",
llm=llm,
)
```
```python
# ❌ 错误用法:不要用 ChatOpenAI + your-provider
# from browser_use.llm.openai.chat import ChatOpenAI # 这个不行!your-provider 不支持 OpenAI 格式
```
```python
# 如果使用 OpenAI 兼容 API(如 Provider-B),用 ChatOpenAI:
from browser_use.llm.openai.chat import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o",
base_url="https://ai.9w7.cn/v1",
api_key="your-api-key",
)
```
### 使用 Chrome 浏览器
```python
from browser_use import Agent, BrowserProfile, BrowserSession
# 配置使用 Chrome
profile = BrowserProfile(
executable_path="/usr/bin/google-chrome-stable",
headless=True,
disable_security=True,
)
session = BrowserSession(browser_profile=profile)
agent = Agent(
task="任务描述",
llm=llm,
browser_session=session,
)
```
### 保存/加载登录态
```python
# 保存登录态
await browser_context.save_storage_state(path="polymarket_auth.json")
# 加载登录态
context = BrowserContextConfig(
storage_state="polymarket_auth.json"
)
```
## 常用配置
### 最小化 Token 消耗
```python
agent = Agent(
task="任务",
llm=llm,
use_vision=False, # 禁用视觉,减少token
max_actions_per_step=3, # 限制每步操作数
message_compaction=True, # 消息压缩
)
```
### 调试模式
```python
agent = Agent(
task="任务",
llm=llm,
headless=False, # 显示浏览器
slow_mo=1000, # 慢放,每步延迟1秒
save_conversation_path="debug_log/", # 保存日志
)
```
### 提取结构化数据
```python
from pydantic import BaseModel
class MarketData(BaseModel):
question: str
yes_price: float
no_price: float
agent = Agent(
task="获取 Polymarket Fed 利率市场数据",
llm=llm,
output_model_schema=MarketData,
)
result = await agent.run()
# result 将是 MarketData 类型
```
## Polymarket 集成
### 查看市场
```python
agent = Agent(
task="""
1. 打开 https://polymarket.com/event/fed-decision-in-march-885
2. 提取以下信息:
- 市场问题
- Yes 价格
- No 价格
- 交易量
3. 返回 JSON 格式数据
""",
llm=llm,
)
```
### 执行交易
```python
agent = Agent(
task="""
1. 打开 Polymarket
2. 连接钱包(如果需要)
3. 导航到 Fed 利率市场
4. 买入 $0.60 的 No(价格 ≥ 0.85)
5. 确认交易
""",
llm=llm,
sensitive_data={
"wallet_address": "0x...",
}
)
```
## 最佳实践
### 1. 任务描述要清晰
```python
# 好 ✅
task="打开 polymarket.com,找到 Fed 利率市场,提取 Yes/No 价格"
# 差 ❌
task="帮我看看那个市场"
```
### 2. 使用结构化输出
```python
from pydantic import BaseModel
class TradingResult(BaseModel):
success: bool
market: str
action: str # "buy_yes", "buy_no"
amount: float
price: float
tx_hash: str | None
agent = Agent(
task="执行交易...",
output_model_schema=TradingResult,
)
```
### 3. 错误处理
```python
try:
result = await agent.run()
except Exception as e:
print(f"任务失败: {e}")
# 可以使用 browser tool 作为后备
```
### 4. 复用登录态
```python
# 第一次登录后保存
# 后续直接加载,避免重复登录
browser_context = BrowserContextConfig(
storage_state="~/.playwright-data/polymarket/auth.json"
)
```
## Token 消耗优化
### 对比(相同任务)
| 工具 | 平均 Token | 原因 |
|------|-----------|------|
| browser tool | ~5000-10000 | 每次快照全页面 |
| Playwright | ~1000-2000 | 需多次调试 |
| browser-use | ~2000-4000 | AI精简决策 |
### 优化技巧
1. **禁用视觉**:`use_vision=False`
2. **限制历史**:`max_history_items=10`
3. **压缩消息**:`message_compaction=True`
4. **减少步骤**:`max_actions_per_step=3`
5. **使用 Flash 模式**:`flash_mode=True`(快速模式)
## 完整示例
```python
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
from pydantic import BaseModel
import os
class MarketInfo(BaseModel):
question: str
yes_price: float
no_price: float
volume: str
async def check_polymarket():
# 使用本地 LLM API
llm = ChatOpenAI(
model="claude-3-5-sonnet-20241022",
base_url="https://your-anthropic-proxy.example.com",
api_key=os.environ.get("XSC_API_KEY"),
)
agent = Agent(
task="""
访问 Polymarket Fed 利率市场:
https://polymarket.com/event/fed-decision-in-march-885
提取并返回:
- 市场问题
- Yes 价格(0-1)
- No 价格(0-1)
- 24h 交易量
""",
llm=llm,
output_model_schema=MarketInfo,
use_vision=False,
max_actions_per_step=5,
)
result = await agent.run()
return result
if __name__ == "__main__":
asyncio.run(check_polymarket())
```
## 资源
- 官方文档: https://docs.browser-use.com
- GitHub: https://github.com/browser-use/browser-use
- 示例: `~/clawd/skills/browser-use/examples/`
## 快速命令
```bash
# 安装(已完成)
pip install browser-use
# 运行脚本
python3 script.py
# 测试
python3 -c "from browser_use import Agent; print('✅ OK')"
```
---
**记住**: browser-use 让浏览器操作更智能,省去调试选择器的痛苦!🚀Related Skills
openclaw-browser-chain-debug
Diagnose OpenClaw browser control failures including browser start timeouts, Chrome CDP startup failures, missing DISPLAY, browser profile launch issues, and gateway/browser environment mismatches. Use when browser automation, browser-based cron jobs, or profile openclaw fails to start, times out, or returns Request was aborted after browser steps. Also use when deciding whether a task should run with a profile browser versus an attach browser: prefer profile for unattended automation and recurring jobs; prefer attach when a human's already-open logged-in tab or manual cooperation is required.
fast-browser-use
Use when the user wants extremely fast browser automation via fast-browser-use / fbu, especially for DOM-heavy pages, fast extraction, or browser tasks on macOS/Linux with Chrome installed.
browser-profile-guide
OpenClaw Browser 配置文件系统完全指南 — 哪个 profile 何时用、如何选
browser-login-monitor
> 浏览器登录安全监控——监测浏览器会话状态与登录安全
Agent Browser
A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands.
wemp-operator
> 微信公众号全功能运营——草稿/发布/评论/用户/素材/群发/统计/菜单/二维码 API 封装
zsxq-smart-publish
Publish and manage content on 知识星球 (zsxq.com). Supports talk posts, Q&A, long articles, file sharing, digest/bookmark, homework tasks, and tag management. Use when publishing content to 知识星球, creating/editing posts, uploading files/images/audio, managing digests, batch publishing, or formatting content for 知识星球.
zoom-automation
Automate Zoom meeting creation, management, recordings, webinars, and participant tracking via Rube MCP (Composio). Always search tools first for current schemas.
zoho-crm-automation
Automate Zoho CRM tasks via Rube MCP (Composio): create/update records, search contacts, manage leads, and convert leads. Always search tools first for current schemas.
ziliu-publisher
字流(Ziliu) - AI驱动的多平台内容分发工具。用于一次创作、智能适配排版、一键分发到16+平台(公众号/知乎/小红书/B站/抖音/微博/X等)。当用户需要多平台发布、内容排版、格式适配时使用。触发词:字流、ziliu、多平台发布、一键分发、内容分发、排版发布。
zhihu-post-skill
> 知乎文章发布——知乎平台内容创作与发布自动化
zendesk-automation
Automate Zendesk tasks via Rube MCP (Composio): tickets, users, organizations, replies. Always search tools first for current schemas.