browser-use

AI驱动的智能浏览器自动化工具。使用LLM理解页面并自动执行任务,比传统Playwright更智能、更省token。适用于复杂交互、动态页面、需要智能决策的浏览器操作。Chrome浏览器优先。

33 stars

Best use case

browser-use is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

AI驱动的智能浏览器自动化工具。使用LLM理解页面并自动执行任务,比传统Playwright更智能、更省token。适用于复杂交互、动态页面、需要智能决策的浏览器操作。Chrome浏览器优先。

Teams using browser-use should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/browser-use/SKILL.md --create-dirs "https://raw.githubusercontent.com/aAAaqwq/AGI-Super-Team/main/skills/browser-use/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/browser-use/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How browser-use Compares

Feature / Agentbrowser-useStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

AI驱动的智能浏览器自动化工具。使用LLM理解页面并自动执行任务,比传统Playwright更智能、更省token。适用于复杂交互、动态页面、需要智能决策的浏览器操作。Chrome浏览器优先。

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# browser-use 智能浏览器自动化

## 概述

browser-use 是一个 AI 驱动的浏览器自动化工具,它使用 LLM 来:
- 理解网页内容
- 智能决策下一步操作
- 自动完成任务

与 Playwright 的区别:
| 特性 | Playwright | browser-use |
|------|-----------|-------------|
| 控制方式 | 预编程脚本 | AI智能决策 |
| 适应性 | 页面变化需重写 | 自动适应 |
| Token消耗 | 较低但需调试 | 智能精简 |
| 复杂交互 | 需精确选择器 | 自然语言描述 |
| 维护成本 | 高 | 低 |

## ⚠️ 资源清理原则(强制)

**所有涉及浏览器的 cron 任务完成后,必须自动关闭 Chrome 进程!**

```python
import asyncio
from browser_use import Agent

async def main():
    agent = Agent(task="...", llm=llm)
    result = await agent.run()

    # ⚠️ 任务结束后必须显式关闭浏览器
    if hasattr(agent, 'browser') and agent.browser:
        await agent.browser.close()

    # ⚠️ 推荐在脚本结束时强制清理残留进程
    import subprocess
    subprocess.run(['pkill', '-f', 'chrome'], capture_output=True)

    return result
```

**原因**: 避免内存泄漏和资源占用,防止 Gateway CPU 100% 过载

## 安装状态

✅ 已安装:
- browser-use 0.11.11
- browser-use-sdk 2.0.15

## 快速开始

### 基本用法

```python
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def main():
    agent = Agent(
        task="打开 polymarket.com,查看 Fed 利率市场",
        llm=ChatOpenAI(model="gpt-4o"),
    )
    result = await agent.run()
    print(result)

asyncio.run(main())
```

### 使用自定义 LLM(推荐配置)

**⚠️ 重要**:your-provider API 是 **Anthropic 格式**,不是 OpenAI 格式!必须使用 `ChatAnthropic`。

```python
from browser_use.llm.anthropic.chat import ChatAnthropic

# your-provider API(Anthropic 兼容)✅ 推荐
llm = ChatAnthropic(
    model="claude-sonnet-4-6",
    base_url="https://your-anthropic-proxy.example.com",  # 注意:不加 /v1
    api_key="your-api-key",  # 或从 pass show api/your-provider 获取
)

agent = Agent(
    task="你的任务",
    llm=llm,
)
```

```python
# ❌ 错误用法:不要用 ChatOpenAI + your-provider
# from browser_use.llm.openai.chat import ChatOpenAI  # 这个不行!your-provider 不支持 OpenAI 格式
```

```python
# 如果使用 OpenAI 兼容 API(如 Provider-B),用 ChatOpenAI:
from browser_use.llm.openai.chat import ChatOpenAI
llm = ChatOpenAI(
    model="gpt-4o",
    base_url="https://ai.9w7.cn/v1",
    api_key="your-api-key",
)
```

### 使用 Chrome 浏览器

```python
from browser_use import Agent, BrowserProfile, BrowserSession

# 配置使用 Chrome
profile = BrowserProfile(
    executable_path="/usr/bin/google-chrome-stable",
    headless=True,
    disable_security=True,
)
session = BrowserSession(browser_profile=profile)

agent = Agent(
    task="任务描述",
    llm=llm,
    browser_session=session,
)
```

### 保存/加载登录态

```python
# 保存登录态
await browser_context.save_storage_state(path="polymarket_auth.json")

# 加载登录态
context = BrowserContextConfig(
    storage_state="polymarket_auth.json"
)
```

## 常用配置

### 最小化 Token 消耗

```python
agent = Agent(
    task="任务",
    llm=llm,
    use_vision=False,  # 禁用视觉,减少token
    max_actions_per_step=3,  # 限制每步操作数
    message_compaction=True,  # 消息压缩
)
```

### 调试模式

```python
agent = Agent(
    task="任务",
    llm=llm,
    headless=False,  # 显示浏览器
    slow_mo=1000,  # 慢放,每步延迟1秒
    save_conversation_path="debug_log/",  # 保存日志
)
```

### 提取结构化数据

```python
from pydantic import BaseModel

class MarketData(BaseModel):
    question: str
    yes_price: float
    no_price: float

agent = Agent(
    task="获取 Polymarket Fed 利率市场数据",
    llm=llm,
    output_model_schema=MarketData,
)
result = await agent.run()
# result 将是 MarketData 类型
```

## Polymarket 集成

### 查看市场

```python
agent = Agent(
    task="""
    1. 打开 https://polymarket.com/event/fed-decision-in-march-885
    2. 提取以下信息:
       - 市场问题
       - Yes 价格
       - No 价格
       - 交易量
    3. 返回 JSON 格式数据
    """,
    llm=llm,
)
```

### 执行交易

```python
agent = Agent(
    task="""
    1. 打开 Polymarket
    2. 连接钱包(如果需要)
    3. 导航到 Fed 利率市场
    4. 买入 $0.60 的 No(价格 ≥ 0.85)
    5. 确认交易
    """,
    llm=llm,
    sensitive_data={
        "wallet_address": "0x...",
    }
)
```

## 最佳实践

### 1. 任务描述要清晰

```python
# 好 ✅
task="打开 polymarket.com,找到 Fed 利率市场,提取 Yes/No 价格"

# 差 ❌
task="帮我看看那个市场"
```

### 2. 使用结构化输出

```python
from pydantic import BaseModel

class TradingResult(BaseModel):
    success: bool
    market: str
    action: str  # "buy_yes", "buy_no"
    amount: float
    price: float
    tx_hash: str | None

agent = Agent(
    task="执行交易...",
    output_model_schema=TradingResult,
)
```

### 3. 错误处理

```python
try:
    result = await agent.run()
except Exception as e:
    print(f"任务失败: {e}")
    # 可以使用 browser tool 作为后备
```

### 4. 复用登录态

```python
# 第一次登录后保存
# 后续直接加载,避免重复登录

browser_context = BrowserContextConfig(
    storage_state="~/.playwright-data/polymarket/auth.json"
)
```

## Token 消耗优化

### 对比(相同任务)

| 工具 | 平均 Token | 原因 |
|------|-----------|------|
| browser tool | ~5000-10000 | 每次快照全页面 |
| Playwright | ~1000-2000 | 需多次调试 |
| browser-use | ~2000-4000 | AI精简决策 |

### 优化技巧

1. **禁用视觉**:`use_vision=False`
2. **限制历史**:`max_history_items=10`
3. **压缩消息**:`message_compaction=True`
4. **减少步骤**:`max_actions_per_step=3`
5. **使用 Flash 模式**:`flash_mode=True`(快速模式)

## 完整示例

```python
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
from pydantic import BaseModel
import os

class MarketInfo(BaseModel):
    question: str
    yes_price: float
    no_price: float
    volume: str

async def check_polymarket():
    # 使用本地 LLM API
    llm = ChatOpenAI(
        model="claude-3-5-sonnet-20241022",
        base_url="https://your-anthropic-proxy.example.com",
        api_key=os.environ.get("XSC_API_KEY"),
    )
    
    agent = Agent(
        task="""
        访问 Polymarket Fed 利率市场:
        https://polymarket.com/event/fed-decision-in-march-885
        
        提取并返回:
        - 市场问题
        - Yes 价格(0-1)
        - No 价格(0-1)
        - 24h 交易量
        """,
        llm=llm,
        output_model_schema=MarketInfo,
        use_vision=False,
        max_actions_per_step=5,
    )
    
    result = await agent.run()
    return result

if __name__ == "__main__":
    asyncio.run(check_polymarket())
```

## 资源

- 官方文档: https://docs.browser-use.com
- GitHub: https://github.com/browser-use/browser-use
- 示例: `~/clawd/skills/browser-use/examples/`

## 快速命令

```bash
# 安装(已完成)
pip install browser-use

# 运行脚本
python3 script.py

# 测试
python3 -c "from browser_use import Agent; print('✅ OK')"
```

---

**记住**: browser-use 让浏览器操作更智能,省去调试选择器的痛苦!🚀

Related Skills

openclaw-browser-chain-debug

33
from aAAaqwq/AGI-Super-Team

Diagnose OpenClaw browser control failures including browser start timeouts, Chrome CDP startup failures, missing DISPLAY, browser profile launch issues, and gateway/browser environment mismatches. Use when browser automation, browser-based cron jobs, or profile openclaw fails to start, times out, or returns Request was aborted after browser steps. Also use when deciding whether a task should run with a profile browser versus an attach browser: prefer profile for unattended automation and recurring jobs; prefer attach when a human's already-open logged-in tab or manual cooperation is required.

fast-browser-use

33
from aAAaqwq/AGI-Super-Team

Use when the user wants extremely fast browser automation via fast-browser-use / fbu, especially for DOM-heavy pages, fast extraction, or browser tasks on macOS/Linux with Chrome installed.

browser-profile-guide

33
from aAAaqwq/AGI-Super-Team

OpenClaw Browser 配置文件系统完全指南 — 哪个 profile 何时用、如何选

browser-login-monitor

33
from aAAaqwq/AGI-Super-Team

> 浏览器登录安全监控——监测浏览器会话状态与登录安全

Agent Browser

33
from aAAaqwq/AGI-Super-Team

A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands.

wemp-operator

33
from aAAaqwq/AGI-Super-Team

> 微信公众号全功能运营——草稿/发布/评论/用户/素材/群发/统计/菜单/二维码 API 封装

Content & Documentation

zsxq-smart-publish

33
from aAAaqwq/AGI-Super-Team

Publish and manage content on 知识星球 (zsxq.com). Supports talk posts, Q&A, long articles, file sharing, digest/bookmark, homework tasks, and tag management. Use when publishing content to 知识星球, creating/editing posts, uploading files/images/audio, managing digests, batch publishing, or formatting content for 知识星球.

zoom-automation

33
from aAAaqwq/AGI-Super-Team

Automate Zoom meeting creation, management, recordings, webinars, and participant tracking via Rube MCP (Composio). Always search tools first for current schemas.

zoho-crm-automation

33
from aAAaqwq/AGI-Super-Team

Automate Zoho CRM tasks via Rube MCP (Composio): create/update records, search contacts, manage leads, and convert leads. Always search tools first for current schemas.

ziliu-publisher

33
from aAAaqwq/AGI-Super-Team

字流(Ziliu) - AI驱动的多平台内容分发工具。用于一次创作、智能适配排版、一键分发到16+平台(公众号/知乎/小红书/B站/抖音/微博/X等)。当用户需要多平台发布、内容排版、格式适配时使用。触发词:字流、ziliu、多平台发布、一键分发、内容分发、排版发布。

zhihu-post-skill

33
from aAAaqwq/AGI-Super-Team

> 知乎文章发布——知乎平台内容创作与发布自动化

zendesk-automation

33
from aAAaqwq/AGI-Super-Team

Automate Zendesk tasks via Rube MCP (Composio): tickets, users, organizations, replies. Always search tools first for current schemas.