browser-task

Smart browser task agent - describe what you want done in natural language and it completes automatically. PREFERRED tool for multi-step browser operations like searching, form filling, and data extraction.

1,592 stars

Best use case

browser-task is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Smart browser task agent - describe what you want done in natural language and it completes automatically. PREFERRED tool for multi-step browser operations like searching, form filling, and data extraction.

Teams using browser-task should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/browser-task/SKILL.md --create-dirs "https://raw.githubusercontent.com/openakita/openakita/main/skills/system/browser-task/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/browser-task/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How browser-task Compares

Feature / Agentbrowser-taskStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Smart browser task agent - describe what you want done in natural language and it completes automatically. PREFERRED tool for multi-step browser operations like searching, form filling, and data extraction.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# browser_task - 智能浏览器任务

**推荐优先使用** - 这是浏览器操作的首选工具。

基于 [browser-use](https://github.com/browser-use/browser-use) 开源项目实现。

## 用法

```python
browser_task(
    task="要完成的任务描述",
    max_steps=15  # 可选,默认 15
)
```

## 参数

| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| task | string | 是 | 任务描述,用自然语言描述你想完成的操作 |
| max_steps | integer | 否 | 最大执行步骤数,默认 15 |

## 何时使用(优先)

- 任何涉及多步骤的浏览器操作
- 网页搜索、表单填写、信息提取
- 不确定具体操作步骤时
- 复杂的网页交互流程

## 示例

### 搜索任务
```python
browser_task(task="打开百度搜索福建福州天气")
```

### 表单填写
```python
browser_task(task="打开 example.com 的注册页面,填写用户名 test123")
```

### 信息提取
```python
browser_task(task="打开 GitHub 首页,获取今日热门项目的名称")
```

### 截图任务
```python
browser_task(task="打开百度搜索福建福州,截图保存")
```

## 浏览器 & 网站操作工具选用指引

系统提供多条路径操作网站和浏览器,按场景选择最可靠的方案:

| 场景 | 推荐工具 | 说明 |
|------|---------|------|
| 目标网站有 opencli adapter | `opencli_run`(最可靠) | 确定性命令 + JSON 输出,复用 Chrome 登录态 |
| 需要登录但无 adapter | `browser_task` → 手动组合 | 先尝试 browser_task,失败则用 click/type 手动操作 |
| 仅需读取网页内容 | `web_fetch` | 最快最省资源,无需浏览器 |
| 仅需搜索 | `web_search` | DuckDuckGo 直接搜索 |
| 复杂多步浏览器交互 | `browser_task` | 适合登录、填表、筛选等 |
| 单步浏览器操作 | `browser_navigate`/`browser_click` 等 | 精确控制单个操作 |
| 操作用户已登录的 Chrome | `call_mcp_tool("chrome-devtools", ...)` | 需用户 Chrome 开启调试端口 |

决策顺序:`opencli_run`(有 adapter 时)→ `web_fetch`/`web_search`(只读时)→ `browser_task` → 手动 browser_click/type 组合 → chrome-devtools MCP。

## 何时使用细粒度工具

仅在以下情况使用 `browser_navigate`、`browser_click` 等细粒度工具:

- `browser_task` 执行失败需要手动介入
- 仅需单步操作(如只截图 `browser_screenshot`)
- 需要精确控制特定元素

## 返回值

```json
{
    "success": true,
    "result": {
        "task": "打开百度搜索福建福州",
        "steps_taken": 5,
        "final_result": "搜索完成,已显示福建福州相关结果",
        "message": "任务完成: 打开百度搜索福建福州"
    }
}
```

## 注意事项

1. 任务描述要清晰具体,避免歧义
2. 复杂任务可能需要增加 max_steps
3. 首次使用会自动启动浏览器(可见模式)
4. **自动继承系统 LLM 配置**,无需额外配置 API Key

## 技术细节

- 通过 CDP (Chrome DevTools Protocol) 复用 OpenAkita 已启动的浏览器
- 自动继承 OpenAkita 系统配置的 LLM(来自 llm_endpoints.json)
- 基于 [browser-use](https://github.com/browser-use/browser-use) 开源项目

## 高级:操作用户已打开的 Chrome

如果想让 OpenAkita 操作你已打开的 Chrome 页面,需要以调试模式启动 Chrome:

**Windows:**
```cmd
"C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222
```

**macOS:**
```bash
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
```

**Linux:**
```bash
google-chrome --remote-debugging-port=9222
```

启动后,OpenAkita 会自动检测并连接,可以操作你已打开的标签页。

## 相关技能

- `browser_screenshot` - 单独截图
- `browser_navigate` - 单独导航
- `deliver_artifacts` - 发送结果给用户

Related Skills

openakita/skills@todoist-task

1592
from openakita/openakita

Manage Todoist tasks, projects, sections, labels, and filters via REST API v2. Supports task CRUD, due dates, priorities, recurring tasks, project organization, and advanced filtering. Based on doggy8088/agent-skills/todoist-api, using curl + jq.

update-scheduled-task

1592
from openakita/openakita

Modify scheduled task settings WITHOUT deleting. Can modify notify_on_start, notify_on_complete, enabled. Common uses - (1) 'Turn off notification' = notify=false, (2) 'Pause task' = enabled=false, (3) 'Resume task' = enabled=true.

trigger-scheduled-task

1592
from openakita/openakita

Immediately trigger scheduled task without waiting for scheduled time. When you need to test task execution or run task ahead of schedule.

set-task-timeout

1592
from openakita/openakita

Adjust current task timeout policy. Use when the task is expected to take long, or when the system is too aggressive switching models. Prefer increasing timeout for long-running tasks with steady progress.

schedule-task

1592
from openakita/openakita

Create scheduled task or reminder. IMPORTANT - must actually call this tool to create task. Just saying 'OK I will remind you' does NOT create the task. Task types - (1) reminder for simple messages, (2) task for AI operations.

list-scheduled-tasks

1592
from openakita/openakita

List all scheduled tasks with their ID, name, type, status, and next execution time. When you need to check existing tasks, find task ID for cancel/update, or verify task creation.

cancel-scheduled-task

1592
from openakita/openakita

PERMANENTLY DELETE scheduled task. When user says 'cancel/delete task' use this. When user says 'turn off notification' use update_scheduled_task with notify=false. When user says 'pause task' use update_scheduled_task with enabled=false.

browser-type

1592
from openakita/openakita

Type text into input fields on webpage. When you need to fill forms, enter search queries, or input data. PREREQUISITE - must use browser_navigate first. May need to click field first for focus.

browser-switch-tab

1592
from openakita/openakita

Switch to a specific browser tab by index. When you need to work with a different tab or return to previous page. Use browser_list_tabs to get tab indices.

browser-status

1592
from openakita/openakita

Check browser current state including open status, current URL, page title, tab count. Useful for checking current page URL/title. Note - browser_open already includes status check and auto-starts if needed, so you don't need to call browser_status before browser_open.

browser-screenshot

1592
from openakita/openakita

Capture browser page screenshot (webpage content only, not desktop). When you need to show page state, document results, or debug issues. For desktop screenshots, use desktop_screenshot instead.

browser-open

1592
from openakita/openakita

Launch browser or check its status. Returns current state (is_open, url, title, tab_count). If already running, returns status without restarting. Auto-handles everything - no need to call browser_status first.