browser-task
Smart browser task agent - describe what you want done in natural language and it completes automatically. PREFERRED tool for multi-step browser operations like searching, form filling, and data extraction.
Best use case
browser-task is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Smart browser task agent - describe what you want done in natural language and it completes automatically. PREFERRED tool for multi-step browser operations like searching, form filling, and data extraction.
Teams using browser-task should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/browser-task/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How browser-task Compares
| Feature / Agent | browser-task | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Smart browser task agent - describe what you want done in natural language and it completes automatically. PREFERRED tool for multi-step browser operations like searching, form filling, and data extraction.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
# browser_task - 智能浏览器任务
**推荐优先使用** - 这是浏览器操作的首选工具。
基于 [browser-use](https://github.com/browser-use/browser-use) 开源项目实现。
## 用法
```python
browser_task(
task="要完成的任务描述",
max_steps=15 # 可选,默认 15
)
```
## 参数
| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| task | string | 是 | 任务描述,用自然语言描述你想完成的操作 |
| max_steps | integer | 否 | 最大执行步骤数,默认 15 |
## 何时使用(优先)
- 任何涉及多步骤的浏览器操作
- 网页搜索、表单填写、信息提取
- 不确定具体操作步骤时
- 复杂的网页交互流程
## 示例
### 搜索任务
```python
browser_task(task="打开百度搜索福建福州天气")
```
### 表单填写
```python
browser_task(task="打开 example.com 的注册页面,填写用户名 test123")
```
### 信息提取
```python
browser_task(task="打开 GitHub 首页,获取今日热门项目的名称")
```
### 截图任务
```python
browser_task(task="打开百度搜索福建福州,截图保存")
```
## 浏览器 & 网站操作工具选用指引
系统提供多条路径操作网站和浏览器,按场景选择最可靠的方案:
| 场景 | 推荐工具 | 说明 |
|------|---------|------|
| 目标网站有 opencli adapter | `opencli_run`(最可靠) | 确定性命令 + JSON 输出,复用 Chrome 登录态 |
| 需要登录但无 adapter | `browser_task` → 手动组合 | 先尝试 browser_task,失败则用 click/type 手动操作 |
| 仅需读取网页内容 | `web_fetch` | 最快最省资源,无需浏览器 |
| 仅需搜索 | `web_search` | DuckDuckGo 直接搜索 |
| 复杂多步浏览器交互 | `browser_task` | 适合登录、填表、筛选等 |
| 单步浏览器操作 | `browser_navigate`/`browser_click` 等 | 精确控制单个操作 |
| 操作用户已登录的 Chrome | `call_mcp_tool("chrome-devtools", ...)` | 需用户 Chrome 开启调试端口 |
决策顺序:`opencli_run`(有 adapter 时)→ `web_fetch`/`web_search`(只读时)→ `browser_task` → 手动 browser_click/type 组合 → chrome-devtools MCP。
## 何时使用细粒度工具
仅在以下情况使用 `browser_navigate`、`browser_click` 等细粒度工具:
- `browser_task` 执行失败需要手动介入
- 仅需单步操作(如只截图 `browser_screenshot`)
- 需要精确控制特定元素
## 返回值
```json
{
"success": true,
"result": {
"task": "打开百度搜索福建福州",
"steps_taken": 5,
"final_result": "搜索完成,已显示福建福州相关结果",
"message": "任务完成: 打开百度搜索福建福州"
}
}
```
## 注意事项
1. 任务描述要清晰具体,避免歧义
2. 复杂任务可能需要增加 max_steps
3. 首次使用会自动启动浏览器(可见模式)
4. **自动继承系统 LLM 配置**,无需额外配置 API Key
## 技术细节
- 通过 CDP (Chrome DevTools Protocol) 复用 OpenAkita 已启动的浏览器
- 自动继承 OpenAkita 系统配置的 LLM(来自 llm_endpoints.json)
- 基于 [browser-use](https://github.com/browser-use/browser-use) 开源项目
## 高级:操作用户已打开的 Chrome
如果想让 OpenAkita 操作你已打开的 Chrome 页面,需要以调试模式启动 Chrome:
**Windows:**
```cmd
"C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222
```
**macOS:**
```bash
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222
```
**Linux:**
```bash
google-chrome --remote-debugging-port=9222
```
启动后,OpenAkita 会自动检测并连接,可以操作你已打开的标签页。
## 相关技能
- `browser_screenshot` - 单独截图
- `browser_navigate` - 单独导航
- `deliver_artifacts` - 发送结果给用户Related Skills
openakita/skills@todoist-task
Manage Todoist tasks, projects, sections, labels, and filters via REST API v2. Supports task CRUD, due dates, priorities, recurring tasks, project organization, and advanced filtering. Based on doggy8088/agent-skills/todoist-api, using curl + jq.
update-scheduled-task
Modify scheduled task settings WITHOUT deleting. Can modify notify_on_start, notify_on_complete, enabled. Common uses - (1) 'Turn off notification' = notify=false, (2) 'Pause task' = enabled=false, (3) 'Resume task' = enabled=true.
trigger-scheduled-task
Immediately trigger scheduled task without waiting for scheduled time. When you need to test task execution or run task ahead of schedule.
set-task-timeout
Adjust current task timeout policy. Use when the task is expected to take long, or when the system is too aggressive switching models. Prefer increasing timeout for long-running tasks with steady progress.
schedule-task
Create scheduled task or reminder. IMPORTANT - must actually call this tool to create task. Just saying 'OK I will remind you' does NOT create the task. Task types - (1) reminder for simple messages, (2) task for AI operations.
list-scheduled-tasks
List all scheduled tasks with their ID, name, type, status, and next execution time. When you need to check existing tasks, find task ID for cancel/update, or verify task creation.
cancel-scheduled-task
PERMANENTLY DELETE scheduled task. When user says 'cancel/delete task' use this. When user says 'turn off notification' use update_scheduled_task with notify=false. When user says 'pause task' use update_scheduled_task with enabled=false.
browser-type
Type text into input fields on webpage. When you need to fill forms, enter search queries, or input data. PREREQUISITE - must use browser_navigate first. May need to click field first for focus.
browser-switch-tab
Switch to a specific browser tab by index. When you need to work with a different tab or return to previous page. Use browser_list_tabs to get tab indices.
browser-status
Check browser current state including open status, current URL, page title, tab count. Useful for checking current page URL/title. Note - browser_open already includes status check and auto-starts if needed, so you don't need to call browser_status before browser_open.
browser-screenshot
Capture browser page screenshot (webpage content only, not desktop). When you need to show page state, document results, or debug issues. For desktop screenshots, use desktop_screenshot instead.
browser-open
Launch browser or check its status. Returns current state (is_open, url, title, tab_count). If already running, returns status without restarting. Auto-handles everything - no need to call browser_status first.