browser-use-local
Use when you need browser automation via the browser-use CLI or Python code in this OpenClaw container/host: open pages, click/type, take screenshots, extract HTML/links, or run an Agent with an OpenAI-compatible LLM (e.g. Moonshot/Kimi) using a custom base_url. Also use for debugging browser-use sessions (state empty, page readiness timeouts), and for extracting login QR codes from demo/login pages via screenshots or HTML data:image.
Best use case
browser-use-local is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Use when you need browser automation via the browser-use CLI or Python code in this OpenClaw container/host: open pages, click/type, take screenshots, extract HTML/links, or run an Agent with an OpenAI-compatible LLM (e.g. Moonshot/Kimi) using a custom base_url. Also use for debugging browser-use sessions (state empty, page readiness timeouts), and for extracting login QR codes from demo/login pages via screenshots or HTML data:image.
Teams using browser-use-local should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/browser-use-local/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How browser-use-local Compares
| Feature / Agent | browser-use-local | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Use when you need browser automation via the browser-use CLI or Python code in this OpenClaw container/host: open pages, click/type, take screenshots, extract HTML/links, or run an Agent with an OpenAI-compatible LLM (e.g. Moonshot/Kimi) using a custom base_url. Also use for debugging browser-use sessions (state empty, page readiness timeouts), and for extracting login QR codes from demo/login pages via screenshots or HTML data:image.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# browser-use (local) playbook
## Default constraints in this environment
- Prefer **browser-use** (CLI/Python) over OpenClaw `browser` tool here; OpenClaw `browser` may fail if no supported system browser is present.
- Use **persistent sessions** to do multi-step flows: `--session <name>`.
## Quick CLI workflow (non-agent)
1) Open
```bash
browser-use --session demo open https://example.com
```
2) Inspect (sometimes `state` returns 0 elements on heavy/JS sites)
```bash
browser-use --session demo --json state | jq '.data | {url,title,elements:(.elements|length)}'
```
3) Screenshot (always works; best debugging primitive)
```bash
browser-use --session demo screenshot /home/node/.openclaw/workspace/page.png
```
4) HTML for link discovery (works even when `state` is empty)
```bash
browser-use --session demo --json get html > /tmp/page_html.json
python3 - <<'PY'
import json,re
html=json.load(open('/tmp/page_html.json')).get('data',{}).get('html','')
urls=set(re.findall(r"https?://[^\s\"'<>]+", html))
for u in sorted([u for u in urls if any(k in u for k in ['demo','login','console','qr','qrcode'])])[:200]:
print(u)
PY
```
5) Lightweight DOM queries via JS (useful when `state` is empty)
```bash
browser-use --session demo --json eval "location.href"
browser-use --session demo --json eval "document.title"
```
## Agent workflow with OpenAI-compatible LLM (Moonshot/Kimi)
Use Python for Agent runs when the CLI `run` path requires Browser-Use cloud keys or when you need strict control over LLM parameters.
### Minimal working Kimi example
Create `.env` (or export env vars) with:
- `OPENAI_API_KEY=...`
- `OPENAI_BASE_URL=https://api.moonshot.cn/v1`
Then run the bundled script:
```bash
source /home/node/.openclaw/workspace/.venv-browser-use/bin/activate
python /home/node/.openclaw/workspace/skills/browser-use-local/scripts/run_agent_kimi.py
```
**Kimi/Moonshot quirks observed in practice** (fixes):
- `temperature` must be `1` for `kimi-k2.5`.
- `frequency_penalty` must be `0` for `kimi-k2.5`.
- Moonshot can reject strict JSON Schema used for structured output. Enable:
- `remove_defaults_from_schema=True`
- `remove_min_items_from_schema=True`
If you get a 400 error mentioning `response_format.json_schema ... keyword 'default' is not allowed` or `min_items unsupported`, those two flags are the first thing to set.
## QR code extraction (login/demo pages)
### Preferred order
1) **Screenshot the page** and crop candidate regions (fast, robust).
2) If HTML contains `data:image/png;base64,...`, extract and decode it.
### Crop candidates
Use `scripts/crop_candidates.py` to generate multiple likely QR crops from a screenshot.
```bash
source /home/node/.openclaw/workspace/.venv-browser-use/bin/activate
python skills/browser-use-local/scripts/crop_candidates.py \
--in /home/node/.openclaw/workspace/login.png \
--outdir /home/node/.openclaw/workspace/qr_crops
```
### Extract base64-embedded images from HTML
```bash
source /home/node/.openclaw/workspace/.venv-browser-use/bin/activate
browser-use --session demo --json get html > /tmp/page_html.json
python skills/browser-use-local/scripts/extract_data_images.py \
--in /tmp/page_html.json \
--outdir /home/node/.openclaw/workspace/data_imgs
```
## Troubleshooting
- **`state` shows `elements: 0`**: use `get html` + regex discovery, plus screenshots; use `eval` to query DOM.
- **Page readiness timeout warnings**: usually harmless; rely on screenshot + HTML.
- **CLI flags order**: global flags go *before* the subcommand:
- ✅ `browser-use --browser chromium --json open https://...`
- ❌ `browser-use open https://... --browser chromium`Related Skills
stealth-browser
Anti-bot browser automation using Camoufox and Nodriver. Bypasses Cloudflare Turnstile, Datadome, and aggressive anti-bot on sites like Airbnb and Yelp. Use when standard Playwright/Selenium gets blocked.
camoufox-stealth-browser
C++ level anti-bot browser automation using Camoufox (patched Firefox) in isolated containers. Bypasses Cloudflare Turnstile, Datadome, Airbnb, Yelp. Superior to Chrome-based solutions (undetected-chromedriver, puppeteer-stealth) which only patch at JS level. Use when standard Playwright/Selenium gets blocked.
local-first-llm
Routes LLM requests to a local model (Ollama, LM Studio, llamafile) before falling back to cloud APIs.
qwen3-tts-local-inference
Generate speech from text using Qwen3-TTS via direct Python inference — no server required.
browser-use
Cloud browser automation via Browser Use API. Use when you need AI-driven web browsing, scraping, form filling, or multi-step web tasks without local browser control. Triggers on "browser use", "cloud browser", "scrape website", "automate web task", or when local browser isn't available/suitable.
local-system-info
Return system metrics (CPU, RAM, disk, processes) using psutil.
iyeque-local-system-info
Return system metrics (CPU, RAM, disk, processes) using psutil.
whisper-mlx-local
Free local speech-to-text for Telegram and WhatsApp using MLX Whisper on Apple Silicon. Private, no API costs.
next-browser
Use Nextbrowser cloud API to spin up cloud browsers for Openclaw to run autonomous browser tasks. Primary use is creating browser sessions with profiles (persisted logins/cookies) that Openclaw can control to manage social media and other online accounts. Secondary use is running task subagents for fast autonomous browser automation under residential proxy, browser stealth, and CAPTCHA solving capability. Docs at docs.nextbrowser.com.
super-browser
**The ultimate browser automation framework.** Combines the best of 8 top-rated browser skills.
parakeet-local-asr
Install and operate local NVIDIA Parakeet ASR for OpenClaw with an OpenAI-compatible transcription API.
Agent Browser
A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands.