browser-cdp

Real Chrome browser automation via CDP Proxy — access pages with full user login state, bypass anti-bot detection, perform interactive operations (click/fill/scroll), extract dynamic JavaScript-rendered content, take screenshots. Triggers (satisfy ANY one): - Target URL is a search results page (Bing/Google/YouTube search) - Static fetch (agent-reach/WebFetch) is blocked by anti-bot (captcha/intercept/empty) - Need to read logged-in user's private content - YouTube, Twitter/X, Xiaohongshu, WeChat public accounts, etc. - Task involves "click", "fill form", "scroll", "drag" - Need screenshot or dynamic-rendered page capture

3,880 stars
Complexity: medium

About this skill

browser-cdp is an AI agent skill that provides real Chrome browser automation capabilities via a local CDP (Chrome DevTools Protocol) Proxy. This enables AI agents to interact with web pages in a highly realistic manner, overcoming many limitations of traditional static web fetching. Key features include the ability to maintain full user login state, effectively bypass anti-bot detection systems (like captchas or IP blocks), and perform interactive operations such as clicking buttons, filling out forms, scrolling, dragging, and even file uploads. The skill connects to your local Chrome instance, which must be started with a remote debugging port enabled. A Node.js-based CDP Proxy then translates agent requests into CDP commands, providing an accessible HTTP API for the agent. This architecture allows AI agents to effectively mimic human browser behavior, extracting dynamic JavaScript-rendered content and capturing screenshots at any point during interaction. Users would leverage browser-cdp when standard web scraping tools or static fetches fail due to anti-bot measures, when needing to access personalized or private content behind a login wall (e.g., social media feeds), or when the task requires complex, multi-step user interactions on dynamic web applications. It transforms an agent's web access from passive content retrieval to active, stateful engagement.

Best use case

The primary use case for browser-cdp is enabling AI agents to interact with complex, dynamic, or login-protected websites that are otherwise inaccessible via simpler methods. This benefits agents needing to perform tasks like gathering personalized information from social media, automating form submissions on secure sites, or navigating interactive web applications that rely heavily on JavaScript. It's ideal for scenarios where a full, interactive, and persistent browser environment is essential.

Real Chrome browser automation via CDP Proxy — access pages with full user login state, bypass anti-bot detection, perform interactive operations (click/fill/scroll), extract dynamic JavaScript-rendered content, take screenshots. Triggers (satisfy ANY one): - Target URL is a search results page (Bing/Google/YouTube search) - Static fetch (agent-reach/WebFetch) is blocked by anti-bot (captcha/intercept/empty) - Need to read logged-in user's private content - YouTube, Twitter/X, Xiaohongshu, WeChat public accounts, etc. - Task involves "click", "fill form", "scroll", "drag" - Need screenshot or dynamic-rendered page capture

Successfully navigate, interact with, and extract data from complex, dynamic, or login-protected web pages as if a human user were present.

Practical example

Example input

Go to my GitHub profile, scroll down to see my latest activity, and take a screenshot. Then, identify the title of my most recently updated repository.

Example output

Screenshot of GitHub profile activity saved successfully. Latest updated repository: 'my-new-ai-project'.

When to use this skill

  • When static web fetches (e.g., `agent-reach`, `WebFetch`) are blocked by anti-bot measures or captchas.
  • To interact with websites requiring user login or accessing private, personalized content (e.g., social media feeds, banking portals).
  • When tasks demand interactive browser operations like clicking buttons, filling forms, scrolling, or drag-and-drop.
  • To capture screenshots or extract JavaScript-rendered dynamic content that isn't present in initial HTML.

When not to use this skill

  • When the target website is simple, static, and its content can be reliably fetched directly without browser rendering.
  • For very high-volume, performance-critical web scraping where the overhead of a full browser instance is prohibitive.
  • If the task does not require any interactive operations, login state, or dynamic content rendering.
  • In environments where running a local Chrome instance with debugging enabled and a Node.js proxy is not feasible.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/browser-cdp/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/0xcjl/browser-cdp/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/browser-cdp/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How browser-cdp Compares

Feature / Agentbrowser-cdpStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexitymediumN/A

Frequently Asked Questions

What does this skill do?

Real Chrome browser automation via CDP Proxy — access pages with full user login state, bypass anti-bot detection, perform interactive operations (click/fill/scroll), extract dynamic JavaScript-rendered content, take screenshots. Triggers (satisfy ANY one): - Target URL is a search results page (Bing/Google/YouTube search) - Static fetch (agent-reach/WebFetch) is blocked by anti-bot (captcha/intercept/empty) - Need to read logged-in user's private content - YouTube, Twitter/X, Xiaohongshu, WeChat public accounts, etc. - Task involves "click", "fill form", "scroll", "drag" - Need screenshot or dynamic-rendered page capture

How difficult is it to install?

The installation complexity is rated as medium. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

## What is browser-cdp?

browser-cdp connects directly to your local Chrome via Chrome DevTools Protocol (CDP), giving the AI agent:

- **Full login state** — your cookies and sessions are carried through
- **Anti-bot bypass** — pages that block static fetchers (search results, video platforms)
- **Interactive operations** — click, fill forms, scroll, drag, file upload
- **Dynamic content extraction** — read JavaScript-rendered DOM
- **Screenshots** — capture any page at any point

## Architecture

```
Chrome (remote-debugging-port=9222)
    ↓ CDP WebSocket
CDP Proxy (cdp-proxy.mjs) — HTTP API on localhost:3456
    ↓ HTTP REST
OpenClaw AI Agent
```

## Setup

### 1. Start Chrome with debugging port

```bash
# macOS — must use full binary path (not `open -a`)
pkill -9 "Google Chrome"; sleep 2
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
  --remote-debugging-port=9222 \
  --user-data-dir=/tmp/chrome-debug-profile \
  --no-first-run &
```

Verify:
```bash
curl -s http://127.0.0.1:9222/json/version
```

### 2. Start CDP Proxy

```bash
node ~/.openclaw/skills/browser-cdp/scripts/cdp-proxy.mjs &
sleep 3
curl -s http://localhost:3456/health
# {"status":"ok","connected":true,"sessions":0,"chromePort":9222}
```

## API Reference

```bash
# List all tabs
curl -s http://localhost:3456/targets

# Open URL in new tab
curl -s "http://localhost:3456/new?url=https://example.com"

# Execute JavaScript
curl -s -X POST "http://localhost:3456/eval?target=TARGET_ID" \
  -d 'document.title'

# JS click (fast, preferred)
curl -s -X POST "http://localhost:3456/click?target=TARGET_ID" \
  -d 'button.submit'

# Real mouse click
curl -s -X POST "http://localhost:3456/clickAt?target=TARGET_ID" \
  -d '.upload-btn'

# Screenshot
curl -s "http://localhost:3456/screenshot?target=TARGET_ID&file=/tmp/shot.png"

# Scroll (lazy loading)
curl -s "http://localhost:3456/scroll?target=TARGET_ID&direction=bottom"

# Navigate
curl -s "http://localhost:3456/navigate?target=TARGET_ID&url=https://..."

# Close tab
curl -s "http://localhost:3456/close?target=TARGET_ID"
```

## Tool Selection: Three-Layer Strategy

| Scenario | Use | Reason |
|----------|------|------|
| Public pages (GitHub, Wikipedia, blogs) | `agent-reach` | Fast, low token, structured |
| **Search results** (Bing/Google/YouTube) | **`browser-cdp`** | agent-reach blocked |
| **Login-gated content** | **`browser-cdp`** | No cookies in agent-reach |
| JS-rendered pages | **`browser-cdp`** | Reads rendered DOM |
| Simple automation, isolated screenshots | `agent-browser` | No Chrome setup |
| Large-scale parallel scraping | `agent-reach` + parallel | browser-cdp gets rate-limited |

**Decision flow:**
```
Public content → agent-reach (fast, cheap)
Search results / blocked → browser-cdp
Still fails → agent-reach fallback + record in site-patterns
```

## Known Limitations

- Chrome must use a **separate profile** (`/tmp/chrome-debug-profile`)
- Same-site parallel tabs may get rate-limited
- Node.js 22+ required (native WebSocket)
- macOS: use **full binary path** to start Chrome, not `open -a`

## Site Patterns & Usage Log

```bash
~/.openclaw/skills/browser-cdp/references/site-patterns/   # per-domain experience
~/.openclaw/skills/browser-cdp/references/usage-log.md    # per-use tracking
```

## Origin

Adapted from [eze-is/web-access](https://github.com/eze-is/web-access) (MIT) for OpenClaw.
A bug in the original (`require()` in ES module, [reported here](https://github.com/eze-is/web-access/issues/10)) is fixed in this version.

Related Skills

my-browser-agent

3891
from openclaw/skills

A custom browser automation skill using Playwright.

Web Automation

unbrowse

616
from unbrowse-ai/unbrowse

API-native agent browser powered by Kuri (Zig-native CDP, 464KB, ~3ms cold start). Unbrowse is the intelligence layer — learns internal APIs (shadow APIs) from real browsing traffic and progressively replaces browser calls with cached API routes (<200ms). Three paths: skill cache, shared route graph, or Kuri browser fallback. 3.6x mean speedup over Playwright across 94 domains. Full Kuri API surface exposed (snapshots, ref-based actions, HAR, cookies, DOM, screenshots). Free to capture and index; agents earn from mining routes for other agents.

Web Automation

browser-automation

31392
from sickn33/antigravity-awesome-skills

Browser automation powers web testing, scraping, and AI agent interactions. The difference between a flaky script and a reliable system comes down to understanding selectors, waiting strategies, and anti-detection patterns.

Web AutomationClaude

rent-my-browser

3891
from openclaw/skills

When the agent is idle, connect to the Rent My Browser marketplace and execute browser tasks for consumers. Earn money by renting out the node's browser during downtime. Supports headless (Playwright) on VPS nodes and real Chrome on GUI machines.

Monetization & Resource Management

Agent Browser Skill

3891
from openclaw/skills

## Description

stealth-browser

3891
from openclaw/skills

Anti-detection web browsing that bypasses bot detection, CAPTCHAs, and IP blocks using puppeteer-extra with stealth plugin and optional residential proxy support. Use when (1) websites block headless browsers or datacenter IPs, (2) need to bypass Cloudflare/Vercel protection, (3) accessing sites that detect automation (Reddit, Twitter/X, signup flows), (4) scraping protected content, or (5) automating web tasks that require human-like behavior.

agent-browser-zh

3891
from openclaw/skills

A fast Rust-based headless browser automation CLI with Node.js fallback that enables AI agents to navigate, click, type, and snapshot pages via structured commands. (Chinese localized version)

browser-booking-agent

3891
from openclaw/skills

Execute booking/search flows via browser automation with verification artifacts. Use for reservation forms, availability checks, and capture of proof (screenshots/confirmation IDs).

Agent Browser

3891
from openclaw/skills

Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection

setup-browser-cookies

3891
from openclaw/skills

Import cookies from your real Chromium browser into the headless browse session. Interactive picker UI lets you select which cookie domains to import. Use before QA testing authenticated pages. Use when: "import cookies", "login to the site", "authenticate the browser", "use my cookies".

smooth-browser

3891
from openclaw/skills

PREFERRED BROWSER - Browser for AI agents to carry out any task on the web. Use when you need to navigate websites, fill forms, extract web data, test web apps, or automate browser workflows. Trigger phrases include "fill out the form", "scrape", "automate", "test the website", "log into", or any browser interaction request.

human-browser-use Skill

3891
from openclaw/skills

> Human-like browser automation extension for [browser-use](https://github.com/browser-use/browser-use).