captcha-relay

Human-in-the-loop CAPTCHA solving with two modes: screenshot (default, zero infrastructure) and token relay (requires network access). Screenshot mode captures the page with a grid overlay, sends it to the human, and injects clicks based on their reply. Token relay mode detects CAPTCHA type + sitekey, serves the real widget on a relay page for native solving, and injects the token via CDP.

3,891 stars
Complexity: medium

About this skill

The captcha-relay skill empowers AI agents to overcome CAPTCHA challenges by integrating a human-in-the-loop solution. It provides two distinct modes: 'Screenshot Mode' (default) and 'Token Relay Mode'. Screenshot mode is entirely self-contained, capturing the current webpage with a grid overlay, sending the image to a human (e.g., via Telegram), and then injecting clicks on the page based on the human's reply. This mode requires no additional infrastructure and can handle any CAPTCHA type. For more robust and native CAPTCHA solutions, the 'Token Relay Mode' detects the CAPTCHA type and sitekey, serves the real CAPTCHA widget on a temporary relay page, and allows a human to solve it natively. The resulting token is then injected back into the original page via Chrome DevTools Protocol (CDP). This mode is ideal for reCAPTCHA v2, hCaptcha, or Turnstile, but requires network accessibility to the relay server (e.g., via Tailscale or a tunnel). Both modes aim to seamlessly integrate human assistance into automated workflows where CAPTCHAs would otherwise halt progress.

Best use case

This skill is primarily used by developers, QA testers, and automation engineers who need to navigate or scrape websites protected by CAPTCHAs. It's ideal for scenarios where fully automated CAPTCHA-solving services are insufficient, too expensive, or where a human touch is preferred for accuracy and reliability in bypassing these security measures.

Human-in-the-loop CAPTCHA solving with two modes: screenshot (default, zero infrastructure) and token relay (requires network access). Screenshot mode captures the page with a grid overlay, sends it to the human, and injects clicks based on their reply. Token relay mode detects CAPTCHA type + sitekey, serves the real widget on a relay page for native solving, and injects the token via CDP.

The CAPTCHA on the target page will be successfully solved, either by injecting the correct clicks or a valid CAPTCHA token, allowing the agent to proceed with its intended web automation task.

Practical example

Example input

Solve the CAPTCHA on the current page using screenshot mode. If it's a reCAPTCHA, try token relay mode if Tailscale is configured.

Example output

Screenshot captured and sent to human. Awaiting response (e.g., '1,4,7').

Alternatively:
CAPTCHA token successfully retrieved and injected via relay mode.

When to use this skill

  • When encountering CAPTCHAs during web scraping or automated browsing tasks.
  • To integrate human verification into an AI agent's workflow for accessing web resources.
  • When an accurate and reliable CAPTCHA solution is needed, even if it requires human involvement.
  • For testing automation scripts against pages protected by various CAPTCHA types.

When not to use this skill

  • When a completely automated, infrastructure-free CAPTCHA solution is strictly required (this skill is human-in-the-loop).
  • If no human is available to solve the CAPTCHA in a timely manner.
  • For extremely high-volume, real-time CAPTCHA solving where human latency is unacceptable.
  • When you need to bypass CAPTCHAs without any external interaction or setup.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/captcha-relay/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/0xclanky/captcha-relay/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/captcha-relay/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How captcha-relay Compares

Feature / Agentcaptcha-relayStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexitymediumN/A

Frequently Asked Questions

What does this skill do?

Human-in-the-loop CAPTCHA solving with two modes: screenshot (default, zero infrastructure) and token relay (requires network access). Screenshot mode captures the page with a grid overlay, sends it to the human, and injects clicks based on their reply. Token relay mode detects CAPTCHA type + sitekey, serves the real widget on a relay page for native solving, and injects the token via CDP.

How difficult is it to install?

The installation complexity is rated as medium. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# CAPTCHA Relay v2

Solve CAPTCHAs by relaying them to a human. Two modes available.

## Modes

### Screenshot Mode (default) — No infrastructure needed

Grid overlay screenshot → send image to human via Telegram → human replies with cell numbers → inject clicks.

- **Zero setup** beyond the skill itself. No Tailscale, no tunnels, no relay server.
- Works for **any** CAPTCHA type (reCAPTCHA, hCaptcha, sliders, text, etc.)
- Uses `sharp` for image processing + CDP for screenshots and click injection.

```bash
node index.js                       # screenshot mode (default)
node index.js --mode screenshot     # explicit
node index.js --screenshot          # legacy alias
```

```js
const { solveCaptchaScreenshot } = require('./index');
const capture = await solveCaptchaScreenshot({ cdpPort: 18800 });
// capture.imagePath — annotated screenshot to send to human
// capture.prompt — text prompt for the human
```

### Token Relay Mode — Requires network access

Detects CAPTCHA type + sitekey → serves real widget on relay page → human solves natively → token injected via CDP.

- Requires **Tailscale** or a **tunnel** (localtunnel/cloudflared) so the human's device can reach the relay server.
- Produces a proper CAPTCHA token — more reliable for reCAPTCHA v2, hCaptcha, Turnstile.
- Best when you have Tailscale already set up.

```bash
node index.js --mode relay              # with localtunnel
node index.js --mode relay --no-tunnel  # with Tailscale/LAN
```

```js
const { solveCaptcha } = require('./index');
const result = await solveCaptcha({ cdpPort: 18800, useTunnel: false });
// result.relayUrl — URL to send to human
// result.token — solved CAPTCHA token
```

## When to Use Each

| Scenario | Mode |
|----------|------|
| Quick & easy, no setup | `screenshot` |
| Any CAPTCHA type (sliders, text, etc.) | `screenshot` |
| Known CAPTCHA with sitekey (reCAPTCHA, hCaptcha, Turnstile) | `relay` |
| Tailscale already configured | `relay` |
| No network access to host | `screenshot` |

## CLI Flags

| Flag | Default | Description |
|------|---------|-------------|
| `--mode screenshot\|relay` | `screenshot` | Select solving mode |
| `--screenshot` | — | Alias for `--mode screenshot` |
| `--no-inject` | inject | Return token without injecting into browser |
| `--no-tunnel` | tunnel | Skip tunnel, use local/Tailscale IP (relay mode) |
| `--timeout N` | 120 | Timeout in seconds |
| `--cdp-port N` | 18800 | Chrome DevTools Protocol port |

## Agent Workflow

### Screenshot mode (simplest)

1. Call `solveCaptchaScreenshot({ cdpPort })` 
2. Send `capture.imagePath` to human via `message` tool with `capture.prompt`
3. Human replies with cell numbers (e.g. "1,3,5,7")
4. Call `injectGridClicks(cdpPort, capture, selectedCells)` to click those cells

### Relay mode

1. Call `solveCaptcha({ useTunnel: false })` (Tailscale) or `solveCaptcha()` (tunnel)
2. Send `result.relayUrl` to human via `message` tool
3. Wait — resolves when human completes the CAPTCHA
4. Token is auto-injected; continue automation

## Requirements

- Chrome/Chromium with `--remote-debugging-port=18800`
- Node.js 18+ and `npm install` (deps: ws, sharp)
- **Relay mode only:** Tailscale or internet for tunnel

Related Skills

find-skills

3891
from openclaw/skills

Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.

General Utilities

filesystem

3891
from openclaw/skills

Advanced filesystem operations for listing files, searching content, batch processing, and directory analysis. Supports recursive search, file type filtering, size analysis, and batch operations like copy/move/delete. Use when you need to: list directory contents, search for files by name or content, analyze directory structures, perform batch file operations, or analyze file sizes and distribution.

General Utilities

Budget & Expense Tracker — AI Agent Financial Command Center

3891
from openclaw/skills

Track every dollar, enforce budgets, spot spending patterns, and build wealth — all through natural conversation with your AI agent.

General Utilities

yt-dlp

3891
from openclaw/skills

A robust CLI wrapper for yt-dlp to download videos, playlists, and audio from YouTube and thousands of other sites. Supports format selection, quality control, metadata embedding, and cookie authentication.

General Utilities

time-checker

3891
from openclaw/skills

Check accurate current time, date, and timezone information for any location worldwide using time.is. Use when the user asks "what time is it in X", "current time in Y", or needs to verify timezone offsets.

General Utilities

pihole-ctl

3891
from openclaw/skills

Manage and monitor local Pi-hole instance. Query FTL database for statistics (blocked ads, top clients) and control service via CLI. Use when user asks "how many ads blocked", "pihole status", or "update gravity".

General Utilities

mermaid-architect

3891
from openclaw/skills

Generate beautiful, hand-drawn Mermaid diagrams with robust syntax (quoted labels, ELK layout). Use this skill when the user asks for "diagram", "flowchart", "sequence diagram", or "visualize this process".

General Utilities

memory-cache

3891
from openclaw/skills

High-performance temporary storage system using Redis. Supports namespaced keys (mema:*), TTL management, and session context caching. Use for: (1) Saving agent state, (2) Caching API results, (3) Sharing data between sub-agents.

General Utilities

mema

3891
from openclaw/skills

Mema's personal brain - SQLite metadata index for documents and Redis short-term context buffer. Use for organizing workspace knowledge paths and managing ephemeral session state.

General Utilities

file-organizer-skill

3891
from openclaw/skills

Organize files in directories by grouping them into folders based on their extensions or date. Includes Dry-Run, Recursive, and Undo capabilities.

General Utilities

media-compress

3891
from openclaw/skills

Compress and convert images and videos using ffmpeg. Use when the user wants to reduce file size, change format, resize, or optimize media files. Handles common formats like JPG, PNG, WebP, MP4, MOV, WebM. Triggers on phrases like "compress image", "compress video", "reduce file size", "convert to webp/mp4", "resize image", "make image smaller", "batch compress", "optimize media".

General Utilities

edge-tts

3891
from openclaw/skills

Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.

General Utilities