captcha-relay
Human-in-the-loop CAPTCHA solving with two modes: screenshot (default, zero infrastructure) and token relay (requires network access). Screenshot mode captures the page with a grid overlay, sends it to the human, and injects clicks based on their reply. Token relay mode detects CAPTCHA type + sitekey, serves the real widget on a relay page for native solving, and injects the token via CDP.
About this skill
The captcha-relay skill empowers AI agents to overcome CAPTCHA challenges by integrating a human-in-the-loop solution. It provides two distinct modes: 'Screenshot Mode' (default) and 'Token Relay Mode'. Screenshot mode is entirely self-contained, capturing the current webpage with a grid overlay, sending the image to a human (e.g., via Telegram), and then injecting clicks on the page based on the human's reply. This mode requires no additional infrastructure and can handle any CAPTCHA type. For more robust and native CAPTCHA solutions, the 'Token Relay Mode' detects the CAPTCHA type and sitekey, serves the real CAPTCHA widget on a temporary relay page, and allows a human to solve it natively. The resulting token is then injected back into the original page via Chrome DevTools Protocol (CDP). This mode is ideal for reCAPTCHA v2, hCaptcha, or Turnstile, but requires network accessibility to the relay server (e.g., via Tailscale or a tunnel). Both modes aim to seamlessly integrate human assistance into automated workflows where CAPTCHAs would otherwise halt progress.
Best use case
This skill is primarily used by developers, QA testers, and automation engineers who need to navigate or scrape websites protected by CAPTCHAs. It's ideal for scenarios where fully automated CAPTCHA-solving services are insufficient, too expensive, or where a human touch is preferred for accuracy and reliability in bypassing these security measures.
Human-in-the-loop CAPTCHA solving with two modes: screenshot (default, zero infrastructure) and token relay (requires network access). Screenshot mode captures the page with a grid overlay, sends it to the human, and injects clicks based on their reply. Token relay mode detects CAPTCHA type + sitekey, serves the real widget on a relay page for native solving, and injects the token via CDP.
The CAPTCHA on the target page will be successfully solved, either by injecting the correct clicks or a valid CAPTCHA token, allowing the agent to proceed with its intended web automation task.
Practical example
Example input
Solve the CAPTCHA on the current page using screenshot mode. If it's a reCAPTCHA, try token relay mode if Tailscale is configured.
Example output
Screenshot captured and sent to human. Awaiting response (e.g., '1,4,7'). Alternatively: CAPTCHA token successfully retrieved and injected via relay mode.
When to use this skill
- When encountering CAPTCHAs during web scraping or automated browsing tasks.
- To integrate human verification into an AI agent's workflow for accessing web resources.
- When an accurate and reliable CAPTCHA solution is needed, even if it requires human involvement.
- For testing automation scripts against pages protected by various CAPTCHA types.
When not to use this skill
- When a completely automated, infrastructure-free CAPTCHA solution is strictly required (this skill is human-in-the-loop).
- If no human is available to solve the CAPTCHA in a timely manner.
- For extremely high-volume, real-time CAPTCHA solving where human latency is unacceptable.
- When you need to bypass CAPTCHAs without any external interaction or setup.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/captcha-relay/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How captcha-relay Compares
| Feature / Agent | captcha-relay | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | medium | N/A |
Frequently Asked Questions
What does this skill do?
Human-in-the-loop CAPTCHA solving with two modes: screenshot (default, zero infrastructure) and token relay (requires network access). Screenshot mode captures the page with a grid overlay, sends it to the human, and injects clicks based on their reply. Token relay mode detects CAPTCHA type + sitekey, serves the real widget on a relay page for native solving, and injects the token via CDP.
How difficult is it to install?
The installation complexity is rated as medium. You can find the installation instructions above.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Top AI Agents for Productivity
See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.
Best AI Skills for ChatGPT
Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.
AI Agents for Freelancers
Browse AI agent skills for freelancers handling client research, proposals, outreach, delivery systems, documentation, and repeatable admin work.
SKILL.md Source
# CAPTCHA Relay v2
Solve CAPTCHAs by relaying them to a human. Two modes available.
## Modes
### Screenshot Mode (default) — No infrastructure needed
Grid overlay screenshot → send image to human via Telegram → human replies with cell numbers → inject clicks.
- **Zero setup** beyond the skill itself. No Tailscale, no tunnels, no relay server.
- Works for **any** CAPTCHA type (reCAPTCHA, hCaptcha, sliders, text, etc.)
- Uses `sharp` for image processing + CDP for screenshots and click injection.
```bash
node index.js # screenshot mode (default)
node index.js --mode screenshot # explicit
node index.js --screenshot # legacy alias
```
```js
const { solveCaptchaScreenshot } = require('./index');
const capture = await solveCaptchaScreenshot({ cdpPort: 18800 });
// capture.imagePath — annotated screenshot to send to human
// capture.prompt — text prompt for the human
```
### Token Relay Mode — Requires network access
Detects CAPTCHA type + sitekey → serves real widget on relay page → human solves natively → token injected via CDP.
- Requires **Tailscale** or a **tunnel** (localtunnel/cloudflared) so the human's device can reach the relay server.
- Produces a proper CAPTCHA token — more reliable for reCAPTCHA v2, hCaptcha, Turnstile.
- Best when you have Tailscale already set up.
```bash
node index.js --mode relay # with localtunnel
node index.js --mode relay --no-tunnel # with Tailscale/LAN
```
```js
const { solveCaptcha } = require('./index');
const result = await solveCaptcha({ cdpPort: 18800, useTunnel: false });
// result.relayUrl — URL to send to human
// result.token — solved CAPTCHA token
```
## When to Use Each
| Scenario | Mode |
|----------|------|
| Quick & easy, no setup | `screenshot` |
| Any CAPTCHA type (sliders, text, etc.) | `screenshot` |
| Known CAPTCHA with sitekey (reCAPTCHA, hCaptcha, Turnstile) | `relay` |
| Tailscale already configured | `relay` |
| No network access to host | `screenshot` |
## CLI Flags
| Flag | Default | Description |
|------|---------|-------------|
| `--mode screenshot\|relay` | `screenshot` | Select solving mode |
| `--screenshot` | — | Alias for `--mode screenshot` |
| `--no-inject` | inject | Return token without injecting into browser |
| `--no-tunnel` | tunnel | Skip tunnel, use local/Tailscale IP (relay mode) |
| `--timeout N` | 120 | Timeout in seconds |
| `--cdp-port N` | 18800 | Chrome DevTools Protocol port |
## Agent Workflow
### Screenshot mode (simplest)
1. Call `solveCaptchaScreenshot({ cdpPort })`
2. Send `capture.imagePath` to human via `message` tool with `capture.prompt`
3. Human replies with cell numbers (e.g. "1,3,5,7")
4. Call `injectGridClicks(cdpPort, capture, selectedCells)` to click those cells
### Relay mode
1. Call `solveCaptcha({ useTunnel: false })` (Tailscale) or `solveCaptcha()` (tunnel)
2. Send `result.relayUrl` to human via `message` tool
3. Wait — resolves when human completes the CAPTCHA
4. Token is auto-injected; continue automation
## Requirements
- Chrome/Chromium with `--remote-debugging-port=18800`
- Node.js 18+ and `npm install` (deps: ws, sharp)
- **Relay mode only:** Tailscale or internet for tunnelRelated Skills
find-skills
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
filesystem
Advanced filesystem operations for listing files, searching content, batch processing, and directory analysis. Supports recursive search, file type filtering, size analysis, and batch operations like copy/move/delete. Use when you need to: list directory contents, search for files by name or content, analyze directory structures, perform batch file operations, or analyze file sizes and distribution.
Budget & Expense Tracker — AI Agent Financial Command Center
Track every dollar, enforce budgets, spot spending patterns, and build wealth — all through natural conversation with your AI agent.
yt-dlp
A robust CLI wrapper for yt-dlp to download videos, playlists, and audio from YouTube and thousands of other sites. Supports format selection, quality control, metadata embedding, and cookie authentication.
time-checker
Check accurate current time, date, and timezone information for any location worldwide using time.is. Use when the user asks "what time is it in X", "current time in Y", or needs to verify timezone offsets.
pihole-ctl
Manage and monitor local Pi-hole instance. Query FTL database for statistics (blocked ads, top clients) and control service via CLI. Use when user asks "how many ads blocked", "pihole status", or "update gravity".
mermaid-architect
Generate beautiful, hand-drawn Mermaid diagrams with robust syntax (quoted labels, ELK layout). Use this skill when the user asks for "diagram", "flowchart", "sequence diagram", or "visualize this process".
memory-cache
High-performance temporary storage system using Redis. Supports namespaced keys (mema:*), TTL management, and session context caching. Use for: (1) Saving agent state, (2) Caching API results, (3) Sharing data between sub-agents.
mema
Mema's personal brain - SQLite metadata index for documents and Redis short-term context buffer. Use for organizing workspace knowledge paths and managing ephemeral session state.
file-organizer-skill
Organize files in directories by grouping them into folders based on their extensions or date. Includes Dry-Run, Recursive, and Undo capabilities.
media-compress
Compress and convert images and videos using ffmpeg. Use when the user wants to reduce file size, change format, resize, or optimize media files. Handles common formats like JPG, PNG, WebP, MP4, MOV, WebM. Triggers on phrases like "compress image", "compress video", "reduce file size", "convert to webp/mp4", "resize image", "make image smaller", "batch compress", "optimize media".
edge-tts
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.