macpilot-screenshot-ocr

Capture screenshots and extract text via OCR using MacPilot. Take full-screen, region, or window screenshots, and recognize text in images or screen areas with multi-language support.

3,807 stars

Best use case

macpilot-screenshot-ocr is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Capture screenshots and extract text via OCR using MacPilot. Take full-screen, region, or window screenshots, and recognize text in images or screen areas with multi-language support.

Teams using macpilot-screenshot-ocr should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/macpilot-screenshot-ocr/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/adhikjoshi/macpilot/skills/macpilot-screenshot-ocr/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/macpilot-screenshot-ocr/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How macpilot-screenshot-ocr Compares

Feature / Agentmacpilot-screenshot-ocrStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Capture screenshots and extract text via OCR using MacPilot. Take full-screen, region, or window screenshots, and recognize text in images or screen areas with multi-language support.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# MacPilot Screenshot & OCR

Use MacPilot to capture screenshots of the screen, specific regions, or application windows, and extract text from images or screen regions using Apple's built-in Vision OCR.

## When to Use

Use this skill when:
- You need to capture what's currently on screen
- You need to extract text from an image file
- You need to read text from a specific area of the screen
- You need to capture a specific app window
- You need to verify visual state of an application
- You need to capture screen recordings

## Screenshot Commands

### Full Screen
```bash
macpilot screenshot --json                           # Capture to temp file
macpilot screenshot ~/Desktop/screen.png --json      # Capture to specific path
macpilot screenshot --with-permissions --json        # Use CGWindowListCreateImage directly
```

### Specific Region
```bash
macpilot screenshot --region 100,200,800,600 --json
# Region format: x,y,width,height (from top-left corner)
```

### Specific Window
```bash
macpilot screenshot --window "Safari" --json         # Capture Safari window
macpilot screenshot --window "Finder" --json         # Capture Finder window
```

### All Windows
```bash
macpilot screenshot --all-windows --json             # Each window separately
```

### Specific Display
```bash
macpilot screenshot --display 1 --json               # Second display (0-indexed)
```

### Format Options
```bash
macpilot screenshot --format png ~/Desktop/shot.png  # PNG (default, lossless)
macpilot screenshot --format jpg ~/Desktop/shot.jpg  # JPEG (smaller files)
```

## OCR Commands

### Extract Text from Image File
```bash
macpilot ocr scan /path/to/image.png --json
macpilot ocr scan ~/Desktop/screenshot.png --json
```

### Extract Text from Screen Region
```bash
macpilot ocr scan 100 200 800 600 --json
# Arguments: x y width height (captures region then OCRs it)
```

### Multi-Language OCR
```bash
macpilot ocr scan image.png --language en-US --json       # English
macpilot ocr scan image.png --language ja --json           # Japanese
macpilot ocr scan image.png --language zh-Hans --json      # Simplified Chinese
macpilot ocr scan image.png --language de --json           # German
macpilot ocr scan image.png --language fr --json           # French
```

### OCR Click (Find and Click Text on Screen)
```bash
macpilot ocr click "Submit" --json                    # Find text on screen and click it
macpilot ocr click "OK" --app Finder --json           # Click text in specific app
macpilot ocr click "Accept" --timeout 10 --json       # Retry until text appears (10s)
```

OCR click takes a screenshot, runs OCR, finds the matching text (case-insensitive), and clicks at its center coordinates. Use `--timeout` to poll and retry when waiting for text to appear.

## Screen Recording (ScreenCaptureKit)

### Start Recording
```bash
macpilot screen record start --output ~/Desktop/recording.mov --json
macpilot screen record start --output rec.mov --region 0,0,1920,1080 --json  # Region
macpilot screen record start --output rec.mov --window Safari --json          # Window
macpilot screen record start --output rec.mov --display 1 --json              # Display
macpilot screen record start --output rec.mov --audio --json                  # With audio
macpilot screen record start --output rec.mov --quality high --fps 60 --json  # Quality
```

### Control Recording
```bash
macpilot screen record stop --json         # Stop and save
macpilot screen record status --json       # Check if recording
macpilot screen record pause --json        # Pause recording
macpilot screen record resume --json       # Resume recording
```

Quality options: `low` (1 Mbps), `medium` (5 Mbps, default), `high` (10 Mbps). FPS default: 30.

## Display Information

```bash
macpilot display-info --json
# Returns: all displays with resolution, position, scale factor
```

## Workflow Patterns

### Capture and OCR in One Flow
```bash
# Take screenshot of specific region
macpilot screenshot --region 0,0,1920,1080 ~/tmp/capture.png --json
# Extract text from it
macpilot ocr scan ~/tmp/capture.png --json
```

### Quick Screen Region OCR
```bash
# Directly OCR a screen region without saving
macpilot ocr scan 200 100 600 400 --json
```

### Find and Click Text (No Coordinate Math)
```bash
# Instead of screenshot > OCR > parse > click, just:
macpilot ocr click "Submit" --json
macpilot ocr click "Next" --timeout 5 --json   # Wait up to 5s for text to appear
```

### Verify UI State
```bash
# Screenshot a window to see its current state
macpilot screenshot --window "Safari" ~/tmp/safari.png --json
# Read the image to verify content
macpilot ocr scan ~/tmp/safari.png --json
```

### Record an Automation
```bash
macpilot screen record start --output ~/Desktop/demo.mov
macpilot app open Safari
macpilot wait seconds 2
macpilot keyboard key cmd+l
macpilot keyboard type "https://example.com"
macpilot keyboard key enter
macpilot wait seconds 3
macpilot screen record stop
```

## Tips

- Screen Recording permission must be granted to MacPilot.app in System Settings
- PNG format is best for screenshots with text (lossless); JPEG for photos
- OCR works best on high-contrast text; increase screenshot region size if text is small
- Use `display-info` to get screen dimensions before capturing specific regions
- The coordinate system starts at top-left (0,0) with x increasing right and y increasing down
- On Retina displays, coordinates are in logical points (not physical pixels)

Related Skills

web-screenshot

3807
from openclaw/skills

Capture screenshots of web pages running on local or remote servers using Puppeteer in headless Chromium. Use when user asks to screenshot web pages, capture web UI, take website screenshots, or document web application interfaces. Supports login-required SPAs (Vue/React/Angular) by performing form-based authentication before navigating. Generates screenshots and an optional result.json with per-page descriptions.

app-store-screenshots-generator

3807
from openclaw/skills

Generate production-ready App Store screenshots for iOS apps using AI agents, Next.js, and html-to-image

macpilot-window-manager

3807
from openclaw/skills

Manage macOS windows with MacPilot. List, move, resize, snap, minimize, fullscreen, and arrange application windows. Supports multi-display and Spaces.

macpilot-ui-inspector

3807
from openclaw/skills

Inspect and interact with macOS UI elements using MacPilot accessibility APIs. Find buttons, text fields, labels, and other elements by role, label, or position, then click, read, or modify them.

macpilot-dialog-handler

3807
from openclaw/skills

Handle macOS file dialogs (Open, Save, Print) with MacPilot. Navigate folders, select files, set filenames, and dismiss dialogs programmatically in any application.

macpilot-automation

3807
from openclaw/skills

Core macOS automation skill using MacPilot CLI. Enables Claude Code to control apps, type text, click elements, run shell commands, and automate workflows on macOS via the `macpilot` command.

MacPilot Skills

3807
from openclaw/skills

Agent skills for [MacPilot](https://github.com/adhikjoshi/macpilot) — a CLI tool for macOS automation via Accessibility APIs.

screenshot-tool

3807
from openclaw/skills

网页截图 + 文档截图工具。支持网页全页截图、PPT/Word/Excel/PDF 转高清图片。保留原始样式,300 DPI 高清输出。

screenshot-ux-auditor

3807
from openclaw/skills

Turn app screenshots into structured UX, copywriting, and conversion audits with issue severity and recommended fixes.

screenshot-to-task

3807
from openclaw/skills

把截图里的待办或灵感整理成任务、备注和优先级。;use for screenshots, tasks, capture workflows;do not use for 伪造截图内容, 替代 OCR 系统.

---

3807
from openclaw/skills

## 概述

Content & Documentation

self-improvement

3807
from openclaw/skills

Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Claude ('No, that's wrong...', 'Actually...'), (3) User requests a capability that doesn't exist, (4) An external API or tool fails, (5) Claude realizes its knowledge is outdated or incorrect, (6) A better approach is discovered for a recurring task. Also review learnings before major tasks.

Agent Intelligence & Learning