macpilot-screenshot-ocr

Capture screenshots and extract text via OCR using MacPilot. Take full-screen, region, or window screenshots, and recognize text in images or screen areas with multi-language support.

3,891 stars

byopenclaw

View on GitHub Installation ↓

Best use case

macpilot-screenshot-ocr is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Capture screenshots and extract text via OCR using MacPilot. Take full-screen, region, or window screenshots, and recognize text in images or screen areas with multi-language support.

Teams using macpilot-screenshot-ocr should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/macpilot-screenshot-ocr/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/adhikjoshi/macpilot/skills/macpilot-screenshot-ocr/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/macpilot-screenshot-ocr/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How macpilot-screenshot-ocr Compares

Feature / Agent	macpilot-screenshot-ocr	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Capture screenshots and extract text via OCR using MacPilot. Take full-screen, region, or window screenshots, and recognize text in images or screen areas with multi-language support.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# MacPilot Screenshot & OCR

Use MacPilot to capture screenshots of the screen, specific regions, or application windows, and extract text from images or screen regions using Apple's built-in Vision OCR.

## When to Use

Use this skill when:
- You need to capture what's currently on screen
- You need to extract text from an image file
- You need to read text from a specific area of the screen
- You need to capture a specific app window
- You need to verify visual state of an application
- You need to capture screen recordings

## Screenshot Commands

### Full Screen
```bash
macpilot screenshot --json                           # Capture to temp file
macpilot screenshot ~/Desktop/screen.png --json      # Capture to specific path
macpilot screenshot --with-permissions --json        # Use CGWindowListCreateImage directly
```

### Specific Region
```bash
macpilot screenshot --region 100,200,800,600 --json
# Region format: x,y,width,height (from top-left corner)
```

### Specific Window
```bash
macpilot screenshot --window "Safari" --json         # Capture Safari window
macpilot screenshot --window "Finder" --json         # Capture Finder window
```

### All Windows
```bash
macpilot screenshot --all-windows --json             # Each window separately
```

### Specific Display
```bash
macpilot screenshot --display 1 --json               # Second display (0-indexed)
```

### Format Options
```bash
macpilot screenshot --format png ~/Desktop/shot.png  # PNG (default, lossless)
macpilot screenshot --format jpg ~/Desktop/shot.jpg  # JPEG (smaller files)
```

## OCR Commands

### Extract Text from Image File
```bash
macpilot ocr scan /path/to/image.png --json
macpilot ocr scan ~/Desktop/screenshot.png --json
```

### Extract Text from Screen Region
```bash
macpilot ocr scan 100 200 800 600 --json
# Arguments: x y width height (captures region then OCRs it)
```

### Multi-Language OCR
```bash
macpilot ocr scan image.png --language en-US --json       # English
macpilot ocr scan image.png --language ja --json           # Japanese
macpilot ocr scan image.png --language zh-Hans --json      # Simplified Chinese
macpilot ocr scan image.png --language de --json           # German
macpilot ocr scan image.png --language fr --json           # French
```

### OCR Click (Find and Click Text on Screen)
```bash
macpilot ocr click "Submit" --json                    # Find text on screen and click it
macpilot ocr click "OK" --app Finder --json           # Click text in specific app
macpilot ocr click "Accept" --timeout 10 --json       # Retry until text appears (10s)
```

OCR click takes a screenshot, runs OCR, finds the matching text (case-insensitive), and clicks at its center coordinates. Use `--timeout` to poll and retry when waiting for text to appear.

## Screen Recording (ScreenCaptureKit)

### Start Recording
```bash
macpilot screen record start --output ~/Desktop/recording.mov --json
macpilot screen record start --output rec.mov --region 0,0,1920,1080 --json  # Region
macpilot screen record start --output rec.mov --window Safari --json          # Window
macpilot screen record start --output rec.mov --display 1 --json              # Display
macpilot screen record start --output rec.mov --audio --json                  # With audio
macpilot screen record start --output rec.mov --quality high --fps 60 --json  # Quality
```

### Control Recording
```bash
macpilot screen record stop --json         # Stop and save
macpilot screen record status --json       # Check if recording
macpilot screen record pause --json        # Pause recording
macpilot screen record resume --json       # Resume recording
```

Quality options: `low` (1 Mbps), `medium` (5 Mbps, default), `high` (10 Mbps). FPS default: 30.

## Display Information

```bash
macpilot display-info --json
# Returns: all displays with resolution, position, scale factor
```

## Workflow Patterns

### Capture and OCR in One Flow
```bash
# Take screenshot of specific region
macpilot screenshot --region 0,0,1920,1080 ~/tmp/capture.png --json
# Extract text from it
macpilot ocr scan ~/tmp/capture.png --json
```

### Quick Screen Region OCR
```bash
# Directly OCR a screen region without saving
macpilot ocr scan 200 100 600 400 --json
```

### Find and Click Text (No Coordinate Math)
```bash
# Instead of screenshot > OCR > parse > click, just:
macpilot ocr click "Submit" --json
macpilot ocr click "Next" --timeout 5 --json   # Wait up to 5s for text to appear
```

### Verify UI State
```bash
# Screenshot a window to see its current state
macpilot screenshot --window "Safari" ~/tmp/safari.png --json
# Read the image to verify content
macpilot ocr scan ~/tmp/safari.png --json
```

### Record an Automation
```bash
macpilot screen record start --output ~/Desktop/demo.mov
macpilot app open Safari
macpilot wait seconds 2
macpilot keyboard key cmd+l
macpilot keyboard type "https://example.com"
macpilot keyboard key enter
macpilot wait seconds 3
macpilot screen record stop
```

## Tips

- Screen Recording permission must be granted to MacPilot.app in System Settings
- PNG format is best for screenshots with text (lossless); JPEG for photos
- OCR works best on high-contrast text; increase screenshot region size if text is small
- Use `display-info` to get screen dimensions before capturing specific regions
- The coordinate system starts at top-left (0,0) with x increasing right and y increasing down
- On Retina displays, coordinates are in logical points (not physical pixels)

Related Skills

web-screenshot

3891

from openclaw/skills

Capture screenshots of web pages running on local or remote servers using Puppeteer in headless Chromium. Use when user asks to screenshot web pages, capture web UI, take website screenshots, or document web application interfaces. Supports login-required SPAs (Vue/React/Angular) by performing form-based authentication before navigating. Generates screenshots and an optional result.json with per-page descriptions.

macpilot-window-manager

3891

from openclaw/skills

Manage macOS windows with MacPilot. List, move, resize, snap, minimize, fullscreen, and arrange application windows. Supports multi-display and Spaces.

macpilot-ui-inspector

3891

from openclaw/skills

Inspect and interact with macOS UI elements using MacPilot accessibility APIs. Find buttons, text fields, labels, and other elements by role, label, or position, then click, read, or modify them.

macpilot-dialog-handler

3891

from openclaw/skills

Handle macOS file dialogs (Open, Save, Print) with MacPilot. Navigate folders, select files, set filenames, and dismiss dialogs programmatically in any application.

macpilot-automation

3891

from openclaw/skills

Core macOS automation skill using MacPilot CLI. Enables Claude Code to control apps, type text, click elements, run shell commands, and automate workflows on macOS via the `macpilot` command.

MacPilot Skills

3891

from openclaw/skills

Agent skills for [MacPilot](https://github.com/adhikjoshi/macpilot) — a CLI tool for macOS automation via Accessibility APIs.

screenshot-tool

3891

from openclaw/skills

网页截图 + 文档截图工具。支持网页全页截图、PPT/Word/Excel/PDF 转高清图片。保留原始样式，300 DPI 高清输出。

screenshot-ux-auditor

3891

from openclaw/skills

Turn app screenshots into structured UX, copywriting, and conversion audits with issue severity and recommended fixes.

screenshot-to-task

3891

from openclaw/skills

把截图里的待办或灵感整理成任务、备注和优先级。；use for screenshots, tasks, capture workflows；do not use for 伪造截图内容, 替代 OCR 系统.

app-store-screenshots-generator

3817

from openclaw/skills

Generate production-ready App Store screenshots for iOS apps using AI agents, Next.js, and html-to-image

---

3891

from openclaw/skills

name: article-factory-wechat

Content & Documentation

humanizer

3891

from openclaw/skills

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.

Content & Documentation

macpilot-screenshot-ocr

Best use case

When to use this skill

When not to use this skill

Installation

How macpilot-screenshot-ocr Compares

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

Related Guides

AI Agents for Marketing

AI Agents for Startups

AI Agents for Coding

SKILL.md Source

Related Skills

web-screenshot

macpilot-window-manager

macpilot-ui-inspector

macpilot-dialog-handler

macpilot-automation

MacPilot Skills

screenshot-tool

screenshot-ux-auditor

screenshot-to-task

app-store-screenshots-generator

﻿---

humanizer

---