macpilot-screenshot-ocr
Capture screenshots and extract text via OCR using MacPilot. Take full-screen, region, or window screenshots, and recognize text in images or screen areas with multi-language support.
Best use case
macpilot-screenshot-ocr is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Capture screenshots and extract text via OCR using MacPilot. Take full-screen, region, or window screenshots, and recognize text in images or screen areas with multi-language support.
Teams using macpilot-screenshot-ocr should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/macpilot-screenshot-ocr/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How macpilot-screenshot-ocr Compares
| Feature / Agent | macpilot-screenshot-ocr | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Capture screenshots and extract text via OCR using MacPilot. Take full-screen, region, or window screenshots, and recognize text in images or screen areas with multi-language support.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Marketing
Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.
AI Agents for Startups
Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
SKILL.md Source
# MacPilot Screenshot & OCR Use MacPilot to capture screenshots of the screen, specific regions, or application windows, and extract text from images or screen regions using Apple's built-in Vision OCR. ## When to Use Use this skill when: - You need to capture what's currently on screen - You need to extract text from an image file - You need to read text from a specific area of the screen - You need to capture a specific app window - You need to verify visual state of an application - You need to capture screen recordings ## Screenshot Commands ### Full Screen ```bash macpilot screenshot --json # Capture to temp file macpilot screenshot ~/Desktop/screen.png --json # Capture to specific path macpilot screenshot --with-permissions --json # Use CGWindowListCreateImage directly ``` ### Specific Region ```bash macpilot screenshot --region 100,200,800,600 --json # Region format: x,y,width,height (from top-left corner) ``` ### Specific Window ```bash macpilot screenshot --window "Safari" --json # Capture Safari window macpilot screenshot --window "Finder" --json # Capture Finder window ``` ### All Windows ```bash macpilot screenshot --all-windows --json # Each window separately ``` ### Specific Display ```bash macpilot screenshot --display 1 --json # Second display (0-indexed) ``` ### Format Options ```bash macpilot screenshot --format png ~/Desktop/shot.png # PNG (default, lossless) macpilot screenshot --format jpg ~/Desktop/shot.jpg # JPEG (smaller files) ``` ## OCR Commands ### Extract Text from Image File ```bash macpilot ocr scan /path/to/image.png --json macpilot ocr scan ~/Desktop/screenshot.png --json ``` ### Extract Text from Screen Region ```bash macpilot ocr scan 100 200 800 600 --json # Arguments: x y width height (captures region then OCRs it) ``` ### Multi-Language OCR ```bash macpilot ocr scan image.png --language en-US --json # English macpilot ocr scan image.png --language ja --json # Japanese macpilot ocr scan image.png --language zh-Hans --json # Simplified Chinese macpilot ocr scan image.png --language de --json # German macpilot ocr scan image.png --language fr --json # French ``` ### OCR Click (Find and Click Text on Screen) ```bash macpilot ocr click "Submit" --json # Find text on screen and click it macpilot ocr click "OK" --app Finder --json # Click text in specific app macpilot ocr click "Accept" --timeout 10 --json # Retry until text appears (10s) ``` OCR click takes a screenshot, runs OCR, finds the matching text (case-insensitive), and clicks at its center coordinates. Use `--timeout` to poll and retry when waiting for text to appear. ## Screen Recording (ScreenCaptureKit) ### Start Recording ```bash macpilot screen record start --output ~/Desktop/recording.mov --json macpilot screen record start --output rec.mov --region 0,0,1920,1080 --json # Region macpilot screen record start --output rec.mov --window Safari --json # Window macpilot screen record start --output rec.mov --display 1 --json # Display macpilot screen record start --output rec.mov --audio --json # With audio macpilot screen record start --output rec.mov --quality high --fps 60 --json # Quality ``` ### Control Recording ```bash macpilot screen record stop --json # Stop and save macpilot screen record status --json # Check if recording macpilot screen record pause --json # Pause recording macpilot screen record resume --json # Resume recording ``` Quality options: `low` (1 Mbps), `medium` (5 Mbps, default), `high` (10 Mbps). FPS default: 30. ## Display Information ```bash macpilot display-info --json # Returns: all displays with resolution, position, scale factor ``` ## Workflow Patterns ### Capture and OCR in One Flow ```bash # Take screenshot of specific region macpilot screenshot --region 0,0,1920,1080 ~/tmp/capture.png --json # Extract text from it macpilot ocr scan ~/tmp/capture.png --json ``` ### Quick Screen Region OCR ```bash # Directly OCR a screen region without saving macpilot ocr scan 200 100 600 400 --json ``` ### Find and Click Text (No Coordinate Math) ```bash # Instead of screenshot > OCR > parse > click, just: macpilot ocr click "Submit" --json macpilot ocr click "Next" --timeout 5 --json # Wait up to 5s for text to appear ``` ### Verify UI State ```bash # Screenshot a window to see its current state macpilot screenshot --window "Safari" ~/tmp/safari.png --json # Read the image to verify content macpilot ocr scan ~/tmp/safari.png --json ``` ### Record an Automation ```bash macpilot screen record start --output ~/Desktop/demo.mov macpilot app open Safari macpilot wait seconds 2 macpilot keyboard key cmd+l macpilot keyboard type "https://example.com" macpilot keyboard key enter macpilot wait seconds 3 macpilot screen record stop ``` ## Tips - Screen Recording permission must be granted to MacPilot.app in System Settings - PNG format is best for screenshots with text (lossless); JPEG for photos - OCR works best on high-contrast text; increase screenshot region size if text is small - Use `display-info` to get screen dimensions before capturing specific regions - The coordinate system starts at top-left (0,0) with x increasing right and y increasing down - On Retina displays, coordinates are in logical points (not physical pixels)
Related Skills
web-screenshot
Capture screenshots of web pages running on local or remote servers using Puppeteer in headless Chromium. Use when user asks to screenshot web pages, capture web UI, take website screenshots, or document web application interfaces. Supports login-required SPAs (Vue/React/Angular) by performing form-based authentication before navigating. Generates screenshots and an optional result.json with per-page descriptions.
app-store-screenshots-generator
Generate production-ready App Store screenshots for iOS apps using AI agents, Next.js, and html-to-image
macpilot-window-manager
Manage macOS windows with MacPilot. List, move, resize, snap, minimize, fullscreen, and arrange application windows. Supports multi-display and Spaces.
macpilot-ui-inspector
Inspect and interact with macOS UI elements using MacPilot accessibility APIs. Find buttons, text fields, labels, and other elements by role, label, or position, then click, read, or modify them.
macpilot-dialog-handler
Handle macOS file dialogs (Open, Save, Print) with MacPilot. Navigate folders, select files, set filenames, and dismiss dialogs programmatically in any application.
macpilot-automation
Core macOS automation skill using MacPilot CLI. Enables Claude Code to control apps, type text, click elements, run shell commands, and automate workflows on macOS via the `macpilot` command.
MacPilot Skills
Agent skills for [MacPilot](https://github.com/adhikjoshi/macpilot) — a CLI tool for macOS automation via Accessibility APIs.
screenshot-tool
网页截图 + 文档截图工具。支持网页全页截图、PPT/Word/Excel/PDF 转高清图片。保留原始样式,300 DPI 高清输出。
screenshot-ux-auditor
Turn app screenshots into structured UX, copywriting, and conversion audits with issue severity and recommended fixes.
screenshot-to-task
把截图里的待办或灵感整理成任务、备注和优先级。;use for screenshots, tasks, capture workflows;do not use for 伪造截图内容, 替代 OCR 系统.
---
## 概述
self-improvement
Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Claude ('No, that's wrong...', 'Actually...'), (3) User requests a capability that doesn't exist, (4) An external API or tool fails, (5) Claude realizes its knowledge is outdated or incorrect, (6) A better approach is discovered for a recurring task. Also review learnings before major tasks.