ocr-local
Extract text from images using Tesseract.js OCR (100% local, no API key required). Supports Chinese (simplified/traditional) and English.
About this skill
This AI agent skill provides robust Optical Character Recognition (OCR) directly on your local machine using the Tesseract.js library. It processes images to extract textual content, making it ideal for converting scanned documents, screenshots, or any image containing text into editable and searchable digital text. A key advantage is its 100% local operation, eliminating the need for API keys, internet connectivity for processing, or concerns about data privacy typically associated with cloud-based OCR services. Users can specify the desired language for recognition, including simplified Chinese, traditional Chinese, and English, or combine them for mixed-language documents. The skill intelligently downloads and caches language data on the first run, ensuring fast subsequent operations. It offers the flexibility to output results as plain text or structured JSON, catering to different integration needs. This skill is particularly valuable for developers, researchers, or anyone handling sensitive documents who requires a reliable, offline OCR solution. It empowers AI agents to automate tasks like data entry from images, text extraction for analysis, or converting legacy paper documents into digital formats, all while maintaining control over the processing environment.
Best use case
The primary use case is converting images containing text into machine-readable digital text, especially for users who prioritize privacy, offline capability, and control over their data. This benefits developers building local automation tools, researchers processing historical documents, or any user needing to extract text from screenshots, scanned documents, or photos without relying on external APIs or cloud services.
Extract text from images using Tesseract.js OCR (100% local, no API key required). Supports Chinese (simplified/traditional) and English.
The user should expect to receive the extracted text content from the specified image, optionally formatted as JSON, with high accuracy for clear, high-contrast images.
Practical example
Example input
Can you extract the text from the image located at `/home/user/documents/report.png`? It contains both English and Simplified Chinese text, so please use both languages for recognition.
Example output
Report Title: Project Alpha 日期: 2023年10月26日 Status: Completed 摘要: 这是一份关于项目阿尔法的报告。
When to use this skill
- You need to extract text from an image (screenshot, scanned document, photo).
- You require a 100% local OCR solution with no API keys or internet connection for processing.
- You are working with Chinese (simplified/traditional) or English text.
- Data privacy is a concern, and you prefer not to send images to cloud services.
When not to use this skill
- You need to process very high volumes of images where a highly optimized cloud service might be faster.
- The images contain heavily distorted, very low-resolution, or extremely complex handwritten text.
- You need OCR for languages other than Chinese (simplified/traditional) or English.
- You don't have Node.js installed or cannot install npm packages.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/ocr-local-v2/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How ocr-local Compares
| Feature / Agent | ocr-local | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | easy | N/A |
Frequently Asked Questions
What does this skill do?
Extract text from images using Tesseract.js OCR (100% local, no API key required). Supports Chinese (simplified/traditional) and English.
How difficult is it to install?
The installation complexity is rated as easy. You can find the installation instructions above.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Best AI Skills for ChatGPT
Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.
Top AI Agents for Productivity
See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.
AI Agents for Freelancers
Browse AI agent skills for freelancers handling client research, proposals, outreach, delivery systems, documentation, and repeatable admin work.
SKILL.md Source
# OCR - Image Text Recognition (Local)
Extract text from images using Tesseract.js. **100% local run, no API key required.** Supports Chinese and English.
## Quick start
```bash
node {baseDir}/scripts/ocr.js /path/to/image.jpg
node {baseDir}/scripts/ocr.js /path/to/image.png --lang chi_sim
node {baseDir}/scripts/ocr.js /path/to/image.jpg --lang chi_tra+eng
```
## Options
- `--lang <langs>`: Language codes (default: chi_sim+eng)
- `chi_sim` - Simplified Chinese
- `chi_tra` - Traditional Chinese
- `eng` - English
- Combine with `+`: `chi_sim+eng`
- `--json`: Output as JSON instead of plain text
## Examples
```bash
# Recognize Chinese screenshot
node {baseDir}/scripts/ocr.js screenshot.png
# Recognize English document
node {baseDir}/scripts/ocr.js document.jpg --lang eng
# Mixed Chinese + English
node {baseDir}/scripts/ocr.js mixed.png --lang chi_sim+eng
```
## Notes
- First run downloads language data (~20MB per language)
- Subsequent runs are cached locally
- Works best with clear, high-contrast images
- For handwritten text, accuracy may varyRelated Skills
find-skills
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
filesystem
Advanced filesystem operations for listing files, searching content, batch processing, and directory analysis. Supports recursive search, file type filtering, size analysis, and batch operations like copy/move/delete. Use when you need to: list directory contents, search for files by name or content, analyze directory structures, perform batch file operations, or analyze file sizes and distribution.
Budget & Expense Tracker — AI Agent Financial Command Center
Track every dollar, enforce budgets, spot spending patterns, and build wealth — all through natural conversation with your AI agent.
yt-dlp
A robust CLI wrapper for yt-dlp to download videos, playlists, and audio from YouTube and thousands of other sites. Supports format selection, quality control, metadata embedding, and cookie authentication.
time-checker
Check accurate current time, date, and timezone information for any location worldwide using time.is. Use when the user asks "what time is it in X", "current time in Y", or needs to verify timezone offsets.
pihole-ctl
Manage and monitor local Pi-hole instance. Query FTL database for statistics (blocked ads, top clients) and control service via CLI. Use when user asks "how many ads blocked", "pihole status", or "update gravity".
mermaid-architect
Generate beautiful, hand-drawn Mermaid diagrams with robust syntax (quoted labels, ELK layout). Use this skill when the user asks for "diagram", "flowchart", "sequence diagram", or "visualize this process".
memory-cache
High-performance temporary storage system using Redis. Supports namespaced keys (mema:*), TTL management, and session context caching. Use for: (1) Saving agent state, (2) Caching API results, (3) Sharing data between sub-agents.
mema
Mema's personal brain - SQLite metadata index for documents and Redis short-term context buffer. Use for organizing workspace knowledge paths and managing ephemeral session state.
file-organizer-skill
Organize files in directories by grouping them into folders based on their extensions or date. Includes Dry-Run, Recursive, and Undo capabilities.
media-compress
Compress and convert images and videos using ffmpeg. Use when the user wants to reduce file size, change format, resize, or optimize media files. Handles common formats like JPG, PNG, WebP, MP4, MOV, WebM. Triggers on phrases like "compress image", "compress video", "reduce file size", "convert to webp/mp4", "resize image", "make image smaller", "batch compress", "optimize media".
edge-tts
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.