Tesseract — Open-Source OCR Engine

You are an expert in Tesseract OCR, the most popular open-source optical character recognition engine. You help developers extract text from images, PDFs, and scanned documents using Tesseract's LSTM neural network engine, multi-language support (100+ languages), page segmentation modes, and integration with image preprocessing for maximum accuracy.

25 stars

Best use case

Tesseract — Open-Source OCR Engine is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

You are an expert in Tesseract OCR, the most popular open-source optical character recognition engine. You help developers extract text from images, PDFs, and scanned documents using Tesseract's LSTM neural network engine, multi-language support (100+ languages), page segmentation modes, and integration with image preprocessing for maximum accuracy.

Teams using Tesseract — Open-Source OCR Engine should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/tesseract/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/TerminalSkills/skills/tesseract/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/tesseract/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How Tesseract — Open-Source OCR Engine Compares

Feature / AgentTesseract — Open-Source OCR EngineStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

You are an expert in Tesseract OCR, the most popular open-source optical character recognition engine. You help developers extract text from images, PDFs, and scanned documents using Tesseract's LSTM neural network engine, multi-language support (100+ languages), page segmentation modes, and integration with image preprocessing for maximum accuracy.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Tesseract — Open-Source OCR Engine

You are an expert in Tesseract OCR, the most popular open-source optical character recognition engine. You help developers extract text from images, PDFs, and scanned documents using Tesseract's LSTM neural network engine, multi-language support (100+ languages), page segmentation modes, and integration with image preprocessing for maximum accuracy.

## Core Capabilities

### Basic Usage

```python
# pip install pytesseract Pillow
import pytesseract
from PIL import Image
import cv2

# Simple text extraction
text = pytesseract.image_to_string(Image.open("document.png"))
print(text)

# With language specification
text_de = pytesseract.image_to_string(Image.open("german_doc.png"), lang="deu")

# Multiple languages
text_multi = pytesseract.image_to_string(Image.open("mixed.png"), lang="eng+fra+deu")

# Get bounding boxes for each word
data = pytesseract.image_to_data(Image.open("invoice.png"), output_type=pytesseract.Output.DICT)
for i, word in enumerate(data["text"]):
    if word.strip():
        x, y, w, h = data["left"][i], data["top"][i], data["width"][i], data["height"][i]
        conf = int(data["conf"][i])
        print(f"'{word}' at ({x},{y},{w},{h}) confidence: {conf}%")

# PDF to text
from pdf2image import convert_from_path
pages = convert_from_path("document.pdf", dpi=300)
full_text = ""
for page in pages:
    full_text += pytesseract.image_to_string(page) + "\n\n"
```

### Image Preprocessing for Better Accuracy

```python
import cv2
import numpy as np

def preprocess_for_ocr(image_path: str) -> np.ndarray:
    """Preprocess image for optimal OCR accuracy.

    Steps: grayscale → denoise → threshold → deskew → resize
    """
    img = cv2.imread(image_path)

    # Grayscale
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Denoise
    denoised = cv2.fastNlMeansDenoising(gray, h=10)

    # Adaptive threshold (handles uneven lighting)
    thresh = cv2.adaptiveThreshold(
        denoised, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
        cv2.THRESH_BINARY, 11, 2,
    )

    # Deskew (fix rotated scans)
    coords = np.column_stack(np.where(thresh > 0))
    angle = cv2.minAreaRect(coords)[-1]
    if angle < -45:
        angle = -(90 + angle)
    else:
        angle = -angle

    (h, w) = thresh.shape
    center = (w // 2, h // 2)
    M = cv2.getRotationMatrix2D(center, angle, 1.0)
    rotated = cv2.warpAffine(thresh, M, (w, h), flags=cv2.INTER_CUBIC,
                              borderMode=cv2.BORDER_REPLICATE)

    # Scale up if too small (Tesseract works best at 300+ DPI)
    if w < 1000:
        scale = 2.0
        rotated = cv2.resize(rotated, None, fx=scale, fy=scale, interpolation=cv2.INTER_CUBIC)

    return rotated

# Use preprocessed image
processed = preprocess_for_ocr("scan.jpg")
text = pytesseract.image_to_string(processed, config="--psm 6")
```

### Page Segmentation Modes

```python
# PSM modes control how Tesseract analyzes page layout
# --psm 0: Orientation and script detection only
# --psm 1: Automatic with OSD
# --psm 3: Fully automatic (default)
# --psm 4: Assume single column
# --psm 6: Assume single uniform block of text
# --psm 7: Treat image as single text line
# --psm 8: Treat image as single word
# --psm 11: Sparse text, no order
# --psm 13: Raw line (no layout analysis)

# For receipts/invoices (structured single column)
text = pytesseract.image_to_string(img, config="--psm 4")

# For single line (serial numbers, license plates)
text = pytesseract.image_to_string(img, config="--psm 7")

# For scattered text (signs in a photo)
text = pytesseract.image_to_string(img, config="--psm 11")

# Whitelist specific characters
text = pytesseract.image_to_string(img, config="--psm 7 -c tessedit_char_whitelist=0123456789ABCDEF")
```

## Installation

```bash
# System package
brew install tesseract                     # macOS
apt install tesseract-ocr                 # Ubuntu/Debian
apt install tesseract-ocr-deu tesseract-ocr-fra  # Additional languages

# Python binding
pip install pytesseract Pillow

# Node.js
npm install tesseract.js                  # Pure JS (runs in browser too)
```

## Best Practices

1. **Preprocess images** — Grayscale → denoise → threshold → deskew before OCR; preprocessing improves accuracy 30-50%
2. **300 DPI minimum** — Tesseract works best at 300+ DPI; scale up small images before processing
3. **PSM selection** — Choose the right page segmentation mode; PSM 6 for documents, PSM 7 for single lines, PSM 11 for sparse
4. **Language data** — Install language-specific traineddata files; use `lang="eng+deu"` for multilingual documents
5. **Whitelist characters** — For known formats (serial numbers, dates), restrict character set for higher accuracy
6. **Confidence filtering** — Use `image_to_data` to get per-word confidence; filter out low-confidence results
7. **Tesseract.js for browser** — Use the JavaScript version for client-side OCR; no server needed, runs in Web Workers
8. **LSTM engine** — Tesseract 4+ uses LSTM neural networks by default; much more accurate than Tesseract 3's pattern matching

Related Skills

gpu-resource-optimizer

25
from ComeOnOliver/skillshub

Gpu Resource Optimizer - Auto-activating skill for ML Deployment. Triggers on: gpu resource optimizer, gpu resource optimizer Part of the ML Deployment skill category.

engineering-features-for-machine-learning

25
from ComeOnOliver/skillshub

This skill empowers Claude to perform feature engineering tasks for machine learning. It creates, selects, and transforms features to improve model performance. Use this skill when the user requests feature creation, feature selection, feature transformation, or any request that involves improving the features used in a machine learning model. Trigger terms include "feature engineering", "feature selection", "feature transformation", "create features", "select features", "transform features", "improve model performance", and similar phrases related to feature manipulation.

feature-engineering-helper

25
from ComeOnOliver/skillshub

Feature Engineering Helper - Auto-activating skill for ML Training. Triggers on: feature engineering helper, feature engineering helper Part of the ML Training skill category.

conducting-chaos-engineering

25
from ComeOnOliver/skillshub

This skill enables Claude to design and execute chaos engineering experiments to test system resilience. It is used when the user requests help with failure injection, latency simulation, resource exhaustion testing, or resilience validation. The skill is triggered by discussions of chaos experiments (GameDays), failure injection strategies, resilience testing, and validation of recovery mechanisms like circuit breakers and retry logic. It leverages tools like Chaos Mesh, Gremlin, Toxiproxy, and AWS FIS to simulate real-world failures and assess system behavior.

adk-engineer

25
from ComeOnOliver/skillshub

Execute software engineer specializing in creating production-ready ADK agents with best practices, code structure, testing, and deployment automation. Use when asked to "build ADK agent", "create agent code", or "engineer ADK application". Trigger with relevant phrases based on skill purpose.

provider-resources

25
from ComeOnOliver/skillshub

Implement Terraform Provider resources and data sources using the Plugin Framework. Use when developing CRUD operations, schema design, state management, and acceptance testing for provider resources.

openapi-to-application-code

25
from ComeOnOliver/skillshub

Generate a complete, production-ready application from an OpenAPI specification

game-engine

25
from ComeOnOliver/skillshub

Expert skill for building web-based game engines and games using HTML5, Canvas, WebGL, and JavaScript. Use when asked to create games, build game engines, implement game physics, handle collision detection, set up game loops, manage sprites, add game controls, or work with 2D/3D rendering. Covers techniques for platformers, breakout-style games, maze games, tilemaps, audio, multiplayer via WebRTC, and publishing games.

azure-resource-health-diagnose

25
from ComeOnOliver/skillshub

Analyze Azure resource health, diagnose issues from logs and telemetry, and create a remediation plan for identified problems.

aspnet-minimal-api-openapi

25
from ComeOnOliver/skillshub

Create ASP.NET Minimal API endpoints with proper OpenAPI documentation

opencode-learn

25
from ComeOnOliver/skillshub

Extracts actionable knowledge from external sources and enhances existing skills using a 4-tier novelty framework. Use PROACTIVELY when a user says "/learn <source>", provides documentation URLs, code examples, or explicitly asks to extract patterns from a repository or marketplace.

OpenAI Whisper API (curl)

25
from ComeOnOliver/skillshub

Transcribe an audio file via OpenAI’s `/v1/audio/transcriptions` endpoint.