multiAI Summary Pending

ocrmypdf-optimize

OCRmyPDF optimization skill — compress PDFs, configure PDF/A output, JBIG2 encoding, and lossless optimization. Use when the user needs to reduce PDF file size, create archival PDF/A files, or optimize OCR output.

223 stars

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ocrmypdf-optimize/SKILL.md --create-dirs "https://raw.githubusercontent.com/partme-ai/full-stack-skills/main/skills/ocrmypdf-skills/ocrmypdf-optimize/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/ocrmypdf-optimize/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How ocrmypdf-optimize Compares

Feature / Agentocrmypdf-optimizeStandard Approach
Platform SupportmultiLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

OCRmyPDF optimization skill — compress PDFs, configure PDF/A output, JBIG2 encoding, and lossless optimization. Use when the user needs to reduce PDF file size, create archival PDF/A files, or optimize OCR output.

Which AI agents support this skill?

This skill is compatible with multi.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# OCRmyPDF — Optimization Guide

## Overview

[OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF) provides extensive optimization options to reduce file size, create PDF/A archival documents, and configure output quality.

For core OCR functionality, see the **ocrmypdf** skill. For image processing (deskew, rotate, clean), see **ocrmypdf-image**. For batch/Docker/scripting, see **ocrmypdf-batch**.

## Compression Levels

```bash
# Level 0 — no optimization (fastest)
ocrmypdf --optimize 0 input.pdf output.pdf

# Level 1 — lossless (default)
ocrmypdf --optimize 1 input.pdf output.pdf

# Level 2 — lossy (aggressive)
ocrmypdf --optimize 2 input.pdf output.pdf

# Level 3 — lossless, aggressive JPEG recompression
ocrmypdf --optimize 3 input.pdf output.pdf
```

## PDF/A Output

PDF/A is an archival format with embedded fonts and colorspaces:

```bash
# PDF/A-1b (basic, default)
ocrmypdf --output-type pdfa input.pdf output.pdf

# PDF/A-2b (includes transparency)
ocrmypdf --output-type pdfa2b input.pdf output.pdf

# PDF/A-2u (Unicode)
ocrmypdf --output-type pdfa2u input.pdf output.pdf

# Standard PDF (no archival)
ocrmypdf --output-type pdf input.pdf output.pdf
```

## JBIG2 Encoding

JBIG2 provides excellent compression for monochrome (1-bit) images:

```bash
# Enable JBIG2 (requires jbig2enc)
ocrmypdf --jbig2-lossy input.pdf output.pdf  # Lossy

ocrmypdf --jbib2-lossless input.pdf output.pdf  # Lossless (v17+)
```

**Requirements**:

```bash
# Debian/Ubuntu
apt install jbig2enc

# macOS
brew install jbig2enc
```

## PNG Optimization

Optimize embedded PNG images:

```bash
# Use pngquant for lossy compression
ocrmypdf --png-lossy input.pdf output.pdf

# Lossless PNG optimization
ocrmypdf --png-lossless input.pdf output.pdf
```

## Ghostscript Options

Fine-tune PDF processing with Ghostscript:

```bash
# Set PDF minor version
ocrmypdf --pdf-renderer hatch input.pdf output.pdf

# Use pdfimages for better image extraction
ocrmypdf --pdf-renderer img2pdf input.pdf output.pdf
```

## Sidecar Text

Generate text file alongside PDF without modifying PDF:

```bash
# Generate sidecar only
ocrmypdf --output-type none --sidecar text.txt input.pdf output.pdf

# Typical sidecar workflow
ocrmypdf --sidecar text.txt --force-ocr input.pdf output.pdf
```

## Combined Recipes

### Maximum compression

```bash
ocrmypdf --optimize 3 --jbig2-lossy --png-lossy input.pdf small.pdf
```

### Archival PDF/A with compression

```bash
ocrmypdf --output-type pdfa --optimize 2 input.pdf archival.pdf
```

### Lossless output

```bash
ocrmypdf --output-type pdf --optimize 1 --png-lossless input.pdf lossless.pdf
```

## Quick Reference

| Task | Command |
|------|---------|
| No optimization | `--optimize 0` |
| Lossless default | `--optimize 1` |
| Aggressive lossy | `--optimize 2` |
| Max quality | `--optimize 3` |
| PDF/A-1b (default) | `--output-type pdfa` |
| PDF/A-2b | `--output-type pdfa2b` |
| JBIG2 lossy | `--jbig2-lossy` |
| PNG lossy | `--png-lossy` |
| Sidecar text | `--sidecar text.txt` |

## Troubleshooting

- **Large file size**: Try `--optimize 2` or `--png-lossy`.
- **PDF/A validation fails**: Use `--output-type pdfa2b` for better compatibility.
- **Font issues**: PDF/A-2u ensures full Unicode support.