latex-ocr-guide
Extract and convert mathematical formulas from images and PDFs to LaTeX code
Best use case
latex-ocr-guide is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Extract and convert mathematical formulas from images and PDFs to LaTeX code
Teams using latex-ocr-guide should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/latex-ocr-guide/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How latex-ocr-guide Compares
| Feature / Agent | latex-ocr-guide | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Extract and convert mathematical formulas from images and PDFs to LaTeX code
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# LaTeX OCR Guide
A skill for extracting mathematical formulas from images, PDFs, and handwritten notes and converting them to LaTeX code. Covers tool selection, batch processing workflows, and quality verification techniques.
## Tool Landscape
### Available Math OCR Tools
| Tool | Type | Accuracy | Best For | License |
|------|------|----------|----------|---------|
| Mathpix | Cloud API | Very high | All math, diagrams | Commercial ($) |
| LaTeX-OCR (Lukas Blecher) | Local model | High | Printed formulas | MIT |
| Pix2Tex | Local model | High | Single equations | MIT |
| Nougat (Meta) | Local model | High | Full papers with math | MIT |
| InftyReader | Desktop | High | Printed math, Japanese | Commercial |
| img2latex | Local model | Moderate | Simple equations | MIT |
### Quick Start with LaTeX-OCR
```bash
# Install the open-source LaTeX-OCR package
pip install "pix2tex[gui]"
# Or install from GitHub for latest version
pip install git+https://github.com/lukas-blecher/LaTeX-OCR.git
```
```python
from pix2tex.cli import LatexOCR
from PIL import Image
def recognize_formula(image_path: str) -> str:
"""
Convert a formula image to LaTeX code.
Args:
image_path: Path to image containing a mathematical formula
Returns:
LaTeX string representation of the formula
"""
model = LatexOCR()
img = Image.open(image_path)
latex_code = model(img)
return latex_code
# Single image
result = recognize_formula('formula.png')
print(result)
# Output: E = mc^{2}
```
## Batch Processing Workflow
### Processing Multiple Formulas from a PDF
```python
import fitz # PyMuPDF
from PIL import Image
import io
def extract_formulas_from_pdf(pdf_path: str, output_dir: str,
min_height: int = 30) -> list[dict]:
"""
Extract formula regions from a PDF and convert to LaTeX.
Args:
pdf_path: Path to the PDF file
output_dir: Directory to save extracted formula images
min_height: Minimum height (px) to consider as formula region
"""
doc = fitz.open(pdf_path)
model = LatexOCR()
results = []
for page_num in range(len(doc)):
page = doc[page_num]
# Extract images from page
image_list = page.get_images(full=True)
for img_idx, img_info in enumerate(image_list):
xref = img_info[0]
pix = fitz.Pixmap(doc, xref)
if pix.height >= min_height:
img_data = pix.tobytes("png")
img = Image.open(io.BytesIO(img_data))
try:
latex = model(img)
results.append({
'page': page_num + 1,
'image_index': img_idx,
'latex': latex,
'confidence': 'high' if len(latex) > 3 else 'low'
})
except Exception as e:
results.append({
'page': page_num + 1,
'image_index': img_idx,
'latex': None,
'error': str(e)
})
return results
```
### Processing Handwritten Notes
For handwritten mathematics, preprocessing improves accuracy significantly:
```python
import cv2
import numpy as np
def preprocess_handwritten(image_path: str) -> Image.Image:
"""
Preprocess a handwritten formula image for better OCR accuracy.
"""
img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
# 1. Denoise
img = cv2.fastNlMeansDenoising(img, h=10)
# 2. Adaptive thresholding for varying illumination
img = cv2.adaptiveThreshold(
img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY, 15, 8
)
# 3. Dilation to connect broken strokes
kernel = np.ones((2, 2), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
# 4. Crop to content with padding
coords = cv2.findNonZero(255 - img)
x, y, w, h = cv2.boundingRect(coords)
pad = 20
img = img[max(0, y-pad):y+h+pad, max(0, x-pad):x+w+pad]
return Image.fromarray(img)
```
## Using Mathpix API
**Pricing note:** Mathpix is a paid service (starting at $5/month). For free open-source alternatives, use pix2tex/LaTeX-OCR or Nougat (Meta), both MIT-licensed and capable of running locally.
For production-quality results, the Mathpix API provides the highest accuracy:
```python
import requests
import base64
def mathpix_ocr(image_path: str, app_id: str, app_key: str) -> dict:
"""
Use Mathpix API for high-accuracy math OCR.
"""
with open(image_path, 'rb') as f:
image_data = base64.b64encode(f.read()).decode()
response = requests.post(
'https://api.mathpix.com/v3/text',
headers={
'app_id': app_id,
'app_key': app_key,
'Content-type': 'application/json'
},
json={
'src': f'data:image/png;base64,{image_data}',
'formats': ['latex_styled', 'text'],
'data_options': {'include_asciimath': True}
}
)
return response.json()
```
## Verification and Correction
Always verify OCR output by rendering the LaTeX:
```python
import matplotlib.pyplot as plt
def verify_latex(latex_string: str, output_path: str = 'verify.png'):
"""Render LaTeX formula and save as image for visual verification."""
fig, ax = plt.subplots(figsize=(8, 2))
ax.text(0.5, 0.5, f'${latex_string}$', fontsize=20,
ha='center', va='center', transform=ax.transAxes)
ax.axis('off')
fig.savefig(output_path, dpi=150, bbox_inches='tight')
plt.close()
print(f"Verification image saved to {output_path}")
```
Common OCR errors to watch for: confusing `l` with `1`, `O` with `0`, missing superscripts/subscripts, incorrect fraction nesting, and misrecognized Greek letters. Always proofread critical equations before submission.Related Skills
thuthesis-guide
Write Tsinghua University theses using the ThuThesis LaTeX template
thesis-writing-guide
Templates, formatting rules, and strategies for thesis and dissertation writing
thesis-template-guide
Set up LaTeX templates for PhD and Master's thesis documents
sjtuthesis-guide
Write SJTU theses using the SJTUThesis LaTeX template with full compliance
novathesis-guide
LaTeX thesis template supporting multiple universities and formats
graphical-abstract-guide
Create SVG graphical abstracts for journal paper submissions
beamer-presentation-guide
Guide to creating academic presentations with LaTeX Beamer
plagiarism-detection-guide
Use plagiarism detection tools and ensure manuscript originality
paper-polish-guide
Review and polish LaTeX research papers for clarity and style
grammar-checker-guide
Use grammar and style checking tools to polish academic manuscripts
conciseness-editing-guide
Eliminate wordiness and redundancy in academic prose for clarity
academic-translation-guide
Academic translation, post-editing, and Chinglish correction guide