pdf-math-translate-guide
Translate scientific PDFs with preserved math formatting via PDFMathTranslate
Best use case
pdf-math-translate-guide is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Translate scientific PDFs with preserved math formatting via PDFMathTranslate
Teams using pdf-math-translate-guide should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/pdf-math-translate-guide/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How pdf-math-translate-guide Compares
| Feature / Agent | pdf-math-translate-guide | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Translate scientific PDFs with preserved math formatting via PDFMathTranslate
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# PDFMathTranslate Guide
## Overview
PDFMathTranslate is an open-source tool designed specifically for translating scientific and technical PDF documents while preserving mathematical formulas, tables, figures, and the overall layout structure. Traditional PDF translators often mangle equations and destroy formatting, making translated papers difficult to read. PDFMathTranslate solves this problem by intelligently detecting and preserving mathematical content during the translation process.
The tool leverages large language models for high-quality translation while maintaining the integrity of LaTeX-rendered equations, chemical formulas, and complex table structures commonly found in academic publications. It supports translation between dozens of language pairs, making it invaluable for researchers who need to read papers published in languages outside their expertise.
PDFMathTranslate has gained significant traction in the academic community with over 32,000 GitHub stars, reflecting the widespread need for reliable scientific document translation that respects the specialized formatting requirements of research papers.
## Installation and Setup
Install PDFMathTranslate using pip in a Python environment (Python 3.8 or higher required):
```bash
pip install pdf2zh
```
For GPU-accelerated processing, install with CUDA support:
```bash
pip install pdf2zh[cuda]
```
Configure your translation backend by setting the appropriate environment variable. The tool supports multiple LLM providers:
```bash
# Using OpenAI-compatible APIs
export OPENAI_API_KEY=$OPENAI_API_KEY
export OPENAI_BASE_URL=$OPENAI_BASE_URL
# Using Google Translate (free, no key required)
pdf2zh input.pdf -s google
# Using DeepL
export DEEPL_AUTH_KEY=$DEEPL_AUTH_KEY
pdf2zh input.pdf -s deepl
```
You can also launch the web-based GUI for interactive use:
```bash
pdf2zh -g
```
This starts a Gradio interface where you can upload PDFs and configure translation settings through a browser.
## Core Features
PDFMathTranslate provides several capabilities critical for academic workflows:
**Formula Preservation**: The tool detects and preserves inline and display math environments. Equations rendered in LaTeX, MathML, or image-based formats are identified and left untouched during translation, ensuring mathematical accuracy is maintained.
**Layout Retention**: The translated output maintains the original page layout including headers, footers, figure positions, table structures, and column formatting. This produces a readable document that mirrors the source paper structure.
**Batch Processing**: Translate multiple papers at once using wildcard patterns or directory inputs:
```bash
# Translate all PDFs in a directory
pdf2zh ./papers/*.pdf -lo zh -li en
# Specify output directory
pdf2zh input.pdf -o ./translated/
```
**Dual-Document Output**: Each translation generates two files: a fully translated version and a side-by-side bilingual version where original and translated text appear together, useful for verifying translation quality.
**Selective Translation**: Target specific page ranges to avoid translating references, appendices, or supplementary material:
```bash
# Translate only pages 1 through 15
pdf2zh input.pdf -p 1-15
```
## Research Workflow Integration
PDFMathTranslate fits naturally into several academic research scenarios:
**Literature Review Acceleration**: When conducting systematic reviews across multilingual sources, batch-translate candidate papers to quickly assess relevance before investing time in detailed reading. The preserved formatting means figures and data tables remain interpretable.
**Collaboration Across Languages**: Research teams spanning multiple countries can share translated versions of key papers while maintaining the mathematical rigor of the original. The bilingual output mode is particularly useful for group discussions where team members have different language proficiencies.
**Conference Preparation**: When presenting at international conferences, translate your own papers or related works to prepare for questions and discussions in the host language.
**Integration with Reference Managers**: Combine with Zotero or Mendeley workflows by translating downloaded papers and storing both versions in your reference library:
```bash
# Example workflow with a Zotero storage directory
for pdf in ~/Zotero/storage/*/*.pdf; do
pdf2zh "$pdf" -lo zh -li en -o ~/translated_papers/
done
```
**Quality Verification**: Always use the bilingual output mode when translation accuracy is critical. Cross-reference translated equations with the original to catch any edge cases where formula detection may have been imprecise.
## Configuration and Customization
Fine-tune translation behavior through configuration options:
```bash
# Use a specific model for translation
pdf2zh input.pdf -s openai -m gpt-4o
# Set custom font for translated text
pdf2zh input.pdf --font "Noto Sans CJK SC"
# Adjust thread count for parallel processing
pdf2zh input.pdf -t 4
```
For programmatic use, PDFMathTranslate can be integrated into Python scripts:
```python
from pdf2zh import translate
# Basic translation
translate.translate(
files=["paper.pdf"],
lang_out="zh",
lang_in="en",
service="google"
)
```
## References
- PDFMathTranslate repository: https://github.com/Byaidu/PDFMathTranslate
- PyPI package: https://pypi.org/project/pdf2zh/
- Supported translation services documentation in the project wikiRelated Skills
thuthesis-guide
Write Tsinghua University theses using the ThuThesis LaTeX template
thesis-writing-guide
Templates, formatting rules, and strategies for thesis and dissertation writing
thesis-template-guide
Set up LaTeX templates for PhD and Master's thesis documents
sjtuthesis-guide
Write SJTU theses using the SJTUThesis LaTeX template with full compliance
novathesis-guide
LaTeX thesis template supporting multiple universities and formats
graphical-abstract-guide
Create SVG graphical abstracts for journal paper submissions
beamer-presentation-guide
Guide to creating academic presentations with LaTeX Beamer
plagiarism-detection-guide
Use plagiarism detection tools and ensure manuscript originality
paper-polish-guide
Review and polish LaTeX research papers for clarity and style
grammar-checker-guide
Use grammar and style checking tools to polish academic manuscripts
conciseness-editing-guide
Eliminate wordiness and redundancy in academic prose for clarity
academic-translation-guide
Academic translation, post-editing, and Chinglish correction guide