pdf-process-mineru
PDF document parsing tool based on local MinerU, supports converting PDF to Markdown, JSON, and other machine-readable formats.
Best use case
pdf-process-mineru is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
PDF document parsing tool based on local MinerU, supports converting PDF to Markdown, JSON, and other machine-readable formats.
Teams using pdf-process-mineru should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/pdf-parser-mineru/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How pdf-process-mineru Compares
| Feature / Agent | pdf-process-mineru | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
PDF document parsing tool based on local MinerU, supports converting PDF to Markdown, JSON, and other machine-readable formats.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Top AI Agents for Productivity
See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.
AI Agents for Marketing
Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.
AI Agents for Startups
Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.
SKILL.md Source
## Tool List
### 1. pdf_to_markdown
Convert PDF documents to Markdown format, preserving document structure, formulas, tables, and images.
**Description**: Use MinerU to parse PDF documents and output in Markdown format, supporting OCR, formula recognition, table extraction, and other features.
**Parameters**:
- `file_path` (string, required): Absolute path to the PDF file
- `output_dir` (string, required): Absolute path to the output directory
- `backend` (string, optional): Parsing backend, options: `hybrid-auto-engine` (default), `pipeline`, `vlm-auto-engine`
- `language` (string, optional): OCR language code, such as `en` (English), `ch` (Chinese), `ja` (Japanese), etc., defaults to auto-detection
- `enable_formula` (boolean, optional): Whether to enable formula recognition, defaults to true
- `enable_table` (boolean, optional): Whether to enable table extraction, defaults to true
- `start_page` (integer, optional): Start page number (starting from 0), defaults to 0
- `end_page` (integer, optional): End page number (starting from 0), defaults to -1 meaning parse all pages
**Return Value**:
```json
{
"success": true,
"output_path": "/path/to/output",
"markdown_content": "Converted Markdown content...",
"images": ["List of image paths"],
"tables": ["List of table information"],
"formula_count": 10
}
```
**Examples**:
```bash
python .claude/skills/pdf-process/script/pdf_parser.py \
'{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output"}}'
# Use specific backend
python .claude/skills/pdf-process/script/pdf_parser.py \
'{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "backend": "pipeline"}}'
# Parse specific pages
python .claude/skills/pdf-process/script/pdf_parser.py \
'{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "start_page": 0, "end_page": 5}}'
```
---
### 2. pdf_to_json
Convert PDF documents to JSON format, including detailed layout and structural information.
**Description**: Use MinerU to parse PDF documents and output in JSON format, containing structured information such as text blocks, images, tables, formulas, etc.
**Parameters**:
- `file_path` (string, required): Absolute path to the PDF file
- `output_dir` (string, required): Absolute path to the output directory
- `backend` (string, optional): Parsing backend, options: `hybrid-auto-engine` (default), `pipeline`, `vlm-auto-engine`
- `language` (string, optional): OCR language code, such as `en` (English), `ch` (Chinese), `ja` (Japanese), etc., defaults to auto-detection
- `enable_formula` (boolean, optional): Whether to enable formula recognition, defaults to true
- `enable_table` (boolean, optional): Whether to enable table extraction, defaults to true
- `start_page` (integer, optional): Start page number (starting from 0), defaults to 0
- `end_page` (integer, optional): End page number (starting from 0), defaults to -1 meaning parse all pages
**Return Value**:
```json
{
"success": true,
"output_path": "/path/to/output.json",
"pages": [
{
"page_no": 0,
"page_size": [595, 842],
"blocks": [
{
"type": "text",
"text": "Text content",
"bbox": [x, y, x, y]
}
],
"images": [],
"tables": [],
"formulas": []
}
],
"metadata": {
"total_pages": 10,
"author": "Author",
"title": "Title"
}
}
```
**Examples**:
```bash
python .claude/skills/pdf-process/script/pdf_parser.py \
'{"name": "pdf_to_json", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output"}}'
# Use specific backend and language
python .claude/skills/pdf-process/script/pdf_parser.py \
'{"name": "pdf_to_json", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "backend": "hybrid-auto-engine", "language": "ch"}}'
```
---
## Installation Instructions
### 1. Install MinerU
```bash
# Update pip and install uv
pip install --upgrade pip
pip install uv
# Install MinerU (including all features)
uv pip install -U "mineru[all]"
```
### 2. Verify Installation
```bash
# Check if MinerU is installed successfully
mineru --version
# Test basic functionality
mineru --help
```
### 3. System Requirements
- **Python Version**: 3.10-3.13
- **Operating System**: Linux / Windows / macOS 14.0+
- **Memory**:
- Using `pipeline` backend: minimum 16GB, recommended 32GB+
- Using `hybrid/vlm` backend: minimum 16GB, recommended 32GB+
- **Disk Space**: minimum 20GB (SSD recommended)
- **GPU** (optional):
- `pipeline` backend: supports CPU-only
- `hybrid/vlm` backend: requires NVIDIA GPU (Volta architecture and above) or Apple Silicon
## Use Cases
1. **Academic Paper Parsing**: Extract structured content such as formulas, tables, and images
2. **Technical Document Conversion**: Convert PDF documents to Markdown for version control and online publishing
3. **OCR Processing**: Process scanned PDFs and garbled PDFs
4. **Multilingual Documents**: Supports OCR recognition for 109 languages
5. **Batch Processing**: Batch convert multiple PDF documents
## Backend Selection Recommendations
- **hybrid-auto-engine** (default): Balanced accuracy and speed, suitable for most scenarios
- **pipeline**: Suitable for CPU-only environments, best compatibility
- **vlm-auto-engine**: Highest accuracy, requires GPU acceleration
## Notes
1. **File Paths**: All paths must be absolute paths
2. **Output Directory**: Non-existent directories will be created automatically
3. **Performance**: Using GPU can significantly improve parsing speed
4. **Page Numbers**: Page numbers start counting from 0
5. **Memory**: Processing large documents may consume more memory
## Troubleshooting
### Common Issues
1. **Installation Failure**:
- Ensure using Python 3.10-3.13
- Windows only supports Python 3.10-3.12 (ray does not support 3.13)
- Using `uv pip install` can resolve most dependency conflicts
2. **Insufficient Memory**:
- Use `pipeline` backend
- Limit parsing pages: `start_page` and `end_page`
- Reduce virtual memory allocation
3. **Slow Parsing Speed**:
- Enable GPU acceleration
- Use `hybrid-auto-engine` backend
- Disable unnecessary features (formulas, tables)
4. **Low OCR Accuracy**:
- Specify the correct document language
- Ensure the backend supports OCR (use `pipeline` or `hybrid-*`)
## Related Resources
- MinerU Official Documentation: https://opendatalab.github.io/MinerU/
- MinerU GitHub: https://github.com/opendatalab/MinerU
- Online Demo: https://mineru.net/Related Skills
Insurance Claims Processor
Process, analyze, and optimize insurance claims. Covers property, liability, workers' comp, auto, and professional indemnity.
pixel-art-processing
Pixel art sprite sheet processing tool — video frame extraction, GIF/frames conversion, sprite sheet compose/split, image matting, pixelation, resize, crop, and watermark removal. Use when processing pixel art, game assets, RPG Maker sprites, or any sprite sheet workflow. Triggers on: sprite sheet, GIF拆帧, 序列帧, 像素图片, 抠图, 去水印, 视频转帧, pixel art, sprite, GIF to frames, frames to GIF, background removal, pixelate.
evomap-work-processor
Specialized processor for EvoMap AI work opportunities including formal verification tasks, performance optimization, and concurrent system development. Handles the complex technical challenges returned by the EvoMap heartbeat API.
jq-json-processor
Process, filter, and transform JSON data using jq - the lightweight and flexible command-line JSON processor.
laiye-doc-processing
Enterprise-grade agentic document processing API. Accurately extracts key fields and line items from invoices, receipts, orders and more across 10+ file formats, with confidence scoring. Zero-configuration, fast integration. Professionally optimized on massive enterprise documents.
Sample Text Processor
---
mineru-pdf-extractor
Extract PDF content to Markdown using MinerU API. Supports formulas, tables, OCR. Provides both local file and online URL parsing methods.
Business Process Audit
Identify automation opportunities across any business. Analyzes workflows, estimates time savings, and prioritizes which processes to automate first based on ROI.
process-output
DEFAULT OUTPUT MODE: Always emit machine-parseable `openclaw-process` fenced JSON blocks in your assistant reply so a custom web client can render a live progress panel. Use when: any user message. Skip ONLY when the user explicitly requests no intermediate process (e.g. '只给最终答案'). Keep it lightweight for simple Q&A.
---
name: article-factory-wechat
humanizer
Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.
find-skills
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.