pdf-process-mineru

PDF document parsing tool based on local MinerU, supports converting PDF to Markdown, JSON, and other machine-readable formats.

3,891 stars

Best use case

pdf-process-mineru is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

PDF document parsing tool based on local MinerU, supports converting PDF to Markdown, JSON, and other machine-readable formats.

Teams using pdf-process-mineru should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/pdf-parser-mineru/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/baokui/pdf-parser-mineru/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/pdf-parser-mineru/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How pdf-process-mineru Compares

Feature / Agentpdf-process-mineruStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

PDF document parsing tool based on local MinerU, supports converting PDF to Markdown, JSON, and other machine-readable formats.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

## Tool List

### 1. pdf_to_markdown

Convert PDF documents to Markdown format, preserving document structure, formulas, tables, and images.

**Description**: Use MinerU to parse PDF documents and output in Markdown format, supporting OCR, formula recognition, table extraction, and other features.

**Parameters**:
- `file_path` (string, required): Absolute path to the PDF file
- `output_dir` (string, required): Absolute path to the output directory
- `backend` (string, optional): Parsing backend, options: `hybrid-auto-engine` (default), `pipeline`, `vlm-auto-engine`
- `language` (string, optional): OCR language code, such as `en` (English), `ch` (Chinese), `ja` (Japanese), etc., defaults to auto-detection
- `enable_formula` (boolean, optional): Whether to enable formula recognition, defaults to true
- `enable_table` (boolean, optional): Whether to enable table extraction, defaults to true
- `start_page` (integer, optional): Start page number (starting from 0), defaults to 0
- `end_page` (integer, optional): End page number (starting from 0), defaults to -1 meaning parse all pages

**Return Value**:
```json
{
  "success": true,
  "output_path": "/path/to/output",
  "markdown_content": "Converted Markdown content...",
  "images": ["List of image paths"],
  "tables": ["List of table information"],
  "formula_count": 10
}
```

**Examples**:
```bash
python .claude/skills/pdf-process/script/pdf_parser.py \
  '{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output"}}'

# Use specific backend
python .claude/skills/pdf-process/script/pdf_parser.py \
  '{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "backend": "pipeline"}}'

# Parse specific pages
python .claude/skills/pdf-process/script/pdf_parser.py \
  '{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "start_page": 0, "end_page": 5}}'
```

---

### 2. pdf_to_json

Convert PDF documents to JSON format, including detailed layout and structural information.

**Description**: Use MinerU to parse PDF documents and output in JSON format, containing structured information such as text blocks, images, tables, formulas, etc.

**Parameters**:
- `file_path` (string, required): Absolute path to the PDF file
- `output_dir` (string, required): Absolute path to the output directory
- `backend` (string, optional): Parsing backend, options: `hybrid-auto-engine` (default), `pipeline`, `vlm-auto-engine`
- `language` (string, optional): OCR language code, such as `en` (English), `ch` (Chinese), `ja` (Japanese), etc., defaults to auto-detection
- `enable_formula` (boolean, optional): Whether to enable formula recognition, defaults to true
- `enable_table` (boolean, optional): Whether to enable table extraction, defaults to true
- `start_page` (integer, optional): Start page number (starting from 0), defaults to 0
- `end_page` (integer, optional): End page number (starting from 0), defaults to -1 meaning parse all pages

**Return Value**:
```json
{
  "success": true,
  "output_path": "/path/to/output.json",
  "pages": [
    {
      "page_no": 0,
      "page_size": [595, 842],
      "blocks": [
        {
          "type": "text",
          "text": "Text content",
          "bbox": [x, y, x, y]
        }
      ],
      "images": [],
      "tables": [],
      "formulas": []
    }
  ],
  "metadata": {
    "total_pages": 10,
    "author": "Author",
    "title": "Title"
  }
}
```

**Examples**:
```bash
python .claude/skills/pdf-process/script/pdf_parser.py \
  '{"name": "pdf_to_json", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output"}}'

# Use specific backend and language
python .claude/skills/pdf-process/script/pdf_parser.py \
  '{"name": "pdf_to_json", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "backend": "hybrid-auto-engine", "language": "ch"}}'
```

---

## Installation Instructions

### 1. Install MinerU

```bash
# Update pip and install uv
pip install --upgrade pip
pip install uv

# Install MinerU (including all features)
uv pip install -U "mineru[all]"
```

### 2. Verify Installation

```bash
# Check if MinerU is installed successfully
mineru --version

# Test basic functionality
mineru --help
```

### 3. System Requirements

- **Python Version**: 3.10-3.13
- **Operating System**: Linux / Windows / macOS 14.0+
- **Memory**:
  - Using `pipeline` backend: minimum 16GB, recommended 32GB+
  - Using `hybrid/vlm` backend: minimum 16GB, recommended 32GB+
- **Disk Space**: minimum 20GB (SSD recommended)
- **GPU** (optional):
  - `pipeline` backend: supports CPU-only
  - `hybrid/vlm` backend: requires NVIDIA GPU (Volta architecture and above) or Apple Silicon

## Use Cases

1. **Academic Paper Parsing**: Extract structured content such as formulas, tables, and images
2. **Technical Document Conversion**: Convert PDF documents to Markdown for version control and online publishing
3. **OCR Processing**: Process scanned PDFs and garbled PDFs
4. **Multilingual Documents**: Supports OCR recognition for 109 languages
5. **Batch Processing**: Batch convert multiple PDF documents

## Backend Selection Recommendations

- **hybrid-auto-engine** (default): Balanced accuracy and speed, suitable for most scenarios
- **pipeline**: Suitable for CPU-only environments, best compatibility
- **vlm-auto-engine**: Highest accuracy, requires GPU acceleration

## Notes

1. **File Paths**: All paths must be absolute paths
2. **Output Directory**: Non-existent directories will be created automatically
3. **Performance**: Using GPU can significantly improve parsing speed
4. **Page Numbers**: Page numbers start counting from 0
5. **Memory**: Processing large documents may consume more memory

## Troubleshooting

### Common Issues

1. **Installation Failure**:
   - Ensure using Python 3.10-3.13
   - Windows only supports Python 3.10-3.12 (ray does not support 3.13)
   - Using `uv pip install` can resolve most dependency conflicts

2. **Insufficient Memory**:
   - Use `pipeline` backend
   - Limit parsing pages: `start_page` and `end_page`
   - Reduce virtual memory allocation

3. **Slow Parsing Speed**:
   - Enable GPU acceleration
   - Use `hybrid-auto-engine` backend
   - Disable unnecessary features (formulas, tables)

4. **Low OCR Accuracy**:
   - Specify the correct document language
   - Ensure the backend supports OCR (use `pipeline` or `hybrid-*`)

## Related Resources

- MinerU Official Documentation: https://opendatalab.github.io/MinerU/
- MinerU GitHub: https://github.com/opendatalab/MinerU
- Online Demo: https://mineru.net/

Related Skills

Insurance Claims Processor

3891
from openclaw/skills

Process, analyze, and optimize insurance claims. Covers property, liability, workers' comp, auto, and professional indemnity.

Finance & Legal

pixel-art-processing

3891
from openclaw/skills

Pixel art sprite sheet processing tool — video frame extraction, GIF/frames conversion, sprite sheet compose/split, image matting, pixelation, resize, crop, and watermark removal. Use when processing pixel art, game assets, RPG Maker sprites, or any sprite sheet workflow. Triggers on: sprite sheet, GIF拆帧, 序列帧, 像素图片, 抠图, 去水印, 视频转帧, pixel art, sprite, GIF to frames, frames to GIF, background removal, pixelate.

evomap-work-processor

3891
from openclaw/skills

Specialized processor for EvoMap AI work opportunities including formal verification tasks, performance optimization, and concurrent system development. Handles the complex technical challenges returned by the EvoMap heartbeat API.

jq-json-processor

3891
from openclaw/skills

Process, filter, and transform JSON data using jq - the lightweight and flexible command-line JSON processor.

laiye-doc-processing

3891
from openclaw/skills

Enterprise-grade agentic document processing API. Accurately extracts key fields and line items from invoices, receipts, orders and more across 10+ file formats, with confidence scoring. Zero-configuration, fast integration. Professionally optimized on massive enterprise documents.

Sample Text Processor

3891
from openclaw/skills

---

mineru-pdf-extractor

3891
from openclaw/skills

Extract PDF content to Markdown using MinerU API. Supports formulas, tables, OCR. Provides both local file and online URL parsing methods.

Business Process Audit

3891
from openclaw/skills

Identify automation opportunities across any business. Analyzes workflows, estimates time savings, and prioritizes which processes to automate first based on ROI.

process-output

3880
from openclaw/skills

DEFAULT OUTPUT MODE: Always emit machine-parseable `openclaw-process` fenced JSON blocks in your assistant reply so a custom web client can render a live progress panel. Use when: any user message. Skip ONLY when the user explicitly requests no intermediate process (e.g. '只给最终答案'). Keep it lightweight for simple Q&A.

---

3891
from openclaw/skills

name: article-factory-wechat

Content & Documentation

humanizer

3891
from openclaw/skills

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.

Content & Documentation

find-skills

3891
from openclaw/skills

Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.

General Utilities