pdf-process-mineru

PDF document parsing tool based on local MinerU, supports converting PDF to Markdown, JSON, and other machine-readable formats.

3,891 stars

byopenclaw

View on GitHub Installation ↓

Best use case

pdf-process-mineru is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

PDF document parsing tool based on local MinerU, supports converting PDF to Markdown, JSON, and other machine-readable formats.

Teams using pdf-process-mineru should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/pdf-parser-mineru/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/baokui/pdf-parser-mineru/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/pdf-parser-mineru/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How pdf-process-mineru Compares

Feature / Agent	pdf-process-mineru	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

PDF document parsing tool based on local MinerU, supports converting PDF to Markdown, JSON, and other machine-readable formats.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Top AI Agents for Productivity

See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

SKILL.md Source

## Tool List

### 1. pdf_to_markdown

Convert PDF documents to Markdown format, preserving document structure, formulas, tables, and images.

**Description**: Use MinerU to parse PDF documents and output in Markdown format, supporting OCR, formula recognition, table extraction, and other features.

**Parameters**:
- `file_path` (string, required): Absolute path to the PDF file
- `output_dir` (string, required): Absolute path to the output directory
- `backend` (string, optional): Parsing backend, options: `hybrid-auto-engine` (default), `pipeline`, `vlm-auto-engine`
- `language` (string, optional): OCR language code, such as `en` (English), `ch` (Chinese), `ja` (Japanese), etc., defaults to auto-detection
- `enable_formula` (boolean, optional): Whether to enable formula recognition, defaults to true
- `enable_table` (boolean, optional): Whether to enable table extraction, defaults to true
- `start_page` (integer, optional): Start page number (starting from 0), defaults to 0
- `end_page` (integer, optional): End page number (starting from 0), defaults to -1 meaning parse all pages

**Return Value**:
```json
{
  "success": true,
  "output_path": "/path/to/output",
  "markdown_content": "Converted Markdown content...",
  "images": ["List of image paths"],
  "tables": ["List of table information"],
  "formula_count": 10
}
```

**Examples**:
```bash
python .claude/skills/pdf-process/script/pdf_parser.py \
  '{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output"}}'

# Use specific backend
python .claude/skills/pdf-process/script/pdf_parser.py \
  '{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "backend": "pipeline"}}'

# Parse specific pages
python .claude/skills/pdf-process/script/pdf_parser.py \
  '{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "start_page": 0, "end_page": 5}}'
```

---

### 2. pdf_to_json

Convert PDF documents to JSON format, including detailed layout and structural information.

**Description**: Use MinerU to parse PDF documents and output in JSON format, containing structured information such as text blocks, images, tables, formulas, etc.

**Parameters**:
- `file_path` (string, required): Absolute path to the PDF file
- `output_dir` (string, required): Absolute path to the output directory
- `backend` (string, optional): Parsing backend, options: `hybrid-auto-engine` (default), `pipeline`, `vlm-auto-engine`
- `language` (string, optional): OCR language code, such as `en` (English), `ch` (Chinese), `ja` (Japanese), etc., defaults to auto-detection
- `enable_formula` (boolean, optional): Whether to enable formula recognition, defaults to true
- `enable_table` (boolean, optional): Whether to enable table extraction, defaults to true
- `start_page` (integer, optional): Start page number (starting from 0), defaults to 0
- `end_page` (integer, optional): End page number (starting from 0), defaults to -1 meaning parse all pages

**Return Value**:
```json
{
  "success": true,
  "output_path": "/path/to/output.json",
  "pages": [
    {
      "page_no": 0,
      "page_size": [595, 842],
      "blocks": [
        {
          "type": "text",
          "text": "Text content",
          "bbox": [x, y, x, y]
        }
      ],
      "images": [],
      "tables": [],
      "formulas": []
    }
  ],
  "metadata": {
    "total_pages": 10,
    "author": "Author",
    "title": "Title"
  }
}
```

**Examples**:
```bash
python .claude/skills/pdf-process/script/pdf_parser.py \
  '{"name": "pdf_to_json", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output"}}'

# Use specific backend and language
python .claude/skills/pdf-process/script/pdf_parser.py \
  '{"name": "pdf_to_json", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "backend": "hybrid-auto-engine", "language": "ch"}}'
```

---

## Installation Instructions

### 1. Install MinerU

```bash
# Update pip and install uv
pip install --upgrade pip
pip install uv

# Install MinerU (including all features)
uv pip install -U "mineru[all]"
```

### 2. Verify Installation

```bash
# Check if MinerU is installed successfully
mineru --version

# Test basic functionality
mineru --help
```

### 3. System Requirements

- **Python Version**: 3.10-3.13
- **Operating System**: Linux / Windows / macOS 14.0+
- **Memory**:
  - Using `pipeline` backend: minimum 16GB, recommended 32GB+
  - Using `hybrid/vlm` backend: minimum 16GB, recommended 32GB+
- **Disk Space**: minimum 20GB (SSD recommended)
- **GPU** (optional):
  - `pipeline` backend: supports CPU-only
  - `hybrid/vlm` backend: requires NVIDIA GPU (Volta architecture and above) or Apple Silicon

## Use Cases

1. **Academic Paper Parsing**: Extract structured content such as formulas, tables, and images
2. **Technical Document Conversion**: Convert PDF documents to Markdown for version control and online publishing
3. **OCR Processing**: Process scanned PDFs and garbled PDFs
4. **Multilingual Documents**: Supports OCR recognition for 109 languages
5. **Batch Processing**: Batch convert multiple PDF documents

## Backend Selection Recommendations

- **hybrid-auto-engine** (default): Balanced accuracy and speed, suitable for most scenarios
- **pipeline**: Suitable for CPU-only environments, best compatibility
- **vlm-auto-engine**: Highest accuracy, requires GPU acceleration

## Notes

1. **File Paths**: All paths must be absolute paths
2. **Output Directory**: Non-existent directories will be created automatically
3. **Performance**: Using GPU can significantly improve parsing speed
4. **Page Numbers**: Page numbers start counting from 0
5. **Memory**: Processing large documents may consume more memory

## Troubleshooting

### Common Issues

1. **Installation Failure**:
   - Ensure using Python 3.10-3.13
   - Windows only supports Python 3.10-3.12 (ray does not support 3.13)
   - Using `uv pip install` can resolve most dependency conflicts

2. **Insufficient Memory**:
   - Use `pipeline` backend
   - Limit parsing pages: `start_page` and `end_page`
   - Reduce virtual memory allocation

3. **Slow Parsing Speed**:
   - Enable GPU acceleration
   - Use `hybrid-auto-engine` backend
   - Disable unnecessary features (formulas, tables)

4. **Low OCR Accuracy**:
   - Specify the correct document language
   - Ensure the backend supports OCR (use `pipeline` or `hybrid-*`)

## Related Resources

- MinerU Official Documentation: https://opendatalab.github.io/MinerU/
- MinerU GitHub: https://github.com/opendatalab/MinerU
- Online Demo: https://mineru.net/

Related Skills

Insurance Claims Processor

3891

from openclaw/skills

Process, analyze, and optimize insurance claims. Covers property, liability, workers' comp, auto, and professional indemnity.

Finance & Legal

pixel-art-processing

3891

from openclaw/skills

Pixel art sprite sheet processing tool — video frame extraction, GIF/frames conversion, sprite sheet compose/split, image matting, pixelation, resize, crop, and watermark removal. Use when processing pixel art, game assets, RPG Maker sprites, or any sprite sheet workflow. Triggers on: sprite sheet, GIF拆帧, 序列帧, 像素图片, 抠图, 去水印, 视频转帧, pixel art, sprite, GIF to frames, frames to GIF, background removal, pixelate.

evomap-work-processor

3891

from openclaw/skills

Specialized processor for EvoMap AI work opportunities including formal verification tasks, performance optimization, and concurrent system development. Handles the complex technical challenges returned by the EvoMap heartbeat API.

jq-json-processor

3891

from openclaw/skills

Process, filter, and transform JSON data using jq - the lightweight and flexible command-line JSON processor.

laiye-doc-processing

3891

from openclaw/skills

Enterprise-grade agentic document processing API. Accurately extracts key fields and line items from invoices, receipts, orders and more across 10+ file formats, with confidence scoring. Zero-configuration, fast integration. Professionally optimized on massive enterprise documents.

Sample Text Processor

3891

from openclaw/skills

---

mineru-pdf-extractor

3891

from openclaw/skills

Extract PDF content to Markdown using MinerU API. Supports formulas, tables, OCR. Provides both local file and online URL parsing methods.

Business Process Audit

3891

from openclaw/skills

Identify automation opportunities across any business. Analyzes workflows, estimates time savings, and prioritizes which processes to automate first based on ROI.

process-output

3880

from openclaw/skills

DEFAULT OUTPUT MODE: Always emit machine-parseable `openclaw-process` fenced JSON blocks in your assistant reply so a custom web client can render a live progress panel. Use when: any user message. Skip ONLY when the user explicitly requests no intermediate process (e.g. '只给最终答案'). Keep it lightweight for simple Q&A.

---

3891

from openclaw/skills

name: article-factory-wechat

Content & Documentation

humanizer

3891

from openclaw/skills

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.

Content & Documentation

find-skills

3891

from openclaw/skills

Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.

General Utilities

pdf-process-mineru

Best use case

When to use this skill

When not to use this skill

Installation

How pdf-process-mineru Compares

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

Related Guides

Top AI Agents for Productivity

AI Agents for Marketing

AI Agents for Startups

SKILL.md Source

Related Skills

Insurance Claims Processor

pixel-art-processing

evomap-work-processor

jq-json-processor

laiye-doc-processing

Sample Text Processor

mineru-pdf-extractor

Business Process Audit

process-output

﻿---

humanizer

find-skills

---