paddleocr-doc-parsing

Parse documents using PaddleOCR's API.

7 stars

Best use case

paddleocr-doc-parsing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Parse documents using PaddleOCR's API.

Teams using paddleocr-doc-parsing should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/paddleocr-doc-parsing/SKILL.md --create-dirs "https://raw.githubusercontent.com/Demerzels-lab/elsamultiskillagent/main/public/skills/bobholamovic/paddleocr-doc-parsing/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/paddleocr-doc-parsing/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How paddleocr-doc-parsing Compares

Feature / Agentpaddleocr-doc-parsingStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Parse documents using PaddleOCR's API.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# PaddleOCR Document Parsing

Parse images and PDF files using PaddleOCR's API. Supports multiple document parsing algorithms with structured output.

## Key Features

- **Multi-format support**: PDF and image files (JPG, PNG, BMP, TIFF)
- **Layout analysis**: Automatic detection of text blocks, tables, formulas
- **Multi-language**: Support for 110+ languages
- **Structured output**: Markdown format with preserved document structure

## Setup

1. Obtain credentials from the [PaddleOCR official website](https://www.paddleocr.com). Click the “API” button, choose the desired algorithm (e.g., PP-Structure, PaddleOCR-VL-1.5), and copy the API URL and the access token.
2. Set environment variables:

```bash
export PADDLEOCR_API_URL="https://your-endpoint-here"
export PADDLEOCR_ACCESS_TOKEN="your_access_token"
```

## Usage Examples

### Run Script

```bash
# Parse local image
{baseDir}/paddleocr_parse.sh document.jpg

# Parse local PDF file
{baseDir}/paddleocr_parse.sh -t pdf document.pdf

# Parse document from URL
{baseDir}/paddleocr_parse.sh -t pdf https://example.com/document.pdf

# Output to stdout (default)
{baseDir}/paddleocr_parse.sh document.jpg

# Save output to file
{baseDir}/paddleocr_parse.sh -o result.json document.jpg
```

### Response Structure

```json
{
  "logId": "unique_request_id",
  "errorCode": 0,
  "errorMsg": "Success",
  "result": {
    "layoutParsingResults": [
      {
        "prunedResult": [...],
        "markdown": {
          "text": "# Document Title\n\nParagraph content...",
          "images": {}
        },
        "outputImages": [...],
        "inputImage": "http://input-image"
      }
    ],
    "dataInfo": {...}
  }
}
```

**Important Fields:**

- **`prunedResult`** - Contains detailed layout element information including positions, categories, etc.
- **`markdown`** - Stores the document content converted to Markdown format with preserved structure and formatting.

## Quota Information

See official documentation: https://ai.baidu.com/ai-doc/AISTUDIO/Xmjclapam

Related Skills

paddleocr-doc-parsing-v2

7
from Demerzels-lab/elsamultiskillagent

Parse documents using PaddleOCR's API.

paylock

7
from Demerzels-lab/elsamultiskillagent

Non-custodial SOL escrow for AI agent deals.

agent-reputation

7
from Demerzels-lab/elsamultiskillagent

summary: Cross-platform AI agent reputation checker with trust scoring and PayLock escrow recommendations.

Telecom Agent Skill

7
from Demerzels-lab/elsamultiskillagent

Turn your AI Agent into a Telecom Operator. Bulk calling, ChatOps, and Field Monitoring.

OpenClaw-Finnhub

7
from Demerzels-lab/elsamultiskillagent

OpenClaw skill for real-time stock quote, and financials via Finnhub API.

```markdown

7
from Demerzels-lab/elsamultiskillagent

# OpenClaw-Last.fm

security-operator

7
from Demerzels-lab/elsamultiskillagent

Runtime security guardrails for OpenClaw agents.

operator-humanizer

7
from Demerzels-lab/elsamultiskillagent

Transform AI-generated text into authentic human writing.

kit-email-operator

7
from Demerzels-lab/elsamultiskillagent

**AI-powered email marketing for Kit (ConvertKit)**.

agora

7
from Demerzels-lab/elsamultiskillagent

Trade prediction markets on Agora — the prediction market exclusively for AI agents. Register, browse markets, trade YES/NO, create markets, earn reputation via Brier scores.

surf-check

7
from Demerzels-lab/elsamultiskillagent

Surf forecast decision engine.

jinko-flight-search

7
from Demerzels-lab/elsamultiskillagent

Search flights and discover travel destinations using the Jinko MCP server. Provides two core capabilities: (1) Destination discovery — find where to travel based on criteria like budget, climate, or activities when the user has no specific destination in mind, and (2) Specific flight search — compare flights between two known cities/airports with flexible dates, cabin classes, and budget filters. Use this skill when the user wants to: search for flights, find cheap flights, discover travel destinations, compare flight prices, plan a trip, find deals from a specific city, or explore where to go. Triggers on any flight-booking, travel-planning, or destination-discovery request. Requires the Jinko MCP server connected at https://mcp.gojinko.com.