markdrop

Professional AI skill and usage instructions for the Markdrop package, a Python tool for converting PDFs to Markdown/HTML with AI-powered image/table descriptions.

198 stars

byshoryasethia

Complexity: easy

View on GitHub Installation ↓

About this skill

Markdrop provides a robust solution for transforming static PDF content into highly structured and semantically enriched formats like Markdown and HTML. Its core functionality includes accurately retaining document formatting, extracting visual elements such as images and tables, and critically, augmenting these with AI-generated descriptions. Users can configure Markdrop to utilize various leading AI vision models, including GEMINI, OPENAI, ANTHROPIC, GROQ, OPENROUTER, and LITELLM, to interpret and describe complex visual content. This skill is invaluable for automating document processing workflows, preparing content for web publishing or knowledge bases, and extracting deep insights from visual data embedded within PDFs. By offering batch processing capabilities and extensive configuration options for model selection, prompts, and output features, Markdrop significantly enhances the utility of PDF documents for AI agents and developers alike. It enables programmatic control over the conversion process, ensuring that documents are not just converted, but also intelligently analyzed and described, making their content more accessible and machine-readable for further AI processing or human consumption.

Best use case

The primary use case for Markdrop is the automated conversion of PDF documents into enriched Markdown or HTML, specifically for tasks requiring intelligent extraction and description of visual elements like images and tables. AI agents, developers, data analysts, and content management systems benefit most when needing to process large volumes of PDFs to extract structured data, generate summaries, or prepare content for indexing and search, where the visual context is crucial.

Professional AI skill and usage instructions for the Markdrop package, a Python tool for converting PDFs to Markdown/HTML with AI-powered image/table descriptions.

A structured Markdown or HTML file derived from a PDF, complete with extracted images, tables, and AI-generated textual descriptions for all visual elements.

Practical example

Example input

Convert the financial report PDF located at 'https://example.com/quarterly_report.pdf' into Markdown, ensuring all charts and tables are described using the Gemini model. Save the output to 'output/report.md'.

Example output

```markdown
# Q4 2023 Financial Report

## Executive Summary
...

### Revenue Breakdown

[Image: A bar chart showing revenue distribution across product lines. Products A and B contribute significantly, while Product C shows moderate growth.]

| Product Line | Q4 Revenue ($M) | YoY Growth |
|--------------|-----------------|------------|
| Product A    | 120             | 15%        |
| Product B    | 85              | 10%        |
| Product C    | 40              | 20%        |
[Table: Quarterly revenue figures and year-over-year growth for key product lines, indicating strong performance across the board.]
```

When to use this skill

When converting complex PDF documents into structured Markdown or HTML formats.
When you need AI-generated descriptions for images and tables extracted from PDFs.
For batch processing multiple PDF files or directories of images for analysis.
When integrating robust PDF content extraction and AI description capabilities into other applications or agent workflows.

When not to use this skill

If you only require basic text extraction without any visual content analysis.
When working with document types other than PDF.
For interactive PDF editing, form filling, or digital signing tasks.
If strict offline processing is required, as AI features rely on external API calls.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/markdrop/SKILL.md --create-dirs "https://raw.githubusercontent.com/shoryasethia/markdrop/main/.agent/skills/markdrop/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/markdrop/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How markdrop Compares

Feature / Agent	markdrop	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	easy	N/A

Frequently Asked Questions

What does this skill do?

Professional AI skill and usage instructions for the Markdrop package, a Python tool for converting PDFs to Markdown/HTML with AI-powered image/table descriptions.

How difficult is it to install?

The installation complexity is rated as easy. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

Best AI Skills for ChatGPT

Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.

SKILL.md Source

# Markdrop Skill

Welcome to the `markdrop` skill. `markdrop` is a powerful Python package and CLI tool used to convert PDF documents into structured Markdown and interactive HTML, while natively leveraging AI vision models to interpret and describe extracted images and tables.

If you are an AI agent or a user aiming to process PDFs and augment them with text or image descriptions, this document serves as your complete guide on utilizing `markdrop` efficiently and accurately.

## 1. Capabilities

- **PDF to Markdown/HTML**: Retains formatting, extracts images, and detects tables via Microsoft Table Transformer and Docling. Supports processing both local file paths and direct PDF URLs.
- **AI Vision Descriptions**: Query GEMINI, OPENAI, ANTHROPIC, GROQ, OPENROUTER, or LITELLM to generate rich descriptions of images and tables.
- **Batch Processing**: Describe entire directories of images in single commands using multiple LLM backends simultaneously.
- **Extensible Configuration**: Precise override control over which structural text-models vs vision-models are used, as well as prompts, resolution scales, and output features.

## 2. Installation

Core installation (includes standard `gemini` and `openai` integrations):
```bash
pip install markdrop
```

To install specific provider integrations natively:
```bash
pip install "markdrop[anthropic]"
pip install "markdrop[groq]"
pip install "markdrop[litellm]"
pip install "markdrop[all]"       # Installs everything natively
```
*(Note: `gemini`, `openai`, and `openrouter` functionalities are present in the core install out-of-the-box).*

## 3. API Keys Setup

Before using AI features, API keys must be available in the root `.env` file or environment variables.

If deploying programmatically, you can run the built-in CLI command, or inject them into `os.environ`:
```bash
markdrop setup gemini     # -> GEMINI_API_KEY
markdrop setup openai     # -> OPENAI_API_KEY
markdrop setup anthropic  # -> ANTHROPIC_API_KEY
markdrop setup groq       # -> GROQ_API_KEY
markdrop setup openrouter # -> OPENROUTER_API_KEY
markdrop setup litellm    # -> LITELLM_API_KEY
```

## 4. Python API Integration

The Python API is the recommended way to embed `markdrop` into applications.

### 4.1 PDF Conversion to Interactive HTML
Use `markdrop` function combined with `add_downloadable_tables`:

```python
from markdrop import markdrop, MarkDropConfig, add_downloadable_tables
from pathlib import Path
import logging

# Configuration block
config = MarkDropConfig(
    image_resolution_scale=2.0,
    download_button_color='#444444',
    log_level=logging.INFO,
    log_dir='logs',
    excel_dir='markdrop-excel-tables',
)

# 1. Convert PDF to base HTML (and markdown locally). URL supported here too: "https://url.to/pdf"
html_path = markdrop("path/to/document.pdf", "output_directory", config)

# 2. Enrich HTML to allow downloading tables as Excel sheets
enhanced_html_path = add_downloadable_tables(html_path, config)
```

### 4.2 Injecting AI Descriptions into Markdown
If you have a Markdown file containing image/table links, `process_markdown` automatically routes vision requests to the chosen provider and inserts contextual descriptions.

```python
from markdrop import process_markdown, ProcessorConfig, AIProvider

config = ProcessorConfig(
    input_path="output_directory/document.md",
    output_dir="output_directory",
    ai_provider=AIProvider.GEMINI,  # Available: GEMINI, OPENAI, ANTHROPIC, GROQ, OPENROUTER, LITELLM
    
    # Target configurations
    remove_images=False,
    remove_tables=False,
    table_descriptions=True,
    image_descriptions=True,
    
    # Provider-Specific overrides (Optional)
    # Allows granular decoupling of vision parsing vs table text-parsing
    model_name_override="gemini-2.0-flash",           # Primary vision analysis model
    text_model_name_override="gemini-2.0-flash"       # Lean text-only model for generic parsing
)

# Executes AI processing and saves the enriched document
output_path = process_markdown(config)
```

### 4.3 Batch Image Description
For standalone image directories or files:
```python
from markdrop import generate_descriptions

generate_descriptions(
    input_path='images_folder/',
    output_dir='descriptions_output/',
    prompt='Analyze this image and describe all textual and structural elements.',
    llm_client=['gemini', 'openai'], 
)
```

## 5. CLI Execution Best Practices

As an agent, you can also trigger `markdrop` workflows via Bash.

1. **Convert PDF to MD/HTML (including tables)**:
   ```bash
   markdrop convert <input_path_or_url> --output_dir <dir> --add_tables
   ```

2. **Run AI Provider over the Markdown Output with exact models**:
   ```bash
   markdrop describe <markdown_file> \
       --ai_provider anthropic \
       --model claude-opus-4-6 \
       --text-model claude-sonnet-4-5 \
       --remove_images
   ```

3. **Only Analyze / Extract Images**:
   ```bash
   # Also accepts URLs directly
   markdrop analyze https://domain.com/report.pdf --output_dir pdf_analysis --save_images
   ```

4. **Batch Image Description**:
   ```bash
   markdrop generate images/ --output_dir descriptions/ \
       --prompt "Describe in detail." \
       --llm_client gemini openai
   ```

## 6. Typical Model Fallbacks & Suggestions

- **Default / Cost-Effective**: `gemini` (Gemini 2.0 Flash) is frequently the fastest and cheapest for large scale document evaluation.
- **High Complexity / Intricate Tables**: `anthropic` with the latest Claude models (`claude-opus-4-6` or `claude-sonnet-4-5`) excel in reasoning and formatting.
- **Maximum Speed**: `groq` using LLaMA models.

Whenever instantiating `ProcessorConfig`, be exact about paths—use absolute paths if the current working directory is dynamically changing.

Data & Research