markdrop
Professional AI skill and usage instructions for the Markdrop package, a Python tool for converting PDFs to Markdown/HTML with AI-powered image/table descriptions.
About this skill
Markdrop provides a robust solution for transforming static PDF content into highly structured and semantically enriched formats like Markdown and HTML. Its core functionality includes accurately retaining document formatting, extracting visual elements such as images and tables, and critically, augmenting these with AI-generated descriptions. Users can configure Markdrop to utilize various leading AI vision models, including GEMINI, OPENAI, ANTHROPIC, GROQ, OPENROUTER, and LITELLM, to interpret and describe complex visual content. This skill is invaluable for automating document processing workflows, preparing content for web publishing or knowledge bases, and extracting deep insights from visual data embedded within PDFs. By offering batch processing capabilities and extensive configuration options for model selection, prompts, and output features, Markdrop significantly enhances the utility of PDF documents for AI agents and developers alike. It enables programmatic control over the conversion process, ensuring that documents are not just converted, but also intelligently analyzed and described, making their content more accessible and machine-readable for further AI processing or human consumption.
Best use case
The primary use case for Markdrop is the automated conversion of PDF documents into enriched Markdown or HTML, specifically for tasks requiring intelligent extraction and description of visual elements like images and tables. AI agents, developers, data analysts, and content management systems benefit most when needing to process large volumes of PDFs to extract structured data, generate summaries, or prepare content for indexing and search, where the visual context is crucial.
Professional AI skill and usage instructions for the Markdrop package, a Python tool for converting PDFs to Markdown/HTML with AI-powered image/table descriptions.
A structured Markdown or HTML file derived from a PDF, complete with extracted images, tables, and AI-generated textual descriptions for all visual elements.
Practical example
Example input
Convert the financial report PDF located at 'https://example.com/quarterly_report.pdf' into Markdown, ensuring all charts and tables are described using the Gemini model. Save the output to 'output/report.md'.
Example output
```markdown # Q4 2023 Financial Report ## Executive Summary ... ### Revenue Breakdown [Image: A bar chart showing revenue distribution across product lines. Products A and B contribute significantly, while Product C shows moderate growth.] | Product Line | Q4 Revenue ($M) | YoY Growth | |--------------|-----------------|------------| | Product A | 120 | 15% | | Product B | 85 | 10% | | Product C | 40 | 20% | [Table: Quarterly revenue figures and year-over-year growth for key product lines, indicating strong performance across the board.] ```
When to use this skill
- When converting complex PDF documents into structured Markdown or HTML formats.
- When you need AI-generated descriptions for images and tables extracted from PDFs.
- For batch processing multiple PDF files or directories of images for analysis.
- When integrating robust PDF content extraction and AI description capabilities into other applications or agent workflows.
When not to use this skill
- If you only require basic text extraction without any visual content analysis.
- When working with document types other than PDF.
- For interactive PDF editing, form filling, or digital signing tasks.
- If strict offline processing is required, as AI features rely on external API calls.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/markdrop/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How markdrop Compares
| Feature / Agent | markdrop | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | easy | N/A |
Frequently Asked Questions
What does this skill do?
Professional AI skill and usage instructions for the Markdrop package, a Python tool for converting PDFs to Markdown/HTML with AI-powered image/table descriptions.
How difficult is it to install?
The installation complexity is rated as easy. You can find the installation instructions above.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
Best AI Skills for ChatGPT
Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.
SKILL.md Source
# Markdrop Skill
Welcome to the `markdrop` skill. `markdrop` is a powerful Python package and CLI tool used to convert PDF documents into structured Markdown and interactive HTML, while natively leveraging AI vision models to interpret and describe extracted images and tables.
If you are an AI agent or a user aiming to process PDFs and augment them with text or image descriptions, this document serves as your complete guide on utilizing `markdrop` efficiently and accurately.
## 1. Capabilities
- **PDF to Markdown/HTML**: Retains formatting, extracts images, and detects tables via Microsoft Table Transformer and Docling. Supports processing both local file paths and direct PDF URLs.
- **AI Vision Descriptions**: Query GEMINI, OPENAI, ANTHROPIC, GROQ, OPENROUTER, or LITELLM to generate rich descriptions of images and tables.
- **Batch Processing**: Describe entire directories of images in single commands using multiple LLM backends simultaneously.
- **Extensible Configuration**: Precise override control over which structural text-models vs vision-models are used, as well as prompts, resolution scales, and output features.
## 2. Installation
Core installation (includes standard `gemini` and `openai` integrations):
```bash
pip install markdrop
```
To install specific provider integrations natively:
```bash
pip install "markdrop[anthropic]"
pip install "markdrop[groq]"
pip install "markdrop[litellm]"
pip install "markdrop[all]" # Installs everything natively
```
*(Note: `gemini`, `openai`, and `openrouter` functionalities are present in the core install out-of-the-box).*
## 3. API Keys Setup
Before using AI features, API keys must be available in the root `.env` file or environment variables.
If deploying programmatically, you can run the built-in CLI command, or inject them into `os.environ`:
```bash
markdrop setup gemini # -> GEMINI_API_KEY
markdrop setup openai # -> OPENAI_API_KEY
markdrop setup anthropic # -> ANTHROPIC_API_KEY
markdrop setup groq # -> GROQ_API_KEY
markdrop setup openrouter # -> OPENROUTER_API_KEY
markdrop setup litellm # -> LITELLM_API_KEY
```
## 4. Python API Integration
The Python API is the recommended way to embed `markdrop` into applications.
### 4.1 PDF Conversion to Interactive HTML
Use `markdrop` function combined with `add_downloadable_tables`:
```python
from markdrop import markdrop, MarkDropConfig, add_downloadable_tables
from pathlib import Path
import logging
# Configuration block
config = MarkDropConfig(
image_resolution_scale=2.0,
download_button_color='#444444',
log_level=logging.INFO,
log_dir='logs',
excel_dir='markdrop-excel-tables',
)
# 1. Convert PDF to base HTML (and markdown locally). URL supported here too: "https://url.to/pdf"
html_path = markdrop("path/to/document.pdf", "output_directory", config)
# 2. Enrich HTML to allow downloading tables as Excel sheets
enhanced_html_path = add_downloadable_tables(html_path, config)
```
### 4.2 Injecting AI Descriptions into Markdown
If you have a Markdown file containing image/table links, `process_markdown` automatically routes vision requests to the chosen provider and inserts contextual descriptions.
```python
from markdrop import process_markdown, ProcessorConfig, AIProvider
config = ProcessorConfig(
input_path="output_directory/document.md",
output_dir="output_directory",
ai_provider=AIProvider.GEMINI, # Available: GEMINI, OPENAI, ANTHROPIC, GROQ, OPENROUTER, LITELLM
# Target configurations
remove_images=False,
remove_tables=False,
table_descriptions=True,
image_descriptions=True,
# Provider-Specific overrides (Optional)
# Allows granular decoupling of vision parsing vs table text-parsing
model_name_override="gemini-2.0-flash", # Primary vision analysis model
text_model_name_override="gemini-2.0-flash" # Lean text-only model for generic parsing
)
# Executes AI processing and saves the enriched document
output_path = process_markdown(config)
```
### 4.3 Batch Image Description
For standalone image directories or files:
```python
from markdrop import generate_descriptions
generate_descriptions(
input_path='images_folder/',
output_dir='descriptions_output/',
prompt='Analyze this image and describe all textual and structural elements.',
llm_client=['gemini', 'openai'],
)
```
## 5. CLI Execution Best Practices
As an agent, you can also trigger `markdrop` workflows via Bash.
1. **Convert PDF to MD/HTML (including tables)**:
```bash
markdrop convert <input_path_or_url> --output_dir <dir> --add_tables
```
2. **Run AI Provider over the Markdown Output with exact models**:
```bash
markdrop describe <markdown_file> \
--ai_provider anthropic \
--model claude-opus-4-6 \
--text-model claude-sonnet-4-5 \
--remove_images
```
3. **Only Analyze / Extract Images**:
```bash
# Also accepts URLs directly
markdrop analyze https://domain.com/report.pdf --output_dir pdf_analysis --save_images
```
4. **Batch Image Description**:
```bash
markdrop generate images/ --output_dir descriptions/ \
--prompt "Describe in detail." \
--llm_client gemini openai
```
## 6. Typical Model Fallbacks & Suggestions
- **Default / Cost-Effective**: `gemini` (Gemini 2.0 Flash) is frequently the fastest and cheapest for large scale document evaluation.
- **High Complexity / Intricate Tables**: `anthropic` with the latest Claude models (`claude-opus-4-6` or `claude-sonnet-4-5`) excel in reasoning and formatting.
- **Maximum Speed**: `groq` using LLaMA models.
Whenever instantiating `ProcessorConfig`, be exact about paths—use absolute paths if the current working directory is dynamically changing.Related Skills
tavily-search
Use Tavily API for real-time web search and content extraction. Use when: user needs real-time web search results, research, or current information from the web. Requires Tavily API key.
baidu-search
Search the web using Baidu AI Search Engine (BDSE). Use for live information, documentation, or research topics.
notebooklm
Google NotebookLM 非官方 Python API 的 OpenClaw Skill。支持内容生成(播客、视频、幻灯片、测验、思维导图等)、文档管理和研究自动化。当用户需要使用 NotebookLM 生成音频概述、视频、学习材料或管理知识库时触发。
openclaw-search
Intelligent search for agents. Multi-source retrieval with confidence scoring - web, academic, and Tavily in one unified API.
aisa-tavily
AI-optimized web search via AIsa's Tavily API proxy. Returns concise, relevant results for AI agents through AIsa's unified API gateway.
Market Sizing — TAM/SAM/SOM Calculator
Build defensible market sizing for any product, pitch deck, or business case. Top-down and bottom-up methodologies combined.
Data Analyst — AfrexAI ⚡📊
**Transform raw data into decisions. Not just charts — answers.**
Competitor Monitor
Tracks and analyzes competitor moves — pricing changes, feature launches, hiring, and positioning shifts
afrexai-competitive-intel
Complete competitive intelligence system — market mapping, product teardowns, pricing intel, win/loss analysis, battlecards, and strategic monitoring. Goes far beyond SEO to cover the full business landscape.
trending-news-aggregator
智能热点新闻聚合器 - 自动抓取多平台热点新闻, AI分析趋势,支持定时推送和热度评分。 核心功能: - 每天自动聚合多平台热点(微博、知乎、百度等) - 智能分类(科技、财经、社会、国际等) - 热度评分算法 - 增量检测(标记新增热点) - AI趋势分析
search-cluster
Aggregated search aggregator using Google CSE, GNews RSS, Wikipedia, Reddit, and Scrapling.
data-analysis-partner
智能数据分析 Skill,输入 CSV/Excel 文件和分析需求,输出带交互式 ECharts 图表的 HTML 自包含分析报告