markitdown
Convert various file formats (PDF, Office documents, images, audio, web content, structured data) to Markdown optimized for LLM processing. Use when converting documents to markdown, extracting text from PDFs/Office files, transcribing audio, performing OCR on images, extracting YouTube transcripts, or processing batches of files. Supports 20+ formats including DOCX, XLSX, PPTX, PDF, HTML, EPUB, CSV, JSON, images with OCR, and audio with transcription.
Best use case
markitdown is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Convert various file formats (PDF, Office documents, images, audio, web content, structured data) to Markdown optimized for LLM processing. Use when converting documents to markdown, extracting text from PDFs/Office files, transcribing audio, performing OCR on images, extracting YouTube transcripts, or processing batches of files. Supports 20+ formats including DOCX, XLSX, PPTX, PDF, HTML, EPUB, CSV, JSON, images with OCR, and audio with transcription.
Teams using markitdown should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/markitdown/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How markitdown Compares
| Feature / Agent | markitdown | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Convert various file formats (PDF, Office documents, images, audio, web content, structured data) to Markdown optimized for LLM processing. Use when converting documents to markdown, extracting text from PDFs/Office files, transcribing audio, performing OCR on images, extracting YouTube transcripts, or processing batches of files. Supports 20+ formats including DOCX, XLSX, PPTX, PDF, HTML, EPUB, CSV, JSON, images with OCR, and audio with transcription.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
# MarkItDown
## Overview
MarkItDown is a Python utility that converts various file formats into Markdown format, optimized for use with large language models and text analysis pipelines. It preserves document structure (headings, lists, tables, hyperlinks) while producing clean, token-efficient Markdown output.
## When to Use This Skill
Use this skill when users request:
- Converting documents to Markdown format
- Extracting text from PDF, Word, PowerPoint, or Excel files
- Performing OCR on images to extract text
- Transcribing audio files to text
- Extracting YouTube video transcripts
- Processing HTML, EPUB, or web content to Markdown
- Converting structured data (CSV, JSON, XML) to readable Markdown
- Batch converting multiple files or ZIP archives
- Preparing documents for LLM analysis or RAG systems
## Core Capabilities
### 1. Document Conversion
Convert Office documents and PDFs to Markdown while preserving structure.
**Supported formats:**
- PDF files (with optional Azure Document Intelligence integration)
- Word documents (DOCX)
- PowerPoint presentations (PPTX)
- Excel spreadsheets (XLSX, XLS)
**Basic usage:**
```python
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("document.pdf")
print(result.text_content)
```
**Command-line:**
```bash
markitdown document.pdf -o output.md
```
See `references/document_conversion.md` for detailed documentation on document-specific features.
### 2. Media Processing
Extract text from images using OCR and transcribe audio files to text.
**Supported formats:**
- Images (JPEG, PNG, GIF, etc.) with EXIF metadata extraction
- Audio files with speech transcription (requires speech_recognition)
**Image with OCR:**
```python
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("image.jpg")
print(result.text_content) # Includes EXIF metadata and OCR text
```
**Audio transcription:**
```python
result = md.convert("audio.wav")
print(result.text_content) # Transcribed speech
```
See `references/media_processing.md` for advanced media handling options.
### 3. Web Content Extraction
Convert web-based content and e-books to Markdown.
**Supported formats:**
- HTML files and web pages
- YouTube video transcripts (via URL)
- EPUB books
- RSS feeds
**YouTube transcript:**
```python
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("https://youtube.com/watch?v=VIDEO_ID")
print(result.text_content)
```
See `references/web_content.md` for web extraction details.
### 4. Structured Data Handling
Convert structured data formats to readable Markdown tables.
**Supported formats:**
- CSV files
- JSON files
- XML files
**CSV to Markdown table:**
```python
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("data.csv")
print(result.text_content) # Formatted as Markdown table
```
See `references/structured_data.md` for format-specific options.
### 5. Advanced Integrations
Enhance conversion quality with AI-powered features.
**Azure Document Intelligence:**
For enhanced PDF processing with better table extraction and layout analysis:
```python
from markitdown import MarkItDown
md = MarkItDown(docintel_endpoint="<endpoint>", docintel_key="<key>")
result = md.convert("complex.pdf")
```
**LLM-Powered Image Descriptions:**
Generate detailed image descriptions using GPT-4o:
```python
from markitdown import MarkItDown
from openai import OpenAI
client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
result = md.convert("presentation.pptx") # Images described with LLM
```
See `references/advanced_integrations.md` for integration details.
### 6. Batch Processing
Process multiple files or entire ZIP archives at once.
**ZIP file processing:**
```python
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("archive.zip")
print(result.text_content) # All files converted and concatenated
```
**Batch script:**
Use the provided batch processing script for directory conversion:
```bash
python scripts/batch_convert.py /path/to/documents /path/to/output
```
See `scripts/batch_convert.py` for implementation details.
## Installation
**Full installation (all features):**
```bash
uv pip install 'markitdown[all]'
```
**Modular installation (specific features):**
```bash
uv pip install 'markitdown[pdf]' # PDF support
uv pip install 'markitdown[docx]' # Word support
uv pip install 'markitdown[pptx]' # PowerPoint support
uv pip install 'markitdown[xlsx]' # Excel support
uv pip install 'markitdown[audio]' # Audio transcription
uv pip install 'markitdown[youtube]' # YouTube transcripts
```
**Requirements:**
- Python 3.10 or higher
## Output Format
MarkItDown produces clean, token-efficient Markdown optimized for LLM consumption:
- Preserves headings, lists, and tables
- Maintains hyperlinks and formatting
- Includes metadata where relevant (EXIF, document properties)
- No temporary files created (streaming approach)
## Common Workflows
**Preparing documents for RAG:**
```python
from markitdown import MarkItDown
md = MarkItDown()
# Convert knowledge base documents
docs = ["manual.pdf", "guide.docx", "faq.html"]
markdown_content = []
for doc in docs:
result = md.convert(doc)
markdown_content.append(result.text_content)
# Now ready for embedding and indexing
```
**Document analysis pipeline:**
```bash
# Convert all PDFs in directory
for file in documents/*.pdf; do
markitdown "$file" -o "markdown/$(basename "$file" .pdf).md"
done
```
## Plugin System
MarkItDown supports extensible plugins for custom conversion logic. Plugins are disabled by default for security:
```python
from markitdown import MarkItDown
# Enable plugins if needed
md = MarkItDown(enable_plugins=True)
```
## Resources
This skill includes comprehensive reference documentation for each capability:
- **references/document_conversion.md** - Detailed PDF, DOCX, PPTX, XLSX conversion options
- **references/media_processing.md** - Image OCR and audio transcription details
- **references/web_content.md** - HTML, YouTube, and EPUB extraction
- **references/structured_data.md** - CSV, JSON, XML conversion formats
- **references/advanced_integrations.md** - Azure Document Intelligence and LLM integration
- **scripts/batch_convert.py** - Batch processing utility for directoriesRelated Skills
zapier-workflows
Manage and trigger pre-built Zapier workflows and MCP tool orchestration. Use when user mentions workflows, Zaps, automations, daily digest, research, search, lead tracking, expenses, or asks to "run" any process. Also handles Perplexity-based research and Google Sheets data tracking.
writing-skills
Create and manage Claude Code skills in HASH repository following Anthropic best practices. Use when creating new skills, modifying skill-rules.json, understanding trigger patterns, working with hooks, debugging skill activation, or implementing progressive disclosure. Covers skill structure, YAML frontmatter, trigger types (keywords, intent patterns), UserPromptSubmit hook, and the 500-line rule. Includes validation and debugging with SKILL_DEBUG. Examples include rust-error-stack, cargo-dependencies, and rust-documentation skills.
writing-plans
Use when design is complete and you need detailed implementation tasks for engineers with zero codebase context - creates comprehensive implementation plans with exact file paths, complete code examples, and verification steps assuming engineer has minimal domain knowledge
workflow-orchestration-patterns
Design durable workflows with Temporal for distributed systems. Covers workflow vs activity separation, saga patterns, state management, and determinism constraints. Use when building long-running processes, distributed transactions, or microservice orchestration.
workflow-management
Create, debug, or modify QStash workflows for data updates and social media posting in the API service. Use when adding new automated jobs, fixing workflow errors, or updating scheduling logic.
workflow-interactive-dev
用于开发 FastGPT 工作流中的交互响应。详细说明了交互节点的架构、开发流程和需要修改的文件。
woocommerce-dev-cycle
Run tests, linting, and quality checks for WooCommerce development. Use when running tests, fixing code style, or following the development workflow.
woocommerce-code-review
Review WooCommerce code changes for coding standards compliance. Use when reviewing code locally, performing automated PR reviews, or checking code quality.
Wheels Migration Generator
Generate database-agnostic Wheels migrations for creating tables, altering schemas, and managing database changes. Use when creating or modifying database schema, adding tables, columns, indexes, or foreign keys. Prevents database-specific SQL and ensures cross-database compatibility.
webapp-testing
Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.
web3-testing
Test smart contracts comprehensively using Hardhat and Foundry with unit tests, integration tests, and mainnet forking. Use when testing Solidity contracts, setting up blockchain test suites, or validating DeFi protocols.
web-research
Use this skill for requests related to web research; it provides a structured approach to conducting comprehensive web research