pdf-smart-split
Intelligent PDF splitting skill. Split multi-page scanned PDF documents into separate files based on content analysis. Features: (1) Convert PDF to high-quality images (2) Analyze and intelligently name images by content (3) Auto-detect and remove duplicate pages (4) Identify and reorder scrambled pages (5) Merge related images into separate PDFs by content (6) Generate summary reports. Use for: bond filing documents, application materials, contracts and agreements that need intelligent content-based splitting.
Best use case
pdf-smart-split is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Intelligent PDF splitting skill. Split multi-page scanned PDF documents into separate files based on content analysis. Features: (1) Convert PDF to high-quality images (2) Analyze and intelligently name images by content (3) Auto-detect and remove duplicate pages (4) Identify and reorder scrambled pages (5) Merge related images into separate PDFs by content (6) Generate summary reports. Use for: bond filing documents, application materials, contracts and agreements that need intelligent content-based splitting.
Teams using pdf-smart-split should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/pdf-smart-split/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How pdf-smart-split Compares
| Feature / Agent | pdf-smart-split | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Intelligent PDF splitting skill. Split multi-page scanned PDF documents into separate files based on content analysis. Features: (1) Convert PDF to high-quality images (2) Analyze and intelligently name images by content (3) Auto-detect and remove duplicate pages (4) Identify and reorder scrambled pages (5) Merge related images into separate PDFs by content (6) Generate summary reports. Use for: bond filing documents, application materials, contracts and agreements that need intelligent content-based splitting.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# PDF Smart Split Skill
Split multi-page scanned PDF documents into separate files based on content analysis. Supports duplicate detection, page reordering, and intelligent naming.
## Workflow
### 1. PDF to Images
```python
from pdf2image import convert_from_path
images = convert_from_path(pdf_path, dpi=200, thread_count=4)
```
- DPI=200 for quality
- Multi-threading for speed
### 2. Duplicate Detection
```python
import hashlib
def get_image_hash(path):
with open(path, 'rb') as f:
return hashlib.md5(f.read()).hexdigest()
```
- Use MD5 hash comparison
- Record duplicates for summary
### 3. Content Analysis and Naming
Use `view_file` tool to analyze each image:
- File type (cover, content, signature page)
- Document category (application, agreement, statement)
- Page sequence number
Naming format: `sequence_filetype_description.jpg`
### 4. Page Reordering
Analyze content continuity:
- Check table sequence numbers
- Check chapter numbering progression
- Ensure signature pages follow content
Reorder by logical content, not physical order.
### 5. Merge Images to PDF
```python
from PIL import Image
def images_to_pdf(image_paths, output_pdf):
images = [Image.open(p).convert('RGB') for p in image_paths]
images[0].save(output_pdf, "PDF", save_all=True, append_images=images[1:])
```
- Merge signature pages with their content
- Combine continuous content into single files
### 6. Generate Summary
Summary should include:
- Source file info (name, original pages)
- Duplicate removal records (if any)
- Page reorder records (if any)
- Split file list (name, original pages, page count, description)
- Output directory paths
## Script Usage
### Full Processing
```bash
python scripts/convert_dedup.py <pdf_path> [output_dir]
python scripts/merge_images.py <image1> <image2> ... <output.pdf>
```
## Output Structure
```
filename_images/ # Intelligently named images
filename_split_PDF/ # Content-grouped PDFs
```
## Requirements
- Install: `pdf2image`, `Pillow`
- Windows needs `poppler`: `conda install -c conda-forge poppler`Related Skills
error-diagnostics-smart-debug
Use when working with error diagnostics smart debug
Smart Contracts
Smart contracts are self-executing programs on blockchain. This guide covers Solidity basics, contract deployment, interaction, and frontend integration for building decentralized applications with au
bgo
Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.
large-data-with-dask
Specific optimization strategies for Python scripts working with larger-than-memory datasets via Dask.
langsmith-fetch
Debug LangChain and LangGraph agents by fetching execution traces from LangSmith Studio. Use when debugging agent behavior, investigating errors, analyzing tool calls, checking memory operations, or examining agent performance. Automatically fetches recent traces and analyzes execution patterns. Requires langsmith-fetch CLI installed.
langchain-tool-calling
How chat models call tools - includes bind_tools, tool choice strategies, parallel tool calling, and tool message handling
langchain-notes
LangChain 框架学习笔记 - 快速查找概念、代码示例和最佳实践。包含 Core components、Middleware、Advanced usage、Multi-agent patterns、RAG retrieval、Long-term memory 等主题。当用户询问 LangChain、Agent、RAG、向量存储、工具使用、记忆系统时使用此 Skill。
langchain-js
Builds LLM-powered applications with LangChain.js for chat, agents, and RAG. Use when creating AI applications with chains, memory, tools, and retrieval-augmented generation in JavaScript.
langchain-agents
Expert guidance for building LangChain agents with proper tool binding, memory, and configuration. Use when creating agents, configuring models, or setting up tool integrations in LangConfig.
lang-python
Python 3.13+ development specialist covering FastAPI, Django, async patterns, data science, testing with pytest, and modern Python features. Use when developing Python APIs, web applications, data pipelines, or writing tests.
kramme:agents-md
This skill should be used when the user asks to "update AGENTS.md", "add to AGENTS.md", "maintain agent docs", or needs to add guidelines to agent instructions. Guides discovery of local skills and enforces structured, keyword-based documentation style.
kontent-ai-automation
Automate Kontent AI tasks via Rube MCP (Composio). Always search tools first for current schemas.