pdf

Use this skill for PDF generation, conversion, inspection, extraction, editing, form filling, OCR, redaction, or render comparison. Triggers include requests to create a PDF, convert Markdown or HTML or LaTeX or DOCX or PPTX to PDF, extract text or tables or images, fill or inspect forms, OCR scans, compare revisions, or redact content.

465 stars

byphodal

View on GitHub Installation ↓

Best use case

pdf is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using pdf should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/pdf/SKILL.md --create-dirs "https://raw.githubusercontent.com/phodal/routa/main/tools/office-skills/.agents/skills/pdf/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/pdf/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How pdf Compares

Feature / Agent	pdf	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# PDF Skill

Use the repo-local toolkit under `tools/pdfs/`. The default operating loop is:

1. Render to images.
2. Inspect layout visually.
3. Perform the edit, extraction, or generation.
4. Re-render and verify.

## Choose the right authoring path first

Even if the user wants a PDF deliverable, PDF is not always the right authoring format.

- Text-heavy business docs: author in DOCX first, then convert with `python3 tools/pdfs/scripts/lo_convert_to_pdf.py ...`
- Slide-like visual layouts: author in PPTX first, then export to PDF
- Direct PDF generation or low-level edits: use this toolkit

If you are hand-tuning line breaks in a programmatically generated PDF, stop and reconsider whether DOCX or PPTX is the better source format.

## Core loop

Render before and after any meaningful change:

```bash
python3 tools/pdfs/scripts/render_pdf.py input.pdf --out_dir /tmp/pdf-renders-in --dpi 200
python3 tools/pdfs/scripts/compare_renders.py before.pdf after.pdf --out_dir /tmp/pdf-diff --dpi 200
```

Rendered PNGs are the source of truth for layout QA. Do not trust extracted text alone for tables, forms, spacing, or clipping.

## Common workflows

### Inspect / extract

```bash
python3 tools/pdfs/scripts/pdf_inspect.py input.pdf
python3 tools/pdfs/scripts/pdf_extract.py text input.pdf --method pdfplumber
python3 tools/pdfs/scripts/pdf_extract.py tables input.pdf
python3 tools/pdfs/scripts/pdf_extract.py forms input.pdf --include_widgets
```

### Edit / normalize

```bash
python3 tools/pdfs/scripts/pdf_edit.py paginate input.pdf -o output.pdf
python3 tools/pdfs/scripts/pdf_edit.py merge a.pdf b.pdf -o merged.pdf
python3 tools/pdfs/scripts/pdf_edit.py rotate input.pdf -o rotated.pdf --pages 1 --degrees 90
python3 tools/pdfs/scripts/pdf_preflight.py input.pdf
```

### Redact / OCR

```bash
python3 tools/pdfs/scripts/pdf_redact.py text input.pdf redacted.pdf --text "secret" --ignore_case
python3 tools/pdfs/scripts/ocr_pdf.py scan.pdf -o searchable.pdf --force
```

### Create / convert

```bash
python3 tools/pdfs/scripts/md_to_pdf.py input.md -o output.pdf
python3 tools/pdfs/scripts/html_to_pdf.py input.html -o output.pdf
python3 tools/pdfs/scripts/latex_to_pdf.py input.tex -o output.pdf
python3 tools/pdfs/scripts/lo_convert_to_pdf.py input.docx -o output.pdf
```

### Forms

Best-effort Python path:

```bash
python3 tools/pdfs/scripts/pdf_edit.py fill-form in.pdf --values values.json -o out.pdf
```

If the form is stubborn, use the Node helpers:

```bash
bash tools/pdfs/js/install_deps.sh
node tools/pdfs/js/extract_form_fields.mjs --input in.pdf
node tools/pdfs/js/fill_form.mjs --input in.pdf --values values.json --output out.pdf --flatten
```

## Quality bar for generated PDFs

- No clipped text, overlaps, broken glyphs, or boundary-hugging table content
- Verify visually after each material change
- Prefer generous spacing and intentional column widths over dense layouts
- Keep captions, tables, and figures visually paired
- For tricky forms, verify in two renderers when possible

## Load extra references only when needed

- `tools/pdfs/tasks/js_tools.md`: Node helpers for forms and PDF.js extraction
- `tools/pdfs/tasks/forms_debugging.md`: widget-level debugging workflow
- `tools/pdfs/troubleshooting/common.md`: renderer and OCR troubleshooting
- `tools/pdfs/examples/smoke_test.md`: runnable smoke flows

## Toolkit map

- `tools/pdfs/scripts/render_pdf.py`: render PDF pages to PNGs
- `tools/pdfs/scripts/compare_renders.py`: render and diff two PDFs
- `tools/pdfs/scripts/pdf_inspect.py`: metadata and structure overview
- `tools/pdfs/scripts/pdf_extract.py`: text, tables, images, attachments, annotations, forms
- `tools/pdfs/scripts/pdf_edit.py`: merge, split, rotate, crop, paginate, encrypt, optimize, fill-form
- `tools/pdfs/scripts/pdf_preflight.py`: warnings and normalization hints
- `tools/pdfs/scripts/pdf_redact.py`: true redaction
- `tools/pdfs/scripts/ocr_pdf.py`: OCR wrapper
- `tools/pdfs/scripts/md_to_pdf.py`: Markdown to PDF
- `tools/pdfs/scripts/html_to_pdf.py`: HTML to PDF
- `tools/pdfs/scripts/latex_to_pdf.py`: LaTeX to PDF
- `tools/pdfs/scripts/lo_convert_to_pdf.py`: LibreOffice-based conversion
- `tools/pdfs/js/*.mjs`: PDF.js and pdf-lib helpers

## Final deliverable expectations

- Keep only the final PDF in the requested output location unless the user asked for intermediates.
- When the task is layout-sensitive, include a quick render verification pass before stopping.
- Prefer ASCII `-` over typographic dashes in generated content when renderer compatibility is uncertain.

Related Skills

spreadsheets

465

from phodal/routa

Use this skill for spreadsheet creation, editing, analysis, formatting, formula modeling, charting, or workbook review. Triggers include requests to create or modify an .xlsx file, build a model or tracker, format a workbook, add formulas or charts, or prepare a shareable spreadsheet deliverable.

slide

465

from phodal/routa

Use this skill as reference material when creating or editing presentation slide decks.

docx

465

from phodal/routa

Use this skill for creating, editing, and reviewing DOCX files, including generation, formatting, content controls, tracked changes, comments, accessibility checks, redaction, rendering, and diff-based QA workflows.

pr-verify

465

from phodal/routa

Comprehensive PR verification skill. Analyzes PR body requirements, reviews comments, checks CI status, and performs E2E testing. Use when a PR is ready for final verification before merge.

playwright-cli

465

from phodal/routa

Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.

issue-garbage-collector

465

from phodal/routa

Two-phase cleanup of duplicate and outdated issue files in docs/issues/. Phase 1 uses Python script for fast pattern matching. Phase 2 uses claude -p for semantic analysis on suspects only.

issue-enricher

465

from phodal/routa

Transforms rough requirements into well-structured GitHub issues. Use when the user provides a vague idea, feature request, or problem description and wants to create a GitHub issue. Analyzes codebase, explores solution approaches, researches relevant libraries, and generates actionable issues using `gh` CLI.

evolution-architecture-review

465

from phodal/routa

Multi-agent architecture evolvability review for this repository. Use when the user wants to analyze current architecture quality, evolvability, fitness functions, coupling, boundary clarity, delivery flow, or phased evolution strategy. Designed to be invoked from Claude Code with prompts like `/evolution-architecture-review analyze the current architecture evolvability`.

slack

465

from phodal/routa

Interact with Slack workspaces using browser automation. Use when the user needs to check unread channels, navigate Slack, send messages, extract data, find information, search conversations, or automate any Slack task. Triggers include "check my Slack", "what channels have unreads", "send a message to", "search Slack for", "extract from Slack", "find who said", or any task requiring programmatic Slack interaction.

electron

465

from phodal/routa

Automate Electron desktop apps (VS Code, Slack, Discord, Figma, Notion, Spotify, etc.) using agent-browser via Chrome DevTools Protocol. Use when the user needs to interact with an Electron app, automate a desktop app, connect to a running app, control a native app, or test an Electron application. Triggers include "automate Slack app", "control VS Code", "interact with Discord app", "test this Electron app", "connect to desktop app", or any task requiring automation of a native Electron application.

dogfood

465

from phodal/routa

Systematically explore and test a web application to find bugs, UX issues, and other problems. Use when asked to "dogfood", "QA", "exploratory test", "find issues", "bug hunt", "test this app/site/platform", or review the quality of a web application. Produces a structured report with full reproduction evidence -- step-by-step screenshots, repro videos, and detailed repro steps for every issue -- so findings can be handed directly to the responsible teams.

agent-browser

465

from phodal/routa

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.