working-with-documents

Creates and edits Office documents: Word (.docx), PDF, and PowerPoint (.pptx). Use when working with document creation, PDF manipulation, presentation generation, tracked changes, or converting between formats.

242 stars

byaiskillstore

View on GitHub Installation ↓

Best use case

working-with-documents is best used when you need a repeatable AI agent workflow instead of a one-off prompt. It is especially useful for teams working in multi. Creates and edits Office documents: Word (.docx), PDF, and PowerPoint (.pptx). Use when working with document creation, PDF manipulation, presentation generation, tracked changes, or converting between formats.

Users should expect a more consistent workflow output, faster repeated execution, and less time spent rewriting prompts from scratch.

Practical example

Example input

Use the "working-with-documents" skill to help with this workflow task. Context: Creates and edits Office documents: Word (.docx), PDF, and PowerPoint (.pptx).
Use when working with document creation, PDF manipulation, presentation generation,
tracked changes, or converting between formats.

Example output

A structured workflow result with clearer steps, more consistent formatting, and an output that is easier to reuse in the next run.

When to use this skill

Use this skill when you want a reusable workflow rather than writing the same prompt again and again.

When not to use this skill

Do not use this when you only need a one-off answer and do not need a reusable workflow.
Do not use it if you cannot install or maintain the related files, repository context, or supporting tools.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/working-with-documents/SKILL.md --create-dirs "https://raw.githubusercontent.com/aiskillstore/marketplace/main/skills/asmayaseen/working-with-documents/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/working-with-documents/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How working-with-documents Compares

Feature / Agent	working-with-documents	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Working with Documents

## Quick Reference

| Format | Read | Create | Edit |
|--------|------|--------|------|
| DOCX | pandoc, python-docx | docx-js | OOXML (unpack/edit/pack) |
| PDF | pdfplumber, pypdf | reportlab | pypdf (merge/split) |
| PPTX | markitdown | html2pptx | OOXML (unpack/edit/pack) |

## Word Documents (.docx)

### Reading Content

```bash
# Convert to markdown (preserves structure)
pandoc document.docx -o output.md

# With tracked changes visible
pandoc --track-changes=all document.docx -o output.md
```

### Creating New Documents

Use **docx-js** (JavaScript):

```javascript
const { Document, Packer, Paragraph, TextRun } = require('docx');

const doc = new Document({
  sections: [{
    children: [
      new Paragraph({
        children: [
          new TextRun({ text: "Hello World", bold: true }),
        ],
      }),
    ],
  }],
});

Packer.toBuffer(doc).then(buffer => {
  fs.writeFileSync("output.docx", buffer);
});
```

### Editing Existing Documents (Tracked Changes)

```bash
# 1. Unpack
python ooxml/scripts/unpack.py document.docx unpacked/

# 2. Edit XML files in unpacked/word/document.xml
# Key files:
#   - word/document.xml (main content)
#   - word/comments.xml (comments)
#   - word/media/ (images)

# 3. Pack
python ooxml/scripts/pack.py unpacked/ edited.docx
```

**Tracked changes XML pattern:**
```xml
<!-- Deletion -->
<w:del><w:r><w:delText>old text</w:delText></w:r></w:del>

<!-- Insertion -->
<w:ins><w:r><w:t>new text</w:t></w:r></w:ins>
```

## PDF Documents

### Reading PDFs

```python
import pdfplumber

# Extract text
with pdfplumber.open("document.pdf") as pdf:
    for page in pdf.pages:
        print(page.extract_text())

# Extract tables
with pdfplumber.open("document.pdf") as pdf:
    for page in pdf.pages:
        tables = page.extract_tables()
        for table in tables:
            for row in table:
                print(row)
```

### Creating PDFs

```python
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph
from reportlab.lib.styles import getSampleStyleSheet

doc = SimpleDocTemplate("output.pdf", pagesize=letter)
styles = getSampleStyleSheet()
story = [
    Paragraph("Report Title", styles['Title']),
    Paragraph("Body text goes here.", styles['Normal']),
]
doc.build(story)
```

### Merging/Splitting PDFs

```python
from pypdf import PdfReader, PdfWriter

# Merge
writer = PdfWriter()
for pdf_file in ["doc1.pdf", "doc2.pdf"]:
    reader = PdfReader(pdf_file)
    for page in reader.pages:
        writer.add_page(page)
writer.write(open("merged.pdf", "wb"))

# Split
reader = PdfReader("input.pdf")
for i, page in enumerate(reader.pages):
    writer = PdfWriter()
    writer.add_page(page)
    writer.write(open(f"page_{i+1}.pdf", "wb"))
```

### Command-Line Tools

```bash
# Extract text
pdftotext input.pdf output.txt
pdftotext -layout input.pdf output.txt  # Preserve layout

# Merge with qpdf
qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf

# Split pages
qpdf input.pdf --pages . 1-5 -- pages1-5.pdf
```

## PowerPoint Presentations (.pptx)

### Reading Content

```bash
# Convert to markdown
python -m markitdown presentation.pptx
```

### Creating New Presentations

Use **html2pptx** workflow:

1. Create HTML slides (720pt × 405pt for 16:9)
2. Convert with html2pptx.js library
3. Validate with thumbnail grid

```bash
# Create thumbnails for validation
python scripts/thumbnail.py output.pptx --cols 4
```

### Editing Existing Presentations

```bash
# 1. Unpack
python ooxml/scripts/unpack.py presentation.pptx unpacked/

# Key files:
#   - ppt/slides/slide1.xml, slide2.xml, etc.
#   - ppt/notesSlides/ (speaker notes)
#   - ppt/media/ (images)

# 2. Edit XML

# 3. Validate
python ooxml/scripts/validate.py unpacked/ --original presentation.pptx

# 4. Pack
python ooxml/scripts/pack.py unpacked/ edited.pptx
```

### Rearranging Slides

```bash
# Duplicate, reorder, delete slides
python scripts/rearrange.py template.pptx output.pptx 0,3,3,5,7
# Creates: slide 0, slide 3 (twice), slide 5, slide 7
```

## Converting Between Formats

```bash
# DOCX/PPTX to PDF
soffice --headless --convert-to pdf document.docx

# PDF to images
pdftoppm -jpeg -r 150 document.pdf page
# Creates: page-1.jpg, page-2.jpg, etc.

# DOCX to Markdown
pandoc document.docx -o output.md
```

## OCR for Scanned Documents

```python
import pytesseract
from pdf2image import convert_from_path

images = convert_from_path('scanned.pdf')
text = ""
for image in images:
    text += pytesseract.image_to_string(image)
```

## Design Guidelines (Presentations)

### Color Palettes

Pick 3-5 colors that work together:

| Palette | Colors |
|---------|--------|
| Classic Blue | Navy #1C2833, Slate #2E4053, Silver #AAB7B8 |
| Teal & Coral | Teal #5EA8A7, Coral #FE4447, White #FFFFFF |
| Black & Gold | Gold #BF9A4A, Black #000000, Cream #F4F6F6 |

### Web-Safe Fonts Only

Arial, Helvetica, Times New Roman, Georgia, Verdana, Tahoma, Trebuchet MS, Courier New, Impact

### Layout Rules

- Two-column: Use for exactly 2 distinct items
- Three-column: Use for exactly 3 items
- Never vertically stack charts below text
- Full-bleed images with text overlays work well

## Dependencies

```bash
# Python
pip install pypdf pdfplumber reportlab python-docx openpyxl

# System tools
apt-get install pandoc poppler-utils libreoffice

# Node.js (for docx-js)
npm install docx
```

## Verification

Run: `python scripts/verify.py`

## Related Skills

- `working-with-spreadsheets` - Excel file handling
- `building-nextjs-apps` - Frontend for document uploads

Related Skills

hybrid-cloud-networking

242

from aiskillstore/marketplace

Configure secure, high-performance connectivity between on-premises infrastructure and cloud platforms using VPN and dedicated connections. Use when building hybrid cloud architectures, connecting data centers to cloud, or implementing secure cross-premises networking.

azure-search-documents-ts

242

from aiskillstore/marketplace

Build search applications using Azure AI Search SDK for JavaScript (@azure/search-documents). Use when creating/managing indexes, implementing vector/hybrid search, semantic ranking, or building agentic retrieval with knowledge bases.

azure-search-documents-py

242

from aiskillstore/marketplace

Azure AI Search SDK for Python. Use for vector search, hybrid search, semantic ranking, indexing, and skillsets. Triggers: "azure-search-documents", "SearchClient", "SearchIndexClient", "vector search", "hybrid search", "semantic search".

azure-search-documents-dotnet

242

from aiskillstore/marketplace

Azure AI Search SDK for .NET (Azure.Search.Documents). Use for building search applications with full-text, vector, semantic, and hybrid search. Covers SearchClient (queries, document CRUD), SearchIndexClient (index management), and SearchIndexerClient (indexers, skillsets). Triggers: "Azure Search .NET", "SearchClient", "SearchIndexClient", "vector search C#", "semantic search .NET", "hybrid search", "Azure.Search.Documents".

working-with-spreadsheets

242

from aiskillstore/marketplace

Creates and edits Excel spreadsheets with formulas, formatting, and financial modeling standards. Use when working with .xlsx files, financial models, data analysis, or formula-heavy spreadsheets. Covers formula recalculation, color coding standards, and common pitfalls.

working-on-ancplua-plugins

242

from aiskillstore/marketplace

Primary instruction manual for working within the ancplua-claude-plugins monorepo. Use when creating, modifying, or debugging plugins in this repository.

p2p-networking

240

from aiskillstore/marketplace

Peer-to-peer networking patterns using commonware for building decentralized Guts network

azure-quotas

242

from aiskillstore/marketplace

Check/manage Azure quotas and usage across providers. For deployment planning, capacity validation, region selection. WHEN: "check quotas", "service limits", "current usage", "request quota increase", "quota exceeded", "validate capacity", "regional availability", "provisioning limits", "vCPU limit", "how many vCPUs available in my subscription".

DevOps & Infrastructure

raindrop-io

242

from aiskillstore/marketplace

Manage Raindrop.io bookmarks with AI assistance. Save and organize bookmarks, search your collection, manage reading lists, and organize research materials. Use when working with bookmarks, web research, reading lists, or when user mentions Raindrop.io.

Data & Research

zlibrary-to-notebooklm

242

from aiskillstore/marketplace

自动从 Z-Library 下载书籍并上传到 Google NotebookLM。支持 PDF/EPUB 格式，自动转换，一键创建知识库。

discover-skills

242

from aiskillstore/marketplace

当你发现当前可用的技能都不够合适（或用户明确要求你寻找技能）时使用。本技能会基于任务目标和约束，给出一份精简的候选技能清单，帮助你选出最适配当前任务的技能。

web-performance-seo

242

from aiskillstore/marketplace

Fix PageSpeed Insights/Lighthouse accessibility "!" errors caused by contrast audit failures (CSS filters, OKLCH/OKLAB, low opacity, gradient text, image backgrounds). Use for accessibility-driven SEO/performance debugging and remediation.