office-docs

Comprehensive document processing for Microsoft Word (.docx) and WPS Office files. Use when Codex needs to work with professional documents for: (1) Creating new documents, (2) Modifying or editing content, (3) Converting between formats, (4) Extracting text and metadata, (5) Troubleshooting document issues, (6) Batch processing documents, or any other Office document tasks.

3,891 stars

Best use case

office-docs is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Comprehensive document processing for Microsoft Word (.docx) and WPS Office files. Use when Codex needs to work with professional documents for: (1) Creating new documents, (2) Modifying or editing content, (3) Converting between formats, (4) Extracting text and metadata, (5) Troubleshooting document issues, (6) Batch processing documents, or any other Office document tasks.

Teams using office-docs should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/office-docs/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/baiyunrei2025/office-docs/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/office-docs/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How office-docs Compares

Feature / Agentoffice-docsStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Comprehensive document processing for Microsoft Word (.docx) and WPS Office files. Use when Codex needs to work with professional documents for: (1) Creating new documents, (2) Modifying or editing content, (3) Converting between formats, (4) Extracting text and metadata, (5) Troubleshooting document issues, (6) Batch processing documents, or any other Office document tasks.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Office Documents Skill

This skill provides comprehensive tools and workflows for working with Microsoft Word (.docx) and WPS Office documents. It covers creation, editing, conversion, analysis, and troubleshooting of professional documents.

## Quick Start

### Basic Operations

**Read document content:**
```python
# Use python-docx for .docx files
from docx import Document
doc = Document('document.docx')
text = '\n'.join([paragraph.text for paragraph in doc.paragraphs])
```

**Create new document:**
```python
from docx import Document
from docx.shared import Inches

doc = Document()
doc.add_heading('Document Title', 0)
doc.add_paragraph('This is a new paragraph.')
doc.save('new_document.docx')
```

### Common Tasks

1. **Text extraction** - See [TEXT_EXTRACTION.md](references/TEXT_EXTRACTION.md)
2. **Format conversion** - See [CONVERSION.md](references/CONVERSION.md)
3. **Document analysis** - See [ANALYSIS.md](references/ANALYSIS.md)
4. **Troubleshooting** - See [TROUBLESHOOTING.md](references/TROUBLESHOOTING.md)

## Core Tools and Libraries

### Python Libraries

**For .docx files:**
- `python-docx` - Primary library for reading/writing .docx
- `docx2txt` - Simple text extraction
- `docxcompose` - Advanced document composition
- `docx-mailmerge` - Mail merge functionality

**For WPS files:**
- `pywps` - WPS file manipulation (when available)
- Conversion to .docx first recommended

**For format conversion:**
- `pandoc` - Universal document converter
- `libreoffice` - Office suite for conversion
- `unoconv` - Universal office converter

### Command Line Tools

**Document conversion:**
```bash
# Convert .docx to PDF
libreoffice --headless --convert-to pdf document.docx

# Convert .docx to text
pandoc document.docx -o document.txt

# Batch convert WPS to .docx
for file in *.wps; do libreoffice --headless --convert-to docx "$file"; done
```

**Document analysis:**
```bash
# Extract metadata
exiftool document.docx

# Check file integrity
file document.docx
```

## Workflows

### 1. Document Creation Workflow

When creating new documents:

1. **Choose template** - Start from template or create from scratch
2. **Add structure** - Headings, paragraphs, lists
3. **Apply formatting** - Styles, fonts, spacing
4. **Add elements** - Tables, images, hyperlinks
5. **Finalize** - Page setup, headers/footers, save

See [CREATION.md](references/CREATION.md) for detailed patterns.

### 2. Document Editing Workflow

When modifying existing documents:

1. **Backup original** - Always create backup first
2. **Analyze structure** - Understand document layout
3. **Make changes** - Edit content, update formatting
4. **Preserve formatting** - Maintain original styles
5. **Validate** - Check for corruption, save new version

See [EDITING.md](references/EDITING.md) for detailed patterns.

### 3. Conversion Workflow

When converting between formats:

1. **Identify source format** - .docx, .wps, .doc, .rtf, etc.
2. **Choose conversion tool** - Based on format and requirements
3. **Convert** - With appropriate options
4. **Verify** - Check content preservation
5. **Clean up** - Remove temporary files

See [CONVERSION.md](references/CONVERSION.md) for detailed patterns.

## Common Issues and Solutions

### 1. Corrupted Documents

**Symptoms:** Won't open, error messages, missing content

**Solutions:**
- Try opening in different application
- Use recovery mode in Word/WPS
- Extract content with `python-docx` ignoring errors
- Convert to different format and back

See [TROUBLESHOOTING.md](references/TROUBLESHOOTING.md#corruption) for detailed recovery procedures.

### 2. Formatting Issues

**Symptoms:** Wrong fonts, broken layout, missing styles

**Solutions:**
- Check style definitions
- Verify font availability
- Use template-based approach
- Simplify complex formatting

### 3. Compatibility Problems

**Symptoms:** Different appearance in Word vs WPS, missing features

**Solutions:**
- Stick to common features
- Test in both applications
- Use standard formats
- Provide alternative versions

## Advanced Features

### Document Automation

**Batch processing:**
```python
import os
from docx import Document

def process_documents(folder_path):
    for filename in os.listdir(folder_path):
        if filename.endswith('.docx'):
            doc_path = os.path.join(folder_path, filename)
            process_single_document(doc_path)
```

**Template-based generation:**
```python
from docx import Document

def generate_from_template(template_path, data):
    doc = Document(template_path)
    # Replace placeholders with data
    for paragraph in doc.paragraphs:
        for key, value in data.items():
            if f'{{{{ {key} }}}}' in paragraph.text:
                paragraph.text = paragraph.text.replace(f'{{{{ {key} }}}}', value)
    return doc
```

### Document Analysis

**Extract statistics:**
```python
def analyze_document(doc_path):
    doc = Document(doc_path)
    stats = {
        'paragraphs': len(doc.paragraphs),
        'tables': len(doc.tables),
        'images': len(doc.inline_shapes),
        'sections': len(doc.sections),
        'styles': len(doc.styles)
    }
    return stats
```

**Check formatting consistency:**
```python
def check_formatting(doc):
    issues = []
    for i, para in enumerate(doc.paragraphs):
        if para.style.name == 'Normal' and para.text.strip():
            # Check for inconsistent formatting
            if len(para.runs) > 1:
                issues.append(f"Paragraph {i}: Multiple runs in Normal style")
    return issues
```

## Best Practices

### 1. Always Backup
```python
import shutil
import os

def backup_document(filepath):
    backup_path = filepath + '.backup'
    shutil.copy2(filepath, backup_path)
    return backup_path
```

### 2. Use Version Control
- Save incremental versions
- Use descriptive filenames
- Document changes made

### 3. Test Thoroughly
- Test in target application
- Verify all content preserved
- Check formatting integrity

### 4. Handle Errors Gracefully
```python
try:
    doc = Document(filepath)
except Exception as e:
    print(f"Error opening {filepath}: {e}")
    # Try alternative methods
    return extract_text_fallback(filepath)
```

## Reference Files

For detailed information on specific topics, consult these reference files:

- [TEXT_EXTRACTION.md](references/TEXT_EXTRACTION.md) - Text extraction methods and patterns
- [CONVERSION.md](references/CONVERSION.md) - Format conversion guides
- [ANALYSIS.md](references/ANALYSIS.md) - Document analysis techniques
- [TROUBLESHOOTING.md](references/TROUBLESHOOTING.md) - Common issues and solutions
- [CREATION.md](references/CREATION.md) - Document creation patterns
- [EDITING.md](references/EDITING.md) - Document editing workflows
- [AUTOMATION.md](references/AUTOMATION.md) - Automation scripts and templates

## Scripts

Available scripts in the `scripts/` directory:

- `extract_text.py` - Extract text from .docx files
- `convert_format.py` - Convert between document formats
- `batch_process.py` - Process multiple documents
- `document_stats.py` - Generate document statistics
- `repair_document.py` - Attempt to repair corrupted documents

Run scripts with appropriate parameters:
```bash
python scripts/extract_text.py input.docx output.txt
```

## Getting Help

If you encounter issues not covered in this skill:

1. Check the relevant reference file
2. Search for specific error messages
3. Try alternative approaches
4. Consider converting to simpler format

Remember: When in doubt, create a backup and work on a copy.

Related Skills

star-office-ui

3891
from openclaw/skills

Star Office UI 一键化 Skill:帮主人快速部署像素办公室看板,支持多 Agent 加入、状态可视化、移动端查看与公网访问。

Workflow & Productivity

docs-pipeline-automation

3891
from openclaw/skills

Build repeatable data-to-Docs pipelines from Sheets and Drive sources. Use for automated status reports, template-based document assembly, and scheduled publishing workflows.

Workflow & Productivity

office-hours

3891
from openclaw/skills

Structured brainstorming session — two modes. Startup mode: six forcing questions that expose demand reality, status quo, desperate specificity, narrowest wedge, observation, and future-fit. Builder mode: design thinking for side projects, hackathons, learning, and open source. Produces a design doc. Use when: "brainstorm this", "I have an idea", "help me think through this", "office hours", "is this worth building". Use before plan-ceo-review or plan-eng-review.

compliance-officer

3891
from openclaw/skills

Reviews marketing content against FTC, HIPAA, GDPR, SEC 482, SEC Marketing, CCPA, COPPA, and CAN-SPAM — 208 specific laws with URLs.

clawd-docs-v2

3891
from openclaw/skills

Smart ClawdBot documentation access with local search index, cached snippets, and on-demand fetch. Token-efficient and freshness-aware.

diataxis-docs-framework

3891
from openclaw/skills

Enterprise technical documentation best practices, patterns, and frameworks for developer and partner adoption. Covers content architecture (Diataxis four quadrants), 14 content types (tutorials, how-to guides, API reference, SDK docs, migration guides, changelogs, runbooks, integration guides, troubleshooting, architecture docs), pluggable writing styles (Diataxis, Google, Microsoft, Stripe, Canonical, Minimal), information architecture, docs-as-code workflows, documentation audit, anti-patterns checklist, and developer experience (DX) strategy. 27 rules, 5 references, 6 style guides. Baseline: Diataxis + Google OpenDocs + Good Docs Project. Triggers on: "write docs", "document this", "API docs", "developer docs", "migration guide", "changelog", "tutorial", "how-to guide", "reference docs", "documentation strategy", "docs audit", "information architecture", "developer experience", "partner docs", "SDK documentation", "runbook", "troubleshooting guide", "integration guide", "quickstart", "getting started", "technical writing", "docs-as-code", "DX", mentions of "Diataxis", "Good Docs Project", or "Google OpenDocs".

tutorial-docs

3891
from openclaw/skills

Tutorial patterns for documentation - learning-oriented guides that teach through guided doing

explanation-docs

3891
from openclaw/skills

Explanation documentation patterns for understanding-oriented content - conceptual guides that explain why things work the way they do

elixir-writing-docs

3891
from openclaw/skills

Guides writing Elixir documentation with @moduledoc, @doc, @typedoc, doctests, cross-references, and metadata. Use when adding or improving documentation in .ex files.

elixir-docs-review

3891
from openclaw/skills

Reviews Elixir documentation for completeness, quality, and ExDoc best practices. Use when auditing @moduledoc, @doc, @spec coverage, doctest correctness, and cross-reference usage in .ex files.

docs-style

3891
from openclaw/skills

Core technical documentation writing principles for voice, tone, structure, and LLM-friendly patterns. Use when writing or reviewing any documentation.

openai-docs-skill

3891
from openclaw/skills

Query the OpenAI developer documentation via the OpenAI Docs MCP server using CLI (curl/jq). Use whenever a task involves the OpenAI API (Responses, Chat Completions, Realtime, etc.), OpenAI SDKs, ChatGPT Apps SDK, Codex, MCP integrations, endpoint schemas, parameters, limits, or migrations and you need up-to-date official guidance.