Paper Slide Deck Generator

Transform academic papers and content into professional slide deck images with automatic figure extraction.

25 stars

Best use case

Paper Slide Deck Generator is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Transform academic papers and content into professional slide deck images with automatic figure extraction.

Teams using Paper Slide Deck Generator should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/paper-slide-deck/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/luwill/research-skills/paper-slide-deck/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/paper-slide-deck/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How Paper Slide Deck Generator Compares

Feature / Agent	Paper Slide Deck Generator	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Transform academic papers and content into professional slide deck images with automatic figure extraction.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Paper Slide Deck Generator

Transform academic papers and content into professional slide deck images with automatic figure extraction.

## Usage

```bash
/paper-slide-deck path/to/paper.pdf
/paper-slide-deck path/to/paper.pdf --style academic-paper
/paper-slide-deck path/to/content.md --style sketch-notes
/paper-slide-deck path/to/content.md --audience executives
/paper-slide-deck path/to/content.md --lang zh
/paper-slide-deck path/to/content.md --slides 10
/paper-slide-deck path/to/content.md --outline-only
/paper-slide-deck  # Then paste content
```

## Script Directory

**Important**: All scripts are located in the `scripts/` subdirectory of this skill.

**Agent Execution Instructions**:
1. Determine this SKILL.md file's directory path as `SKILL_DIR`
2. Script path = `${SKILL_DIR}/scripts/<script-name>.ts`
3. Replace all `${SKILL_DIR}` in this document with the actual path

**Script Reference**:
| Script | Purpose |
|--------|---------|
| `scripts/generate-slides.py` | Generate AI slides via Gemini API (Python) |
| `scripts/merge-to-pptx.ts` | Merge slides into PowerPoint |
| `scripts/merge-to-pdf.ts` | Merge slides into PDF |
| `scripts/detect-figures.ts` | Auto-detect figures/tables in PDF |
| `scripts/extract-figure.ts` | Extract figure from PDF page (uses PyMuPDF fallback) |
| `scripts/apply-template.ts` | Apply figure container template |

## Options

| Option | Description |
|--------|-------------|
| `--style <name>` | Visual style (see Style Gallery) |
| `--audience <type>` | Target audience: beginners, intermediate, experts, executives, general |
| `--lang <code>` | Output language (en, zh, ja, etc.) |
| `--slides <number>` | Target slide count |
| `--outline-only` | Generate outline only, skip image generation |

## Style Gallery

| Style | Description | Best For |
|-------|-------------|----------|
| `academic-paper` | Clean professional, precise charts | Conference talks, thesis defense |
| `blueprint` (Default) | Technical schematics, grid texture | Architecture, system design |
| `chalkboard` | Black chalkboard, colorful chalk | Education, tutorials, classroom |
| `notion` | SaaS dashboard, card-based layouts | Product demos, SaaS, B2B |
| `bold-editorial` | Magazine cover, bold typography, dark | Product launches, keynotes |
| `corporate` | Navy/gold, structured layouts | Investor decks, proposals |
| `dark-atmospheric` | Cinematic dark mode, glowing accents | Entertainment, gaming |
| `editorial-infographic` | Magazine explainers, flat illustrations | Tech explainers, research |
| `fantasy-animation` | Ghibli/Disney style, hand-drawn | Educational, storytelling |
| `intuition-machine` | Technical briefing, bilingual labels | Technical docs, academic |
| `minimal` | Ultra-clean, maximum whitespace | Executive briefings, premium |
| `pixel-art` | Retro 8-bit, chunky pixels | Gaming, developer talks |
| `scientific` | Academic diagrams, precise labeling | Biology, chemistry, medical |
| `sketch-notes` | Hand-drawn, warm & friendly | Educational, tutorials |
| `vector-illustration` | Flat vector, retro & cute | Creative, children's content |
| `vintage` | Aged-paper, historical styling | Historical, heritage, biography |
| `watercolor` | Hand-painted textures, natural warmth | Lifestyle, wellness, travel |

## Auto Style Selection

| Content Signals | Selected Style |
|-----------------|----------------|
| paper, thesis, defense, conference, ieee, acm, icml, neurips, cvpr, acl, aaai, iclr | `academic-paper` |
| tutorial, learn, education, guide, intro, beginner | `sketch-notes` |
| classroom, teaching, school, chalkboard, blackboard | `chalkboard` |
| architecture, system, data, analysis, technical | `blueprint` |
| creative, children, kids, cute, illustration | `vector-illustration` |
| briefing, bilingual, infographic, concept | `intuition-machine` |
| executive, minimal, clean, simple, elegant | `minimal` |
| saas, product, dashboard, metrics, productivity | `notion` |
| investor, quarterly, business, corporate, proposal | `corporate` |
| launch, marketing, keynote, bold, impact, magazine | `bold-editorial` |
| entertainment, music, gaming, creative, atmospheric | `dark-atmospheric` |
| explainer, journalism, science communication | `editorial-infographic` |
| story, fantasy, animation, magical, whimsical | `fantasy-animation` |
| gaming, retro, pixel, developer, nostalgia | `pixel-art` |
| biology, chemistry, medical, pathway, scientific | `scientific` |
| history, heritage, vintage, expedition, historical | `vintage` |
| lifestyle, wellness, travel, artistic, natural | `watercolor` |
| Default | `blueprint` |

## Layout Gallery

Optional layout hints for individual slides. Specify in outline's `// LAYOUT` section.

### Slide-Specific Layouts

| Layout | Description | Best For |
|--------|-------------|----------|
| `title-hero` | Large centered title + subtitle | Cover slides, section breaks |
| `quote-callout` | Featured quote with attribution | Testimonials, key insights |
| `key-stat` | Single large number as focal point | Impact statistics, metrics |
| `split-screen` | Half image, half text | Feature highlights, comparisons |
| `icon-grid` | Grid of icons with labels | Features, capabilities, benefits |
| `two-columns` | Content in balanced columns | Paired information, dual points |
| `three-columns` | Content in three columns | Triple comparisons, categories |
| `image-caption` | Full-bleed image + text overlay | Visual storytelling, emotional |
| `agenda` | Numbered list with highlights | Session overview, roadmap |
| `bullet-list` | Structured bullet points | Simple content, lists |

### Infographic-Derived Layouts

| Layout | Description | Best For |
|--------|-------------|----------|
| `linear-progression` | Sequential flow left-to-right | Timelines, step-by-step |
| `binary-comparison` | Side-by-side A vs B | Before/after, pros-cons |
| `comparison-matrix` | Multi-factor grid | Feature comparisons |
| `hierarchical-layers` | Pyramid or stacked levels | Priority, importance |
| `hub-spoke` | Central node with radiating items | Concept maps, ecosystems |
| `bento-grid` | Varied-size tiles | Overview, summary |
| `funnel` | Narrowing stages | Conversion, filtering |
| `dashboard` | Metrics with charts/numbers | KPIs, data display |
| `venn-diagram` | Overlapping circles | Relationships, intersections |
| `circular-flow` | Continuous cycle | Recurring processes |
| `winding-roadmap` | Curved path with milestones | Journey, timeline |
| `tree-branching` | Parent-child hierarchy | Org charts, taxonomies |
| `iceberg` | Visible vs hidden layers | Surface vs depth |
| `bridge` | Gap with connection | Problem-solution |

### Academic-Specific Layouts

| Layout | Description | Best For |
|--------|-------------|----------|
| `paper-title` | Title, authors, affiliations, venue | Conference paper cover |
| `outline-agenda` | Numbered section list with highlights | Talk structure overview |
| `methods-diagram` | Central architecture/pipeline diagram | Methods, system design |
| `results-chart` | Chart area + data annotations | Quantitative results |
| `equation-focus` | Centered equation + variable definitions | Mathematical derivations |
| `qualitative-grid` | 2x2 or 3x2 image comparison grid | Visual results, ablations |
| `references-list` | Numbered citation list | Key references slide |
| `contributions` | Numbered contribution points | Contributions summary |

**Usage**: Add `Layout: <name>` in slide's `// LAYOUT` section to guide visual composition.

## Design Philosophy

This deck is designed for **reading and sharing**, not live presentation:
- Each slide must be **self-explanatory** without verbal commentary
- Structure content for **logical flow** when scrolling
- Include **all necessary context** within each slide
- Optimize for **social media sharing** and offline reading

## File Management

### Output Directory

Each session creates an independent directory named by content slug:

```
slide-deck/{topic-slug}/
├── source-{slug}.{ext}    # Source files (text, images, etc.)
├── outline.md
├── outline-{style}.md     # Style variant outlines
├── prompts/
│   └── 01-slide-cover.md, 02-slide-{slug}.md, ...
├── 01-slide-cover.png, 02-slide-{slug}.png, ...
├── {topic-slug}.pptx
└── {topic-slug}.pdf
```

**Slug Generation**:
1. Extract main topic from content (2-4 words, kebab-case)
2. Example: "Introduction to Machine Learning" → `intro-machine-learning`

### Conflict Resolution

If `slide-deck/{topic-slug}/` already exists:
- Append timestamp: `{topic-slug}-YYYYMMDD-HHMMSS`
- Example: `intro-ml` exists → `intro-ml-20260118-143052`

### Source Files

Copy all sources with naming `source-{slug}.{ext}`:
- `source-article.md` (main text content)
- `source-diagram.png` (image from conversation)
- `source-data.xlsx` (additional file)

Multiple sources supported: text, images, files from conversation.

## Workflow

### Step 1: Analyze Content

1. Save source content (if pasted, save as `source.md`)
2. Follow `references/analysis-framework.md` for deep content analysis
3. Determine style (use `--style` or auto-select from signals)
4. Detect languages (source vs. user preference)
5. Plan slide count (`--slides` or dynamic)
6. **For academic papers (PDF with figures)**: Run automatic figure detection:
   ```bash
   npx -y bun ${SKILL_DIR}/scripts/detect-figures.ts --pdf source-paper.pdf --output figures.json
   ```
   This outputs a JSON file with all detected figures/tables, their page numbers, and captions.

### Step 2: Generate Outline Variants

1. Generate 3 style variant outlines based on content analysis
2. Follow `references/outline-template.md` for structure
3. **Auto-populate IMAGE_SOURCE** for academic papers:
   - Read `figures.json` from Step 1
   - Map figures to slides using rules in `references/analysis-framework.md` Section 8
   - Automatically add `// IMAGE_SOURCE` blocks to appropriate slides:
     - Architecture/pipeline figures → Methods slides (`Source: extract`)
     - Results tables → Quantitative results slides (`Source: extract`)
     - Comparison images → Qualitative results slides (`Source: extract`)
     - Conceptual/simple diagrams → Leave for AI generation (`Source: generate` or omit)
4. Save as `outline-{style}.md` for each variant

### Step 3: User Confirmation

**Single AskUserQuestion with all applicable options:**

| Question | When to Ask |
|----------|-------------|
| Style variant | Always (3 options + custom) |
| Language | Only if source ≠ user language |

After selection:
- Copy selected `outline-{style}.md` to `outline.md`
- Regenerate in different language if requested
- User may edit `outline.md` for fine-tuning

If `--outline-only`, stop here.

### Step 4: Generate Prompts

1. Read `references/base-prompt.md`
2. Combine with style instructions from outline
3. Add slide-specific content
4. If `Layout:` specified in outline, include layout guidance in prompt:
   - Reference layout characteristics for image composition
   - Example: `Layout: hub-spoke` → "Central concept in middle with related items radiating outward"
5. Save to `prompts/` directory

### Step 5: Image Generation Method Selection

**Before generating images**, ask user to choose generation method:

**Use AskUserQuestion** with options:

| Option | Label | Description |
|--------|-------|-------------|
| 1 | **Gemini API (Recommended)** | Official Google API via Python. Requires GOOGLE_API_KEY env var. |
| 2 | **Gemini Web (Browser-based)** | ⚠️ Uses reverse-engineered web API. No API key needed but may break. |

**Based on selection:**

#### Option 1: Gemini API (Python)

1. **Verify API key**: Check `GOOGLE_API_KEY` or `GEMINI_API_KEY` environment variable
2. **Run generation script**:
   ```bash
   python ${SKILL_DIR}/scripts/generate-slides.py <slide-deck-dir> --model gemini-3-pro-image-preview
   ```

**Script Features**:
- Auto-installs `google-genai` package if missing
- Retry logic with exponential backoff (3 retries)
- Skips already-generated slides (> 10KB)
- Supports custom model via `--model` flag
- Outputs to `slides/` subdirectory

**Troubleshooting**:
- If server disconnection errors occur, script auto-retries
- For persistent failures, re-run the script (it skips completed slides)
- Check API quota if many failures occur

#### Option 2: Gemini Web Skill

1. **Consent Check**: Read consent file at:
   - Windows: `$APPDATA/baoyu-skills/gemini-web/consent.json`
   - macOS: `~/Library/Application Support/baoyu-skills/gemini-web/consent.json`
   - Linux: `~/.local/share/baoyu-skills/gemini-web/consent.json`

2. **If no consent or version mismatch**, display disclaimer and ask:
   ```
   ⚠️ DISCLAIMER: This uses a reverse-engineered Gemini Web API (NOT official).
   Risks: May break anytime, no support, possible account risk.
   ```

3. **For each slide**, run:
   ```bash
   npx -y bun ${GEMINI_WEB_SKILL_DIR}/scripts/main.ts \
     --promptfiles prompts/01-slide-cover.md \
     --image 01-slide-cover.png \
     --sessionId slides-{topic-slug}-{timestamp}
   ```

   Where `GEMINI_WEB_SKILL_DIR` = path to `baoyu-danger-gemini-web` skill directory.

4. **Proxy support**: If user is in restricted network, prepend:
   ```bash
   HTTP_PROXY=http://127.0.0.1:7890 HTTPS_PROXY=http://127.0.0.1:7890
   ```

### Step 5.5: Process IMAGE_SOURCE (Automatic Figure Extraction)

For academic presentations, IMAGE_SOURCE metadata was auto-populated in Step 2 based on figure detection from Step 1.

**Automatic Execution**:

1. **Parse outline** to identify slides with `Source: extract`
2. **Create figures directory**: `mkdir -p figures`
3. **For each extract slide**, automatically:
   - Read the Figure number, Page, and Caption from metadata
   - Run figure extraction script:
     ```bash
     npx -y bun ${SKILL_DIR}/scripts/extract-figure.ts \
       --pdf source-paper.pdf \
       --page <page-number> \
       --output figures/figure-<N>.png
     ```
   - Run template application script:
     ```bash
     npx -y bun ${SKILL_DIR}/scripts/apply-template.ts \
       --figure figures/figure-<N>.png \
       --title "<slide-headline>" \
       --caption "Figure <N>: <caption-text>" \
       --output <NN>-slide-<slug>.png
     ```
   - Report: "Extracted: Figure N → slide NN"

4. **For slides with `Source: generate`** (or no IMAGE_SOURCE):
   - Proceed to Step 6 for AI generation

**Note**: Source PDF must be saved as `source-paper.pdf` in output directory.

**Troubleshooting**:
- If figure detection missed a figure: manually add `// IMAGE_SOURCE` block to outline
- If wrong figure mapped: edit the `Figure:` and `Page:` values in outline
- If extraction fails: check PDF page number (1-indexed)

**PyMuPDF Fallback for Page Extraction**:
If `extract-figure.ts` fails with "Image or Canvas expected" error (common with complex PDFs), use PyMuPDF:
```python
import fitz
doc = fitz.open("source-paper.pdf")
page = doc[page_num - 1]  # 0-indexed
mat = fitz.Matrix(3, 3)  # 3x scale for high resolution
pix = page.get_pixmap(matrix=mat)
pix.save(f"extracted/page-{page_num}.png")
```
Then apply template using `apply-template.ts`.

### Step 6: Generate Images

1. Use selected method from Step 5
2. **Skip slides already processed in Step 5.5** (those with `Source: extract`)
3. Generate session ID: `slides-{topic-slug}-{timestamp}`
4. Generate each remaining slide with same session ID
5. Report progress: "Generated X/N"
6. Auto-retry once on generation failure

### Step 7: Merge to PPTX and PDF

```bash
npx -y bun ${SKILL_DIR}/scripts/merge-to-pptx.ts <slide-deck-dir>
npx -y bun ${SKILL_DIR}/scripts/merge-to-pdf.ts <slide-deck-dir>
```

### Step 8: Output Summary

```
Slide Deck Complete!

Topic: [topic]
Style: [style name]
Location: [directory path]
Slides: N total

- 01-slide-cover.png ✓ Cover
- 02-slide-intro.png ✓ Content
- ...
- {NN}-slide-back-cover.png ✓ Back Cover

Outline: outline.md
PPTX: {topic-slug}.pptx
PDF: {topic-slug}.pdf
```

## Slide Modification

See `references/modification-guide.md` for:
- Edit single slide workflow
- Add new slide (with renumbering)
- Delete slide (with renumbering)
- File naming conventions

## Image Generation Dependencies

### Gemini API (Option 1 - Recommended)

Requires:
- `GOOGLE_API_KEY` or `GEMINI_API_KEY` environment variable
- Python 3.8+ with pip
- `google-genai` package (auto-installed by script)

Model: `gemini-3-pro-image-preview` (default)

### Gemini Web Skill (Option 2)

Requires:
- `baoyu-danger-gemini-web` skill installed at `.claude/skills/baoyu-danger-gemini-web`
- Google Chrome browser with logged-in Google account
- User consent for reverse-engineered API disclaimer

### PDF Figure Extraction

Requires:
- **Primary**: `pdfjs-dist` npm package (use legacy build for Node.js)
- **Fallback**: `pymupdf` Python package (more reliable for complex PDFs)
- `canvas` npm package for apply-template.ts

## References

| File | Content |
|------|---------|
| `references/analysis-framework.md` | Deep content analysis for presentations |
| `references/outline-template.md` | Outline structure and STYLE_INSTRUCTIONS format |
| `references/modification-guide.md` | Edit, add, delete slide workflows |
| `references/content-rules.md` | Content and style guidelines |
| `references/base-prompt.md` | Base prompt for image generation |
| `references/figure-container-template.md` | Visual specs for extracted figure containers |
| `references/styles/<style>.md` | Full style specifications |

## Notes

### Image Generation

- **Nano Banana Pro API**: Recommended. Stable, reliable, requires API key
- **Gemini Web**: No API key needed, but uses reverse-engineered API with account risk
- Generation time: 10-30 seconds per slide
- Auto-retry once on generation failure
- Maintain style consistency via session ID

### Content Guidelines

- Use stylized alternatives for sensitive public figures
- Both methods use the same underlying Gemini model for image generation

## Extension Support

Custom styles and configurations via EXTEND.md.

**Check paths** (priority order):
1. `.paper-skills/paper-slide-deck/EXTEND.md` (project)
2. `~/.paper-skills/paper-slide-deck/EXTEND.md` (user)

If found, load before Step 1. Extension content overrides defaults.

Related Skills

Research Proposal Generator

from ComeOnOliver/skillshub

Generate high-quality academic research proposals for PhD applications following Nature Reviews-style academic writing conventions.

LJG-Xray-Paper: 论文解读

from ComeOnOliver/skillshub

你要做两件事，仅两件：

window-function-generator

from ComeOnOliver/skillshub

Window Function Generator - Auto-activating skill for Data Analytics. Triggers on: window function generator, window function generator Part of the Data Analytics skill category.

website-generator

from ComeOnOliver/skillshub

Use when the user asks to create a website, build a site, generate a landing page, or mentions B12. Trigger phrases: "create a website", "build me a site", "website for my business", "generate a landing page", "B12 website". Trigger with 'website', 'generator'.

webhook-receiver-generator

from ComeOnOliver/skillshub

Webhook Receiver Generator - Auto-activating skill for API Integration. Triggers on: webhook receiver generator, webhook receiver generator Part of the API Integration skill category.

vulnerability-report-generator

from ComeOnOliver/skillshub

Vulnerability Report Generator - Auto-activating skill for Security Advanced. Triggers on: vulnerability report generator, vulnerability report generator Part of the Security Advanced skill category.

vue-component-generator

from ComeOnOliver/skillshub

Vue Component Generator - Auto-activating skill for Frontend Development. Triggers on: vue component generator, vue component generator Part of the Frontend Development skill category.

user-story-generator

from ComeOnOliver/skillshub

User Story Generator - Auto-activating skill for Enterprise Workflows. Triggers on: user story generator, user story generator Part of the Enterprise Workflows skill category.

typeorm-entity-generator

from ComeOnOliver/skillshub

Typeorm Entity Generator - Auto-activating skill for Backend Development. Triggers on: typeorm entity generator, typeorm entity generator Part of the Backend Development skill category.

torchserve-config-generator

from ComeOnOliver/skillshub

Torchserve Config Generator - Auto-activating skill for ML Deployment. Triggers on: torchserve config generator, torchserve config generator Part of the ML Deployment skill category.

svg-icon-generator

from ComeOnOliver/skillshub

Svg Icon Generator - Auto-activating skill for Visual Content. Triggers on: svg icon generator, svg icon generator Part of the Visual Content skill category.

status-report-generator

from ComeOnOliver/skillshub

Status Report Generator - Auto-activating skill for Enterprise Workflows. Triggers on: status report generator, status report generator Part of the Enterprise Workflows skill category.