pdf-figure-extractor

从PDF论文中精确提取Figure图片，自动分析PDF结构、定位caption位置、裁剪干净图形，并验证图片质量。支持学术新闻稿、论文写作等场景的自动化图片处理。

3,891 stars

Best use case

pdf-figure-extractor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using pdf-figure-extractor should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/pdf-figure-extractor/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/438061781/pdf-figure-extractor/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/pdf-figure-extractor/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How pdf-figure-extractor Compares

Feature / Agent	pdf-figure-extractor	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# PDF Figure提取技能

## 使用场景

- 从学术论文PDF提取Figure插入Word文档
- 需要干净、无caption、无正文的纯图形图片
- 批量提取多个Figure

## 标准工作流程

### 步骤1: 分析PDF结构
```python
import fitz

doc = fitz.open(pdf_path)
page = doc[page_num]

# 获取所有文本块
blocks = page.get_text("blocks")
for block in blocks:
    x0, y0, x1, y1, text, block_no, block_type = block
    if "Fig." in text or "Figure" in text:
        print(f"Figure相关: y={y0:.0f}-{y1:.0f}, {text[:50]}...")
```

### 步骤2: 定位Caption位置
```python
# 搜索Fig. X的精确位置
text_instances = page.search_for(f"Fig. {fig_num}")
for inst in text_instances:
    print(f"Fig.{fig_num}位置: y={inst.y0:.0f}-{inst.y1:.0f}")
```

### 步骤3: 确定裁剪区域
根据caption位置判断图形区域：

| Caption位置 | 图形区域 |
|------------|---------|
| y=400 (页面中部) | y=100-395 (caption上方) |
| y=666 (页面底部) | y=350-660 (caption上方) |
| y=326 (页面底部) | y=100-320 (caption上方) |

### 步骤4: 精确裁剪
```python
rect = fitz.Rect(50, y_start, page.rect.width - 50, y_end)
pix = page.get_pixmap(matrix=fitz.Matrix(2, 2), clip=rect)
pix.save(f"fig{fig_num}.png")
```

### 步骤5: 验证图片质量
检查清单：
- [ ] 包含所有子图(a,b,c,d...)
- [ ] 没有混入"Fig. X"开头的caption文字
- [ ] 没有混入正文段落
- [ ] 坐标轴和标签完整

## 常见PDF布局模板

### Nature/Science论文
- Fig.1: 通常caption在底部，图形y=350-660
- Fig.2+: caption位置不固定，需要先分析

### 会议论文
- 单栏布局: caption通常在图形下方
- 双栏布局: caption可能在图形上方或下方

## 错误处理

### 问题: 图片混入正文
**原因**: 裁剪范围太大
**解决**: 缩小y_end，确保在caption之前结束

### 问题: 子图缺失
**原因**: 裁剪范围太小
**解决**: 扩大y_start/y_end，包含完整图形

### 问题: caption未去除
**原因**: 裁剪范围包含了caption区域
**解决**: 根据caption的y坐标精确调整裁剪边界

## 最佳实践

1. **永远不要**凭感觉估计坐标
2. **始终先**分析PDF文本块结构
3. **高分辨率渲染**: 使用`matrix=fitz.Matrix(2, 2)`
4. **验证每张图片**: 确保干净无杂质
5. **记录坐标**: 为常见PDF类型建立坐标模板

## 触发关键词

"提取PDF图片", "从PDF提取Figure", "PDF图片裁剪", "学术论文图片提取"

Related Skills

recipe-video-extractor

3891

from openclaw/skills

Extract a structured cooking recipe from a shared video URL when the user sends `recipe <url>`. Prioritize caption/description and comments via browser automation, then use web search/fetch as fallback with clear source attribution.

invoice-extractor

3891

from openclaw/skills

Extract invoice information from images and PDF files using Baidu OCR API, export to Excel. Supports single file, multiple files, or entire directory processing. Use when the user mentions invoices, invoice recognition, extracting invoice data, processing receipts, converting invoices to Excel, or batch processing invoice files.

methodology-extractor

3891

from openclaw/skills

Batch extraction of experimental methods from multiple papers for protocol.

figure-reference-checker

3891

from openclaw/skills

Check figure references in manuscripts

figure-legend-gen

3891

from openclaw/skills

Generate standardized figure legends for scientific charts and graphs. Trigger when user uploads/requesting legend for research figures, academic papers, or data charts. Supports bar charts, line graphs, scatter plots, box plots, heatmaps, and microscopy images. This tool generates text legends only, not visualizations.

clinical-data-extractor

3891

from openclaw/skills

Extract clinical trial data from pharmaceutical conference websites or PDF documents. Use when user provides a URL or PDF file containing innovative drug clinical trial data and needs structured extraction of: drug name, manufacturer, indication, clinical phase, trial name, conference, efficacy and safety data (presented as tables), and markdown output to "药品名称@适应症.md" file.

terabox-link-extractor

3891

from openclaw/skills

Direct link extraction from TeraBox URLs using the XAPIverse protocol. Extracts high-speed download and stream links (All Resolutions) without browser session requirements. Use when the user provides a TeraBox link and wants to download or stream content directly.

mineru-pdf-extractor

3891

from openclaw/skills

Extract PDF content to Markdown using MinerU API. Supports formulas, tables, OCR. Provides both local file and online URL parsing methods.

Web Content Extractor - 网页内容提取器

3891

from openclaw/skills

**版本**: 2.0

---

3891

from openclaw/skills

name: article-factory-wechat

Content & Documentation

humanizer

3891

from openclaw/skills

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.

Content & Documentation

find-skills

3891

from openclaw/skills

Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.

General Utilities

pdf-figure-extractor

Best use case

When to use this skill

When not to use this skill

Installation

How pdf-figure-extractor Compares

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

Related Guides

AI Agents for Marketing

AI Agents for Startups

AI Agents for Coding

SKILL.md Source

Related Skills

recipe-video-extractor

invoice-extractor

methodology-extractor

figure-reference-checker

figure-legend-gen

clinical-data-extractor

terabox-link-extractor

mineru-pdf-extractor

Web Content Extractor - 网页内容提取器

﻿---

humanizer

find-skills

---