image-text-extractor

批量识别图片中的文字内容并按图片分段输出为结构化文档；当用户需要从多张图片中提取文字、整理图片文字内容、将图片文字转为可编辑文档时使用

3,891 stars

Best use case

image-text-extractor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using image-text-extractor should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/test20260402/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/asiangiantduck/test20260402/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/test20260402/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How image-text-extractor Compares

Feature / Agent	image-text-extractor	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# 图片文字提取器

## 任务目标
- 本 Skill 用于：批量处理用户上传的图片，识别并提取每张图片中的文字内容
- 能力包含：图片OCR识别、文字内容整理、分段输出、文档生成
- 触发条件：用户上传一张或多张图片，并要求提取文字、识别内容、转为文档

## 操作步骤

### 步骤1：接收图片
- 引导用户上传图片（支持批量上传）
- 接受的图片格式：PNG、JPG、JPEG、GIF、WebP等常见格式
- 确认图片数量和顺序

### 步骤2：识别文字内容
- 对每张图片调用 `read_image` 工具进行文字识别
- 识别参数设置：
  - `prompt`: "提取图片中的所有文字内容，保持原有的段落和格式"
- 按图片上传顺序依次处理

### 步骤3：整理识别结果
- 为每张图片的文字内容添加清晰的图片标识（如"图片1"、"图片2"）
- 保留原文的段落结构和格式
- 如识别到标题、正文、列表等结构，保持原有层次

### 步骤4：生成文档
- 将整理好的内容按标准格式生成Markdown文档
- 文档格式参考：[references/output-format.md](references/output-format.md)
- 输出文档结构：
  1. 文档标题
  2. 提取时间
  3. 图片总数
  4. 各图片文字内容（按图片分段）
- 将文档内容直接输出给用户，或生成.md文件供用户下载

## 资源索引
- 输出格式参考：见 [references/output-format.md](references/output-format.md)（包含文档模板和格式规范）

## 注意事项
- 图片质量：建议图片清晰、光线充足、文字明显，以提高识别准确率
- 文字语言：支持中英文混合识别，其他语言根据图片内容自动识别
- 处理顺序：严格按照用户上传图片的顺序进行处理和输出
- 格式保留：尽可能保留原文的段落、标题、列表等结构
- 错误处理：如某张图片识别失败，跳过该图片并告知用户，继续处理其他图片
- 隐私保护：图片内容仅在当前会话中使用，不会存储或泄露

## 使用示例

### 示例1：批量提取文档图片
**用户上传**：3张文档截图
**执行流程**：
1. 接收3张图片
2. 逐张调用read_image识别文字
3. 整理为"图片1"、"图片2"、"图片3"三个部分
4. 生成包含所有内容的Markdown文档

### 示例2：提取演示文稿内容
**用户上传**：多张PPT截图
**执行流程**：
1. 接收图片并确认数量
2. 识别每张PPT中的标题和正文
3. 按幻灯片顺序分段输出
4. 保持原有的标题层次结构

Related Skills

MCP Engineering — Complete Model Context Protocol System

3891

from openclaw/skills

Build, integrate, secure, and scale MCP servers and clients. From first server to production multi-tool architecture.

AI Infrastructure & Integrations

alphashop-text

3891

from openclaw/skills

AlphaShop（遨虾）文本处理 API 工具集。支持3个接口：大模型文本翻译、生成商品多语言卖点、生成商品多语言标题。触发场景：翻译文本、文字翻译、多语言翻译、生成卖点、商品卖点、多语言卖点、生成标题、商品标题、多语言标题、SEO标题、 AlphaShop文本、遨虾文本处理。

Content & Documentation

alphashop-image

3891

from openclaw/skills

AlphaShop（遨虾）图像处理 API 工具集。支持11个接口：图片翻译、图片翻译PRO、图片高清放大、图片主题抠图、图片元素识别、图片元素智能消除、图像裁剪、虚拟试衣（创建+查询）、模特换肤（创建+查询）。触发场景：图片翻译、翻译图片文字、放大图片、高清放大、抠图、去背景、检测水印/Logo/文字、消除水印、去牛皮癣、裁剪图片、虚拟试衣、AI试衣、模特换肤、换模特、AlphaShop图像、遨虾图片处理。

Image Processing & Analysis

image-gen

3891

from openclaw/skills

Generate AI images from text prompts. Triggers on: "生成图片", "画一张", "AI图", "generate image", "配图", "create picture", "draw", "visualize", "generate an image".

Content & Documentation

contextbroker

3891

from openclaw/skills

A cross-agent memory and context SDK for AI systems. Provides structured context injection, conversation memory portability, and context enrichment.

wechat-article-extractor

3891

from openclaw/skills

Extract metadata and content from WeChat Official Account articles. Use when user needs to parse WeChat article URLs (mp.weixin.qq.com), extract article info (title, author, content, publish time, cover image), or convert WeChat articles to structured data. Supports various article types including posts, videos, images, voice messages, and reposts.

bing-keyword-image-downloader

3891

from openclaw/skills

当用户需要按关键词从 Bing 公开图片搜索结果中批量下载图片时使用。遇到类似“帮我从 Bing 按关键词下载 10 张图片”“批量抓取 Bing 图片”“按关键词保存 Bing 图片到本地”这类请求时，应主动使用这个 skill。它专门处理基于关键词的 Bing 图片搜索、分页收集候选链接、跳过失败源站并保存到本地目录的工作流。

pretext-reporter-bao

3891

from openclaw/skills

文本测量和Canvas布局报告工具 - 基于Pretext库，支持多语言文本测量、行布局计算、可视化报告生成

zhipu-free-image-video

3891

from openclaw/skills

智谱免费图片与视频生成技能。适用于用户想用智谱生成图片、批量出图、生成短视频、查询视频任务结果、等待视频完成，或优先使用免费/低成本模型快速产出创意内容时。

doubao-image-video

3891

from openclaw/skills

豆包图片与视频生成原生技能。适用于用户提到豆包、文生图、图生图、文生视频、图生视频、查询视频生成任务、等待任务完成或下载最终视频时，直接调用火山引擎 Ark 接口，不依赖外部 MCP 服务。

context-compressor

3891

from openclaw/skills

Intelligently compress context — conversations, code, logs. Preserve key information while reducing token usage. Auto-detects content type and applies optimal compression.

context_window_economics

3891

from openclaw/skills

Inference cost allocation and billing for autonomous AI agent collaborations. Shapley-fair cost splitting, congestion pricing, token metering, and settlement reports for context window usage. The economic layer of the Agent Trust Stack.