general-ocr-struct

General-purpose offline OCR and post-processing for Chinese/English screenshots, scanned images, receipts, tables, chat screenshots, statement screenshots, and other text-heavy images. Use when you need to: (1) extract text from an image locally, (2) return raw OCR text before interpretation, (3) clean broken OCR lines into structured content, (4) reorganize recognized text into rows/fields for downstream use, or (5) separate recognition from later table entry, summarization, or document drafting.

3,891 stars

Best use case

general-ocr-struct is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

General-purpose offline OCR and post-processing for Chinese/English screenshots, scanned images, receipts, tables, chat screenshots, statement screenshots, and other text-heavy images. Use when you need to: (1) extract text from an image locally, (2) return raw OCR text before interpretation, (3) clean broken OCR lines into structured content, (4) reorganize recognized text into rows/fields for downstream use, or (5) separate recognition from later table entry, summarization, or document drafting.

Teams using general-ocr-struct should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/general-ocr-struct/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/9penny/general-ocr-struct/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/general-ocr-struct/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How general-ocr-struct Compares

Feature / Agentgeneral-ocr-structStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

General-purpose offline OCR and post-processing for Chinese/English screenshots, scanned images, receipts, tables, chat screenshots, statement screenshots, and other text-heavy images. Use when you need to: (1) extract text from an image locally, (2) return raw OCR text before interpretation, (3) clean broken OCR lines into structured content, (4) reorganize recognized text into rows/fields for downstream use, or (5) separate recognition from later table entry, summarization, or document drafting.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# General OCR Struct

Use this skill to separate OCR recognition from downstream content整理.

## Workflow

1. Run the local OCR script on the image first.
2. Return the raw OCR text before making business interpretations when accuracy matters.
3. If the image is a transaction-detail screenshot, run structuring mode to group rows into fields.
4. Mark uncertain fields explicitly as `待确认`; do not guess missing content.
5. Only after the user confirms recognition quality, use the result for tables, summaries, or documents.

## Commands

### Raw OCR

```bash
python3 scripts/general_ocr.py raw /path/to/image.jpg
```

### Structured transaction extraction

```bash
python3 scripts/general_ocr.py transactions /path/to/image.jpg
```

### JSON output

```bash
python3 scripts/general_ocr.py transactions /path/to/image.jpg --json
```

## Output rules

- Prefer showing the recognition result first, then the cleaned structure.
- Preserve source wording where possible.
- For uncertain content, use `待确认` instead of inferring.
- Adapt the structure to the source image type. For statement-like screenshots, common fields are: `card_last4`, `date`, `time`, `currency`, `merchant`, `amount`.

## Notes

- This skill uses RapidOCR locally.
- First install may need Python packages; after setup it runs offline.
- If OCR quality is weak, request a higher-resolution original screenshot before doing deeper整理.

Related Skills

Deal Desk — Structured Deal Review & Approval

3891
from openclaw/skills

Run every non-standard deal through a repeatable review process. Catch margin leaks, enforce discount guardrails, and close faster with pre-approved terms.

afrexai-construction-estimator

3891
from openclaw/skills

Complete construction estimating and cost management system. Use when preparing project estimates, bid proposals, cost breakdowns, value engineering, change order management, or construction budget tracking. Covers residential, commercial, and infrastructure projects. Trigger on 'estimate', 'construction cost', 'bid', 'takeoff', 'cost breakdown', 'change order', 'value engineering', 'construction budget', 'unit pricing', 'RSMeans'.

Construction & Project Management

Building Permit & Construction Permitting Agent

3891
from openclaw/skills

You are a construction permitting specialist. Help contractors, developers, and property owners navigate the building permit process from application through final inspection.

Construction & Permitting

instructional-design-cn

3891
from openclaw/skills

培训课程大纲设计、效果评估、内部分享材料生成

Workflow & Productivity

text-humanizer-Instruction-based

3891
from openclaw/skills

Detect and rewrite AI-generated writing patterns, em dashes, rule-of-three lists, sycophantic openers, hollow buzzwords like "delve" and "landscape", and replace them with direct, human-sounding prose.

gene-structure-mapper

3891
from openclaw/skills

Visualize gene structure with exon-intron diagrams, domain annotations, and mutation position markers. Produces SVG, PNG, or PDF figures suitable for publication from a gene symbol input.

chemical-structure-converter

3891
from openclaw/skills

Convert between IUPAC names, SMILES strings, and molecular formulas for chemical compounds. Supports structure validation, identifier interconversion, and cheminformatics data preparation for drug discovery and chemical research workflows.

Binance ICT Structure Recognizer

3891
from openclaw/skills

## 1. Scenario Definition

structsd-install

3891
from openclaw/skills

Installs the structsd binary from source. Covers Go, Ignite CLI, and building structsd for Linux and macOS. Use when structsd is not found, when setting up a new machine, or when the agent needs to install or update the Structs chain binary.

structs-streaming

3891
from openclaw/skills

Connects to the GRASS real-time event system via NATS WebSocket. Use when you need real-time game updates, want to react to events as they happen, need to monitor raids or attacks, watch for player creation, track fleet movements, or build event-driven tools. GRASS is the fastest way to know what's happening in the galaxy.

structs-reconnaissance

3891
from openclaw/skills

Gathers intelligence on players, guilds, planets, and the galaxy in Structs. Use when scouting enemy players, checking planet defenses, monitoring fleet movements, assessing guild strength, surveying the galaxy map, gathering intel before combat or raids, or updating competitive intelligence. Persists findings to memory/intel/.

structs-power

3891
from openclaw/skills

Manages power infrastructure in Structs. Covers substations, allocations, player connections, and power monitoring. Use when power is low or overloaded, creating or managing substations, connecting players to substations, allocating capacity, diagnosing offline status, or planning power budget for new structs.