general-ocr-struct
General-purpose offline OCR and post-processing for Chinese/English screenshots, scanned images, receipts, tables, chat screenshots, statement screenshots, and other text-heavy images. Use when you need to: (1) extract text from an image locally, (2) return raw OCR text before interpretation, (3) clean broken OCR lines into structured content, (4) reorganize recognized text into rows/fields for downstream use, or (5) separate recognition from later table entry, summarization, or document drafting.
Best use case
general-ocr-struct is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
General-purpose offline OCR and post-processing for Chinese/English screenshots, scanned images, receipts, tables, chat screenshots, statement screenshots, and other text-heavy images. Use when you need to: (1) extract text from an image locally, (2) return raw OCR text before interpretation, (3) clean broken OCR lines into structured content, (4) reorganize recognized text into rows/fields for downstream use, or (5) separate recognition from later table entry, summarization, or document drafting.
Teams using general-ocr-struct should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/general-ocr-struct/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How general-ocr-struct Compares
| Feature / Agent | general-ocr-struct | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
General-purpose offline OCR and post-processing for Chinese/English screenshots, scanned images, receipts, tables, chat screenshots, statement screenshots, and other text-heavy images. Use when you need to: (1) extract text from an image locally, (2) return raw OCR text before interpretation, (3) clean broken OCR lines into structured content, (4) reorganize recognized text into rows/fields for downstream use, or (5) separate recognition from later table entry, summarization, or document drafting.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Marketing
Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.
Best AI Agents for Marketing
A curated list of the best AI agents and skills for marketing teams focused on SEO, content systems, outreach, and campaign execution.
Best AI Skills for ChatGPT
Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.
SKILL.md Source
# General OCR Struct Use this skill to separate OCR recognition from downstream content整理. ## Workflow 1. Run the local OCR script on the image first. 2. Return the raw OCR text before making business interpretations when accuracy matters. 3. If the image is a transaction-detail screenshot, run structuring mode to group rows into fields. 4. Mark uncertain fields explicitly as `待确认`; do not guess missing content. 5. Only after the user confirms recognition quality, use the result for tables, summaries, or documents. ## Commands ### Raw OCR ```bash python3 scripts/general_ocr.py raw /path/to/image.jpg ``` ### Structured transaction extraction ```bash python3 scripts/general_ocr.py transactions /path/to/image.jpg ``` ### JSON output ```bash python3 scripts/general_ocr.py transactions /path/to/image.jpg --json ``` ## Output rules - Prefer showing the recognition result first, then the cleaned structure. - Preserve source wording where possible. - For uncertain content, use `待确认` instead of inferring. - Adapt the structure to the source image type. For statement-like screenshots, common fields are: `card_last4`, `date`, `time`, `currency`, `merchant`, `amount`. ## Notes - This skill uses RapidOCR locally. - First install may need Python packages; after setup it runs offline. - If OCR quality is weak, request a higher-resolution original screenshot before doing deeper整理.
Related Skills
Deal Desk — Structured Deal Review & Approval
Run every non-standard deal through a repeatable review process. Catch margin leaks, enforce discount guardrails, and close faster with pre-approved terms.
afrexai-construction-estimator
Complete construction estimating and cost management system. Use when preparing project estimates, bid proposals, cost breakdowns, value engineering, change order management, or construction budget tracking. Covers residential, commercial, and infrastructure projects. Trigger on 'estimate', 'construction cost', 'bid', 'takeoff', 'cost breakdown', 'change order', 'value engineering', 'construction budget', 'unit pricing', 'RSMeans'.
Building Permit & Construction Permitting Agent
You are a construction permitting specialist. Help contractors, developers, and property owners navigate the building permit process from application through final inspection.
instructional-design-cn
培训课程大纲设计、效果评估、内部分享材料生成
text-humanizer-Instruction-based
Detect and rewrite AI-generated writing patterns, em dashes, rule-of-three lists, sycophantic openers, hollow buzzwords like "delve" and "landscape", and replace them with direct, human-sounding prose.
gene-structure-mapper
Visualize gene structure with exon-intron diagrams, domain annotations, and mutation position markers. Produces SVG, PNG, or PDF figures suitable for publication from a gene symbol input.
chemical-structure-converter
Convert between IUPAC names, SMILES strings, and molecular formulas for chemical compounds. Supports structure validation, identifier interconversion, and cheminformatics data preparation for drug discovery and chemical research workflows.
Binance ICT Structure Recognizer
## 1. Scenario Definition
structsd-install
Installs the structsd binary from source. Covers Go, Ignite CLI, and building structsd for Linux and macOS. Use when structsd is not found, when setting up a new machine, or when the agent needs to install or update the Structs chain binary.
structs-streaming
Connects to the GRASS real-time event system via NATS WebSocket. Use when you need real-time game updates, want to react to events as they happen, need to monitor raids or attacks, watch for player creation, track fleet movements, or build event-driven tools. GRASS is the fastest way to know what's happening in the galaxy.
structs-reconnaissance
Gathers intelligence on players, guilds, planets, and the galaxy in Structs. Use when scouting enemy players, checking planet defenses, monitoring fleet movements, assessing guild strength, surveying the galaxy map, gathering intel before combat or raids, or updating competitive intelligence. Persists findings to memory/intel/.
structs-power
Manages power infrastructure in Structs. Covers substations, allocations, player connections, and power monitoring. Use when power is low or overloaded, creating or managing substations, connecting players to substations, allocating capacity, diagnosing offline status, or planning power budget for new structs.