scanlume-ocr-api

Use when calling the Scanlume OCR API for screenshots, JPG, PNG, or image-based tables, especially when a task needs base64 data URLs, mode selection between simple and formatted OCR, or table-aware structured output. Tambem use quando for necessario chamar a API OCR do https://www.scanlume.com/ para screenshots, JPG, PNG ou tabelas em imagem.

242 stars

byaiskillstore

View on GitHub Installation ↓

Best use case

scanlume-ocr-api is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using scanlume-ocr-api should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/scanlume-ocr-api/SKILL.md --create-dirs "https://raw.githubusercontent.com/aiskillstore/marketplace/main/skills/daanaagua/scanlume-ocr-api/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/scanlume-ocr-api/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How scanlume-ocr-api Compares

Feature / Agent	scanlume-ocr-api	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Scanlume OCR API

Use this skill when the task is specifically about calling the public OCR API behind [https://www.scanlume.com/](https://www.scanlume.com/), not when the user only wants the website UI.

Use este skill quando a tarefa for especificamente chamar a API publica de OCR do [https://www.scanlume.com/](https://www.scanlume.com/), e nao quando o usuario so quiser usar a interface do site.

## English

### Workflow

1. Confirm the input is an image, not a PDF.
2. Read `references/api-contract.md` before building the request.
3. Choose `simple` only for raw text speed and lower cost.
4. Choose `formatted` for headings, multi-block layouts, Markdown, HTML, and tables.
5. If the user gives a local file path, prefer `scripts/scanlume_ocr.py` to build the data URL and call the API.
6. Read `references/output-shapes.md` before consuming `formatted` responses, especially table blocks.
7. State clearly when a request is blocked by public API limits, such as PDF OCR beta access.

### Quick Rules

- Public image OCR endpoint: `POST /v1/api/ocr`
- Auth: `Authorization: Bearer <SCANLUME_API_KEY>`
- Content type: `application/json`
- Payload keys: `mode` and `base64`
- `base64` must be a full data URL such as `data:image/png;base64,...`
- Do not claim multipart upload support
- Do not claim remote file URL support
- Do not claim public PDF OCR API availability

### Mode Selection

- Use `simple` for:
  - quick raw text extraction
  - lower cost image OCR
  - tasks that only need plain text

- Use `formatted` for:
  - screenshots with multiple text blocks
  - image-based tables
  - output needed in Markdown or HTML
  - tasks that benefit from `blocks` or `tableSummary`

### Helpers

- Read `references/api-contract.md` before first use.
- Read `references/output-shapes.md` before parsing formatted OCR results.
- Use `python scripts/scanlume_ocr.py <path> --mode formatted --output md` for a local table image.
- Use `python scripts/scanlume_ocr.py <path> --mode simple --output txt` for plain text extraction.

### Constraints

- The public v1 API currently covers image OCR only.
- The website supports PDF OCR, but the public PDF API route is still beta-gated.
- `simple` costs 1 credit per image.
- `formatted` costs 2 credits per image.
- Favor precise claims over marketing claims. If the API cannot do something publicly today, say so.

## Portugues (Brasil)

### Fluxo

1. Confirme que a entrada e uma imagem, nao um PDF.
2. Leia `references/api-contract.md` antes de montar a requisicao.
3. Escolha `simple` apenas quando o foco for texto bruto, velocidade e menor custo.
4. Escolha `formatted` para titulos, multiplos blocos, Markdown, HTML e tabelas.
5. Se o usuario fornecer um caminho local, prefira `scripts/scanlume_ocr.py` para gerar a data URL e chamar a API.
6. Leia `references/output-shapes.md` antes de consumir respostas `formatted`, principalmente em blocos de tabela.
7. Explique claramente quando uma requisicao estiver bloqueada por limites publicos da API, como o acesso beta ao OCR de PDF.

### Regras Rapidas

- Endpoint publico de OCR de imagem: `POST /v1/api/ocr`
- Auth: `Authorization: Bearer <SCANLUME_API_KEY>`
- Tipo de conteudo: `application/json`
- Chaves do payload: `mode` e `base64`
- `base64` precisa ser uma data URL completa como `data:image/png;base64,...`
- Nao afirme suporte a multipart upload
- Nao afirme suporte a URL remota de arquivo
- Nao afirme disponibilidade publica da API de PDF

### Escolha de Modo

- Use `simple` para:
  - extracao rapida de texto bruto
  - OCR de imagem com menor custo
  - tarefas que so precisam de texto puro

- Use `formatted` para:
  - screenshots com multiplos blocos de texto
  - tabelas em imagem
  - saida em Markdown ou HTML
  - tarefas que se beneficiam de `blocks` ou `tableSummary`

### Helpers

- Leia `references/api-contract.md` antes do primeiro uso.
- Leia `references/output-shapes.md` antes de processar respostas formatadas.
- Use `python scripts/scanlume_ocr.py <path> --mode formatted --output md` para uma imagem local com tabela.
- Use `python scripts/scanlume_ocr.py <path> --mode simple --output txt` para extracao simples de texto.

### Restricoes

- A API publica v1 atualmente cobre apenas OCR de imagem.
- O site [https://www.scanlume.com/](https://www.scanlume.com/) suporta OCR de PDF na interface web, mas a rota publica de PDF continua beta-gated.
- `simple` custa 1 credito por imagem.
- `formatted` custa 2 creditos por imagem.
- Prefira afirmacoes precisas a afirmacoes promocionais. Se a API publica ainda nao faz algo hoje, diga isso.

Related Skills

azure-quotas

242

from aiskillstore/marketplace

Check/manage Azure quotas and usage across providers. For deployment planning, capacity validation, region selection. WHEN: "check quotas", "service limits", "current usage", "request quota increase", "quota exceeded", "validate capacity", "regional availability", "provisioning limits", "vCPU limit", "how many vCPUs available in my subscription".

DevOps & Infrastructure

raindrop-io

242

from aiskillstore/marketplace

Manage Raindrop.io bookmarks with AI assistance. Save and organize bookmarks, search your collection, manage reading lists, and organize research materials. Use when working with bookmarks, web research, reading lists, or when user mentions Raindrop.io.

Data & Research

zlibrary-to-notebooklm

242

from aiskillstore/marketplace

自动从 Z-Library 下载书籍并上传到 Google NotebookLM。支持 PDF/EPUB 格式，自动转换，一键创建知识库。

discover-skills

242

from aiskillstore/marketplace

当你发现当前可用的技能都不够合适（或用户明确要求你寻找技能）时使用。本技能会基于任务目标和约束，给出一份精简的候选技能清单，帮助你选出最适配当前任务的技能。

web-performance-seo

242

from aiskillstore/marketplace

Fix PageSpeed Insights/Lighthouse accessibility "!" errors caused by contrast audit failures (CSS filters, OKLCH/OKLAB, low opacity, gradient text, image backgrounds). Use for accessibility-driven SEO/performance debugging and remediation.

project-to-obsidian

242

from aiskillstore/marketplace

将代码项目转换为 Obsidian 知识库。当用户提到 obsidian、项目文档、知识库、分析项目、转换项目时激活。【激活后必须执行】： 1. 先完整阅读本 SKILL.md 文件 2. 理解 AI 写入规则（默认到 00_Inbox/AI/、追加式、统一 Schema） 3. 执行 STEP 0: 使用 AskUserQuestion 询问用户确认 4. 用户确认后才开始 STEP 1 项目扫描 5. 严格按 STEP 0 → 1 → 2 → 3 → 4 顺序执行【禁止行为】： - 禁止不读 SKILL.md 就开始分析项目 - 禁止跳过 STEP 0 用户确认 - 禁止直接在 30_Resources 创建（先到 00_Inbox/AI/） - 禁止自作主张决定输出位置

obsidian-helper

242

from aiskillstore/marketplace

Obsidian 智能笔记助手。当用户提到 obsidian、日记、笔记、知识库、capture、review 时激活。【激活后必须执行】： 1. 先完整阅读本 SKILL.md 文件 2. 理解 AI 写入三条硬规矩（00_Inbox/AI/、追加式、白名单字段） 3. 按 STEP 0 → STEP 1 → ... 顺序执行 4. 不要跳过任何步骤，不要自作主张【禁止行为】： - 禁止不读 SKILL.md 就开始工作 - 禁止跳过用户确认步骤 - 禁止在非 00_Inbox/AI/ 位置创建新笔记（除非用户明确指定）

internationalizing-websites

242

from aiskillstore/marketplace

Adds multi-language support to Next.js websites with proper SEO configuration including hreflang tags, localized sitemaps, and language-specific content. Use when adding new languages, setting up i18n, optimizing for international SEO, or when user mentions localization, translation, multi-language, or specific languages like Japanese, Korean, Chinese.

google-official-seo-guide

242

from aiskillstore/marketplace

Official Google SEO guide covering search optimization, best practices, Search Console, crawling, indexing, and improving website search visibility based on official Google documentation

github-release-assistant

242

from aiskillstore/marketplace

Generate bilingual GitHub release documentation (README.md + README.zh.md) from repo metadata and user input, and guide release prep with git add/commit/push. Use when the user asks to write or polish README files, create bilingual docs, prepare a GitHub release, or mentions release assistant/README generation.

doc-sync-tool

242

from aiskillstore/marketplace

自动同步项目中的 Agents.md、claude.md 和 gemini.md 文件，保持内容一致性。支持自动监听和手动触发。

deploying-to-production

242

from aiskillstore/marketplace

Automate creating a GitHub repository and deploying a web project to Vercel. Use when the user asks to deploy a website/app to production, publish a project, or set up GitHub + Vercel deployment.