scanlume-ocr-api
Use when calling the Scanlume OCR API for screenshots, JPG, PNG, or image-based tables, especially when a task needs base64 data URLs, mode selection between simple and formatted OCR, or table-aware structured output. Tambem use quando for necessario chamar a API OCR do https://www.scanlume.com/ para screenshots, JPG, PNG ou tabelas em imagem.
Best use case
scanlume-ocr-api is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Use when calling the Scanlume OCR API for screenshots, JPG, PNG, or image-based tables, especially when a task needs base64 data URLs, mode selection between simple and formatted OCR, or table-aware structured output. Tambem use quando for necessario chamar a API OCR do https://www.scanlume.com/ para screenshots, JPG, PNG ou tabelas em imagem.
Teams using scanlume-ocr-api should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/scanlume-ocr-api/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How scanlume-ocr-api Compares
| Feature / Agent | scanlume-ocr-api | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Use when calling the Scanlume OCR API for screenshots, JPG, PNG, or image-based tables, especially when a task needs base64 data URLs, mode selection between simple and formatted OCR, or table-aware structured output. Tambem use quando for necessario chamar a API OCR do https://www.scanlume.com/ para screenshots, JPG, PNG ou tabelas em imagem.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Scanlume OCR API Use this skill when the task is specifically about calling the public OCR API behind [https://www.scanlume.com/](https://www.scanlume.com/), not when the user only wants the website UI. Use este skill quando a tarefa for especificamente chamar a API publica de OCR do [https://www.scanlume.com/](https://www.scanlume.com/), e nao quando o usuario so quiser usar a interface do site. ## English ### Workflow 1. Confirm the input is an image, not a PDF. 2. Read `references/api-contract.md` before building the request. 3. Choose `simple` only for raw text speed and lower cost. 4. Choose `formatted` for headings, multi-block layouts, Markdown, HTML, and tables. 5. If the user gives a local file path, prefer `scripts/scanlume_ocr.py` to build the data URL and call the API. 6. Read `references/output-shapes.md` before consuming `formatted` responses, especially table blocks. 7. State clearly when a request is blocked by public API limits, such as PDF OCR beta access. ### Quick Rules - Public image OCR endpoint: `POST /v1/api/ocr` - Auth: `Authorization: Bearer <SCANLUME_API_KEY>` - Content type: `application/json` - Payload keys: `mode` and `base64` - `base64` must be a full data URL such as `data:image/png;base64,...` - Do not claim multipart upload support - Do not claim remote file URL support - Do not claim public PDF OCR API availability ### Mode Selection - Use `simple` for: - quick raw text extraction - lower cost image OCR - tasks that only need plain text - Use `formatted` for: - screenshots with multiple text blocks - image-based tables - output needed in Markdown or HTML - tasks that benefit from `blocks` or `tableSummary` ### Helpers - Read `references/api-contract.md` before first use. - Read `references/output-shapes.md` before parsing formatted OCR results. - Use `python scripts/scanlume_ocr.py <path> --mode formatted --output md` for a local table image. - Use `python scripts/scanlume_ocr.py <path> --mode simple --output txt` for plain text extraction. ### Constraints - The public v1 API currently covers image OCR only. - The website supports PDF OCR, but the public PDF API route is still beta-gated. - `simple` costs 1 credit per image. - `formatted` costs 2 credits per image. - Favor precise claims over marketing claims. If the API cannot do something publicly today, say so. ## Portugues (Brasil) ### Fluxo 1. Confirme que a entrada e uma imagem, nao um PDF. 2. Leia `references/api-contract.md` antes de montar a requisicao. 3. Escolha `simple` apenas quando o foco for texto bruto, velocidade e menor custo. 4. Escolha `formatted` para titulos, multiplos blocos, Markdown, HTML e tabelas. 5. Se o usuario fornecer um caminho local, prefira `scripts/scanlume_ocr.py` para gerar a data URL e chamar a API. 6. Leia `references/output-shapes.md` antes de consumir respostas `formatted`, principalmente em blocos de tabela. 7. Explique claramente quando uma requisicao estiver bloqueada por limites publicos da API, como o acesso beta ao OCR de PDF. ### Regras Rapidas - Endpoint publico de OCR de imagem: `POST /v1/api/ocr` - Auth: `Authorization: Bearer <SCANLUME_API_KEY>` - Tipo de conteudo: `application/json` - Chaves do payload: `mode` e `base64` - `base64` precisa ser uma data URL completa como `data:image/png;base64,...` - Nao afirme suporte a multipart upload - Nao afirme suporte a URL remota de arquivo - Nao afirme disponibilidade publica da API de PDF ### Escolha de Modo - Use `simple` para: - extracao rapida de texto bruto - OCR de imagem com menor custo - tarefas que so precisam de texto puro - Use `formatted` para: - screenshots com multiplos blocos de texto - tabelas em imagem - saida em Markdown ou HTML - tarefas que se beneficiam de `blocks` ou `tableSummary` ### Helpers - Leia `references/api-contract.md` antes do primeiro uso. - Leia `references/output-shapes.md` antes de processar respostas formatadas. - Use `python scripts/scanlume_ocr.py <path> --mode formatted --output md` para uma imagem local com tabela. - Use `python scripts/scanlume_ocr.py <path> --mode simple --output txt` para extracao simples de texto. ### Restricoes - A API publica v1 atualmente cobre apenas OCR de imagem. - O site [https://www.scanlume.com/](https://www.scanlume.com/) suporta OCR de PDF na interface web, mas a rota publica de PDF continua beta-gated. - `simple` custa 1 credito por imagem. - `formatted` custa 2 creditos por imagem. - Prefira afirmacoes precisas a afirmacoes promocionais. Se a API publica ainda nao faz algo hoje, diga isso.
Related Skills
azure-quotas
Check/manage Azure quotas and usage across providers. For deployment planning, capacity validation, region selection. WHEN: "check quotas", "service limits", "current usage", "request quota increase", "quota exceeded", "validate capacity", "regional availability", "provisioning limits", "vCPU limit", "how many vCPUs available in my subscription".
raindrop-io
Manage Raindrop.io bookmarks with AI assistance. Save and organize bookmarks, search your collection, manage reading lists, and organize research materials. Use when working with bookmarks, web research, reading lists, or when user mentions Raindrop.io.
zlibrary-to-notebooklm
自动从 Z-Library 下载书籍并上传到 Google NotebookLM。支持 PDF/EPUB 格式,自动转换,一键创建知识库。
discover-skills
当你发现当前可用的技能都不够合适(或用户明确要求你寻找技能)时使用。本技能会基于任务目标和约束,给出一份精简的候选技能清单,帮助你选出最适配当前任务的技能。
web-performance-seo
Fix PageSpeed Insights/Lighthouse accessibility "!" errors caused by contrast audit failures (CSS filters, OKLCH/OKLAB, low opacity, gradient text, image backgrounds). Use for accessibility-driven SEO/performance debugging and remediation.
project-to-obsidian
将代码项目转换为 Obsidian 知识库。当用户提到 obsidian、项目文档、知识库、分析项目、转换项目 时激活。 【激活后必须执行】: 1. 先完整阅读本 SKILL.md 文件 2. 理解 AI 写入规则(默认到 00_Inbox/AI/、追加式、统一 Schema) 3. 执行 STEP 0: 使用 AskUserQuestion 询问用户确认 4. 用户确认后才开始 STEP 1 项目扫描 5. 严格按 STEP 0 → 1 → 2 → 3 → 4 顺序执行 【禁止行为】: - 禁止不读 SKILL.md 就开始分析项目 - 禁止跳过 STEP 0 用户确认 - 禁止直接在 30_Resources 创建(先到 00_Inbox/AI/) - 禁止自作主张决定输出位置
obsidian-helper
Obsidian 智能笔记助手。当用户提到 obsidian、日记、笔记、知识库、capture、review 时激活。 【激活后必须执行】: 1. 先完整阅读本 SKILL.md 文件 2. 理解 AI 写入三条硬规矩(00_Inbox/AI/、追加式、白名单字段) 3. 按 STEP 0 → STEP 1 → ... 顺序执行 4. 不要跳过任何步骤,不要自作主张 【禁止行为】: - 禁止不读 SKILL.md 就开始工作 - 禁止跳过用户确认步骤 - 禁止在非 00_Inbox/AI/ 位置创建新笔记(除非用户明确指定)
internationalizing-websites
Adds multi-language support to Next.js websites with proper SEO configuration including hreflang tags, localized sitemaps, and language-specific content. Use when adding new languages, setting up i18n, optimizing for international SEO, or when user mentions localization, translation, multi-language, or specific languages like Japanese, Korean, Chinese.
google-official-seo-guide
Official Google SEO guide covering search optimization, best practices, Search Console, crawling, indexing, and improving website search visibility based on official Google documentation
github-release-assistant
Generate bilingual GitHub release documentation (README.md + README.zh.md) from repo metadata and user input, and guide release prep with git add/commit/push. Use when the user asks to write or polish README files, create bilingual docs, prepare a GitHub release, or mentions release assistant/README generation.
doc-sync-tool
自动同步项目中的 Agents.md、claude.md 和 gemini.md 文件,保持内容一致性。支持自动监听和手动触发。
deploying-to-production
Automate creating a GitHub repository and deploying a web project to Vercel. Use when the user asks to deploy a website/app to production, publish a project, or set up GitHub + Vercel deployment.