retab

Build apps and integrations on top of Retab's core APIs. Use when Codex needs to add document parsing, structured extraction, bundle splitting, form filling or document editing, document classification, or workflow run integration to a codebase through the Retab Python SDK, Node SDK, or direct REST calls. Covers starting workflow runs, waiting for completion, inspecting step outputs, and handling human-review pauses.

41 stars

Best use case

retab is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Build apps and integrations on top of Retab's core APIs. Use when Codex needs to add document parsing, structured extraction, bundle splitting, form filling or document editing, document classification, or workflow run integration to a codebase through the Retab Python SDK, Node SDK, or direct REST calls. Covers starting workflow runs, waiting for completion, inspecting step outputs, and handling human-review pauses.

Teams using retab should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/retab/SKILL.md --create-dirs "https://raw.githubusercontent.com/retab-dev/retab/main/skills/retab/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/retab/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How retab Compares

Feature / AgentretabStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Build apps and integrations on top of Retab's core APIs. Use when Codex needs to add document parsing, structured extraction, bundle splitting, form filling or document editing, document classification, or workflow run integration to a codebase through the Retab Python SDK, Node SDK, or direct REST calls. Covers starting workflow runs, waiting for completion, inspecting step outputs, and handling human-review pauses.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Retab

Use this skill to implement Retab document API calls and workflow runs without relying on external docs.

It is especially useful for two kinds of tasks:

- Direct document operations: `parse`, `extract`, `split`, `edit`, `classify`
- Existing workflow execution: start a run, pass inputs to start blocks, wait for completion, inspect step outputs, and handle `waiting_for_human`

## Quick Start

1. Install a client when needed:
   - Python: `pip install retab`
   - Node: `npm install @retab/node`
2. Load `RETAB_API_KEY`.
3. Pick the smallest operation that solves the task:
  - Need text or page content: read `references/parse.md`
  - Need structured JSON from a schema: read `references/extract.md`
  - Need to break a file into labeled sections: read `references/split.md`
  - Need to fill or update a form-like document: read `references/edit.md`
  - Need to choose one label from known categories: read `references/classify.md`
  - Need to run an existing multi-step pipeline, wait for completion, inspect block outputs, or handle human review: read `references/workflows.md`
4. Read `references/common.md` before writing code if authentication, input format, or model defaults are still unclear.

## Workflow-first cases

Prefer the workflow reference over direct document routes when the user already has a workflow and any of these are true:

- They mention a workflow ID such as `wf_...`
- They need multiple steps chained together
- They need outputs from specific workflow blocks
- They mention `final_outputs`, `steps`, `waiting_for_human`, or HIL/human review
- They want to reuse an existing dashboard workflow from code instead of rebuilding the logic inline

## Working Rules

- Prefer the SDK unless the codebase is already built around raw HTTP.
- Pass `model="retab-small"` explicitly unless the user asks for a different tradeoff.
- Keep request bodies minimal. Add optional fields only when they solve a real problem.
- For direct document REST calls, send the `Api-Key` header and a `document` object with `filename` and `url`.
- For workflow-run REST calls, send `documents` keyed by start block ID, with `filename`, `content`, and `mime_type`.
- For workflow-run SDK calls, map inputs by start block ID exactly. Do not invent friendly aliases for block keys.
- For SDK calls, prefer passing a local file path when possible.
- Add retries for transient network or 5xx failures. Do not blindly retry validation errors.
- If a workflow run must finish before downstream code proceeds, use the SDK waiting helpers instead of hand-writing ad hoc polling when the SDK already provides one.
- If a workflow stops at `waiting_for_human`, do not treat that as a generic failure. Surface it explicitly and inspect the relevant step or HIL decision state.
- When debugging workflow outputs, use `workflows.runs.steps.list(run_id)` as the batch primitive (one HTTP call for the whole run). Use `steps.get(run_id, block_id)` for a single step. Avoid looping `run.steps` with per-step `steps.get()` calls — that creates an N+1 anti-pattern.
- To retrieve the typed resource produced by an inference step (extract, split, classifier, parse, edit, for-each partition), use `step.artifact` and the matching resource client (`client.extractions.get(step.artifact.id)`, `client.splits.get(...)`, `client.classifications.get(...)`, `client.parses.get(...)`, `client.edits.get(...)`, `client.partitions.get(...)`).
- To retrieve source provenance for an extraction, use `client.extractions.sources(extraction.id)` or `GET /v1/extractions/{extraction_id}/sources`; the returned `sources` tree mirrors the extraction and wraps leaves as `{ value, source }`.
- Stay within this skill's scope. It covers the direct document routes plus running existing workflows. If the user asks for workflow design, widgets, projects, or MCP setup, give the simplest useful answer and note that those areas are outside this skill's main coverage.

## References

- `references/common.md`: auth, SDK setup, shared request conventions, operation chooser
- `references/parse.md`: `POST /v1/parses`
- `references/extract.md`: `POST /v1/extractions`
- `references/split.md`: `POST /v1/splits`
- `references/edit.md`: `POST /v1/edits`
- `references/classify.md`: `POST /v1/classifications`
- `references/workflows.md`: workflow runs with `client.workflows.runs.create()` and `client.workflows.runs.get()`

Related Skills

transformer-lens-interpretability

24269
from davila7/claude-code-templates

Provides guidance for mechanistic interpretability research using TransformerLens to inspect and manipulate transformer internals via HookPoints and activation caching. Use when reverse-engineering model algorithms, studying attention patterns, or performing activation patching experiments.

nnsight-remote-interpretability

24269
from davila7/claude-code-templates

Provides guidance for interpreting and manipulating neural network internals using nnsight with optional NDIF remote execution. Use when needing to run interpretability experiments on massive models (70B+) without local GPU resources, or when working with any PyTorch architecture.

transformer-lens-interpretability

1174
from foryourhealth111-pixel/Vibe-Skills

Provides guidance for mechanistic interpretability research using TransformerLens to inspect and manipulate transformer internals via HookPoints and activation caching. Use when reverse-engineering model algorithms, studying attention patterns, or performing activation patching experiments.

TransformerLens: Mechanistic Interpretability for Transformers

25
from ComeOnOliver/skillshub

TransformerLens is the de facto standard library for mechanistic interpretability research on GPT-style language models. Created by Neel Nanda and maintained by Bryce Meyer, it provides clean interfaces to inspect and manipulate model internals via HookPoints on every activation.

SAELens: Sparse Autoencoders for Mechanistic Interpretability

25
from ComeOnOliver/skillshub

SAELens is the primary library for training and analyzing Sparse Autoencoders (SAEs) - a technique for decomposing polysemantic neural network activations into sparse, interpretable features. Based on Anthropic's groundbreaking research on monosemanticity.

workspace-surface-audit

144923
from affaan-m/everything-claude-code

Audit the active repo, MCP servers, plugins, connectors, env surfaces, and harness setup, then recommend the highest-value ECC-native skills, hooks, agents, and operator workflows. Use when the user wants help setting up Claude Code or understanding what capabilities are actually available in their environment.

DevelopmentClaude

ui-demo

144923
from affaan-m/everything-claude-code

Record polished UI demo videos using Playwright. Use when the user asks to create a demo, walkthrough, screen recording, or tutorial video of a web application. Produces WebM videos with visible cursor, natural pacing, and professional feel.

Developer ToolsClaude

token-budget-advisor

144923
from affaan-m/everything-claude-code

Offers the user an informed choice about how much response depth to consume before answering. Use this skill when the user explicitly wants to control response length, depth, or token budget. TRIGGER when: "token budget", "token count", "token usage", "token limit", "response length", "answer depth", "short version", "brief answer", "detailed answer", "exhaustive answer", "respuesta corta vs larga", "cuántos tokens", "ahorrar tokens", "responde al 50%", "dame la versión corta", "quiero controlar cuánto usas", or clear variants where the user is explicitly asking to control answer size or depth. DO NOT TRIGGER when: user has already specified a level in the current session (maintain it), the request is clearly a one-word answer, or "token" refers to auth/session/payment tokens rather than response size.

Productivity & Content CreationClaude

skill-comply

144923
from affaan-m/everything-claude-code

Visualize whether skills, rules, and agent definitions are actually followed — auto-generates scenarios at 3 prompt strictness levels, runs agents, classifies behavioral sequences, and reports compliance rates with full tool call timelines

DevelopmentClaude

santa-method

144923
from affaan-m/everything-claude-code

Multi-agent adversarial verification with convergence loop. Two independent review agents must both pass before output ships.

Quality AssuranceClaude

safety-guard

144923
from affaan-m/everything-claude-code

Use this skill to prevent destructive operations when working on production systems or running agents autonomously.

DevelopmentClaude

repo-scan

144923
from affaan-m/everything-claude-code

Cross-stack source code asset audit — classifies every file, detects embedded third-party libraries, and delivers actionable four-level verdicts per module with interactive HTML reports.

DevelopmentClaude