receipt-ocr

Implement and extend the Tesseract.js OCR pipeline for receipt field extraction.

7 stars

byheldernoid

View on GitHub Installation ↓

Best use case

receipt-ocr is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Implement and extend the Tesseract.js OCR pipeline for receipt field extraction.

Teams using receipt-ocr should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/receipt-ocr/SKILL.md --create-dirs "https://raw.githubusercontent.com/heldernoid/agentic-build-templates/main/projects/automation-productivity/expense-report-generator/skills/receipt-ocr/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/receipt-ocr/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How receipt-ocr Compares

Feature / Agent	receipt-ocr	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Implement and extend the Tesseract.js OCR pipeline for receipt field extraction.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# receipt-ocr skill

## OCR pipeline overview

Receipt OCR uses Tesseract.js (WASM, runs in Node.js) to extract three fields from uploaded images: vendor name, transaction date, and total amount. Each extracted field includes a confidence score.

### Server-side implementation

`server/src/ocr/extractFields.ts`:
```typescript
import Tesseract from 'tesseract.js';
import path from 'node:path';

export interface OcrResult {
  vendor:     string | null;
  date:       string | null;   // ISO 8601 or null
  amount:     number | null;
  confidence: number;          // 0-100, average of matched fields
  rawText:    string;
}

export async function extractReceiptFields(imagePath: string): Promise<OcrResult> {
  const { data } = await Tesseract.recognize(imagePath, 'eng', {
    logger: () => {},   // suppress progress logs
  });

  const raw = data.text;
  const confidence = Math.round(data.confidence);

  return {
    vendor:     extractVendor(raw),
    date:       extractDate(raw),
    amount:     extractAmount(raw),
    confidence,
    rawText:    raw,
  };
}
```

### Vendor extraction

Heuristic: the vendor name is typically on the first 1-3 non-empty lines, in all-caps or title case.

```typescript
function extractVendor(text: string): string | null {
  const lines = text.split('\n').map(l => l.trim()).filter(Boolean);
  // Take the first line that looks like a business name (>3 chars, not purely numeric)
  for (const line of lines.slice(0, 4)) {
    if (line.length > 3 && !/^\d+$/.test(line)) {
      return titleCase(line);
    }
  }
  return null;
}

function titleCase(s: string): string {
  return s.toLowerCase().replace(/\b\w/g, c => c.toUpperCase());
}
```

### Date extraction

Matches common date formats: MM/DD/YYYY, DD/MM/YYYY, YYYY-MM-DD, Month DD YYYY, DD Month YYYY.

```typescript
const DATE_PATTERNS: Array<{ re: RegExp; parse: (m: RegExpMatchArray) => string }> = [
  {
    re: /\b(\d{1,2})[\/\-\.](\d{1,2})[\/\-\.](\d{4})\b/,
    parse: ([, m, d, y]) => `${y}-${m.padStart(2,'0')}-${d.padStart(2,'0')}`,
  },
  {
    re: /\b(\d{4})[\/\-\.](\d{1,2})[\/\-\.](\d{1,2})\b/,
    parse: ([, y, m, d]) => `${y}-${m.padStart(2,'0')}-${d.padStart(2,'0')}`,
  },
  {
    re: /\b(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*\.?\s+(\d{1,2}),?\s+(\d{4})\b/i,
    parse: ([, mon, d, y]) => `${y}-${monthNum(mon)}-${d.padStart(2,'0')}`,
  },
];

function extractDate(text: string): string | null {
  for (const { re, parse } of DATE_PATTERNS) {
    const m = text.match(re);
    if (m) return parse(m);
  }
  return null;
}
```

### Amount extraction

Finds the largest dollar value on a line containing "TOTAL", "AMOUNT DUE", "GRAND TOTAL", or "DUE". Falls back to the largest dollar value in the document.

```typescript
const AMOUNT_RE = /\$?\s?(\d{1,6}[.,]\d{2})/g;

function extractAmount(text: string): number | null {
  const lines = text.split('\n');

  // Priority: line contains total keyword
  const totalKeywords = /\b(total|amount\s*due|grand\s*total|due)\b/i;
  for (const line of lines) {
    if (totalKeywords.test(line)) {
      const amounts = [...line.matchAll(AMOUNT_RE)].map(m => parseAmount(m[1]));
      if (amounts.length > 0) return Math.max(...amounts);
    }
  }

  // Fallback: largest amount in document
  const all = [...text.matchAll(AMOUNT_RE)].map(m => parseAmount(m[1]));
  return all.length > 0 ? Math.max(...all) : null;
}

function parseAmount(s: string): number {
  return parseFloat(s.replace(',', ''));
}
```

## Multer upload configuration

`server/src/middleware/upload.ts`:
```typescript
import multer from 'multer';
import path from 'node:path';
import { randomUUID } from 'node:crypto';
import fs from 'node:fs';

const UPLOAD_DIR = process.env.UPLOAD_DIR ?? './tmp/receipts';

// Ensure upload directory exists
fs.mkdirSync(UPLOAD_DIR, { recursive: true });

const ALLOWED_MIME = new Set(['image/jpeg', 'image/png', 'image/webp']);
const MAX_BYTES = parseInt(process.env.MAX_FILE_BYTES ?? '10485760', 10);

export const receiptUpload = multer({
  storage: multer.diskStorage({
    destination: UPLOAD_DIR,
    filename: (_req, _file, cb) => cb(null, `${randomUUID()}.jpg`),
  }),
  limits: { fileSize: MAX_BYTES },
  fileFilter: (_req, file, cb) => {
    if (ALLOWED_MIME.has(file.mimetype)) cb(null, true);
    else cb(new Error(`Unsupported file type: ${file.mimetype}`));
  },
});
```

## OCR route

`server/src/routes/receipts.ts`:
```typescript
import { Router } from 'express';
import { receiptUpload } from '../middleware/upload.js';
import { extractReceiptFields } from '../ocr/extractFields.js';

const router = Router();

router.post('/ocr', receiptUpload.single('receipt'), async (req, res) => {
  if (!req.file) {
    return res.status(400).json({ error: 'No file uploaded' });
  }
  try {
    const result = await extractReceiptFields(req.file.path);
    res.json({
      vendor:     result.vendor,
      date:       result.date,
      amount:     result.amount,
      confidence: result.confidence,
      rawText:    result.rawText,
      filePath:   req.file.filename,   // relative path for storage
    });
  } catch (err) {
    res.status(500).json({ error: 'OCR failed', detail: String(err) });
  }
});

export default router;
```

## Confidence thresholds

| Confidence | Meaning | UI treatment |
|------------|---------|--------------|
| >= 85 | High | Green badge, field auto-filled, no warning |
| 65-84 | Medium | Yellow badge, field pre-filled, user prompted to verify |
| < 65 | Low | Red badge, field left blank, user must fill manually |

The `OCR_CONFIDENCE_THRESHOLD` env var (default 75) controls which fields are flagged in the UI. Fields below the threshold receive a warning indicator.

## Improving OCR accuracy

1. Ensure the uploaded image is at least 800px wide. Narrow images lose character detail.
2. Receipt images that are skewed more than 15 degrees may reduce accuracy. Consider adding a deskew step using `sharp` before passing to Tesseract.
3. For images with dark backgrounds (thermal receipts), use `sharp` to invert and increase contrast before OCR.
4. Set the Tesseract page segmentation mode: `psm 6` (assume a single uniform block of text) works well for receipts. Pass via `config.tessedit_pageseg_mode = '6'` in the `recognize` call options.

## Testing OCR extraction

```typescript
// server/src/ocr/__tests__/extractFields.test.ts
import { extractReceiptFields } from '../extractFields.js';
import path from 'node:path';

test('extracts vendor, date, and amount from sample receipt', async () => {
  const result = await extractReceiptFields(
    path.resolve(__dirname, 'fixtures/sample_receipt.jpg')
  );
  expect(result.vendor).toBeTruthy();
  expect(result.date).toMatch(/^\d{4}-\d{2}-\d{2}$/);
  expect(result.amount).toBeGreaterThan(0);
  expect(result.confidence).toBeGreaterThan(60);
}, 30_000);  // OCR can take up to 30s on first run (WASM init)
```

Place test receipt images in `server/src/ocr/__tests__/fixtures/`.

## Known OCR edge cases

- **"T0TAL" (digit zero):** Tesseract sometimes confuses uppercase O with 0. The amount extractor accepts both in the total keyword regex.
- **Comma as decimal separator:** European receipts use commas (e.g. "52,00"). `parseAmount` normalizes commas to periods.
- **Multi-currency symbols:** The amount regex matches `$`, euro, and pound signs. The currency is not extracted from OCR; it defaults to the app default and must be corrected by the user.
- **Tip vs subtotal:** The extractor uses the largest amount on a "TOTAL" line, which should be the grand total including tip. If the receipt only shows a subtotal line, the tip is not included.

Related Skills

Skill: Uptime Monitoring

from heldernoid/agentic-build-templates

## Overview

Skill: Status Page

from heldernoid/agentic-build-templates

## Overview

Skill: unit-conversion

from heldernoid/agentic-build-templates

## Overview

Skill: recipe-scaler

from heldernoid/agentic-build-templates

## Overview

reading-list

from heldernoid/agentic-build-templates

Operate the reading-list API to save, manage, tag, search, and export articles.

email-digest

from heldernoid/agentic-build-templates

Configure, test, and troubleshoot the reading-list daily email digest delivered via nodemailer.

websocket-realtime

from heldernoid/agentic-build-templates

Use the WebSocket connection in poll-builder to receive live vote updates. Use when you need to stream real-time poll results, monitor a poll for new votes, or build a live dashboard. Triggers include "live results", "real-time updates", "stream votes", "watch poll", or "WebSocket".

poll-builder

from heldernoid/agentic-build-templates

Self-hosted poll creation tool with real-time results. Use when you need to create a poll, check vote counts, close a poll, export results, or get the shareable link for a poll. Triggers include "create poll", "vote", "poll results", "survey", "collect votes", "share poll", or any task involving polling or voting.

Skill: personal-finance

from heldernoid/agentic-build-templates

## Overview

Skill: csv-import

from heldernoid/agentic-build-templates

## Overview

Skill: Syntax Highlighting

from heldernoid/agentic-build-templates

## Purpose

Skill: Pastebin Core

from heldernoid/agentic-build-templates

## Purpose