orthogonal-pdf-processor

Process PDFs - extract text, tables, and structured data from documents

380 stars

Best use case

orthogonal-pdf-processor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Process PDFs - extract text, tables, and structured data from documents

Teams using orthogonal-pdf-processor should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/orthogonal-pdf-processor/SKILL.md --create-dirs "https://raw.githubusercontent.com/gooseworks-ai/goose-skills/main/skills/capabilities/orthogonal-pdf-processor/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/orthogonal-pdf-processor/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How orthogonal-pdf-processor Compares

Feature / Agentorthogonal-pdf-processorStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Process PDFs - extract text, tables, and structured data from documents

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# PDF Processor - Extract Data from PDFs

## Setup

Read your credentials from ~/.gooseworks/credentials.json:
```bash
export GOOSEWORKS_API_KEY=$(python3 -c "import json;print(json.load(open('$HOME/.gooseworks/credentials.json'))['api_key'])")
export GOOSEWORKS_API_BASE=$(python3 -c "import json;print(json.load(open('$HOME/.gooseworks/credentials.json')).get('api_base','https://api.gooseworks.ai'))")
```

If ~/.gooseworks/credentials.json does not exist, tell the user to run: `npx gooseworks login`

All endpoints use Bearer auth: `-H "Authorization: Bearer $GOOSEWORKS_API_KEY"`


Extract text, tables, and structured data from PDF documents.

## Workflow

### Step 1: Fetch PDF Content
Use Linkup to fetch PDF URLs:

```bash
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"linkup","path":"/fetch","body":{"url":"https://example.com/document.pdf"}}'
```

### Step 2: Extract with AI
Use ScrapeGraph to extract specific content:

```bash
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"scrapegraph","path":"/v1/smartscraper"}'
  "website_url": "https://example.com/report.pdf",
  "user_prompt": "Extract all financial figures, tables, and key metrics from this document"
}'
```

### Step 3: Extract Tables
Get structured table data:

```bash
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"riveter","path":"/v1/run"}'
  "input": {
    "urls": ["https://example.com/report.pdf"]
  },
  "output": {
    "tables": {"prompt": "Extract all tables with titles, headers, and rows", "contexts": ["urls"]}
  }
}'
```

### Step 4: Convert to Markdown
Get readable markdown output:

```bash
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"scrapegraph","path":"/v1/markdownify","body":{"website_url":"https://example.com/document.pdf"}}'
```

## Example Usage

```bash
# Extract data from financial report
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"scrapegraph","path":"/v1/smartscraper"}'
  "website_url": "https://example.com/annual-report.pdf",
  "user_prompt": "Extract revenue, profit, and key business metrics with their values"
}'

# Extract invoice data
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"riveter","path":"/v1/run"}'
  "input": {"urls": ["https://example.com/invoice.pdf"]},
  "output": {
    "vendor": {"prompt": "Vendor name", "contexts": ["urls"]},
    "amount": {"prompt": "Total amount", "contexts": ["urls"]},
    "date": {"prompt": "Invoice date", "contexts": ["urls"]}
  }
}'
```

## Tips

- Specify exact data you need for better extraction
- Use schemas for consistent structured output
- Handle multi-page documents in chunks
- Verify extracted numbers against source

## Discover More

List all endpoints, or add a path for parameter details:

```bash
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/search \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"linkup API endpoints"}' api show riveter
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/search \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"scrapegraph API endpoints"}'

Example: `curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/details \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"olostep","path":"/v1/scrapes`"}' for endpoint parameters.

Related Skills

orthogonal-yc-batch-evaluator

380
from gooseworks-ai/goose-skills

Evaluate YC batch companies for investment — scrapes the YC directory, researches each company and its founders (work history, LinkedIn, website), assesses founder-company fit, and exports to Google Sheets with priority rankings. Use when asked to evaluate YC companies, research a YC batch, screen startups, or do due diligence on YC companies.

orthogonal-website-screenshot

380
from gooseworks-ai/goose-skills

Take screenshots of websites and web pages

orthogonal-weather

380
from gooseworks-ai/goose-skills

Get current weather and forecasts using free APIs (no API key required). Use when asked about weather, temperature, forecasts, or climate conditions for any location.

orthogonal-weather-forecast

380
from gooseworks-ai/goose-skills

Get weather forecasts - temperature, precipitation, wind, and conditions

orthogonal-vhs-terminal-recordings

380
from gooseworks-ai/goose-skills

Create polished terminal GIF recordings using VHS (Video Hardware Software) by Charmbracelet. Use when asked to create terminal demos, CLI gifs, command-line recordings, or animated terminal screenshots for documentation, READMEs, or marketing.

orthogonal-verify-email

380
from gooseworks-ai/goose-skills

Verify if an email address is valid and deliverable

orthogonal-valyu

380
from gooseworks-ai/goose-skills

Web search, AI answers, content extraction, and async deep research

orthogonal-uptime-monitor

380
from gooseworks-ai/goose-skills

Monitor website uptime - check availability, response times, and status

orthogonal-twitter-profile-lookup

380
from gooseworks-ai/goose-skills

Look up Twitter/X profiles - get bio, followers, tweets, and engagement

orthogonal-tomba

380
from gooseworks-ai/goose-skills

Email finder and verifier - find emails from domains, LinkedIn, or company search

orthogonal-tiktok-search

380
from gooseworks-ai/goose-skills

Search TikTok - find profiles, videos, hashtags, and trending content

orthogonal-textbelt

380
from gooseworks-ai/goose-skills

Send SMS messages programmatically - simple HTTP API for text messaging