orthogonal-riveter

Web scraping with structured data extraction - define your output schema

380 stars

Best use case

orthogonal-riveter is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Web scraping with structured data extraction - define your output schema

Teams using orthogonal-riveter should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/orthogonal-riveter/SKILL.md --create-dirs "https://raw.githubusercontent.com/gooseworks-ai/goose-skills/main/skills/capabilities/orthogonal-riveter/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/orthogonal-riveter/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How orthogonal-riveter Compares

Feature / Agentorthogonal-riveterStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Web scraping with structured data extraction - define your output schema

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Riveter - Structured Web Scraping

## Setup

Read your credentials from ~/.gooseworks/credentials.json:
```bash
export GOOSEWORKS_API_KEY=$(python3 -c "import json;print(json.load(open('$HOME/.gooseworks/credentials.json'))['api_key'])")
export GOOSEWORKS_API_BASE=$(python3 -c "import json;print(json.load(open('$HOME/.gooseworks/credentials.json')).get('api_base','https://api.gooseworks.ai'))")
```

If ~/.gooseworks/credentials.json does not exist, tell the user to run: `npx gooseworks login`

All endpoints use Bearer auth: `-H "Authorization: Bearer $GOOSEWORKS_API_KEY"`


Scrape web pages and extract data into your defined structure.

## Capabilities

- **Scrape**: Scrape a webpage and return the text content
- **Run**: Copy link Define the structure of your output directly in the API request
- **Run data**: Retrieve the processed data from a completed project run (free)
- **Run status**: Check the current status of a project run (free)
- **Stop run**: Stop a currently running project (free)

## Usage

### Scrape
Scrape a webpage and return the text content. This endpoint allows you to extract text content from any public webpage.

Parameters:
- url* (string) - Example: "https://example.com"
- proxy_country_code (string) - Optional two-character country code for proxy (e.g., 'us', 'gb', 'de')
- skip_cache (boolean) - Default: false. Set to true to bypass cache and always fetch fresh content

```bash
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"riveter","path":"/v1/scrape","body":{"url":"https://example.com/article"}}'
```

### Run
Copy link Define the structure of your output directly in the API request. This endpoint allows you to define both your input data and output configuration in a single request.

Parameters:
- input* (object) - The input object contains your source data:  Keys are column/attribute names Values are arrays of strings (all arrays must be the same length) Maximum 1000 rows per request
- output* (object) - The output object defines what data you want to extract:  Keys are the names of attributes you want to extract Each attribute requires: prompt: Instructions for finding/extracting this data contexts: Array of input or other output attribute names this depends on. Optional Output Configuration Each output attribute can optionally include:  format: Data type ('number', 'json', 'url', 'text', 'email', 'tag', 'date', 'boolean') format_details: Format-specific configuration (varies by format type). For json format, you can provide either a description (string) or a schema (JSON Schema object) or both. tools: Array of tools to use (['web_search', 'web_scrape', 'query_pdf', 'query_image']) max_tool_calls: Number of tool calls allowed (0-10) run_when: When to run this extraction ('always', 'any_filled', 'all_filled')
- run_key (string) - Custom identifier for this run (optional, will be generated if not provided)

```bash
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"riveter","path":"/v1/run"}'
  "input": {
    "urls": ["https://example.com/products"]
  },
  "output": {
    "name": {"prompt": "Product name", "contexts": ["urls"]},
    "price": {"prompt": "Product price", "contexts": ["urls"], "format": "number"}
  }
}'
```

### Run data (free)
Retrieve the processed data from a completed project run

Parameters:
- run_key* (string) - The run key (UUID) of the project run to retrieve data for

```bash
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"riveter","path":"/v1/run_data","query":{"run_key":"abc123"}}'
```

### Run status (free)
Check the current status of a project run

Parameters:
- run_key* (string) - The run key (UUID) of the project run to check

```bash
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"riveter","path":"/v1/run_status","query":{"run_key":"abc123"}}'
```

### Stop run (free)
Stop a currently running project. This will halt all processing and mark the run as stopped. Behavior:  If the run is already stopped or success, returns success with current status. If the run is in progress, stops all pending cells and marks the run as stopped.  Stopped runs cannot be resumed

Parameters:
- run_key* (string) - The run key (UUID) of the project run to stop

```bash
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"riveter","path":"/v1/stop_run","query":{"run_key":"abc123"}}'
```

## Use Cases

1. **E-commerce Scraping**: Extract product data in consistent format
2. **Job Listings**: Gather job postings with structured fields
3. **News Aggregation**: Extract articles with title, date, content
4. **Price Monitoring**: Track prices across competitor sites

## Discover More

For full endpoint details and parameters:

```bash
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/search \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"riveter API endpoints"}' List all endpoints
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/details \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"riveter","path":"/v1/scrape"}'   # Get endpoint details
```

Related Skills

orthogonal-yc-batch-evaluator

380
from gooseworks-ai/goose-skills

Evaluate YC batch companies for investment — scrapes the YC directory, researches each company and its founders (work history, LinkedIn, website), assesses founder-company fit, and exports to Google Sheets with priority rankings. Use when asked to evaluate YC companies, research a YC batch, screen startups, or do due diligence on YC companies.

orthogonal-website-screenshot

380
from gooseworks-ai/goose-skills

Take screenshots of websites and web pages

orthogonal-weather

380
from gooseworks-ai/goose-skills

Get current weather and forecasts using free APIs (no API key required). Use when asked about weather, temperature, forecasts, or climate conditions for any location.

orthogonal-weather-forecast

380
from gooseworks-ai/goose-skills

Get weather forecasts - temperature, precipitation, wind, and conditions

orthogonal-vhs-terminal-recordings

380
from gooseworks-ai/goose-skills

Create polished terminal GIF recordings using VHS (Video Hardware Software) by Charmbracelet. Use when asked to create terminal demos, CLI gifs, command-line recordings, or animated terminal screenshots for documentation, READMEs, or marketing.

orthogonal-verify-email

380
from gooseworks-ai/goose-skills

Verify if an email address is valid and deliverable

orthogonal-valyu

380
from gooseworks-ai/goose-skills

Web search, AI answers, content extraction, and async deep research

orthogonal-uptime-monitor

380
from gooseworks-ai/goose-skills

Monitor website uptime - check availability, response times, and status

orthogonal-twitter-profile-lookup

380
from gooseworks-ai/goose-skills

Look up Twitter/X profiles - get bio, followers, tweets, and engagement

orthogonal-tomba

380
from gooseworks-ai/goose-skills

Email finder and verifier - find emails from domains, LinkedIn, or company search

orthogonal-tiktok-search

380
from gooseworks-ai/goose-skills

Search TikTok - find profiles, videos, hashtags, and trending content

orthogonal-textbelt

380
from gooseworks-ai/goose-skills

Send SMS messages programmatically - simple HTTP API for text messaging