harvest-structured

Structured data extraction - tables, pricing, products, API endpoints with schema

422 stars

Best use case

harvest-structured is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Structured data extraction - tables, pricing, products, API endpoints with schema

Teams using harvest-structured should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/harvest-structured/SKILL.md --create-dirs "https://raw.githubusercontent.com/vibeeval/vibecosystem/main/skills/harvest-structured/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/harvest-structured/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How harvest-structured Compares

Feature / Agentharvest-structuredStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Structured data extraction - tables, pricing, products, API endpoints with schema

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Harvest Structured

Extract structured data from web pages using user-defined schemas. Turns messy HTML into clean JSON/CSV - pricing tables, product listings, API endpoint docs, comparison matrices.

## Usage

```
/scrape <url> --schema "<field descriptions>"
```

## Examples

```bash
# Extract pricing data
/scrape https://example.com/pricing --schema "plan_name, price, features[], cta_text"

# Extract product listings
/scrape https://store.example.com/products --schema "name, price, rating, reviews_count, image_url"

# Extract API endpoints
/scrape https://docs.api.com/reference --schema "method, path, description, parameters[], response_code"
```

## Schema Definition

Define fields as comma-separated names. Use `[]` for arrays:

```
name            → Single text value
price           → Single value (auto-detects currency)
features[]      → Array of items
description     → Long text
url             → Auto-detects links
image_url       → Auto-detects image sources
```

## How It Works

1. Fetch page content
2. Parse schema definition
3. Use CSS selectors or LLM extraction to match fields
4. Validate extracted data against schema
5. Output as JSON (default) or CSV

## Output Format

### JSON (default)
```json
[
  {
    "plan_name": "Pro",
    "price": "$29/mo",
    "features": ["Unlimited projects", "Priority support", "API access"],
    "source_url": "https://example.com/pricing"
  }
]
```

### CSV
```csv
plan_name,price,features,source_url
Pro,"$29/mo","Unlimited projects; Priority support; API access",https://example.com/pricing
```

## Integration

- **growth**: Competitor pricing extraction
- **migrator**: Changelog/breaking changes extraction
- **tech-radar**: Feature comparison across tools
- **data-analyst**: Structured data for analysis

## Rules

- Only extract publicly visible data
- Respect rate limits (1 req/sec)
- Validate schema before extraction
- Report confidence per field (high/medium/low)
- Output includes source URL for every record

Related Skills

harvest-single

422
from vibeeval/vibecosystem

Single page smart extraction - articles, docs, blog posts to clean markdown

harvest-monitor

422
from vibeeval/vibecosystem

Web change monitoring - track changes on pages, detect updates, changelog diffs

harvest-deep-crawl

422
from vibeeval/vibecosystem

Multi-page deep crawling - documentation sites, wikis, knowledge bases

harvest-competitive

422
from vibeeval/vibecosystem

Competitive intelligence - extract features, pricing, tech stack from competitor sites

harvest-adaptive

422
from vibeeval/vibecosystem

Adaptive content summarization - auto-detect content type and produce relevant summary

workflow-router

422
from vibeeval/vibecosystem

Goal-based workflow orchestration - routes tasks to specialist agents based on user goals

wiring

422
from vibeeval/vibecosystem

Wiring Verification

websocket-patterns

422
from vibeeval/vibecosystem

Connection management, room patterns, reconnection strategies, message buffering, and binary protocol design.

visual-verdict

422
from vibeeval/vibecosystem

Screenshot comparison QA for frontend development. Takes a screenshot of the current implementation, scores it across multiple visual dimensions, and returns a structured PASS/REVISE/FAIL verdict with concrete fixes. Use when implementing UI from a design reference or verifying visual correctness.

verification-loop

422
from vibeeval/vibecosystem

Comprehensive verification system covering build, types, lint, tests, security, and diff review before a PR.

vector-db-patterns

422
from vibeeval/vibecosystem

Embedding strategies, ANN algorithms, hybrid search, RAG chunking strategies, and reranking for semantic search and retrieval.

variant-analysis

422
from vibeeval/vibecosystem

Find similar vulnerabilities across a codebase after discovering one instance. Uses pattern matching, AST search, Semgrep/CodeQL queries, and manual tracing to propagate findings. Adapted from Trail of Bits. Use after finding a bug to check if the same pattern exists elsewhere.