harvest-structured
Structured data extraction - tables, pricing, products, API endpoints with schema
Best use case
harvest-structured is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Structured data extraction - tables, pricing, products, API endpoints with schema
Teams using harvest-structured should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/harvest-structured/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How harvest-structured Compares
| Feature / Agent | harvest-structured | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Structured data extraction - tables, pricing, products, API endpoints with schema
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Harvest Structured
Extract structured data from web pages using user-defined schemas. Turns messy HTML into clean JSON/CSV - pricing tables, product listings, API endpoint docs, comparison matrices.
## Usage
```
/scrape <url> --schema "<field descriptions>"
```
## Examples
```bash
# Extract pricing data
/scrape https://example.com/pricing --schema "plan_name, price, features[], cta_text"
# Extract product listings
/scrape https://store.example.com/products --schema "name, price, rating, reviews_count, image_url"
# Extract API endpoints
/scrape https://docs.api.com/reference --schema "method, path, description, parameters[], response_code"
```
## Schema Definition
Define fields as comma-separated names. Use `[]` for arrays:
```
name → Single text value
price → Single value (auto-detects currency)
features[] → Array of items
description → Long text
url → Auto-detects links
image_url → Auto-detects image sources
```
## How It Works
1. Fetch page content
2. Parse schema definition
3. Use CSS selectors or LLM extraction to match fields
4. Validate extracted data against schema
5. Output as JSON (default) or CSV
## Output Format
### JSON (default)
```json
[
{
"plan_name": "Pro",
"price": "$29/mo",
"features": ["Unlimited projects", "Priority support", "API access"],
"source_url": "https://example.com/pricing"
}
]
```
### CSV
```csv
plan_name,price,features,source_url
Pro,"$29/mo","Unlimited projects; Priority support; API access",https://example.com/pricing
```
## Integration
- **growth**: Competitor pricing extraction
- **migrator**: Changelog/breaking changes extraction
- **tech-radar**: Feature comparison across tools
- **data-analyst**: Structured data for analysis
## Rules
- Only extract publicly visible data
- Respect rate limits (1 req/sec)
- Validate schema before extraction
- Report confidence per field (high/medium/low)
- Output includes source URL for every recordRelated Skills
harvest-single
Single page smart extraction - articles, docs, blog posts to clean markdown
harvest-monitor
Web change monitoring - track changes on pages, detect updates, changelog diffs
harvest-deep-crawl
Multi-page deep crawling - documentation sites, wikis, knowledge bases
harvest-competitive
Competitive intelligence - extract features, pricing, tech stack from competitor sites
harvest-adaptive
Adaptive content summarization - auto-detect content type and produce relevant summary
workflow-router
Goal-based workflow orchestration - routes tasks to specialist agents based on user goals
wiring
Wiring Verification
websocket-patterns
Connection management, room patterns, reconnection strategies, message buffering, and binary protocol design.
visual-verdict
Screenshot comparison QA for frontend development. Takes a screenshot of the current implementation, scores it across multiple visual dimensions, and returns a structured PASS/REVISE/FAIL verdict with concrete fixes. Use when implementing UI from a design reference or verifying visual correctness.
verification-loop
Comprehensive verification system covering build, types, lint, tests, security, and diff review before a PR.
vector-db-patterns
Embedding strategies, ANN algorithms, hybrid search, RAG chunking strategies, and reranking for semantic search and retrieval.
variant-analysis
Find similar vulnerabilities across a codebase after discovering one instance. Uses pattern matching, AST search, Semgrep/CodeQL queries, and manual tracing to propagate findings. Adapted from Trail of Bits. Use after finding a bug to check if the same pattern exists elsewhere.