harvest-single

Single page smart extraction - articles, docs, blog posts to clean markdown

422 stars

byvibeeval

View on GitHub Installation ↓

Best use case

harvest-single is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Single page smart extraction - articles, docs, blog posts to clean markdown

Teams using harvest-single should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/harvest-single/SKILL.md --create-dirs "https://raw.githubusercontent.com/vibeeval/vibecosystem/main/skills/harvest-single/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/harvest-single/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How harvest-single Compares

Feature / Agent	harvest-single	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Single page smart extraction - articles, docs, blog posts to clean markdown

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Harvest Single Page

Extract and clean content from a single web page. Auto-detects content type (article, documentation, API reference, blog post) and produces clean, structured markdown.

## Usage

```
/harvest <url>
```

## Examples

```bash
# Extract a blog post
/harvest https://blog.example.com/best-practices-2024

# Extract API documentation page
/harvest https://docs.stripe.com/api/charges

# Extract a GitHub README
/harvest https://github.com/owner/repo
```

## How It Works

1. Fetch URL content via WebFetch or crawl4ai
2. Detect content type (article, docs, API ref, blog, wiki)
3. Extract main content, strip navigation/ads/footers
4. Preserve code blocks, tables, images
5. Add metadata header (source, date, word count)
6. Save to `.claude/cache/agents/harvest/`

## Output Format

```markdown
# [Page Title]
> Source: [URL]
> Extracted: [timestamp]
> Type: [article|docs|api|blog|wiki]
> Words: [count]

[Clean extracted content in markdown]

## Links Found
- [Link text](URL)
```

## Fallback Chain

1. crawl4ai Docker (port 11235) - preferred
2. WebFetch tool - built-in fallback
3. curl + html2text - last resort

## When to Use

- Quick grab of a single page's content
- Extracting a specific doc page for reference
- Saving an article for later analysis
- Getting clean markdown from messy HTML

Related Skills

harvest-structured

422

from vibeeval/vibecosystem

Structured data extraction - tables, pricing, products, API endpoints with schema

harvest-monitor

422

from vibeeval/vibecosystem

Web change monitoring - track changes on pages, detect updates, changelog diffs

harvest-deep-crawl

422

from vibeeval/vibecosystem

Multi-page deep crawling - documentation sites, wikis, knowledge bases

harvest-competitive

422

from vibeeval/vibecosystem

Competitive intelligence - extract features, pricing, tech stack from competitor sites

harvest-adaptive

422

from vibeeval/vibecosystem

Adaptive content summarization - auto-detect content type and produce relevant summary

workflow-router

422

from vibeeval/vibecosystem

Goal-based workflow orchestration - routes tasks to specialist agents based on user goals

wiring

422

from vibeeval/vibecosystem

Wiring Verification

websocket-patterns

422

from vibeeval/vibecosystem

Connection management, room patterns, reconnection strategies, message buffering, and binary protocol design.

visual-verdict

422

from vibeeval/vibecosystem

Screenshot comparison QA for frontend development. Takes a screenshot of the current implementation, scores it across multiple visual dimensions, and returns a structured PASS/REVISE/FAIL verdict with concrete fixes. Use when implementing UI from a design reference or verifying visual correctness.

verification-loop

422

from vibeeval/vibecosystem

Comprehensive verification system covering build, types, lint, tests, security, and diff review before a PR.

vector-db-patterns

422

from vibeeval/vibecosystem

Embedding strategies, ANN algorithms, hybrid search, RAG chunking strategies, and reranking for semantic search and retrieval.

variant-analysis

422

from vibeeval/vibecosystem

Find similar vulnerabilities across a codebase after discovering one instance. Uses pattern matching, AST search, Semgrep/CodeQL queries, and manual tracing to propagate findings. Adapted from Trail of Bits. Use after finding a bug to check if the same pattern exists elsewhere.