extract

Extract content from specific URLs using Tavily's extraction API. Returns clean markdown/text from web pages. Use when you have specific URLs and need their content without writing code.

7 stars

byDemerzels-lab

View on GitHub Installation ↓

Best use case

extract is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Extract content from specific URLs using Tavily's extraction API. Returns clean markdown/text from web pages. Use when you have specific URLs and need their content without writing code.

Teams using extract should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/extract/SKILL.md --create-dirs "https://raw.githubusercontent.com/Demerzels-lab/elsamultiskillagent/main/public/skills/barneyjm/extract/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/extract/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How extract Compares

Feature / Agent	extract	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Extract content from specific URLs using Tavily's extraction API. Returns clean markdown/text from web pages. Use when you have specific URLs and need their content without writing code.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Extract Skill

Extract clean content from specific URLs. Ideal when you know which pages you want content from.

## Prerequisites

**Tavily API Key Required** - Get your key at https://tavily.com

Add to `~/.claude/settings.json`:
```json
{
  "env": {
    "TAVILY_API_KEY": "tvly-your-api-key-here"
  }
}
```

## Quick Start

### Using the Script

```bash
./scripts/extract.sh '<json>'
```

**Examples:**
```bash
# Single URL
./scripts/extract.sh '{"urls": ["https://example.com/article"]}'

# Multiple URLs
./scripts/extract.sh '{"urls": ["https://example.com/page1", "https://example.com/page2"]}'

# With query focus and chunks
./scripts/extract.sh '{"urls": ["https://example.com/docs"], "query": "authentication API", "chunks_per_source": 3}'

# Advanced extraction for JS pages
./scripts/extract.sh '{"urls": ["https://app.example.com"], "extract_depth": "advanced", "timeout": 60}'
```

### Basic Extraction

```bash
curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://example.com/article"]
  }'
```

### Multiple URLs with Query Focus

```bash
curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/ml-healthcare",
      "https://example.com/ai-diagnostics"
    ],
    "query": "AI diagnostic tools accuracy",
    "chunks_per_source": 3
  }'
```

## API Reference

### Endpoint

```
POST https://api.tavily.com/extract
```

### Headers

| Header | Value |
|--------|-------|
| `Authorization` | `Bearer <TAVILY_API_KEY>` |
| `Content-Type` | `application/json` |

### Request Body

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `urls` | array | Required | URLs to extract (max 20) |
| `query` | string | null | Reranks chunks by relevance |
| `chunks_per_source` | integer | 3 | Chunks per URL (1-5, requires query) |
| `extract_depth` | string | `"basic"` | `basic` or `advanced` (for JS pages) |
| `format` | string | `"markdown"` | `markdown` or `text` |
| `include_images` | boolean | false | Include image URLs |
| `timeout` | float | varies | Max wait (1-60 seconds) |

### Response Format

```json
{
  "results": [
    {
      "url": "https://example.com/article",
      "raw_content": "# Article Title\n\nContent..."
    }
  ],
  "failed_results": [],
  "response_time": 2.3
}
```

## Extract Depth

| Depth | When to Use |
|-------|-------------|
| `basic` | Simple text extraction, faster |
| `advanced` | Dynamic/JS-rendered pages, tables, structured data |

## Examples

### Single URL Extraction

```bash
curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://docs.python.org/3/tutorial/classes.html"],
    "extract_depth": "basic"
  }'
```

### Targeted Extraction with Query

```bash
curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/react-hooks",
      "https://example.com/react-state"
    ],
    "query": "useState and useEffect patterns",
    "chunks_per_source": 2
  }'
```

### JavaScript-Heavy Pages

```bash
curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": ["https://app.example.com/dashboard"],
    "extract_depth": "advanced",
    "timeout": 60
  }'
```

### Batch Extraction

```bash
curl --request POST \
  --url https://api.tavily.com/extract \
  --header "Authorization: Bearer $TAVILY_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3",
      "https://example.com/page4",
      "https://example.com/page5"
    ],
    "extract_depth": "basic"
  }'
```

## Tips

- **Max 20 URLs per request** - batch larger lists
- **Use `query` + `chunks_per_source`** to get only relevant content
- **Try `basic` first**, fall back to `advanced` if content is missing
- **Set longer `timeout`** for slow pages (up to 60s)
- **Check `failed_results`** for URLs that couldn't be extracted

Related Skills

xiaohongshu-extract

from Demerzels-lab/elsamultiskillagent

Extract metadata from Xiaohongshu (XHS) share or discovery URLs by parsing window.__INITIAL_STATE__ and returning.

hn-extract

from Demerzels-lab/elsamultiskillagent

Extract a HackerNews post (article + comments) into single clean Markdown for quick reading or LLM input.

gh-extract

from Demerzels-lab/elsamultiskillagent

Extract content from a GitHub url.

social-media-extractor

from Demerzels-lab/elsamultiskillagent

This skill enables Claude to extract public data from **Instagram**, **TikTok**, and **Reddit**.

google-maps-b2b-extractor

from Demerzels-lab/elsamultiskillagent

EXTRACT UNLIMITED LEADS (Emails, Phones, Websites) from Google Maps.

wechat-article-extractor-skill

from Demerzels-lab/elsamultiskillagent

Extract metadata and content from WeChat Official Account articles.

solo-you2idea-extract

from Demerzels-lab/elsamultiskillagent

Extract startup ideas from YouTube videos via solograph MCP — index, search, and analyze video transcripts.

x-extract

from Demerzels-lab/elsamultiskillagent

Extract tweet content from x.com URLs without credentials using browser automation.

brw-voice-extractor

from Demerzels-lab/elsamultiskillagent

Extract and document someone's authentic writing voice from samples.

brw-brand-voice-extractor

from Demerzels-lab/elsamultiskillagent

Extract or build a distinct brand voice profile that AI agents can use to produce on-brand content every time.

expanso-keyword-extract

from Demerzels-lab/elsamultiskillagent

Extract keywords and key phrases from text for SEO, tagging, and indexing".

cut-your-tokens-97percent-savings-on-session-transcripts-via-observation-extraction

from Demerzels-lab/elsamultiskillagent

Claw Compactor v6.0 — 50%+ savings through rule-based compression, dictionary encoding, session observation.