extract
Extract content from specific URLs using Tavily's extraction API. Returns clean markdown/text from web pages. Use when you have specific URLs and need their content without writing code.
Best use case
extract is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Extract content from specific URLs using Tavily's extraction API. Returns clean markdown/text from web pages. Use when you have specific URLs and need their content without writing code.
Teams using extract should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/extract/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How extract Compares
| Feature / Agent | extract | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Extract content from specific URLs using Tavily's extraction API. Returns clean markdown/text from web pages. Use when you have specific URLs and need their content without writing code.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Extract Skill
Extract clean content from specific URLs. Ideal when you know which pages you want content from.
## Prerequisites
**Tavily API Key Required** - Get your key at https://tavily.com
Add to `~/.claude/settings.json`:
```json
{
"env": {
"TAVILY_API_KEY": "tvly-your-api-key-here"
}
}
```
## Quick Start
### Using the Script
```bash
./scripts/extract.sh '<json>'
```
**Examples:**
```bash
# Single URL
./scripts/extract.sh '{"urls": ["https://example.com/article"]}'
# Multiple URLs
./scripts/extract.sh '{"urls": ["https://example.com/page1", "https://example.com/page2"]}'
# With query focus and chunks
./scripts/extract.sh '{"urls": ["https://example.com/docs"], "query": "authentication API", "chunks_per_source": 3}'
# Advanced extraction for JS pages
./scripts/extract.sh '{"urls": ["https://app.example.com"], "extract_depth": "advanced", "timeout": 60}'
```
### Basic Extraction
```bash
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://example.com/article"]
}'
```
### Multiple URLs with Query Focus
```bash
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/ml-healthcare",
"https://example.com/ai-diagnostics"
],
"query": "AI diagnostic tools accuracy",
"chunks_per_source": 3
}'
```
## API Reference
### Endpoint
```
POST https://api.tavily.com/extract
```
### Headers
| Header | Value |
|--------|-------|
| `Authorization` | `Bearer <TAVILY_API_KEY>` |
| `Content-Type` | `application/json` |
### Request Body
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `urls` | array | Required | URLs to extract (max 20) |
| `query` | string | null | Reranks chunks by relevance |
| `chunks_per_source` | integer | 3 | Chunks per URL (1-5, requires query) |
| `extract_depth` | string | `"basic"` | `basic` or `advanced` (for JS pages) |
| `format` | string | `"markdown"` | `markdown` or `text` |
| `include_images` | boolean | false | Include image URLs |
| `timeout` | float | varies | Max wait (1-60 seconds) |
### Response Format
```json
{
"results": [
{
"url": "https://example.com/article",
"raw_content": "# Article Title\n\nContent..."
}
],
"failed_results": [],
"response_time": 2.3
}
```
## Extract Depth
| Depth | When to Use |
|-------|-------------|
| `basic` | Simple text extraction, faster |
| `advanced` | Dynamic/JS-rendered pages, tables, structured data |
## Examples
### Single URL Extraction
```bash
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://docs.python.org/3/tutorial/classes.html"],
"extract_depth": "basic"
}'
```
### Targeted Extraction with Query
```bash
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/react-hooks",
"https://example.com/react-state"
],
"query": "useState and useEffect patterns",
"chunks_per_source": 2
}'
```
### JavaScript-Heavy Pages
```bash
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://app.example.com/dashboard"],
"extract_depth": "advanced",
"timeout": 60
}'
```
### Batch Extraction
```bash
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3",
"https://example.com/page4",
"https://example.com/page5"
],
"extract_depth": "basic"
}'
```
## Tips
- **Max 20 URLs per request** - batch larger lists
- **Use `query` + `chunks_per_source`** to get only relevant content
- **Try `basic` first**, fall back to `advanced` if content is missing
- **Set longer `timeout`** for slow pages (up to 60s)
- **Check `failed_results`** for URLs that couldn't be extractedRelated Skills
xiaohongshu-extract
Extract metadata from Xiaohongshu (XHS) share or discovery URLs by parsing window.__INITIAL_STATE__ and returning.
hn-extract
Extract a HackerNews post (article + comments) into single clean Markdown for quick reading or LLM input.
gh-extract
Extract content from a GitHub url.
social-media-extractor
This skill enables Claude to extract public data from **Instagram**, **TikTok**, and **Reddit**.
google-maps-b2b-extractor
EXTRACT UNLIMITED LEADS (Emails, Phones, Websites) from Google Maps.
wechat-article-extractor-skill
Extract metadata and content from WeChat Official Account articles.
solo-you2idea-extract
Extract startup ideas from YouTube videos via solograph MCP — index, search, and analyze video transcripts.
x-extract
Extract tweet content from x.com URLs without credentials using browser automation.
brw-voice-extractor
Extract and document someone's authentic writing voice from samples.
brw-brand-voice-extractor
Extract or build a distinct brand voice profile that AI agents can use to produce on-brand content every time.
expanso-keyword-extract
Extract keywords and key phrases from text for SEO, tagging, and indexing".
cut-your-tokens-97percent-savings-on-session-transcripts-via-observation-extraction
Claw Compactor v6.0 — 50%+ savings through rule-based compression, dictionary encoding, session observation.