content-parser
Extract and parse content from URLs. Triggers on: user provides a URL to extract content from, another skill needs to parse source material, "parse this URL", "extract content", "解析链接", "提取内容".
About this skill
This skill empowers AI agents to intelligently extract and parse textual content from any given URL. It acts as a crucial preprocessing step, enabling agents to efficiently ingest web content by providing a standardized and structured output. The skill delivers the main content body, relevant metadata (like title, author, publish date), and associated references, making it highly suitable for subsequent analysis, summarization, or content generation tasks. It eliminates the complexities of directly handling web scraping, allowing AI agents to focus on higher-level tasks based on clean, parsed data. The skill is triggered when a user provides a URL for content extraction, or when another skill requires source material parsing from a web link. It supports commands like "parse this URL" or "extract content," including their Chinese equivalents. The core purpose is to normalize content across various platforms into a consistent data format, making web information readily usable by AI agents.
Best use case
The primary use case is to provide AI agents with a robust and reliable method to access, understand, and utilize information from the public web. This benefits developers building agents for research, summarization, content creation, or data analysis, by simplifying the initial data acquisition and preparation phase from online sources.
Extract and parse content from URLs. Triggers on: user provides a URL to extract content from, another skill needs to parse source material, "parse this URL", "extract content", "解析链接", "提取内容".
A structured JSON object containing the extracted content body, metadata (like title, author, publish date), and any found references from the provided URL.
Practical example
Example input
Please extract the content from this article: https://www.theverge.com/2023/10/26/23933010/google-pixel-8-pro-review
Example output
```json
{
"url": "https://www.theverge.com/2023/10/26/23933010/google-pixel-8-pro-review",
"title": "Google Pixel 8 Pro review: smarter than ever",
"author": "Allison Johnson",
""publish_date": "2023-10-26T13:00:00Z",
"body": "The Google Pixel 8 Pro is a phone with a lot going for it... [excerpt of parsed article body]",
"keywords": ["Google", "Pixel", "Smartphone", "Review", "AI"],
"references": []
}
```When to use this skill
- User provides a URL and wants its content extracted or read.
- Another skill requires parsing source material from a URL before generation.
- User explicitly commands to "parse this URL" or "extract content from this link."
- When raw text content, metadata, and references are needed from a web page.
When not to use this skill
- User already possesses the text content and doesn't need URL parsing.
- User's request involves generating audio or video content.
- User wishes to read a local file (standard file reading tools are more appropriate).
- The URL provided is invalid or points to a non-HTTP(S) resource.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/content-parser/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How content-parser Compares
| Feature / Agent | content-parser | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | medium | N/A |
Frequently Asked Questions
What does this skill do?
Extract and parse content from URLs. Triggers on: user provides a URL to extract content from, another skill needs to parse source material, "parse this URL", "extract content", "解析链接", "提取内容".
How difficult is it to install?
The installation complexity is rated as medium. You can find the installation instructions above.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Best AI Skills for ChatGPT
Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.
AI Agent for YouTube Script Writing
Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
SKILL.md Source
## When to Use
- User provides a URL and wants to extract/read its content
- Another skill needs to parse source material from a URL before generation
- User says "parse this URL", "extract content from this link"
- User says "解析链接", "提取内容"
## When NOT to Use
- User already has text content and doesn't need URL parsing
- User wants to generate audio/video content (not content extraction)
- User wants to read a local file (use standard file reading tools)
## Purpose
Extract and normalize content from URLs across supported platforms. Returns structured data including content body, metadata, and references. Useful as a preprocessing step for content generation skills or standalone content extraction.
## Hard Constraints
- No shell scripts. Construct curl commands from the API reference files listed in Resources
- Always read `shared/authentication.md` for API key and headers
- Follow `shared/common-patterns.md` for polling, errors, and interaction patterns
- URL must be a valid HTTP(S) URL
- Always read config following `shared/config-pattern.md` before any interaction
- Never save files to `~/Downloads/` or `.listenhub/` — save to the current working directory
<HARD-GATE>
Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After collecting URL and options, confirm with the user before calling the extraction API.
</HARD-GATE>
## Step -1: API Key Check
Follow `shared/config-pattern.md` § API Key Check. If the key is missing, stop immediately.
## Step 0: Config Setup
Follow `shared/config-pattern.md` Step 0.
**If file doesn't exist** — ask location, then create immediately:
```bash
mkdir -p ".listenhub/content-parser"
echo '{"autoDownload":true}' > ".listenhub/content-parser/config.json"
CONFIG_PATH=".listenhub/content-parser/config.json"
# (or $HOME/.listenhub/content-parser/config.json for global)
```
Then run **Setup Flow** below.
**If file exists** — read config, display summary, and confirm:
```
当前配置 (content-parser):
自动下载:{是 / 否}
```
Ask: "使用已保存的配置?" → **确认,直接继续** / **重新配置**
### Setup Flow (first run or reconfigure)
1. **autoDownload**: "自动保存提取的内容到当前目录?"
- "是(推荐)" → `autoDownload: true`
- "否" → `autoDownload: false`
Save immediately:
```bash
NEW_CONFIG=$(echo "$CONFIG" | jq --argjson dl {true/false} '. + {"autoDownload": $dl}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
```
## Interaction Flow
### Step 1: URL Input
Free text input. Ask the user:
> What URL would you like to extract content from?
### Step 2: Options (optional)
Ask if the user wants to configure extraction options:
```
Question: "Do you want to configure extraction options?"
Options:
- "No, use defaults" — Extract with default settings
- "Yes, configure options" — Set summarize, maxLength, or Twitter tweet count
```
If "Yes", ask follow-up questions:
- **Summarize**: "Generate a summary of the content?" (Yes/No)
- **Max Length**: "Set maximum content length?" (Free text, e.g., "5000")
- **Twitter count** (only if URL is Twitter/X profile): "How many tweets to fetch?" (1-100, default 20)
### Step 3: Confirm & Extract
Summarize:
```
Ready to extract content:
URL: {url}
Options: {summarize: true, maxLength: 5000, twitter.count: 50} / default
Proceed?
```
Wait for explicit confirmation before calling the API.
## Workflow
1. **Validate URL**: Must be HTTP(S). Normalize if needed (see `references/supported-platforms.md`)
2. **Build request body**:
```json
{
"source": {
"type": "url",
"uri": "{url}"
},
"options": {
"summarize": true/false,
"maxLength": 5000,
"twitter": {
"count": 50
}
}
}
```
Omit `options` if user chose defaults.
3. **Submit (foreground)**: `POST /v1/content/extract` → extract `taskId`
4. Tell the user extraction is in progress
5. **Poll (background)**: Run the following **exact** bash command with `run_in_background: true` and `timeout: 300000`. Note: status field is `.data.status` (not `processStatus`), interval is 5s, values are `processing`/`completed`/`failed`:
```bash
TASK_ID="<id-from-step-3>"
for i in $(seq 1 60); do
RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/content/extract/$TASK_ID" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" 2>/dev/null)
STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.status // "processing"')
case "$STATUS" in
completed) echo "$RESULT"; exit 0 ;;
failed) echo "FAILED: $RESULT" >&2; exit 1 ;;
*) sleep 5 ;;
esac
done
echo "TIMEOUT" >&2; exit 2
```
6. When notified, **download and present result**:
If `autoDownload` is `true`:
- Write `{taskId}-extracted.md` to the **current directory** — full extracted content in markdown
- Write `{taskId}-extracted.json` to the **current directory** — full raw API response data
```bash
echo "$CONTENT_MD" > "${TASK_ID}-extracted.md"
echo "$RESULT" > "${TASK_ID}-extracted.json"
```
Present:
```
内容提取完成!
来源:{url}
标题:{metadata.title}
长度:~{character count} 字符
消耗积分:{credits}
已保存到当前目录:
{taskId}-extracted.md
{taskId}-extracted.json
```
7. Show a preview of the extracted content (first ~500 chars)
8. Offer to use content in another skill (e.g. `/podcast`, `/tts`)
**Estimated time**: 10-30 seconds depending on content size and platform.
## API Reference
- Content extract: `shared/api-content-extract.md`
- Supported platforms: `references/supported-platforms.md`
- Polling: `shared/common-patterns.md` § Async Polling
- Error handling: `shared/common-patterns.md` § Error Handling
- Config pattern: `shared/config-pattern.md`
## Example
**User**: "Parse this article: https://en.wikipedia.org/wiki/Topology"
**Agent workflow**:
1. URL: `https://en.wikipedia.org/wiki/Topology`
2. Options: defaults (omit options)
3. Submit extraction
```bash
curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source": {
"type": "url",
"uri": "https://en.wikipedia.org/wiki/Topology"
}
}'
```
4. Poll until complete:
```bash
curl -sS "https://api.marswave.ai/openapi/v1/content/extract/69a7dac700cf95938f86d9bb" \
-H "Authorization: Bearer $LISTENHUB_API_KEY"
```
5. Present extracted content preview and offer next actions.
---
**User**: "Extract recent tweets from @elonmusk, get 50 tweets"
**Agent workflow**:
1. URL: `https://x.com/elonmusk`
2. Options: `{"twitter": {"count": 50}}`
3. Submit extraction
```bash
curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source": {
"type": "url",
"uri": "https://x.com/elonmusk"
},
"options": {
"twitter": {
"count": 50
}
}
}'
```
4. Poll until complete, present results.Related Skills
tavily-search
Use Tavily API for real-time web search and content extraction. Use when: user needs real-time web search results, research, or current information from the web. Requires Tavily API key.
baidu-search
Search the web using Baidu AI Search Engine (BDSE). Use for live information, documentation, or research topics.
notebooklm
Google NotebookLM 非官方 Python API 的 OpenClaw Skill。支持内容生成(播客、视频、幻灯片、测验、思维导图等)、文档管理和研究自动化。当用户需要使用 NotebookLM 生成音频概述、视频、学习材料或管理知识库时触发。
openclaw-search
Intelligent search for agents. Multi-source retrieval with confidence scoring - web, academic, and Tavily in one unified API.
aisa-tavily
AI-optimized web search via AIsa's Tavily API proxy. Returns concise, relevant results for AI agents through AIsa's unified API gateway.
Market Sizing — TAM/SAM/SOM Calculator
Build defensible market sizing for any product, pitch deck, or business case. Top-down and bottom-up methodologies combined.
Data Analyst — AfrexAI ⚡📊
**Transform raw data into decisions. Not just charts — answers.**
Competitor Monitor
Tracks and analyzes competitor moves — pricing changes, feature launches, hiring, and positioning shifts
afrexai-competitive-intel
Complete competitive intelligence system — market mapping, product teardowns, pricing intel, win/loss analysis, battlecards, and strategic monitoring. Goes far beyond SEO to cover the full business landscape.
trending-news-aggregator
智能热点新闻聚合器 - 自动抓取多平台热点新闻, AI分析趋势,支持定时推送和热度评分。 核心功能: - 每天自动聚合多平台热点(微博、知乎、百度等) - 智能分类(科技、财经、社会、国际等) - 热度评分算法 - 增量检测(标记新增热点) - AI趋势分析
search-cluster
Aggregated search aggregator using Google CSE, GNews RSS, Wikipedia, Reddit, and Scrapling.
data-analysis-partner
智能数据分析 Skill,输入 CSV/Excel 文件和分析需求,输出带交互式 ECharts 图表的 HTML 自包含分析报告