clean-content-fetch

获取干净、可读的网页正文内容,适合现代网页、博客、新闻、公告和微信公众号文章抓取;支持网页正文提取、内容清洗、去噪、Markdown 输出,适用于普通 fetch 效果不佳、页面噪音较多或动态渲染干扰的场景。Clean content fetch for modern web pages, article extraction, WeChat article capture, content cleanup, noise reduction, and markdown output when ordinary fetch is not clean enough.

1,864 stars

Best use case

clean-content-fetch is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

获取干净、可读的网页正文内容,适合现代网页、博客、新闻、公告和微信公众号文章抓取;支持网页正文提取、内容清洗、去噪、Markdown 输出,适用于普通 fetch 效果不佳、页面噪音较多或动态渲染干扰的场景。Clean content fetch for modern web pages, article extraction, WeChat article capture, content cleanup, noise reduction, and markdown output when ordinary fetch is not clean enough.

Teams using clean-content-fetch should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/clean-content-fetch/SKILL.md --create-dirs "https://raw.githubusercontent.com/LeoYeAI/openclaw-master-skills/main/skills/clean-content-fetch/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/clean-content-fetch/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How clean-content-fetch Compares

Feature / Agentclean-content-fetchStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

获取干净、可读的网页正文内容,适合现代网页、博客、新闻、公告和微信公众号文章抓取;支持网页正文提取、内容清洗、去噪、Markdown 输出,适用于普通 fetch 效果不佳、页面噪音较多或动态渲染干扰的场景。Clean content fetch for modern web pages, article extraction, WeChat article capture, content cleanup, noise reduction, and markdown output when ordinary fetch is not clean enough.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Scrapling Web Fetch

当用户要获取网页内容、正文提取、把网页转成 markdown/text、抓取文章主体时,优先使用此技能。

## 默认流程
1. 使用 `python3 scripts/scrapling_fetch.py <url> <max_chars>`
2. 默认正文选择器优先级:
   - `article`
   - `main`
   - `.post-content`
   - `[class*="body"]`
3. 命中正文后,使用 `html2text` 转 Markdown
4. 若都未命中,回退到 `body`
5. 最终按 `max_chars` 截断输出

## 用法
```bash
python3 /Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/scripts/scrapling_fetch.py <url> 30000
```

## 依赖
优先检查:
- `scrapling`
- `html2text`
- `curl_cffi`
- `playwright`
- `browserforge`

推荐使用独立虚拟环境,避免系统 Python 的 PEP 668 限制:
```bash
python3 -m venv /Users/zzd/.openclaw/workspace/.venvs/clean-content-fetch
/Users/zzd/.openclaw/workspace/.venvs/clean-content-fetch/bin/pip install scrapling html2text curl_cffi playwright browserforge
/Users/zzd/.openclaw/workspace/.venvs/clean-content-fetch/bin/python -m playwright install chromium
```

如直接运行脚本,优先使用该虚拟环境中的 Python:
```bash
/Users/zzd/.openclaw/workspace/.venvs/clean-content-fetch/bin/python /Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/scripts/scrapling_fetch.py <url> 30000
```

## 输出约定
脚本默认输出 Markdown 正文内容。
如需结构化输出,可追加 `--json`。
如需调试提取命中了哪个 selector,可查看 stderr 输出。

## 附加资源
- 用法参考:`/Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/references/usage.md`
- 选择器策略:`/Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/references/selectors.md`
- 统一入口:`/Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/scripts/fetch-web-content`

## 何时用这个技能
- 获取文章正文
- 抓博客/新闻/公告正文
- 将网页转成 Markdown 供后续总结
- 常规 fetch 效果差,希望提升现代网页抓取稳定性
- 抓小红书分享短链或笔记落地页正文

## 小红书抓取方法
对于 `xhslink.com` 短链或小红书笔记页,推荐直接使用虚拟环境中的脚本运行:
```bash
/Users/zzd/.openclaw/workspace/.venvs/clean-content-fetch/bin/python /Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/scripts/scrapling_fetch.py 'http://xhslink.com/o/9745hugimlD' 30000
```

说明:
- 脚本会先解析短链并抓取落地页正文
- 适合提取小红书笔记文案、标题和主体内容
- 若页面需要更复杂交互,再切到浏览器自动化

## 何时不用
- 需要完整浏览器交互、点击、登录、翻页时:改用浏览器自动化
- 只是简单获取 API JSON:直接请求 API 更合适

Related Skills

social-content

1864
from LeoYeAI/openclaw-master-skills

When the user wants help creating, scheduling, or optimizing social media content for LinkedIn, Twitter/X, Instagram, TikTok, Facebook, or other platforms. Also use when the user mentions 'LinkedIn post,' 'Twitter thread,' 'social media,' 'content calendar,' 'social scheduling,' 'engagement,' or 'viral content.' This skill covers content creation, repurposing, and platform-specific strategies.

native-data-fetching

1864
from LeoYeAI/openclaw-master-skills

Use when implementing or debugging ANY network request, API call, or data fetching. Covers fetch API, React Query, SWR, error handling, caching, offline support, and Expo Router data loaders (useLoaderData).

content-strategy

1864
from LeoYeAI/openclaw-master-skills

When the user wants to plan a content strategy, decide what content to create, or figure out what topics to cover. Also use when the user mentions "content strategy," "what should I write about," "content ideas," "blog strategy," "topic clusters," or "content planning." For writing individual pieces, see copywriting. For SEO-specific audits, see seo-audit.

content-production

1864
from LeoYeAI/openclaw-master-skills

Full content production pipeline — takes a topic from blank page to published-ready piece. Use when you need to execute content: write a blog post, article, or guide end-to-end. Triggers: 'write a post about', 'draft an article', 'create content for', 'help me write', 'I need a blog post'. NOT for content strategy or calendar planning (use content-strategy). NOT for repurposing existing content (use content-repurposing). NOT for social captions only.

content-humanizer

1864
from LeoYeAI/openclaw-master-skills

Makes AI-generated content sound genuinely human — not just cleaned up, but alive. Use when content feels robotic, uses too many AI clichés, lacks personality, or reads like it was written by committee. Triggers: 'this sounds like AI', 'make it more human', 'add personality', 'it feels generic', 'sounds robotic', 'fix AI writing', 'inject our voice'. NOT for initial content creation (use content-production). NOT for SEO optimization (use content-production Mode 3).

content-creator

1864
from LeoYeAI/openclaw-master-skills

Deprecated redirect skill that routes legacy 'content creator' requests to the correct specialist. Use when a user invokes 'content creator', asks to write a blog post, article, guide, or brand voice analysis (routes to content-production), or asks to plan content, build a topic cluster, or create a content calendar (routes to content-strategy). Does not handle requests directly — identifies user intent and redirects to content-production for writing/SEO/brand-voice tasks or content-strategy for planning tasks.

citedy-content-writer

1864
from LeoYeAI/openclaw-master-skills

From topic to published blog post in one conversation — generate SEO- and GEO-optimized articles with AI illustrations and voice-over in 55 languages, create social media adaptations for 9 platforms, set up automated content sessions, and manage product knowledge base. End-to-end blog autopilot. Powered by Citedy.

citedy-content-ingestion

1864
from LeoYeAI/openclaw-master-skills

Turn any URL into structured content — YouTube videos (via Gemini Video API), web articles, PDFs, and audio files. Extract transcripts, summaries, and metadata for use in any LLM pipeline. Powered by Citedy.

youtube-watcher

1864
from LeoYeAI/openclaw-master-skills

Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.

youtube-transcript

1864
from LeoYeAI/openclaw-master-skills

Fetch and summarize YouTube video transcripts. Use when asked to summarize, transcribe, or extract content from YouTube videos. Handles transcript fetching via residential IP proxy to bypass YouTube's cloud IP blocks.

youtube-auto-captions - YouTube 自动字幕

1864
from LeoYeAI/openclaw-master-skills

## 描述

youtube

1864
from LeoYeAI/openclaw-master-skills

YouTube Data API integration with managed OAuth. Search videos, manage playlists, access channel data, and interact with comments. Use this skill when users want to interact with YouTube. For other third party apps, use the api-gateway skill (https://clawhub.ai/byungkyu/api-gateway).