clean-content-fetch
获取干净、可读的网页正文内容,适合现代网页、博客、新闻、公告和微信公众号文章抓取;支持网页正文提取、内容清洗、去噪、Markdown 输出,适用于普通 fetch 效果不佳、页面噪音较多或动态渲染干扰的场景。Clean content fetch for modern web pages, article extraction, WeChat article capture, content cleanup, noise reduction, and markdown output when ordinary fetch is not clean enough.
Best use case
clean-content-fetch is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
获取干净、可读的网页正文内容,适合现代网页、博客、新闻、公告和微信公众号文章抓取;支持网页正文提取、内容清洗、去噪、Markdown 输出,适用于普通 fetch 效果不佳、页面噪音较多或动态渲染干扰的场景。Clean content fetch for modern web pages, article extraction, WeChat article capture, content cleanup, noise reduction, and markdown output when ordinary fetch is not clean enough.
Teams using clean-content-fetch should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/clean-content-fetch/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How clean-content-fetch Compares
| Feature / Agent | clean-content-fetch | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
获取干净、可读的网页正文内容,适合现代网页、博客、新闻、公告和微信公众号文章抓取;支持网页正文提取、内容清洗、去噪、Markdown 输出,适用于普通 fetch 效果不佳、页面噪音较多或动态渲染干扰的场景。Clean content fetch for modern web pages, article extraction, WeChat article capture, content cleanup, noise reduction, and markdown output when ordinary fetch is not clean enough.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Marketing
Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.
Best AI Agents for Marketing
A curated list of the best AI agents and skills for marketing teams focused on SEO, content systems, outreach, and campaign execution.
Best AI Skills for ChatGPT
Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.
SKILL.md Source
# Scrapling Web Fetch 当用户要获取网页内容、正文提取、把网页转成 markdown/text、抓取文章主体时,优先使用此技能。 ## 默认流程 1. 使用 `python3 scripts/scrapling_fetch.py <url> <max_chars>` 2. 默认正文选择器优先级: - `article` - `main` - `.post-content` - `[class*="body"]` 3. 命中正文后,使用 `html2text` 转 Markdown 4. 若都未命中,回退到 `body` 5. 最终按 `max_chars` 截断输出 ## 用法 ```bash python3 /Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/scripts/scrapling_fetch.py <url> 30000 ``` ## 依赖 优先检查: - `scrapling` - `html2text` - `curl_cffi` - `playwright` - `browserforge` 推荐使用独立虚拟环境,避免系统 Python 的 PEP 668 限制: ```bash python3 -m venv /Users/zzd/.openclaw/workspace/.venvs/clean-content-fetch /Users/zzd/.openclaw/workspace/.venvs/clean-content-fetch/bin/pip install scrapling html2text curl_cffi playwright browserforge /Users/zzd/.openclaw/workspace/.venvs/clean-content-fetch/bin/python -m playwright install chromium ``` 如直接运行脚本,优先使用该虚拟环境中的 Python: ```bash /Users/zzd/.openclaw/workspace/.venvs/clean-content-fetch/bin/python /Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/scripts/scrapling_fetch.py <url> 30000 ``` ## 输出约定 脚本默认输出 Markdown 正文内容。 如需结构化输出,可追加 `--json`。 如需调试提取命中了哪个 selector,可查看 stderr 输出。 ## 附加资源 - 用法参考:`/Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/references/usage.md` - 选择器策略:`/Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/references/selectors.md` - 统一入口:`/Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/scripts/fetch-web-content` ## 何时用这个技能 - 获取文章正文 - 抓博客/新闻/公告正文 - 将网页转成 Markdown 供后续总结 - 常规 fetch 效果差,希望提升现代网页抓取稳定性 - 抓小红书分享短链或笔记落地页正文 ## 小红书抓取方法 对于 `xhslink.com` 短链或小红书笔记页,推荐直接使用虚拟环境中的脚本运行: ```bash /Users/zzd/.openclaw/workspace/.venvs/clean-content-fetch/bin/python /Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/scripts/scrapling_fetch.py 'http://xhslink.com/o/9745hugimlD' 30000 ``` 说明: - 脚本会先解析短链并抓取落地页正文 - 适合提取小红书笔记文案、标题和主体内容 - 若页面需要更复杂交互,再切到浏览器自动化 ## 何时不用 - 需要完整浏览器交互、点击、登录、翻页时:改用浏览器自动化 - 只是简单获取 API JSON:直接请求 API 更合适
Related Skills
social-content
When the user wants help creating, scheduling, or optimizing social media content for LinkedIn, Twitter/X, Instagram, TikTok, Facebook, or other platforms. Also use when the user mentions 'LinkedIn post,' 'Twitter thread,' 'social media,' 'content calendar,' 'social scheduling,' 'engagement,' or 'viral content.' This skill covers content creation, repurposing, and platform-specific strategies.
native-data-fetching
Use when implementing or debugging ANY network request, API call, or data fetching. Covers fetch API, React Query, SWR, error handling, caching, offline support, and Expo Router data loaders (useLoaderData).
content-strategy
When the user wants to plan a content strategy, decide what content to create, or figure out what topics to cover. Also use when the user mentions "content strategy," "what should I write about," "content ideas," "blog strategy," "topic clusters," or "content planning." For writing individual pieces, see copywriting. For SEO-specific audits, see seo-audit.
content-production
Full content production pipeline — takes a topic from blank page to published-ready piece. Use when you need to execute content: write a blog post, article, or guide end-to-end. Triggers: 'write a post about', 'draft an article', 'create content for', 'help me write', 'I need a blog post'. NOT for content strategy or calendar planning (use content-strategy). NOT for repurposing existing content (use content-repurposing). NOT for social captions only.
content-humanizer
Makes AI-generated content sound genuinely human — not just cleaned up, but alive. Use when content feels robotic, uses too many AI clichés, lacks personality, or reads like it was written by committee. Triggers: 'this sounds like AI', 'make it more human', 'add personality', 'it feels generic', 'sounds robotic', 'fix AI writing', 'inject our voice'. NOT for initial content creation (use content-production). NOT for SEO optimization (use content-production Mode 3).
content-creator
Deprecated redirect skill that routes legacy 'content creator' requests to the correct specialist. Use when a user invokes 'content creator', asks to write a blog post, article, guide, or brand voice analysis (routes to content-production), or asks to plan content, build a topic cluster, or create a content calendar (routes to content-strategy). Does not handle requests directly — identifies user intent and redirects to content-production for writing/SEO/brand-voice tasks or content-strategy for planning tasks.
citedy-content-writer
From topic to published blog post in one conversation — generate SEO- and GEO-optimized articles with AI illustrations and voice-over in 55 languages, create social media adaptations for 9 platforms, set up automated content sessions, and manage product knowledge base. End-to-end blog autopilot. Powered by Citedy.
citedy-content-ingestion
Turn any URL into structured content — YouTube videos (via Gemini Video API), web articles, PDFs, and audio files. Extract transcripts, summaries, and metadata for use in any LLM pipeline. Powered by Citedy.
youtube-watcher
Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.
youtube-transcript
Fetch and summarize YouTube video transcripts. Use when asked to summarize, transcribe, or extract content from YouTube videos. Handles transcript fetching via residential IP proxy to bypass YouTube's cloud IP blocks.
youtube-auto-captions - YouTube 自动字幕
## 描述
youtube
YouTube Data API integration with managed OAuth. Search videos, manage playlists, access channel data, and interact with comments. Use this skill when users want to interact with YouTube. For other third party apps, use the api-gateway skill (https://clawhub.ai/byungkyu/api-gateway).