web-content-fetcher
Extracts clean Markdown content from any given URL, intelligently prioritizing a robust Scrapling script with stealth fallback, or using Jina Reader as an alternative.
About this skill
The `web-content-fetcher` is a versatile AI agent skill designed to extract and clean article content from virtually any web page, presenting it as well-structured Markdown. It intelligently employs a primary Scrapling script that automatically switches between fast and stealth modes (utilizing a headless browser for complex or anti-scraping sites), ensuring reliable content retrieval. For simpler pages or as a fallback, it leverages Jina Reader. This skill is invaluable for agents needing to process web content for various tasks such as summarization, data extraction, analysis, or simply presenting web articles in a readable, standardized format. It meticulously preserves essential elements like headings, links, images, lists, and code blocks, making the extracted Markdown highly functional and human-readable. Developers and users of AI agents will find this skill exceptionally useful for automating content intake from diverse sources, including news articles, blog posts, documentation, and even challenging platforms like WeChat articles (微信公众号), enabling agents to interact with the web's rich information landscape more effectively.
Best use case
The primary use case is providing AI agents with the ability to reliably fetch and process web content from any URL into a clean, structured Markdown format. This benefits content analysis, summarization, research, and data extraction workflows, particularly for agents that need to consume web pages as input for further processing without dealing with raw HTML or inconsistent formatting.
Extracts clean Markdown content from any given URL, intelligently prioritizing a robust Scrapling script with stealth fallback, or using Jina Reader as an alternative.
A clean, well-formatted Markdown string containing the main textual and structural content (headings, links, images, lists, code blocks) of the provided URL.
Practical example
Example input
Read this article for me: https://www.nytimes.com/2023/10/26/technology/ai-advances.html
Example output
```markdown
# AI Advances Spark New Debates
## Ethical Considerations
Recent breakthroughs in artificial intelligence have intensified discussions around [ethics and societal impact](https://example.com/ethics-report). Experts are urging for regulations to ensure responsible development.
### Key Takeaways
* Rapid pace of innovation.
* Growing concerns about bias.
* Need for global collaboration.
```python
# Example AI code snippet
def hello_ai():
print("Hello from AI!")
```When to use this skill
- When you need to extract the main article content from a URL as clean Markdown.
- When summarizing, analyzing, or processing text from news articles, blog posts, or documentation.
- When traditional fetching methods fail due to JavaScript rendering or anti-scraping measures.
- When you need to handle content from platforms like WeChat articles (微信公众号).
When not to use this skill
- When you only need raw HTML or a full screenshot of a page.
- When you need to interact with dynamic elements on a page (e.g., clicking buttons, filling forms).
- For very high-volume, continuous scraping that exceeds Jina Reader's free tier limits or requires custom proxy rotation.
How web-content-fetcher Compares
| Feature / Agent | web-content-fetcher | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | medium | N/A |
Frequently Asked Questions
What does this skill do?
Extracts clean Markdown content from any given URL, intelligently prioritizing a robust Scrapling script with stealth fallback, or using Jina Reader as an alternative.
How difficult is it to install?
The installation complexity is rated as medium. You can find the installation instructions above.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Best AI Skills for ChatGPT
Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
AI Agent for YouTube Script Writing
Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.
SKILL.md Source
# Web Content Fetcher
Given a URL, return its main content as clean Markdown — headings, links, images, lists, code blocks all preserved.
## Extraction Strategy
Always try **one method per URL** — don't cascade blindly. Pick the right one upfront.
```
URL
│
├─ 1. Scrapling script (preferred)
│ Run fetch.py — check the domain routing table to decide fast vs --stealth.
│ Works for most sites. Returns clean Markdown directly.
│
└─ 2. Jina Reader (fallback — only if Scrapling fails or dependencies not installed)
web_fetch("https://r.jina.ai/<url>")
Free tier: 200 req/day. Fast (~1-2s), good Markdown output.
Does NOT work for: WeChat (403), some Chinese platforms.
```
### Scrapling script
```bash
python3 <SKILL_DIR>/scripts/fetch.py "<url>" [max_chars] [--stealth]
```
`<SKILL_DIR>` is the directory where this SKILL.md lives. Resolve it before calling the script.
The script has two modes built in:
- **Default (fast):** HTTP fetch, ~1-3s, works for most sites
- **`--stealth`:** Headless browser, ~5-15s, for JS-rendered or anti-scraping sites
When run without `--stealth`, the script automatically falls back to stealth if the fast result has too little content. So you rarely need to specify `--stealth` manually — the only reason to force it is when you already know the site needs it (see routing table), which saves the initial fast attempt.
## Domain Routing
Use this table to pick the right mode on the first call:
| Domain | Command | Why |
|--------|---------|-----|
| `mp.weixin.qq.com` | `fetch.py <url> --stealth` | JS-rendered content |
| `zhuanlan.zhihu.com` | `fetch.py <url> --stealth` | Anti-scraping + JS |
| `juejin.cn` | `fetch.py <url> --stealth` | JS-rendered SPA |
| `sspai.com` | `fetch.py <url>` | Static HTML |
| `blog.csdn.net` | `fetch.py <url>` | Static HTML |
| `ruanyifeng.com` | `fetch.py <url>` | Static blog |
| `openai.com` | `fetch.py <url>` | Static HTML |
| `blog.google` | `fetch.py <url>` | Static HTML |
| Everything else | `fetch.py <url>` | Auto-fallback handles it |
## Script Options
```bash
# Basic — auto-selects fast or stealth
python3 <SKILL_DIR>/scripts/fetch.py "https://sspai.com/post/73145"
# Force stealth for known JS-heavy sites
python3 <SKILL_DIR>/scripts/fetch.py "https://mp.weixin.qq.com/s/xxx" --stealth
# Limit output to 15000 characters (default: 30000)
python3 <SKILL_DIR>/scripts/fetch.py "https://example.com/article" 15000
# JSON output with metadata (url, mode, selector, content_length)
python3 <SKILL_DIR>/scripts/fetch.py "https://example.com" --json
```
## Install Dependencies
First use only — the script checks and tells you if anything is missing:
```bash
pip install scrapling html2text
```
If on system-managed Python (macOS/Linux), add `--break-system-packages` or use a venv.
## Failure Rules
- Same URL fails once → give up, tell the user "unable to extract content from this URL"
- Do not retry — each failed call wastes context tokensRelated Skills
content-pipeline
A comprehensive AI agent skill for content production and distribution, managing the entire lifecycle from material collection to multi-platform publishing across social media, podcasts, and video.
writing-content
An interactive content creation skill inspired by the Julian Shapiro framework, featuring research, scoring, and AI-slop detection.
---
name: article-factory-wechat
humanizer
Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.
linkedin-cli
A bird-like LinkedIn CLI for searching profiles, checking messages, and summarizing your feed using session cookies.
小红书长图文发布 Skill
## 概述
openclaw-youtube
YouTube SERP Scout for agents. Search top-ranking videos, channels, and trends for content research and competitor tracking.
openclaw-media-gen
Generate images & videos with AIsa. Gemini 3 Pro Image (image) + Qwen Wan 2.6 (video) via one API key.
Cold Email Writer
Writes personalized cold emails that actually get replies
Presentation Mastery — Complete Slide Design & Delivery System
You are a Presentation Architect. You help build presentations that persuade, inform, and move people to action. You cover the full lifecycle: audience analysis → narrative structure → slide design → delivery coaching → post-presentation follow-up.
ai-humanizer
Rewrites AI-generated content to sound natural, human, and undetectable. Removes robotic patterns, adds voice variety, and preserves meaning.
Employee Handbook Generator
Build a complete, customized employee handbook for your company. Covers policies, benefits, conduct, leave, remote work, DEI, and compliance — ready for legal review.