web-reader
智能网页阅读器 - 抓取文章/下载视频并归档,支持分析、摘要、衍生。Triggers: '下载这篇文章', '抓取文章', '保存文章', 'fetch URL', '分析这篇文章', '摘要', '总结文章', '下载视频', '抓取微信文章', '抓取飞书文档', '把这个链接保存下来', '下载B站视频', 'download article', 'analyze article', 'summarize'.
Best use case
web-reader is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
智能网页阅读器 - 抓取文章/下载视频并归档,支持分析、摘要、衍生。Triggers: '下载这篇文章', '抓取文章', '保存文章', 'fetch URL', '分析这篇文章', '摘要', '总结文章', '下载视频', '抓取微信文章', '抓取飞书文档', '把这个链接保存下来', '下载B站视频', 'download article', 'analyze article', 'summarize'.
Teams using web-reader should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/web-reader/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How web-reader Compares
| Feature / Agent | web-reader | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
智能网页阅读器 - 抓取文章/下载视频并归档,支持分析、摘要、衍生。Triggers: '下载这篇文章', '抓取文章', '保存文章', 'fetch URL', '分析这篇文章', '摘要', '总结文章', '下载视频', '抓取微信文章', '抓取飞书文档', '把这个链接保存下来', '下载B站视频', 'download article', 'analyze article', 'summarize'.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Marketing
Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.
AI Agents for Startups
Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
SKILL.md Source
# Web Reader
智能网页阅读器:抓取文章(图文)→ 归档到指定目录 → 分析/摘要/衍生。
## 工作流程
### Phase 1: 下载原文(优先执行)
当用户提供 URL 并要求下载/保存/抓取/分析时,**先下载原文**。
**Step 1: 确定归档路径**
检查用户是否配置了归档目录。读取 `~/.claude/web-reader.json`:
```json
{
"archive_dir": "/Volumes/Keybase (m4)/private/biggerbear/文档/微信摘录",
"categories": ["AI漫剧短剧", "AI工具与技术", "OpenClaw", "开发技术", "商业与产品", "飞书文档"]
}
```
如果配置文件不存在,使用 `~/Documents/docs/` 作为默认归档目录。
**Step 2: 抓取文章**
```bash
python3 {SKILL_DIR}/fetcher.py "URL" -o "ARCHIVE_DIR" --category "CATEGORY"
```
对于需要 JavaScript 渲染的页面(返回 "JavaScript enabled" 错误),升级为浏览器模式:
```bash
# 先尝试 scrapling fetch(浏览器)
scrapling extract fetch "URL" /tmp/article.md --network-idle
# 如果仍失败,用 camoufox
python3 {SKILL_DIR}/fetcher.py "URL" -o "ARCHIVE_DIR" --method camoufox --category "CATEGORY"
```
**Step 3: 处理微信图片**
对于微信公众号文章(mp.weixin.qq.com),fetcher.py 自动处理 `data-src` 图片。
如果自动处理失败(图片为空 `![]()` 占位符),手动处理:
1. 用 scrapling 获取 HTML:`scrapling extract get "URL" /tmp/wx.html -s "#js_content"`
2. 从 HTML 中提取 `data-src="https://mmbiz.qpic.cn..."` URL
3. 下载图片到 `ARCHIVE_DIR/CATEGORY/SLUG/` 目录
4. 替换 markdown 中的空占位符为本地路径
**Step 4: 自动分类**
根据文章内容判断分类。参考分类关键词:
- **AI漫剧短剧**: 漫剧, 短剧, 动漫, seedance, AI视频创作
- **AI工具与技术**: AI工具, 大模型, scrapling, agent, API, 技术方案
- **OpenClaw**: openclaw, claude code, agent team
- **开发技术**: flutter, 编程, 开发, 逆向, IDE, 部署
- **商业与产品**: 产品分析, 商业模式, 市场, 付费, 创业
- **飞书文档**: feishu.cn 域名
如果不确定分类,**询问用户**选择或创建新分类。
**Step 5: 确认结果**
输出归档信息:
- 文章标题
- 保存路径
- 图片数量
- 分类
### Phase 2: 后续探索(用户触发)
文章下载完成后,等待用户指令:
| 指令 | 动作 |
|------|------|
| **分析** | 阅读文章全文,提取核心观点、论据、方法论,给出结构化分析 |
| **摘要** | 生成 3-5 句话的精炼摘要,保留关键数据和结论 |
| **总结** | 按章节列出要点,适合快速回顾 |
| **衍生** | 基于文章内容,提出可以进一步探索的方向、相关话题、实践建议 |
| **提炼** | 提取可直接复用的方法、工具、配置、代码片段 |
| **对比** | 与之前下载的其他文章进行对比分析(需指定对比目标) |
| **洗稿** | 基于原文重写,保留核心信息但完全改写表述(配合 /wxpub 使用) |
执行分析时,直接读取归档的 markdown 文件,不需要重新抓取。
## 智能路由
| 平台 | 方法 | 说明 |
|------|------|------|
| mp.weixin.qq.com | scrapling | 提取 `data-src` 图片,处理 SVG 占位符 |
| *.feishu.cn | 虚拟滚动 | 滚动采集内容块,浏览器内下载图片 |
| zhuanlan.zhihu.com | scrapling | `.Post-RichText` 选择器 |
| www.zhihu.com | scrapling | `.RichContent` 选择器 |
| www.toutiao.com | scrapling | 处理 toutiaoimg base64 占位符 |
| www.xiaohongshu.com | camoufox | 反爬需要隐身浏览器 |
| www.weibo.com | camoufox | 反爬需要隐身浏览器 |
| bilibili.com / b23.tv | yt-dlp | 视频下载 |
| youtube.com / youtu.be | yt-dlp | 视频下载 |
| douyin.com | yt-dlp | 视频下载 |
| 其他 URL | scrapling | 通用抓取,三级降级策略 |
## 安装依赖
按需安装:
| 依赖 | 用途 | 安装 |
|------|------|------|
| scrapling | 文章抓取 | `pip install scrapling` |
| yt-dlp | 视频下载 | `pip install yt-dlp` |
| camoufox | 反检测浏览器 | `pip install camoufox && python3 -m camoufox fetch` |
| html2text | HTML 转 Markdown | `pip install html2text` |
## CLI 参考
```
python3 {SKILL_DIR}/fetcher.py [URL] [OPTIONS]
参数:
url 要抓取的 URL
选项:
-o, --output DIR 输出目录(默认: 当前目录)
-q, --quality N 视频质量(默认: 1080)
--method METHOD 强制方法: scrapling, camoufox, ytdlp, feishu
--selector CSS 强制 CSS 选择器
--urls-file FILE URL 列表文件(每行一个,# 注释)
--audio-only 仅提取音频
--no-images 跳过图片下载
--cookies-browser NAME 浏览器 cookies(如 chrome, firefox)
--category NAME 归档子目录名
--json-output JSON 格式输出(程序化调用)
```
## 配置
创建 `~/.claude/web-reader.json` 配置归档目录:
```json
{
"archive_dir": "~/Documents/docs/articles",
"categories": ["技术", "产品", "行业"]
}
```
## 平台特殊处理
### 微信公众号
- 图片使用 `data-src` 属性 + `mmbiz.qpic.cn`
- 可见 `<img>` 是 SVG 懒加载占位符
- 图片下载需要 `Referer: https://mp.weixin.qq.com/` 请求头
### 飞书文档
- 虚拟滚动:滚过的内容会从 DOM 移除
- 注入 JS 采集器在滚动过程中捕获 `[data-block-id]` 内容块
- 图片 401:必须在浏览器上下文内用 `fetch(url, {credentials: 'include'})` 下载
### Bilibili
- 短链 (b23.tv) 自动解析
- 大会员内容用 `--cookies-browser chrome`
## 故障排除
| 问题 | 解决方案 |
|------|----------|
| 页面需要 JavaScript | 用 `--method camoufox` 或先 `scrapling extract fetch` |
| 微信图片为空 | 手动从 HTML 提取 `data-src` URL |
| 飞书返回登录页 | 文档可能需要认证 |
| B站 403 | 用 `--cookies-browser chrome` |
| 内容太短 | 尝试 `--method camoufox` |Related Skills
web-reader-pro
Advanced web content extraction skill for OpenClaw using multi-tier fallback strategy (Jina → Scrapling → WebFetch) with intelligent routing, caching, quality scoring, and domain learning. Use when: reading article content, extracting web page text, scraping dynamic JS-heavy pages, or fetching WeChat official account articles.
wechat-mp-reader
Read WeChat official account articles. Use the built-in browser tool to open the page and extract body text. Always append ?scene=1 to the URL.
rss-ai-reader
📰 RSS AI 阅读器 — 自动抓取订阅、LLM生成摘要、多渠道推送! 支持 Claude/OpenAI 生成中文摘要,推送到飞书/Telegram/Email。 触发条件: 用户要求订阅RSS、监控博客、抓取新闻、生成摘要、设置定时抓取、 "帮我订阅"、"监控这个网站"、"每天推送新闻"、RSS/Atom feed 相关。
DeepReader
The default web content reader for OpenClaw. Reads X (Twitter), Reddit, YouTube, and any webpage into clean Markdown — zero API keys required. Use when you need to ingest social media posts, articles, or video transcripts into agent memory.
feishu-doc-reader
Read and extract content from all Feishu (Lark) document types using the official Feishu Open API
medical-research-literature-reader-pro
A medical-research-native literature reading skill for users with clinical, bioinformatics, translational, and basic experimental backgrounds. Use this skill whenever a user wants to read, analyze, critique, or interpret a medical or scientific paper — whether they provide a PDF, abstract, DOI, PMID, or just a title. Triggers include requests like "analyze this paper", "critique this study", "is this a strong paper?", "give me similar studies", "prepare me for journal club", "help me understand this bioinformatics paper", "what are the weaknesses here?", or "turn this into a mind map". Also activate for any downstream deliverables such as journal club kits, comparison tables, PI decision briefs, replication starters, or follow-up experiment designs. Do NOT treat as a generic summarizer — this skill performs structured evidence-type classification, track-specific critical appraisal, interpretation-boundary judgment, and research-grade follow-up generation.
4chan-reader
Browse 4chan boards and extract thread discussions into structured text files. Use when you need to fetch catalog information or specific thread content (including post text and file metadata) from 4chan boards like /a/, /vg/, /v/, etc.
mockplus-reader
读取和分析 MockPlus 在线设计页面。用于:(1)打开并解析 MockPlus 网页链接,(2)提取页面中的设计信息、结构、组件,(3)分析原型稿内容和交互说明。当用户发送 MockPlus 链接或要求分析原型稿时使用此技能。
WeChat-article-reader
将微信公众号文章导出为 Markdown 格式。当用户提供微信公众号链接 (mp.weixin.qq.com) 或要求下载/导出/保存微信文章时触发。默认保存到工作空间的 source 目录。
Arxiv Paper Reader
利用python,指定某个arxiv_id/url, 基于 LLM Agent 对这篇arxiv论文进行分类与深度阅读,直接print打印阅读笔记
---
name: article-factory-wechat
humanizer
Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.