wechat-article-extractor

Extract metadata and content from WeChat Official Account articles. Use when user needs to parse WeChat article URLs (mp.weixin.qq.com), extract article info (title, author, content, publish time, cover image), or convert WeChat articles to structured data. Supports various article types including posts, videos, images, voice messages, and reposts.

3,891 stars

byopenclaw

View on GitHub Installation ↓

Best use case

wechat-article-extractor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using wechat-article-extractor should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/content-system-wechat-article-extractor-skill/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/abigale-cyber/content-system-wechat-article-extractor-skill/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/content-system-wechat-article-extractor-skill/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How wechat-article-extractor Compares

Feature / Agent	wechat-article-extractor	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agent for YouTube Script Writing

Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# WeChat Article Extractor

Extract metadata and content from WeChat Official Account (微信公众号) articles.

## Capabilities

- Parse WeChat article URLs (`mp.weixin.qq.com`)
- Extract article metadata: title, author, description, publish time
- Extract account info: name, avatar, alias, description
- Get article content (HTML)
- Get cover image URL
- Support multiple article types: post, video, image, voice, text, repost
- Handle various error cases: deleted content, expired links, access limits

## Usage

### Basic Extraction from URL

```javascript
const { extract } = require('./scripts/extract.js');

const result = await extract('https://mp.weixin.qq.com/s?__biz=...');
// Returns: { done: true, code: 0, data: {...} }
```

### Extraction from HTML

```javascript
const html = await fetch(url).then(r => r.text());
const result = await extract(html, { url: sourceUrl });
```

### Options

```javascript
const result = await extract(url, {
  shouldReturnContent: true,      // Return HTML content (default: true)
  shouldReturnRawMeta: false,     // Return raw metadata (default: false)
  shouldFollowTransferLink: true, // Follow migrated account links (default: true)
  shouldExtractMpLinks: false,    // Extract embedded mp.weixin links (default: false)
  shouldExtractTags: false,       // Extract article tags (default: false)
  shouldExtractRepostMeta: false  // Extract repost source info (default: false)
});
```

## Response Format

### Success Response

```javascript
{
  done: true,
  code: 0,
  data: {
    // Account info
    account_name: "公众号名称",
    account_alias: "微信号",
    account_avatar: "头像URL",
    account_description: "功能介绍",
    account_id: "原始ID",
    account_biz: "biz参数",
    account_biz_number: 1234567890,
    account_qr_code: "二维码URL",

    // Article info
    msg_title: "文章标题",
    msg_desc: "文章摘要",
    msg_content: "HTML内容",
    msg_cover: "封面图URL",
    msg_author: "作者",
    msg_type: "post", // post|video|image|voice|text|repost
    msg_has_copyright: true,
    msg_publish_time: Date,
    msg_publish_time_str: "2024/01/15 10:30:00",

    // Link params
    msg_link: "文章链接",
    msg_source_url: "阅读原文链接",
    msg_sn: "sn参数",
    msg_mid: 1234567890,
    msg_idx: 1
  }
}
```

### Error Response

```javascript
{
  done: false,
  code: 1001,
  msg: "无法获取文章信息"
}
```

## Error Codes

| Code | Message | Description |
|------|---------|-------------|
| 1000 | 文章获取失败 | General failure |
| 1001 | 无法获取文章信息 | Missing title or publish time |
| 1002 | 请求失败 | HTTP request failed |
| 1003 | 响应为空 | Empty response |
| 1004 | 访问过于频繁 | Rate limited |
| 1005 | 脚本解析失败 | Script parsing error |
| 1006 | 公众号已迁移 | Account migrated |
| 2001 | 请提供文章内容或链接 | Missing input |
| 2002 | 链接已过期 | Link expired |
| 2003 | 内容涉嫌侵权 | Content removed (copyright) |
| 2004 | 无法获取迁移后的链接 | Migration link failed |
| 2005 | 内容已被发布者删除 | Content deleted by author |
| 2006 | 内容因违规无法查看 | Content blocked |
| 2007 | 内容发送失败 | Failed to send |
| 2008 | 系统出错 | System error |
| 2009 | 不支持的链接 | Unsupported URL |
| 2010 | 内容获取失败 | Content fetch failed |
| 2011 | 涉嫌过度营销 | Marketing/spam content |
| 2012 | 账号已被屏蔽 | Account blocked |
| 2013 | 账号已自主注销 | Account deleted |
| 2014 | 内容被投诉 | Content reported |
| 2015 | 账号处于迁移流程中 | Account migrating |
| 2016 | 冒名侵权 | Impersonation |

## Dependencies

Required npm packages:
- `cheerio` - HTML parsing
- `dayjs` - Date formatting
- `request-promise` - HTTP requests
- `qs` - Query string parsing
- `lodash.unescape` - HTML entities

## Notes

- Handles various WeChat page structures and anti-scraping measures
- Automatically detects article type from page content
- Supports extracting from Sogou WeChat search results (`weixin.sogou.com`)
- Some fields may be null depending on article type and page structure

Related Skills

name: welight-wechat-layout-publish

3891

from openclaw/skills

description: Welight standalone skill for turning an article into WeChat Official Accounts compatible Markdown/HTML, presenting built-in theme choices, and publishing to WeChat as a draft or formal post when publishing prerequisites are already configured.

Content & Documentation

wechat-publisher

3891

from openclaw/skills

一键发布 Markdown 到微信公众号草稿箱。基于 wenyan-cli，支持多主题、代码高亮、图片自动上传。

Content & Documentation

wechat-report

3891

from openclaw/skills

Generate a structured comparison report for multiple WeChat Official Account articles under one topic. Use this when the user wants several公众号文章 collected into one local report with article metadata, engagement status, content structure tables,爆款写法标签, and a later optional Feishu sync step.

wechat-collect

3891

from openclaw/skills

Fetch a public WeChat article URL, archive the raw HTML, and convert the article into a stage-1 compatible brief in `content-production/inbox/`. Use when Codex needs to collect公众号文章素材 or start the Stage 2 collect-to-create pipeline from a public `mp.weixin.qq.com` URL.

wechat-studio

3891

from openclaw/skills

Launch a local WeChat article workbench for Markdown import, WeChat HTML preview, theme tuning, image selection, and optional draft push. Use when Codex needs a browser-based preview and manual QA layer before publishing.

wechat-formatter

3891

from openclaw/skills

Render article markdown into WeChat-style HTML as an independent executor. Use when Codex needs公众号排版预览, WeChat HTML output, or a publishable HTML artifact generated from an article markdown draft.

wechat-monitor

3891

from openclaw/skills

微信公众号调研 + 监控 + 报告推送。每个产品独立目录，互不影响。

wechat-auto-publishing-complete

3891

from openclaw/skills

Use this skill to fully reproduce and operate a local end-to-end WeChat Official Account publishing workflow: prepare the environment, validate dependencies, configure non-sensitive placeholders for credentials, gather source material, draft articles, prepare cover and body images, assemble a WeChat-ready Markdown package, publish to draft, optionally submit for formal publication, poll status, archive outputs, and attach scheduling or alerting. Use whenever the user wants a complete reproducible公众号自动发文 skill with environment setup, templates, runbooks, and execution scaffolding, while keeping all secrets and personal account details outside the skill package. Key real-world findings: freepublish does not always behave like manual platform publishing for homepage visibility, production mode should often default to draft-only, image files must be validated by real format rather than extension alone, and multi-account deployments should use isolated directories.

image-text-extractor

3891

from openclaw/skills

批量识别图片中的文字内容并按图片分段输出为结构化文档；当用户需要从多张图片中提取文字、整理图片文字内容、将图片文字转为可编辑文档时使用

aws-wechat-article-review

3891

from openclaw/skills

审核公众号文章，检查敏感词、错别字、合规性和写作规范，输出修改清单。当用户提到「审稿」「审核」「检查一下」「校对」「合规」「敏感词」「错别字」「帮我看看」「写完了」「检查下有没有问题」「能不能发」时使用。

aws-wechat-article-publish

3891

from openclaw/skills

将文章发布到微信公众号（API 写入草稿箱或提交发布），含发布前检查。当用户提到「发布」「提交」「群发」「推送」「发出去」「上传到公众号」「发到公众号」「可以发了吗」「发布前检查」时使用。

aws-wechat-article-main

3891

from openclaw/skills

管理微信公众号从选题到发布的完整内容的固定流程，路由到各子能力。任何新任务执行时必须严格按这个流程顺序推进，选题 → 写稿 → 审稿(内容审) → 排版 → 配图 → 审稿(终审) →发布，且每一步完成是进入下一步的前提。当用户提到「公众号运营」「自动运营」「发篇文章」「内容规划」「怎么运营」「一条龙」「完整流程」「从头做」「帮我发一篇」「今天发什么」或需要了解整体流程时使用。