Chinese NLP Toolkit

Specialized natural language processing for Chinese text. Covers segmentation (jiaba), sentiment analysis, keyword extraction, text summarization, tone detection, readability scoring, and format conversion (simplified/traditional, pinyin annotation). Use when processing, analyzing, or transforming Chinese text content.

3,891 stars

Best use case

Chinese NLP Toolkit is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Specialized natural language processing for Chinese text. Covers segmentation (jiaba), sentiment analysis, keyword extraction, text summarization, tone detection, readability scoring, and format conversion (simplified/traditional, pinyin annotation). Use when processing, analyzing, or transforming Chinese text content.

Teams using Chinese NLP Toolkit should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/chinese-nlp-toolkit/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/371166758-qq/chinese-nlp-toolkit/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/chinese-nlp-toolkit/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How Chinese NLP Toolkit Compares

Feature / AgentChinese NLP ToolkitStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Specialized natural language processing for Chinese text. Covers segmentation (jiaba), sentiment analysis, keyword extraction, text summarization, tone detection, readability scoring, and format conversion (simplified/traditional, pinyin annotation). Use when processing, analyzing, or transforming Chinese text content.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Chinese NLP Toolkit

Process and analyze Chinese text with specialized NLP capabilities.

## Core Capabilities

### 1. Text Segmentation (分词)
Chinese has no word boundaries. Segmentation is the foundation of all Chinese NLP.

**Approach**: Use rule-based heuristics when no library is available:
- Dictionary matching (maximum forward/backward matching)
- Context-aware: "南京市长江大桥" → ["南京市", "长江大桥"] not ["南京", "市长", "江大桥"]
- Domain-specific terms should be added as custom dictionary entries

**Common Ambiguities**:
| Text | Wrong Split | Correct Split |
|------|-------------|---------------|
| 雨伞 | 雨/伞 | 雨伞 (compound) |
| 结婚的和尚未结婚的 | 结婚/的/和尚/未/结婚/的 | 结婚/的/和/尚未/结婚/的 |
| 项目部 | 项目/部 | 项目部 (compound) |

### 2. Sentiment Analysis (情感分析)
Beyond positive/negative — Chinese sentiment is nuanced:

**Intensity levels**: 强烈负面 < 偏负面 < 中性 < 偏正面 < 强烈正面

**Chinese-specific signals**:
- Rhetorical questions often indicate negative sentiment: "这也算好?" 
- Sarcasm markers: "呵呵", "厉害了", "也是醉了", "你开心就好"
- Intensifiers: "非常", "特别", "简直了", "超级"
- Diminishers: "还行吧", "马马虎虎", "凑合"

**Emoji contribution** (critical for social media):
- 😊👍❤️ = positive amplification
- 😤👎💔 = negative amplification
- 🙄🙄🙄 = sarcasm/disdain (intensity scales with repetition)

### 3. Keyword Extraction (关键词提取)
For Chinese text, prioritize:
- Noun phrases (名词短语)
- Domain-specific terminology
- Named entities (人名、地名、机构名)

**Method**: TF-IDF adapted for Chinese + positional weighting (first/last sentences carry more weight in Chinese writing).

### 4. Text Summarization (文本摘要)
**Chinese-specific rules**:
- Summarize to 20-30% of original length
- Preserve key numbers, names, and claims
- Chinese articles often "bury the lead" — the conclusion may be more important than the introduction
- Extract key sentences using positional + keyword scoring

### 5. Readability Scoring (可读性评分)
Rate Chinese text on a 1-10 scale considering:
- Average sentence length (characters per sentence)
- Vocabulary difficulty (HSK level estimate)
- Clause density ( commas per sentence)
- Use of classical Chinese elements
- Technical jargon density

| Score | Level | Target Audience |
|-------|-------|-----------------|
| 1-3 | Easy | General public |
| 4-6 | Moderate | Educated readers |
| 7-8 | Hard | Domain experts |
| 9-10 | Very Hard | Academic specialists |

### 6. Format Conversion

| Conversion | Example |
|---|---|
| Simplified → Traditional | 体验 → 體驗 |
| Traditional → Simplified | 體驗 → 体验 |
| Chinese → Pinyin | 你好 → nǐ hǎo |
| Chinese → Zhuyin | 你好 → ㄋㄧˇ ㄏㄠˇ |

## Workflow

### When Processing Chinese Text:
1. **Detect variant**: Simplified (简体) or Traditional (繁体)?
2. **Segment**: Break into meaningful units
3. **Analyze**: Apply the requested analysis type(s)
4. **Report**: Present results with Chinese annotations

### Output Format
```
原文:[original text]
分词:[segmented text with / separators]
关键词:[top 5-10 keywords with relevance scores]
情感:[sentiment label + confidence + key signals]
摘要:[summarized text]
可读性:[score/10 + brief explanation]
```

## Edge Cases

- **Mixed-language text**: Handle code-switching naturally ("这个bug太坑了") — don't force Chinese segmentation on English words
- **Internet slang**: Recognize common abbreviations (yyds, xswl, nbcs, awsl) and expand for formal analysis
- **Poetry/classical Chinese**: Flag as special case — modern NLP rules don't apply; use classical grammar patterns
- **Dialectal text**: Flag non-Mandarin text (Cantonese, Shanghainese written forms) — analysis may be unreliable
- **Zero-width characters**: Chinese text sometimes contains invisible characters (U+200B, U+FEFF) that affect processing

## Common Tasks & Prompts

- "Analyze the sentiment of this Chinese review"
- "Extract keywords from this article"
- "Summarize this Chinese news article in 100 characters"
- "Rate the readability of this document"
- "Convert this to Traditional Chinese with pinyin annotation"
- "Segment this Chinese text and identify named entities"

Related Skills

AI Coding Toolkit — Master Every AI Coding Assistant

3891
from openclaw/skills

> The complete methodology for 10X productivity with AI-assisted development. Covers Cursor, Windsurf, Cline, Aider, Claude Code, GitHub Copilot, and more — tool-agnostic principles that work everywhere.

Chinese Medicine

3891
from openclaw/skills

## 1. Identity & Purpose

Wellness & Health

中医智慧 (Chinese Medicine CN)

3891
from openclaw/skills

## 1. 身份定位与宗旨

Health & Wellness

thai-chinese-gov-efficiency

3891
from openclaw/skills

Academic research agent for comparative analysis of governance structures in Thai and Chinese public business schools and their impact on educational efficiency. Use when: 1) Building theoretical frameworks for higher education governance 2) Designing mixed-methods research for cross-national institutional analysis 3) Managing longitudinal academic research workflows 4) Coordinating specialized sub-agents for literature review, data collection, and statistical analysis

Data & Research

humanize-chinese

3891
from openclaw/skills

Detect and humanize AI-generated Chinese text. 20+ detection categories, weighted 0-100 scoring with sentence-level analysis, 7 style transforms (casual/zhihu/xiaohongshu/wechat/academic/literary/weibo), sentence restructuring, context-aware replacement. Pure Python, no dependencies. v2.0.0

Content & Documentation

word-chinese-automation

3891
from openclaw/skills

中文 Word 文档自动化校对工具。当用户需要对中文文本或 Word 文档进行标点符号检查、语法检查、错别字检查时使用此 skill。触发词:语法检查、错别字、标点符号、校对、检查文档。

Feishu SuperToolkit

3891
from openclaw/skills

飞书超级工具包 - 集成文件发送(含音频卡片)、日历、审批、多维表格、通讯录、考勤六大模块

chinese-holidays

3891
from openclaw/skills

Query Chinese statutory holidays, check if a date is a working day/holiday, and get holiday schedules. Use when the user asks about Chinese holidays, working days, statutory holidays, or needs to know if a specific date is a holiday in China. Supports queries like "Is tomorrow a holiday?", "When is Spring Festival?", "Is May 1st a working day?".

🔮 Divination — Oracle Toolkit for AI Agents

3891
from openclaw/skills

*"At every crossroads lies a message. Chance is the messenger. You are the reader."*

prompt-engineer-toolkit

3891
from openclaw/skills

Analyzes and rewrites prompts for better AI output, creates reusable prompt templates for marketing use cases (ad copy, email campaigns, social media), and structures end-to-end AI content workflows. Use when the user wants to improve prompts for AI-assisted marketing, build prompt templates, or optimize AI content workflows. Also use when the user mentions 'prompt engineering,' 'improve my prompts,' 'AI writing quality,' 'prompt templates,' or 'AI content workflow.'

product-manager-toolkit

3891
from openclaw/skills

Comprehensive toolkit for product managers including RICE prioritization, customer interview analysis, PRD templates, discovery frameworks, and go-to-market strategies. Use for feature prioritization, user research synthesis, requirement documentation, and product strategy development.

openrouter-toolkit

3891
from openclaw/skills

The definitive OpenRouter skill — intelligent model routing by task type, cost tracking with budget alerts, automatic fallback chains, side-by-side model comparison, and savings recommendations. Use for optimizing AI model selection, controlling costs, and building resilient LLM pipelines.