cjk-viz

CJK (中日韩) 字体检测与 matplotlib 配置。任何涉及中文标签、标题、图例的 可视化任务启动前必须先执行本 skill 的字体检测流程,确保不会出现方块乱码。 适用于 matplotlib / seaborn / plotly 静态导出等场景。

42 stars

Best use case

cjk-viz is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

CJK (中日韩) 字体检测与 matplotlib 配置。任何涉及中文标签、标题、图例的 可视化任务启动前必须先执行本 skill 的字体检测流程,确保不会出现方块乱码。 适用于 matplotlib / seaborn / plotly 静态导出等场景。

Teams using cjk-viz should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/medge-cjk-viz/SKILL.md --create-dirs "https://raw.githubusercontent.com/Zaoqu-Liu/ScienceClaw/main/skills/medge-cjk-viz/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/medge-cjk-viz/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How cjk-viz Compares

Feature / Agentcjk-vizStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

CJK (中日韩) 字体检测与 matplotlib 配置。任何涉及中文标签、标题、图例的 可视化任务启动前必须先执行本 skill 的字体检测流程,确保不会出现方块乱码。 适用于 matplotlib / seaborn / plotly 静态导出等场景。

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# CJK 可视化字体配置

## 何时使用

任何绘图代码中包含中文文本(标题、轴标签、图例、注释)时,**必须在绘图前**
执行字体检测。不要假设某个字体一定存在。

## 快速使用

### 方式一:导入 helper(推荐)

将 `scripts/setup_cjk_font.py` 复制到工作目录,或直接引用:

```python
import sys, os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'skills/cjk-viz/scripts'))
from setup_cjk_font import setup_cjk_font

font_name = setup_cjk_font()  # 自动检测、配置、返回字体名
# 如果返回 None,说明系统无可用 CJK 字体,会打印警告
```

调用后 `plt.rcParams` 已经配置好,直接绘图即可。

### 方式二:内联代码片段

如果不想引入外部文件,在脚本开头加入:

```python
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm

def _setup_cjk():
    candidates = [
        'Noto Sans CJK SC', 'Noto Sans SC', 'Source Han Sans SC',
        'WenQuanYi Micro Hei', 'WenQuanYi Zen Hei',
        'Droid Sans Fallback', 'AR PL UMing CN',
        'SimHei', 'Microsoft YaHei', 'PingFang SC',
    ]
    available = {f.name for f in fm.fontManager.ttflist}
    for name in candidates:
        if name in available:
            plt.rcParams['font.sans-serif'] = [name, 'DejaVu Sans']
            plt.rcParams['axes.unicode_minus'] = False
            return name
    # 尝试从常见路径加载 .ttf
    search_paths = [
        '/usr/share/fonts', '/usr/local/share/fonts',
        os.path.expanduser('~/.local/share/fonts'),
        os.path.join(os.path.dirname(__file__), 'fonts'),
    ]
    for base in search_paths:
        for root, _, files in os.walk(base):
            for f in files:
                if f.lower().endswith('.ttf') and any(
                    k in f.lower() for k in ['noto', 'cjk', 'hei', 'han', 'wenquan', 'droid']
                ):
                    path = os.path.join(root, f)
                    fm.fontManager.addfont(path)
                    prop = fm.FontProperties(fname=path)
                    name = prop.get_name()
                    plt.rcParams['font.sans-serif'] = [name, 'DejaVu Sans']
                    plt.rcParams['axes.unicode_minus'] = False
                    return name
    print("⚠️  未找到 CJK 字体,中文可能显示为方块。")
    print("   安装建议: apt install fonts-noto-cjk 或 pip install matplotlib-cjk-fonts")
    return None

_cjk_font = _setup_cjk()
```

### 方式三:安装字体后再绘图

如果检测失败,在 Docker 容器内安装:

```bash
apt-get update && apt-get install -y fonts-noto-cjk
# 或者用 pip 安装打包好的字体
pip install matplotlib-cjk-fonts
```

安装后需要清除 matplotlib 字体缓存:

```python
import matplotlib
import shutil, os
cache_dir = matplotlib.get_cachedir()
if os.path.exists(cache_dir):
    shutil.rmtree(cache_dir)
    print(f"已清除缓存: {cache_dir}")
```

## 关键陷阱:`.ttc` 文件与 matplotlib

**这是最常见的坑。** 很多 Linux/Docker 环境安装的 CJK 字体是 `.ttc`(TrueType Collection)
格式(如 `NotoSansCJK-Regular.ttc`),matplotlib 能检测到但 `rcParams` 设置后不生效。

### 症状
- `setup_cjk_font()` 报告成功,但图片中文仍显示为方块 □□□
- `findfont()` 能找到字体文件,但渲染时不使用

### 解决方案:FontProperties 模式

对 `.ttc` 文件,必须用 `FontProperties(fname=path)` 显式传给每个文本元素:

```python
from matplotlib.font_manager import FontProperties

# 全局 FontProperties 对象
CJK_FP = FontProperties(fname='/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc')

# 用法: 每个含中文的文本元素都要传 fontproperties=CJK_FP
ax.set_xlabel('中文标签', fontproperties=CJK_FP)
ax.set_ylabel('中文标签', fontproperties=CJK_FP)
ax.set_title('中文标题', fontproperties=CJK_FP)
ax.set_yticklabels(chinese_labels, fontproperties=CJK_FP)

# legend 需要特殊处理: prop= 设置条目字体, title 需要单独设置
ax.legend(title='中文图例标题', prop=CJK_FP)
ax.get_legend().get_title().set_fontproperties(CJK_FP)

# suptitle 同理
plt.suptitle('中文总标题', fontproperties=CJK_FP)
```

### 优先级策略

1. **优先找 `.ttf` 文件** → 可以用 `rcParams` 全局设置,最省事
2. **只有 `.ttc` 文件** → 必须用 `FontProperties(fname=)` 逐个传参
3. **都没有** → 安装字体或用内嵌 `.ttf`

### helper 脚本已内置此逻辑

`scripts/setup_cjk_font.py` 的 `setup_cjk_font()` 会优先找 `.ttf`,
找不到时返回 `.ttc` 路径。调用 `get_cjk_fp()` 获取 `FontProperties` 对象。

## 字体优先级

按以下顺序尝试(覆盖大多数 Linux / Docker / macOS 环境):

| 优先级 | 字体名 | 常见来源 |
|--------|--------|----------|
| 1 | Noto Sans CJK SC | `fonts-noto-cjk` (Debian/Ubuntu) |
| 2 | Noto Sans SC | Google Fonts |
| 3 | Source Han Sans SC | Adobe 思源黑体 |
| 4 | WenQuanYi Micro Hei | `fonts-wqy-microhei` |
| 5 | WenQuanYi Zen Hei | `fonts-wqy-zenhei` |
| 6 | Droid Sans Fallback | Android / 旧版 Docker 镜像 |
| 7 | AR PL UMing CN | `fonts-arphic-uming` |
| 8 | SimHei | Windows |
| 9 | Microsoft YaHei | Windows |
| 10 | PingFang SC | macOS |

## 与其他 skill 配合

- 使用 `scientific-visualization` 或 `matplotlib` skill 时,先执行本 skill 的字体配置
- 使用 `plotly` 生成静态图片(`write_image`)时同样需要配置字体
- 在 `biomed-dispatch` 的 prompt 中可以加入:"绘图前先运行 cjk-viz 字体检测"

## 验证

绘图后可以用以下代码快速验证中文是否正常渲染:

```python
fig, ax = plt.subplots(figsize=(4, 2))
ax.text(0.5, 0.5, '中文测试 Chinese Test 123',
        ha='center', va='center', fontsize=16, transform=ax.transAxes)
ax.set_title('字体验证')
fig.savefig('/workspace/outputs/cjk_font_test.png', dpi=100, bbox_inches='tight')
print("✅ 验证图已保存,请检查中文是否正常显示")
```

Related Skills

zinc-database

42
from Zaoqu-Liu/ScienceClaw

Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.

zarr-python

42
from Zaoqu-Liu/ScienceClaw

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

Academic Writing

42
from Zaoqu-Liu/ScienceClaw

## Overview

scientific-visualization

42
from Zaoqu-Liu/ScienceClaw

## Overview

venue-templates

42
from Zaoqu-Liu/ScienceClaw

Access comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.

vaex

42
from Zaoqu-Liu/ScienceClaw

Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory.

uspto-database

42
from Zaoqu-Liu/ScienceClaw

Access USPTO APIs for patent/trademark searches, examination history (PEDS), assignments, citations, office actions, TSDR, for IP analysis and prior art searches.

uniprot-database

42
from Zaoqu-Liu/ScienceClaw

Direct REST API access to UniProt. Protein searches, FASTA retrieval, ID mapping, Swiss-Prot/TrEMBL. For Python workflows with multiple databases, prefer bioservices (unified interface to 40+ services). Use this for direct HTTP/REST work or UniProt-specific control.

umap-learn

42
from Zaoqu-Liu/ScienceClaw

UMAP dimensionality reduction. Fast nonlinear manifold learning for 2D/3D visualization, clustering preprocessing (HDBSCAN), supervised/parametric UMAP, for high-dimensional data.

treatment-plans

42
from Zaoqu-Liu/ScienceClaw

Generate concise (3-4 page), focused medical treatment plans in LaTeX/PDF format for all clinical specialties. Supports general medical treatment, rehabilitation therapy, mental health care, chronic disease management, perioperative care, and pain management. Includes SMART goal frameworks, evidence-based interventions with minimal text citations, regulatory compliance (HIPAA), and professional formatting. Prioritizes brevity and clinical actionability.

transformers

42
from Zaoqu-Liu/ScienceClaw

This skill should be used when working with pre-trained transformer models for natural language processing, computer vision, audio, or multimodal tasks. Use for text generation, classification, question answering, translation, summarization, image classification, object detection, speech recognition, and fine-tuning models on custom datasets.

torchdrug

42
from Zaoqu-Liu/ScienceClaw

PyTorch-native graph neural networks for molecules and proteins. Use when building custom GNN architectures for drug discovery, protein modeling, or knowledge graph reasoning. Best for custom model development, protein property prediction, retrosynthesis. For pre-trained models and diverse featurizers use deepchem; for benchmark datasets use pytdc.