chaoxing-download

Download PDF documents from Chaoxing (超星) contest/platform viewer URLs and convert to TXT. Use when user wants to download files from contestyd.chaoxing.com, 超星, or provides Chaoxing WPS viewer URLs with objectid parameters. Supports single or batch downloads with page count validation and automatic PDF-to-TXT conversion.

3,891 stars

byopenclaw

View on GitHub Installation ↓

Best use case

chaoxing-download is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using chaoxing-download should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/chaoxing-download/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/artminding/chaoxing-download/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/chaoxing-download/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How chaoxing-download Compares

Feature / Agent	chaoxing-download	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

AI Agent for Product Research

Browse AI agent skills for product research, competitive analysis, customer discovery, and structured product decision support.

SKILL.md Source

# Chaoxing Document Downloader (超星文档下载)

Download PDFs from Chaoxing WPS viewer URLs using the `getYunFiles` API.

## Core Principle

Every Chaoxing viewer URL contains an `objectid` (32-char hex). Call the `getYunFiles` API to get the direct PDF link — no cookies or auth tokens needed.

## Arguments

`$ARGUMENTS` contains the user's download request — typically one or more entries with page count, name, and viewer URL. Parse them to extract the data.

## Download Method

### Step 1: Extract objectid from each URL

Find the `objectid=([a-f0-9]{32})` parameter in each viewer URL.

### Step 2: Call getYunFiles API

For each objectid, call:
```
https://contestyd.chaoxing.com/app/files/{objectid}/getYunFiles?key=allData
```

Response JSON contains:
- `data.pdf` — direct PDF URL on `s3.cldisk.com` or `s3.ananas.chaoxing.com` (preferred)
- `data.download` — alternative download URL with auth tokens (fallback)
- `data.filename` — original filename
- `data.pagenum` — page count

### Step 3: Download the PDF

Use the `data.pdf` URL to download directly. No authentication headers needed.

Save to: `~/Downloads/chaoxing_pdfs/{用户给的名称}.pdf`

### Step 4: Validate page count

Compare `data.pagenum` with the user's expected page count. Report any mismatch.

### Step 5: Convert PDF to TXT (with OCR fallback)

After downloading each PDF, automatically extract text to a plain text file. Use a two-stage approach: native text extraction first, then OCR fallback for image-based pages.

**Prerequisites:**

```bash
pip install pymupdf rapidocr-onnxruntime
```

**Conversion method (Python):**

```python
import sys, os, fitz
from rapidocr_onnxruntime import RapidOCR

if sys.platform == "win32":
    sys.stdout.reconfigure(encoding="utf-8")

ocr = RapidOCR()
pdf_path = "~/Downloads/chaoxing_pdfs/{name}.pdf"
doc = fitz.open(pdf_path)
all_text = []

for i, page in enumerate(doc):
    # Stage 1: Try native text extraction
    native = page.get_text().strip()
    if len(native) > 50:
        all_text.append(f"--- 第{i+1}页 ---\n{native}")
        continue
    # Stage 2: OCR fallback for image-based pages
    pix = page.get_pixmap(dpi=200)
    img_bytes = pix.tobytes("png")
    result, _ = ocr(img_bytes)
    ocr_text = "\n".join([item[1] for item in result]) if result else ""
    label = "OCR" if len(ocr_text) > 0 else "(empty)"
    all_text.append(f"--- 第{i+1}页 [{label}] ---\n{ocr_text}")

doc.close()
full_text = "\n".join(all_text)

with open(pdf_path.replace(".pdf", ".txt"), "w", encoding="utf-8") as f:
    f.write(full_text)

# Summary
native_pages = sum(1 for p in all_text if "[OCR]" not in p and "[empty]" not in p)
ocr_pages = sum(1 for p in all_text if "[OCR]" in p)
print(f"Native: {native_pages}p, OCR: {ocr_pages}p, Total: {len(full_text)} chars")
```

**Output files per download:**
- `{name}.pdf` — original PDF
- `{name}.txt` — plain text extraction (native + OCR pages marked with `[OCR]`)

**How it works:**
1. Each page is first checked for native text (text layer PDF)
2. If native text < 50 chars, the page is rendered to image at 200 DPI and processed by RapidOCR
3. OCR pages are labeled `[OCR]` in the output for easy identification
4. Empty pages (no text and OCR fails) are labeled `[empty]`

## CLI Tool (Alternative)

A CLI tool is available at `C:/Users/Cameron/Downloads/chaoxing_dl.py`:

```bash
# Single download
python ~/Downloads/chaoxing_dl.py "VIEWER_URL" -n "文件名"

# Batch from JSON file
python ~/Downloads/chaoxing_dl.py --batch tasks.json

# With page validation
python ~/Downloads/chaoxing_dl.py "URL" -n "name" --json

# Force overwrite
python ~/Downloads/chaoxing_dl.py "URL" -n "name" -f
```

Batch JSON format:
```json
[
  {"name": "文件名", "url": "viewer_url_or_objectid", "pages": 22},
  ...
]
```

## Batch Processing (Without CLI Tool)

For multiple downloads without the CLI, use bash loop:

```bash
for oid_name in "OBJECTID1:名称1" "OBJECTID2:名称2"; do
  oid="${oid_name%%:*}"; name="${oid_name##*:}"
  info=$(curl -s -L "https://contestyd.chaoxing.com/app/files/$oid/getYunFiles?key=allData")
  pagenum=$(echo "$info" | grep -o '"pagenum":[0-9]*' | cut -d: -f2)
  pdf_url=$(echo "$info" | grep -o '"pdf":"[^"]*"' | head -1 | tr -d '"' | sed 's/^pdf://')
  echo "$name: ${pagenum}p"
  curl -s -L -o ~/Downloads/chaoxing_pdfs/${name}.pdf "$pdf_url"
done
```

## Key Notes

- Only `objectid` is needed — no `resid`, `tk`, `addPointInfo`, or cookies
- Always validate page count against user expectation
- The PDF URLs on `s3.cldisk.com` are direct links, publicly accessible
- If `data.pdf` is empty, fall back to `data.download`
- Skip files that already exist unless user specifies overwrite

Related Skills

bing-keyword-image-downloader

3891

from openclaw/skills

当用户需要按关键词从 Bing 公开图片搜索结果中批量下载图片时使用。遇到类似“帮我从 Bing 按关键词下载 10 张图片”“批量抓取 Bing 图片”“按关键词保存 Bing 图片到本地”这类请求时，应主动使用这个 skill。它专门处理基于关键词的 Bing 图片搜索、分页收集候选链接、跳过失败源站并保存到本地目录的工作流。

cjl-x-download

3891

from openclaw/skills

Download images and videos from X (Twitter) posts to ~/Downloads. Use when user shares an X/Twitter link and wants to save media, or says '下载', 'download', '保存图片', '保存视频', or provides a x.com/twitter.com URL with intent to download media.

download

3891

from openclaw/skills

Downloads YouTube videos to ~/Downloads. Use when user wants to download a YouTube video to their machine.

instagram-reel-downloader-whatsapp

3891

from openclaw/skills

Download an Instagram Reel via sssinstagram.com and return it as a WhatsApp-ready video file. Use when a reel URL is provided and yt-dlp is blocked or not preferred.

bilibili-downloader

3891

from openclaw/skills

Download videos, audio, subtitles, and covers from Bilibili using bilibili-api. Use when working with Bilibili content for downloading videos in various qualities, extracting audio, getting subtitles and danmaku, downloading covers, and managing download preferences.

youtube-audio-download

3891

from openclaw/skills

Download YouTube video audio and convert to MP3. Supports age-restricted videos with cookies.

scihub-paper-downloader

3891

from openclaw/skills

Get a PDF link from Sci-Hub for a DOI.

yt-dlp-downloader

3891

from openclaw/skills

Download videos from YouTube, Bilibili, Twitter, and thousands of other sites using yt-dlp. Use when the user provides a video URL and wants to download it, extract audio (MP3), download subtitles, or select video quality. Triggers on phrases like "下载视频", "download video", "yt-dlp", "YouTube", "B站", "抖音", "提取音频", "extract audio".

douyin-video-downloader

3891

from openclaw/skills

抖音视频下载工具 - 通过第三方解析服务实现无水印视频下载

scholar-paper-downloader

3891

from openclaw/skills

学术文献PDF批量下载工具,支持从多个学术网站(arXiv、PubMed、PMC、Semantic Scholar等)搜索和下载论文, 自动提取元数据、生成索引列表。优先从官方免费渠道下载,付费文献提供手动下载指引。

douyin-download

3891

from openclaw/skills

抖音无水印视频下载工具。当用户发送抖音视频链接时，自动解析并下载无水印版本，上传到云盘发给用户。 Use cases: - 用户发送抖音链接 - "下载这个视频" - "帮我保存抖音视频" - "解析抖音链接"

youtube-hq-downloader

3891

from openclaw/skills

Youtube Highest Quality Downloader - Download highest quality silent video and pure audio from YouTube, then merge into video with sound