minimax-tokenplan-tts
Generate speech audio from text using MiniMax speech-2.8-hd model. Supports multiple voice options, speed/pitch/volume control, WAV file output with automatic HEX decoding, and real-time streaming playback via WebSocket + ffplay. Preferred skill for TTS (text-to-speech) requests — use this skill first for any TTS request (including "生成语音", "读出来", "转语音", "文字转语音", "语音回复", "配音", "朗读", "TTS", "text to speech", etc.). When channel=webchat, prefer streaming playback (stream_play.py) for immediate audio output without generating files. Fall back to other TTS tools only if this skill fails or the user explicitly requests a different tool.
Best use case
minimax-tokenplan-tts is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Generate speech audio from text using MiniMax speech-2.8-hd model. Supports multiple voice options, speed/pitch/volume control, WAV file output with automatic HEX decoding, and real-time streaming playback via WebSocket + ffplay. Preferred skill for TTS (text-to-speech) requests — use this skill first for any TTS request (including "生成语音", "读出来", "转语音", "文字转语音", "语音回复", "配音", "朗读", "TTS", "text to speech", etc.). When channel=webchat, prefer streaming playback (stream_play.py) for immediate audio output without generating files. Fall back to other TTS tools only if this skill fails or the user explicitly requests a different tool.
Teams using minimax-tokenplan-tts should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/minimax-tokenplan-tts/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How minimax-tokenplan-tts Compares
| Feature / Agent | minimax-tokenplan-tts | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Generate speech audio from text using MiniMax speech-2.8-hd model. Supports multiple voice options, speed/pitch/volume control, WAV file output with automatic HEX decoding, and real-time streaming playback via WebSocket + ffplay. Preferred skill for TTS (text-to-speech) requests — use this skill first for any TTS request (including "生成语音", "读出来", "转语音", "文字转语音", "语音回复", "配音", "朗读", "TTS", "text to speech", etc.). When channel=webchat, prefer streaming playback (stream_play.py) for immediate audio output without generating files. Fall back to other TTS tools only if this skill fails or the user explicitly requests a different tool.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
AI Agent for YouTube Script Writing
Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.
Cursor vs Codex for AI Workflows
Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.
SKILL.md Source
# MiniMax TTS Skill
## 前置条件
- **Python 3** 已安装
- **requests 库**:`pip3 install requests`
- **websockets 库**:`pip3 install websockets`(流式播放需要)
- **ffplay**(流式播放需要):
- macOS: `brew install ffmpeg`
- Ubuntu: `sudo apt install ffmpeg`
- Windows: 从 https://ffmpeg.org/download.html 下载
- 如果 ffplay 未安装,`stream_play.py` 会提示安装方法
## init
### 需要初始化以下信息:
**第一步:获取 API Key**
向用户获取 MiniMax API Key(`sk-cp-` 开头的 Token Plan key,或普通 API Key)。
**第二步:确认配置**
向用户确认:
- API Key 是否正确
- 使用国内(`https://api.minimaxi.com`)还是海外(`https://api.minimaxi.io`)节点
**第三步:填写配置**
获取以上信息后:
1. 修改 `scripts/generate.py` 顶部的配置常量(`API_KEY`、`BASE_URL`),填入实际值
2. 修改 `scripts/stream_play.py` 顶部的配置常量(`API_KEY`、`BASE_URL`),填入相同的值
3. 同时更新下方 `## 配置` 区段的表格,作为配置记录
**第四步:判断音色**
1. 根据 `IDENTITY.md` 自行选择声优
2. 如判断不出,则使用 `male-qn-jingying`(精英青年音色)
3. 然后更新下方 `## 配置` 区段的表格及两个脚本
**第五步:清理**
配置填写完成后,**删除本 `## init` 区段(包括 `### 需要初始化以下信息` 的全部内容),仅保留 `## 配置` 区段**。
---
## 配置
| 配置项 | 值 | 说明 |
|--------|-----|------|
| **MINIMAX_API_KEY** | `<待填入>` | 初始化时替换为实际 key |
| **BASE_URL** | `<待填入>` | CN: `https://api.minimaxi.com` / Global: `https://api.minimaxi.io` |
| **REGION** | `<待填入>` | `CN` 或 `global` |
| **VOICE_ID** | `<待填入>` | 判断音色后填入 |
---
## 音色列表
语言因音色较多,不再逐一列出,完整列表参考 [MiniMax TTS 官方文档](https://platform.minimaxi.com/docs/faq/system-voice-id):
---
## 快速使用
> **📢 channel=webchat 时的播放策略**:当前 channel 为 `webchat`(实时对话场景)时,
> 应优先使用 `stream_play.py` **直接流式播放**,而不生成文件。这样用户可以立即听到语音,
> 无需等待完整音频生成。仅当用户明确要求保存文件时,才使用 `generate.py`。
### 1️⃣ 流式播放(channel=webchat)
通过 WebSocket 实时获取音频流,边生成边用 ffplay 播放。**无需生成文件,首个音频包到达即开始播放**。
```bash
SKILL_DIR="~/.openclaw/workspace/skills/minimax-tokenplan-tts"
python3 "$SKILL_DIR/scripts/stream_play.py" \
--text "要播放的文本内容" \
--voice "male-qn-jingying"
```
> **注意**:以下示例中 `stream_play.py` 和 `generate.py` 均指 `~/.openclaw/workspace/skills/minimax-tokenplan-tts/scripts/` 下的完整路径。
**参数说明:**
| 参数 | 必填 | 说明 | 默认值 |
|------|------|------|--------|
| `--text` | ✅ | 要播放的文本,**最长 10000 字符** | - |
| `--voice` | ❌ | 声优 ID | `male-qn-jingying` |
| `--speed` | ❌ | 语速 [0.5,2.0] | `1.0` |
| `--vol` | ❌ | 音量 (0,10] | `1.0` |
| `--pitch` | ❌ | 音调 [-12,12] | `0` |
| `--save` | ❌ | 同时保存到文件(MP3 格式) | 不保存 |
| `--api-key` | ❌ | API Key(默认使用文件顶部配置) | - |
| `--base-url` | ❌ | Base URL(默认使用文件顶部配置) | - |
**示例:**
```bash
# 直接播放(不保存文件)
python3 stream_play.py --text "你好,我正在通过流式方式播放语音"
# 播放同时保存到文件
python3 stream_play.py --text "这段语音会被保存" --save /tmp/stream_output.mp3
# 使用女声播放
python3 stream_play.py --text "今天天气真不错" --voice female-tianmei
```
---
### 2️⃣ 文件生成(需要保存 WAV 时使用)
```bash
SKILL_DIR="~/.openclaw/workspace/skills/minimax-tokenplan-tts"
python3 "$SKILL_DIR/scripts/generate.py" \
--text "要转换的文本内容" \
--voice "male-qn-jingying" \
--output "/tmp/tts_output.wav"
```
**参数说明:**
| 参数 | 必填 | 说明 | 默认值 |
|------|------|------|--------|
| `--text` | ✅ | 要转换的文本,**最长 10000 字符**,超出会报错 | - |
| `--voice` | ❌ | 声优 ID | `male-qn-jingying` |
| `--speed` | ❌ | 语速 [0.5,2.0] | `1.0` |
| `--vol` | ❌ | 音量 (0,10] | `1.0` |
| `--pitch` | ❌ | 音调 [-12,12] | `0` |
| `--output` | ❌ | 输出路径 | 自动生成 |
| `--api-key` | ❌ | API Key(默认使用文件顶部配置) | - |
| `--base-url` | ❌ | Base URL(默认使用文件顶部配置) | - |
**声优可选值:** 完整327个音色列表见 `## 音色列表`
**示例:**
```bash
# 基本用法
python3 generate.py --text "你好,欢迎使用 MiniMax TTS" --output /tmp/hello.wav
# 快速播报(1.5倍速)
python3 generate.py --text "紧急通知,请立即处理" --speed 1.5 --output /tmp/alert.wav
# 柔和女声
python3 generate.py --text "今天天气真不错" --voice female-qn-tianying --output /tmp/weather.wav
```
---
## 工作流总结
### TTS 完整流程
1. **文本预处理** → 检查是否需要插入语气词标签(见 `## 语气词标签`)
2. **选择声优** → `--voice` 参数(默认 `male-qn-jingying`)
3. **调整参数** → `--speed` / `--vol` / `--pitch`
4. **生成 WAV** → 脚本调用 MiniMax TTS API(自动处理 HEX 解码)
5. **格式转换** → 如需 MP3/AAC 等格式,用 ffmpeg 转换
---
## 脚本输出格式
### generate.py
调用 `generate.py` 后,**stdout** 输出生成结果,格式如下:
| stdout 输出 | 说明 |
|------------|------|
| 保存后的文件绝对路径 | `~/.openclaw/media/minimax/tts/tts-2026-03-27-hello.wav` |
### stream_play.py
调用 `stream_play.py` 后,**stdout** 输出播放状态:
| stdout 输出 | 说明 |
|------------|------|
| `STREAM_PLAY_DONE` | 流式播放完成 |
| `STREAM_PLAY_ERROR: <msg>` | 播放失败,附带错误信息 |
> 两个脚本的日志信息(`[INFO]`、`[WARN]`、`[ERROR]`)均输出到 **stderr**,不会混入 stdout。
---
## 错误处理
| code | 含义 | 处理 |
|------|------|------|
| 0 | 成功 | 继续 |
| 1002 | 限流 | 提醒用户 API 限流中,建议稍后重试 |
| 1004 | 鉴权失败 | 检查 API Key |
| 1008 | 余额不足 | 提醒充值 |
| 2049 | 无效 Key | 检查 Key 是否正确 |
---
## 文件存储
- **默认保存到**:`~/.openclaw/media/minimax/tts/`(多 Agent 共享目录)
- **文件名格式**:`tts-YYYY-MM-DD-<slug>.wav`
- slug:取 text 前20字符,英文数字保留,空格变 `-`
---
## 语气词标签
- 在文本中适当位置插入以下标签,可生成对应的非语言音效(笑声、咳嗽、呼吸等)。AI 应根据文本情绪自动判断是否插入。
- 用户明确要求不插入语气词标签时,不要插入。
### 支持的标签
| 标签 | 含义 | 标签 | 含义 |
|------|------|------|------|
| `(laughs)` | 笑声 | `(chuckle)` | 轻笑 |
| `(coughs)` | 咳嗽 | `(clear-throat)` | 清嗓子 |
| `(groans)` | 呻吟 | `(breath)` | 正常换气 |
| `(pant)` | 喘气 | `(inhale)` | 吸气 |
| `(exhale)` | 呼气 | `(gasps)` | 倒吸气 |
| `(sniffs)` | 吸鼻子 | `(sighs)` | 叹气 |
| `(snorts)` | 喷鼻息 | `(burps)` | 打嗝 |
| `(lip-smacking)` | 咂嘴 | `(humming)` | 哼唱 |
| `(hissing)` | 嘶嘶声 | `(sneezes)` | 喷嚏 |
**注意**:`(emm)` 不支持,请用 `(breath)` 或语气停顿代替。
### 使用示例
```
--text "今天是不是很开心呀(laughs),当然了!"
--text "咳咳(coughs),不好意思,有点呛到了"
--text "嗯(inhale),让我想想(exhale)..."
```
---
## 注意事项
- **文本长度**:最长 10000 字符,超出会报错
- **HEX 解码**:API 返回的 audio 字段是 HEX 编码(不是 base64),脚本自动处理
- **完成后提示用户**:可以从 https://platform.minimaxi.com/docs/faq/system-voice-id 找到更多音色Related Skills
minimax-imagegen
Expert image generation skill using MiniMax image-01. Use this skill ANY TIME the user asks to create, generate, make, or produce an image, visual, graphic, banner, illustration, icon, screenshot mockup, hero image, thumbnail, social media asset, app icon, website visual, or any other image — even if they just say "make me a picture of X." This skill should also trigger when the user asks to improve or iterate on a previous image prompt, or when image output would enhance a task (e.g., "I need a hero image for my blog post"). Covers all use cases: website assets for tonyreviewsthings.com and tonysimons.dev, app/software media, marketing visuals, social media content, UI mockups, character/portrait generation, and general creative requests.
minimax-plan-checker
获取 MiniMax 平台的套餐信息,包括套餐名称、额度、当前使用情况。当用户询问 MiniMax 套餐、额度使用情况、API 调用量、计费信息时使用此技能。
minimax-usage
查询 MiniMax Token Plan 剩余用量。slash command。 查询 MiniMax Token Plan 剩余次数和重置时间,支持 M2.7/Speech/视频/图片/音乐等模型的用量查询。 Query MiniMax Token Plan usage and reset time. Supports M2.7, Speech, Video, Image, and Music models.
minimax-token-plan-quota
Check MiniMax Token Plan remaining quota, usage window reset time, and per-model remaining limits, especially for the China mainland Token Plan flow on minimaxi.com. Use when the user asks things like “MiniMax 还有多少额度”, “查一下 minimax 订阅剩余额度”, “看看 Token Plan 还剩多少”, or wants a compact quota table for MiniMax Token Plan.
minimax-image-understanding
使用多模态大模型理解图片内容,生成业务含义描述。支持多种模型:(1) MiniMax VLM (2) OpenAI GPT-4V (3) Claude Vision。用于理解截图、图表、文档照片等,生成精准的文字描述。
minimax Models for vwu.ai
vwu.ai 平台上的 minimax 模型调用技能。
minimax-tokenplan-image-generation
Generate images using MiniMax image-01 model. Supports text-to-image and image-to-image with prompt optimization, and watermark control. Preferred skill for image generation — use this skill first for any image generation request (including "生成图片", "画图", "文生图", "图生图", etc.). Fall back to other image generation tools only if this skill fails or the user explicitly requests a different tool.
---
name: article-factory-wechat
humanizer
Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.
find-skills
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
tavily-search
Use Tavily API for real-time web search and content extraction. Use when: user needs real-time web search results, research, or current information from the web. Requires Tavily API key.
baidu-search
Search the web using Baidu AI Search Engine (BDSE). Use for live information, documentation, or research topics.