AI Agent Skill HUB

IMA TTS Generator

Convert text, scripts, and captions into natural voiceovers for videos, explainers, product demos, and social posts.

3,891 stars

View on GitHub Installation ↓

Best use case

IMA TTS Generator is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Convert text, scripts, and captions into natural voiceovers for videos, explainers, product demos, and social posts.

Teams using IMA TTS Generator should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ima-tts-ai/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/allenfancy-gan/ima-tts-ai/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ima-tts-ai/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How IMA TTS Generator Compares

Feature / Agent	IMA TTS Generator	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Convert text, scripts, and captions into natural voiceovers for videos, explainers, product demos, and social posts.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agent for YouTube Script Writing

Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# IMA TTS AI — Text-to-Speech Generator

**For complete API documentation, security details, all parameters, speaker list, and Python examples, read `SKILL-DETAIL.md`.**

## Model ID Reference (CRITICAL)

| Friendly Name | model_id | Notes |
|---------------|----------|-------|
| Seed TTS 2.0 | `seed-tts-2.0` | ✅ Default and only supported model |

**Sub-models (via extra-params):**
- `seed-tts-2.0-expressive` — More expressive, emotional (default)
- `seed-tts-2.0-standard` — More stable, neutral

## When User Says "帮我制作旁白/配音"

**Must ask first:**
| Question | Parameter | Required |
|----------|-----------|----------|
| 要朗读的内容/文案 | `prompt` | ✅ Yes |

**Recommend asking:**
| Question | Parameter | Options |
|----------|-----------|---------|
| 音色/发音人 | `speaker` | 魅力苏菲、Vivi、云舟、大壹 等 (see SKILL-DETAIL.md) |

**Optional:**
| Question | Parameter | Range |
|----------|-----------|-------|
| 情感/情绪 | `audio_params.emotion` | neutral, sad, angry |
| 语速 | `audio_params.speech_rate` | [-50, 100], 0=normal |
| 音量 | `audio_params.loudness_rate` | [-50, 100], 0=normal |

## User Input Parsing

| User says | Parameter | Value |
|-----------|-----------|-------|
| 旁白/配音/朗读 | prompt + speaker | Ask for content first |
| 女声/female | speaker | e.g. `zh_female_vv_uranus_bigtts` |
| 男声/male | speaker | e.g. `zh_male_sophie_uranus_bigtts` |
| 语速快/slow | audio_params.speech_rate | Positive/negative value |
| expressive/standard | model | Sub-model selection |

## Script Usage

```bash
# List available TTS models
python3 {baseDir}/scripts/ima_tts_create.py --api-key $IMA_API_KEY --list-models

# Generate speech (default model: seed-tts-2.0)
python3 {baseDir}/scripts/ima_tts_create.py \
  --api-key $IMA_API_KEY \
  --model-id seed-tts-2.0 \
  --prompt "Text to be spoken here." \
  --user-id {user_id} \
  --output-json

# With speaker and emotion
python3 {baseDir}/scripts/ima_tts_create.py \
  --api-key $IMA_API_KEY \
  --model-id seed-tts-2.0 \
  --prompt "阳光青年音色测试，你好世界。" \
  --extra-params '{"model":"seed-tts-2.0-expressive","speaker":"zh_male_sophie_uranus_bigtts","audio_params":{"emotion":"neutral"}}' \
  --user-id {user_id} \
  --output-json
```

## Sending Results to User

```python
# ✅ CORRECT: Use remote URL directly
message(action="send", media=audio_url, caption="✅ 语音合成成功！\n• 模型：[Name]\n• 耗时：[X]s\n• 积分：[N pts]\n\n🔗 原始链接：[url]")

# ❌ WRONG: Never download to local file
```

## UX Protocol (Brief)

1. **Pre-generation:** "🔊 开始语音合成… 模型：[Name]，预计[X~Y]秒，消耗[N]积分"
2. **Progress:** Every 10-15s: "⏳ 语音合成中… [P]%"
3. **Success:** Send audio via `media=audio_url` + include link in caption
4. **Failure:** Natural language error + suggest retry. See SKILL-DETAIL.md for error translation.

**Never say to users:** script names, API endpoints, attribute_id, technical parameter names.

## Environment

Base URL: `https://api.imastudio.com`
Headers: `Authorization: Bearer $IMA_API_KEY` · `x-app-source: ima_skills` · `x_app_language: en`

## Core Flow

1. `GET /open/v1/product/list?app=ima&platform=web&category=text_to_speech` → get `attribute_id`, `credit`
2. `POST /open/v1/tasks/create` → get `task_id`
3. `POST /open/v1/tasks/detail` → poll every 2-5s until `resource_status==1`

**MANDATORY:** Always query product list first. `attribute_id` is required.

## Estimated Generation Time

| Model | Estimated Time | Poll Every |
|-------|---------------|------------|
| seed-tts-2.0 | 5~30s | 3s |

## User Preference Memory

Storage: `~/.openclaw/memory/ima_prefs.json`
- **Save** when user explicitly says "用XXX音色" / "默认用XXX"
- **Clear** when user says "换个音色" / "推荐一个"

## Popular Speakers (Quick Reference)

| Category | Speaker Name | speaker ID |
|----------|-------------|------------|
| 通用 | 魅力苏菲 | `zh_male_sophie_uranus_bigtts` |
| 通用 | Vivi | `zh_female_vv_uranus_bigtts` |
| 通用 | 云舟 | `zh_male_m191_uranus_bigtts` |
| 视频配音 | 大壹 | `zh_male_dayi_uranus_bigtts` |
| 角色扮演 | 知性灿灿 | `zh_female_cancan_uranus_bigtts` |

**Full speaker list:** See `volcengine_tts_timbre_list.json` in project or SKILL-DETAIL.md.

**⚠️ Important:** Use native format (`*_uranus_bigtts`), NOT `BV*_streaming` format.

Related Skills

Invoice Generator

from openclaw/skills

Creates professional invoices in markdown and HTML

Workflow & Productivity

Incident Postmortem Generator

from openclaw/skills

Generate blameless incident postmortems from raw notes, Slack threads, or bullet points.

DevOps & Infrastructure

Partnership Agreement Generator

from openclaw/skills

Generate comprehensive partnership agreements, joint venture frameworks, and strategic alliance documents for B2B relationships.

Legal Documents & Agreements

Employee Onboarding Generator

from openclaw/skills

Build a structured 90-day onboarding plan for any role. Covers pre-boarding, Day 1, Week 1, 30/60/90-day milestones, buddy assignments, and success metrics.

Workflow & Productivity

Employee Handbook Generator

from openclaw/skills

Build a complete, customized employee handbook for your company. Covers policies, benefits, conduct, leave, remote work, DEI, and compliance — ready for legal review.

Content & Documentation

IT Disaster Recovery Plan Generator

from openclaw/skills

Build production-ready disaster recovery plans that actually get followed when things break.

DevOps & Infrastructure

Compliance Audit Generator

from openclaw/skills

Run internal compliance audits against major frameworks without hiring a consultant.

API Documentation Generator

from openclaw/skills

Generate production-ready API documentation from endpoint descriptions. Outputs OpenAPI 3.0, markdown reference docs, and SDK quickstart guides.

Coding & Development

Annual Report Generator

from openclaw/skills

Build a complete annual business report from raw data. Covers financial performance, operational metrics, strategic highlights, and forward-looking guidance.

Workflow & Productivity

daily-report-generator

from openclaw/skills

Automatically generate daily/weekly work reports from git commits, calendar events, and task lists. Use when you need to quickly create professional work reports without manual effort.

Workflow & Productivity

hr-policy-generator

from openclaw/skills

Comprehensive HR policy development covering attendance, time-off, overtime, remote work, and compliance. Generates structured policy documents, legal checklists, exception handling frameworks, and employee communication plans tailored to company size, work arrangement, and jurisdiction.

Workflow & Productivity

hr-policy-generator-cn

from openclaw/skills

综合性 HR 政策设计工具，覆盖考勤、休假、加班、远程办公及合规要求。根据公司规模、办公模式、适用法律等输入，生成完整的政策文档、法律合规清单、例外处理机制及员工沟通方案。

Workflow & Productivity