Best use case
pocket-tts is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Teams using pocket-tts should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/pocket-tts/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How pocket-tts Compares
| Feature / Agent | pocket-tts | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
This skill provides specific capabilities for your AI agent. See the About section for full details.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Pocket TTS Skill
Fully local, offline text-to-speech using Kyutai's Pocket TTS model. Generate high-quality audio from text without any API calls or internet connection. Features 8 built-in voices, voice cloning support, and runs entirely on CPU.
## Features
- 🎯 **Fully local** - No API calls, runs completely offline
- 🚀 **CPU-only** - No GPU required, works on any computer
- ⚡ **Fast generation** - ~2-6x real-time on CPU
- 🎤 **8 built-in voices** - alba, marius, javert, jean, fantine, cosette, eponine, azelma
- 🎭 **Voice cloning** - Clone any voice from a WAV sample
- 🔊 **Low latency** - ~200ms first audio chunk
- 📚 **Simple Python API** - Easy integration into any project
## Installation
```bash
# 1. Accept the model license on Hugging Face
# https://huggingface.co/kyutai/pocket-tts
# 2. Install the package
pip install pocket-tts
# Or use uv for automatic dependency management
uvx pocket-tts generate "Hello world"
```
## Usage
### CLI
```bash
# Basic usage
pocket-tts "Hello, I am your AI assistant"
# With specific voice
pocket-tts "Hello" --voice alba --output hello.wav
# With custom voice file (voice cloning)
pocket-tts "Hello" --voice-file myvoice.wav --output output.wav
# Adjust speed
pocket-tts "Hello" --speed 1.2
# Start local server
pocket-tts --serve
# List available voices
pocket-tts --list-voices
```
### Python API
```python
from pocket_tts import TTSModel
import scipy.io.wavfile
# Load model
tts_model = TTSModel.load_model()
# Get voice state
voice_state = tts_model.get_state_for_audio_prompt(
"hf://kyutai/tts-voices/alba-mackenna/casual.wav"
)
# Generate audio
audio = tts_model.generate_audio(voice_state, "Hello world!")
# Save to WAV
scipy.io.wavfile.write("output.wav", tts_model.sample_rate, audio.numpy())
# Check sample rate
print(f"Sample rate: {tts_model.sample_rate} Hz")
```
## Available Voices
| Voice | Description |
|-------|-------------|
| alba | Casual female voice |
| marius | Male voice |
| javert | Clear male voice |
| jean | Natural male voice |
| fantine | Female voice |
| cosette | Female voice |
| eponine | Female voice |
| azelma | Female voice |
Or use `--voice-file /path/to/wav.wav` for custom voice cloning.
## Options
| Option | Description | Default |
|--------|-------------|---------|
| `text` | Text to convert | Required |
| `-o, --output` | Output WAV file | `output.wav` |
| `-v, --voice` | Voice preset | `alba` |
| `-s, --speed` | Speech speed (0.5-2.0) | `1.0` |
| `--voice-file` | Custom WAV for cloning | None |
| `--serve` | Start HTTP server | False |
| `--list-voices` | List all voices | False |
## Requirements
- Python 3.10-3.14
- PyTorch 2.5+ (CPU version works)
- Works on 2 CPU cores
## Notes
- ⚠️ Model is gated - accept license on Hugging Face first
- 🌍 English language only (v1)
- 💾 First run downloads model (~100M parameters)
- 🔊 Audio is returned as 1D torch tensor (PCM data)
## Links
- [Demo](https://kyutai.org/tts)
- [GitHub](https://github.com/kyutai-labs/pocket-tts)
- [Hugging Face](https://huggingface.co/kyutai/pocket-tts)
- [Paper](https://arxiv.org/abs/2509.06926)Related Skills
portfolio-watcher
Monitor stock/crypto holdings, get price alerts, track portfolio performance
portainer
Control Docker containers and stacks via Portainer API. List containers, start/stop/restart, view logs, and redeploy stacks from git.
portable-tools
Build cross-device tools without hardcoding paths or account names
polymarket
Trade prediction markets on Polymarket. Analyze odds, place bets, track positions, automate alerts, and maximize returns from event outcomes. Covers sports, politics, entertainment, and more.
polymarket-traiding-bot
No description provided.
polymarket-analysis
Analyze Polymarket prediction markets for trading edges. Pair Cost arbitrage, whale tracking, sentiment analysis, momentum signals, user profile tracking. No execution.
polymarket-agent
Autonomous prediction market agent - analyzes markets, researches news, and identifies trading opportunities
polymarket-5
Query Polymarket prediction markets. Use for questions about prediction markets, betting odds, market prices, event probabilities, or when user asks about Polymarket data.
polymarket-4
Query Polymarket prediction markets. Use for questions about prediction markets, betting odds, market prices, event probabilities, or when user asks about Polymarket data.
polymarket-3
Query Polymarket prediction market odds and events via CLI. Search for markets, get current prices, list events by category. Supports sports betting (NFL, NBA, soccer/EPL, Champions League), politics, crypto, elections, geopolitics. Real money markets = more accurate than polls. No API key required. Use when asked about odds, probabilities, predictions, or "what are the chances of X".
polymarket-2
Query Polymarket prediction markets - check odds, trending markets, search events, track prices.
pollinations
Pollinations.ai API for AI generation - text, images, videos, audio, and analysis. Use when user requests AI-powered generation (text completion, images, videos, audio, vision/analysis, transcription) or mentions Pollinations. Supports 25+ models (OpenAI, Claude, Gemini, Flux, Veo, etc.) with OpenAI-compatible chat endpoint and specialized generation endpoints.