pocket-tts

533 stars

bysundial-org

View on GitHub Installation ↓

Best use case

pocket-tts is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using pocket-tts should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/pocket-tts/SKILL.md --create-dirs "https://raw.githubusercontent.com/sundial-org/awesome-openclaw-skills/main/skills/pocket-tts/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/pocket-tts/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How pocket-tts Compares

Feature / Agent	pocket-tts	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

This skill provides specific capabilities for your AI agent. See the About section for full details.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Pocket TTS Skill

Fully local, offline text-to-speech using Kyutai's Pocket TTS model. Generate high-quality audio from text without any API calls or internet connection. Features 8 built-in voices, voice cloning support, and runs entirely on CPU.

## Features

- 🎯 **Fully local** - No API calls, runs completely offline
- 🚀 **CPU-only** - No GPU required, works on any computer
- ⚡ **Fast generation** - ~2-6x real-time on CPU
- 🎤 **8 built-in voices** - alba, marius, javert, jean, fantine, cosette, eponine, azelma
- 🎭 **Voice cloning** - Clone any voice from a WAV sample
- 🔊 **Low latency** - ~200ms first audio chunk
- 📚 **Simple Python API** - Easy integration into any project

## Installation

```bash
# 1. Accept the model license on Hugging Face
# https://huggingface.co/kyutai/pocket-tts

# 2. Install the package
pip install pocket-tts

# Or use uv for automatic dependency management
uvx pocket-tts generate "Hello world"
```

## Usage

### CLI

```bash
# Basic usage
pocket-tts "Hello, I am your AI assistant"

# With specific voice
pocket-tts "Hello" --voice alba --output hello.wav

# With custom voice file (voice cloning)
pocket-tts "Hello" --voice-file myvoice.wav --output output.wav

# Adjust speed
pocket-tts "Hello" --speed 1.2

# Start local server
pocket-tts --serve

# List available voices
pocket-tts --list-voices
```

### Python API

```python
from pocket_tts import TTSModel
import scipy.io.wavfile

# Load model
tts_model = TTSModel.load_model()

# Get voice state
voice_state = tts_model.get_state_for_audio_prompt(
    "hf://kyutai/tts-voices/alba-mackenna/casual.wav"
)

# Generate audio
audio = tts_model.generate_audio(voice_state, "Hello world!")

# Save to WAV
scipy.io.wavfile.write("output.wav", tts_model.sample_rate, audio.numpy())

# Check sample rate
print(f"Sample rate: {tts_model.sample_rate} Hz")
```

## Available Voices

| Voice | Description |
|-------|-------------|
| alba | Casual female voice |
| marius | Male voice |
| javert | Clear male voice |
| jean | Natural male voice |
| fantine | Female voice |
| cosette | Female voice |
| eponine | Female voice |
| azelma | Female voice |

Or use `--voice-file /path/to/wav.wav` for custom voice cloning.

## Options

| Option | Description | Default |
|--------|-------------|---------|
| `text` | Text to convert | Required |
| `-o, --output` | Output WAV file | `output.wav` |
| `-v, --voice` | Voice preset | `alba` |
| `-s, --speed` | Speech speed (0.5-2.0) | `1.0` |
| `--voice-file` | Custom WAV for cloning | None |
| `--serve` | Start HTTP server | False |
| `--list-voices` | List all voices | False |

## Requirements

- Python 3.10-3.14
- PyTorch 2.5+ (CPU version works)
- Works on 2 CPU cores

## Notes

- ⚠️ Model is gated - accept license on Hugging Face first
- 🌍 English language only (v1)
- 💾 First run downloads model (~100M parameters)
- 🔊 Audio is returned as 1D torch tensor (PCM data)

## Links

- [Demo](https://kyutai.org/tts)
- [GitHub](https://github.com/kyutai-labs/pocket-tts)
- [Hugging Face](https://huggingface.co/kyutai/pocket-tts)
- [Paper](https://arxiv.org/abs/2509.06926)

Related Skills

portfolio-watcher

533

from sundial-org/awesome-openclaw-skills

Monitor stock/crypto holdings, get price alerts, track portfolio performance

portainer

533

from sundial-org/awesome-openclaw-skills

Control Docker containers and stacks via Portainer API. List containers, start/stop/restart, view logs, and redeploy stacks from git.

portable-tools

533

from sundial-org/awesome-openclaw-skills

Build cross-device tools without hardcoding paths or account names

polymarket

533

from sundial-org/awesome-openclaw-skills

Trade prediction markets on Polymarket. Analyze odds, place bets, track positions, automate alerts, and maximize returns from event outcomes. Covers sports, politics, entertainment, and more.

polymarket-traiding-bot

533

from sundial-org/awesome-openclaw-skills

No description provided.

polymarket-analysis

533

from sundial-org/awesome-openclaw-skills

Analyze Polymarket prediction markets for trading edges. Pair Cost arbitrage, whale tracking, sentiment analysis, momentum signals, user profile tracking. No execution.

polymarket-agent

533

from sundial-org/awesome-openclaw-skills

Autonomous prediction market agent - analyzes markets, researches news, and identifies trading opportunities

polymarket-5

533

from sundial-org/awesome-openclaw-skills

Query Polymarket prediction markets. Use for questions about prediction markets, betting odds, market prices, event probabilities, or when user asks about Polymarket data.

polymarket-4

533

from sundial-org/awesome-openclaw-skills

Query Polymarket prediction markets. Use for questions about prediction markets, betting odds, market prices, event probabilities, or when user asks about Polymarket data.

polymarket-3

533

from sundial-org/awesome-openclaw-skills

Query Polymarket prediction market odds and events via CLI. Search for markets, get current prices, list events by category. Supports sports betting (NFL, NBA, soccer/EPL, Champions League), politics, crypto, elections, geopolitics. Real money markets = more accurate than polls. No API key required. Use when asked about odds, probabilities, predictions, or "what are the chances of X".

polymarket-2

533

from sundial-org/awesome-openclaw-skills

Query Polymarket prediction markets - check odds, trending markets, search events, track prices.

pollinations

533

from sundial-org/awesome-openclaw-skills

Pollinations.ai API for AI generation - text, images, videos, audio, and analysis. Use when user requests AI-powered generation (text completion, images, videos, audio, vision/analysis, transcription) or mentions Pollinations. Supports 25+ models (OpenAI, Claude, Gemini, Flux, Veo, etc.) with OpenAI-compatible chat endpoint and specialized generation endpoints.