google-tts

Convert documents and text to audio using Google Cloud Text-to-Speech. Use this skill when the user wants to: narrate a document, read aloud text, generate audio from a file, convert text to speech, create a recording of documentation or analysis, create a podcast from a document, or use Google TTS/text-to-speech. Trigger phrases: "read this aloud", "narrate this", "create a recording", "text to speech", "TTS", "convert to audio", "audio from document", "listen to this", "generate audio", "google tts", "create a podcast".

23 stars

Best use case

google-tts is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Convert documents and text to audio using Google Cloud Text-to-Speech. Use this skill when the user wants to: narrate a document, read aloud text, generate audio from a file, convert text to speech, create a recording of documentation or analysis, create a podcast from a document, or use Google TTS/text-to-speech. Trigger phrases: "read this aloud", "narrate this", "create a recording", "text to speech", "TTS", "convert to audio", "audio from document", "listen to this", "generate audio", "google tts", "create a podcast".

Teams using google-tts should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/google-tts/SKILL.md --create-dirs "https://raw.githubusercontent.com/christophacham/agent-skills-library/main/skills/media-production/google-tts/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/google-tts/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How google-tts Compares

Feature / Agentgoogle-ttsStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Convert documents and text to audio using Google Cloud Text-to-Speech. Use this skill when the user wants to: narrate a document, read aloud text, generate audio from a file, convert text to speech, create a recording of documentation or analysis, create a podcast from a document, or use Google TTS/text-to-speech. Trigger phrases: "read this aloud", "narrate this", "create a recording", "text to speech", "TTS", "convert to audio", "audio from document", "listen to this", "generate audio", "google tts", "create a podcast".

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Google Cloud Text-to-Speech

Converts text and documents into audio using Google Cloud TTS API. Supports Neural2, WaveNet, Studio, and Standard voices across 40+ languages.

## Setup

API key via `GOOGLE_TTS_API_KEY` env var or `skills/google-tts/config.json` with `{"api_key": "..."}`.
Requires `ffmpeg` for multi-chunk documents. Optional: `pip install PyPDF2 python-docx` for PDF/DOCX.

## Commands

### List Voices

```bash
python skills/google-tts/scripts/google_tts.py voices --language en-US --type Neural2
python skills/google-tts/scripts/google_tts.py voices --json
```

### Text-to-Speech

```bash
# From text or document (PDF, DOCX, MD, TXT)
python skills/google-tts/scripts/google_tts.py tts --text "Hello world" --output ~/Downloads/hello.mp3
python skills/google-tts/scripts/google_tts.py tts --file /path/to/doc.pdf --output ~/Downloads/narration.mp3

# With voice, rate, pitch, encoding options
python skills/google-tts/scripts/google_tts.py tts --file doc.md --voice en-US-Neural2-F --rate 0.9 --encoding MP3 --output ~/Downloads/out.mp3
```

### Podcast Generation

Takes a JSON script with alternating speakers, synthesizes each with a different voice.

```json
[
  {"speaker": "host1", "text": "Welcome to our podcast!"},
  {"speaker": "host2", "text": "Thanks for having me..."}
]
```

```bash
python skills/google-tts/scripts/google_tts.py podcast --script /tmp/script.json --output ~/Downloads/podcast.mp3
python skills/google-tts/scripts/google_tts.py podcast --script /tmp/script.json --voice1 en-US-Neural2-J --voice2 en-US-Neural2-H --rate 0.9 --output ~/Downloads/podcast.mp3
```

## Workflow

### Single-Voice Narration

1. If user provides a file path, use `--file`. For generated content, write clean prose to `/tmp/tts_input.md` first.
2. Default voice: `en-US-Neural2-D` (male) or `en-US-Neural2-F` (female). Use Neural2 for best quality/cost balance.
3. Generate: `python skills/google-tts/scripts/google_tts.py tts --file /tmp/tts_input.md --output ~/Downloads/recording.mp3`
4. Report file location and size. Default output to `~/Downloads/`.

### Podcast from Document

1. Extract text: `python skills/google-tts/scripts/extract.py /path/to/document.pdf`
2. Generate a two-host conversation script as JSON:
   - Natural discussion, not verbatim reading. Host 1 leads, Host 2 reacts/analyzes.
   - Include intro and outro. Vary turn lengths. Keep turns under 4000 chars.
3. Write script to `/tmp/podcast_script.json`
4. Generate: `python skills/google-tts/scripts/google_tts.py podcast --script /tmp/podcast_script.json --output ~/Downloads/podcast.mp3`
5. Clean up temp files.

## Reference

- **Recommended voice type**: Neural2 (~$4/1M chars, high quality)
- **Speaking rate**: 0.25-4.0 (0.85-0.95 good for technical content)
- **Pitch**: -20.0 to 20.0 semitones
- **Encodings**: MP3 (default), LINEAR16 (.wav), OGG_OPUS (.ogg)
- API limit: 5000 bytes/request. Script auto-chunks at sentence boundaries.

Related Skills

google-calendar

23
from christophacham/agent-skills-library

Interact with Google Calendar - list calendars, view events, create/update/delete events, and find free time. Use when user asks to: check calendar, schedule a meeting, create an event, find available time, list upcoming events, delete or update a calendar event, or respond to meeting invitations. Lightweight alternative to full Google Workspace MCP server with standalone OAuth authentication.

googletasks-automation

23
from christophacham/agent-skills-library

Automate Google Tasks via Rube MCP (Composio): create, list, update, delete, move, and bulk-insert tasks and task lists. Always search tools first for current schemas.

googlesuper-automation

23
from christophacham/agent-skills-library

Automate Google Super tasks via Rube MCP (Composio). Always search tools first for current schemas.

googleslides-automation

23
from christophacham/agent-skills-library

Automate Google Slides tasks via Rube MCP (Composio): create presentations, add slides from Markdown, batch update, copy from templates, get thumbnails. Always search tools first for current schemas.

googlesheets-automation

23
from christophacham/agent-skills-library

Automate Google Sheets operations (read, write, format, filter, manage spreadsheets) via Rube MCP (Composio). Read/write data, manage tabs, apply formatting, and search rows programmatically.

googlephotos-automation

23
from christophacham/agent-skills-library

Automate Google Photos tasks via Rube MCP (Composio): upload media, manage albums, search photos, batch add items, create and update albums. Always search tools first for current schemas.

googlemeet-automation

23
from christophacham/agent-skills-library

Automate Google Meet tasks via Rube MCP (Composio): create Meet spaces, schedule video conferences via Calendar events, manage meeting access. Always search tools first for current schemas.

googledrive-automation

23
from christophacham/agent-skills-library

Automate Google Drive tasks via Rube MCP (Composio). Always search tools first for current schemas.

googledocs-automation

23
from christophacham/agent-skills-library

Automate Google Docs tasks via Rube MCP (Composio): create, edit, search, export, copy, and update documents. Always search tools first for current schemas.

googlecalendar-automation

23
from christophacham/agent-skills-library

Automate Google Calendar tasks via Rube MCP (Composio). Always search tools first for current schemas.

googlebigquery-automation

23
from christophacham/agent-skills-library

Automate Google BigQuery tasks via Rube MCP (Composio): run SQL queries, explore datasets and metadata, execute MBQL queries via Metabase integration. Always search tools first for current schemas.

googleads-automation

23
from christophacham/agent-skills-library

Automate Google Ads analytics tasks via Rube MCP (Composio): list Google Ads links, run GA4 reports, check compatibility, list properties and accounts. Always search tools first for current schemas.