edge-tts

Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.

533 stars

bysundial-org

View on GitHub Installation ↓

Best use case

edge-tts is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using edge-tts should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/edge-tts/SKILL.md --create-dirs "https://raw.githubusercontent.com/sundial-org/awesome-openclaw-skills/main/skills/edge-tts/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/edge-tts/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How edge-tts Compares

Feature / Agent	edge-tts	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Edge-TTS Skill

## Overview

Generate high-quality text-to-speech audio using Microsoft Edge's neural TTS service via the node-edge-tts npm package. Supports multiple languages, voices, adjustable speed/pitch, and subtitle generation.

## Quick Start

When you detect TTS intent from triggers or user request:

1. **Call the tts tool** (Clawdbot built-in) to convert text to speech
2. The tool returns a MEDIA: path
3. Clawdbot routes the audio to the current channel

```javascript
// Example: Built-in tts tool usage
tts("Your text to convert to speech")
// Returns: MEDIA: /path/to/audio.mp3
```

## Trigger Detection

Recognize "tts" keyword as TTS requests. The skill automatically filters out TTS-related keywords from text before conversion to avoid converting the trigger words themselves to audio.

## Advanced Customization

### Using the Node.js Scripts

For more control, use the bundled scripts directly:

#### TTS Converter
```bash
cd scripts
npm install
node tts-converter.js "Your text" --voice en-US-AriaNeural --rate +10% --output output.mp3
```

**Options:**
- `--voice, -v`: Voice name (default: en-US-AriaNeural)
- `--lang, -l`: Language code (e.g., en-US, es-ES)
- `--format, -o`: Output format (default: audio-24khz-48kbitrate-mono-mp3)
- `--pitch`: Pitch adjustment (e.g., +10%, -20%, default)
- `--rate, -r`: Rate adjustment (e.g., +10%, -20%, default)
- `--volume`: Volume adjustment (e.g., +0%, -10%, default)
- `--save-subtitles, -s`: Save subtitles as JSON file
- `--output, -f`: Output file path (default: tts_output.mp3)
- `--proxy, -p`: Proxy URL (e.g., http://localhost:7890)
- `--timeout`: Request timeout in milliseconds (default: 10000)
- `--list-voices, -L`: List available voices

#### Configuration Manager
```bash
cd scripts
npm install
node config-manager.js --set-voice en-US-AriaNeural

node config-manager.js --set-rate +10%

node config-manager.js --get

node config-manager.js --reset
```

### Voice Selection

Common voices (use `--list-voices` for full list):

**English:**
- `en-US-MichelleNeural` (female, natural, **default**)
- `en-US-AriaNeural` (female, natural)
- `en-US-GuyNeural` (male, natural)
- `en-GB-SoniaNeural` (female, British)
- `en-GB-RyanNeural` (male, British)

**Other Languages:**
- `es-ES-ElviraNeural` (Spanish, Spain)
- `fr-FR-DeniseNeural` (French)
- `de-DE-KatjaNeural` (German)
- `ja-JP-NanamiNeural` (Japanese)
- `zh-CN-XiaoxiaoNeural` (Chinese)
- `ar-SA-ZariyahNeural` (Arabic)

### Rate Guidelines

Rate values use percentage format:
- `"default"`: Normal speed
- `"-20%"` to `"-10%"`: Slow, clear (tutorials, stories, accessibility)
- `"+10%"` to `"+20%"`: Slightly fast (summaries)
- `"+30%"` to `"+50%"`: Fast (news, efficiency)

### Output Formats

Choose audio quality based on use case:
- `audio-24khz-48kbitrate-mono-mp3`: Standard quality (voice notes, messages)
- `audio-24khz-96kbitrate-mono-mp3`: High quality (presentations, content)
- `audio-48khz-96kbitrate-stereo-mp3`: Highest quality (professional audio, music)

## Resources

### scripts/tts-converter.js
Main TTS conversion script using node-edge-tts. Generates audio files with customizable voice, rate, volume, pitch, and format. Supports subtitle generation and voice listing.

### scripts/config-manager.js
Manages persistent user preferences for TTS settings (voice, language, format, pitch, rate, volume). Stores config in `~/.tts-config.json`.

### scripts/package.json
NPM package configuration with node-edge-tts dependency.

### references/node_edge_tts_guide.md
Complete documentation for node-edge-tts npm package including:
- Full voice list by language
- Prosody options (rate, pitch, volume)
- Usage examples (CLI and Module)
- Subtitle generation
- Output formats
- Best practices and limitations

### Voice Testing
Test different voices and preview audio quality at: https://tts.travisvn.com/

Refer to this when you need specific voice details or advanced features.

## Installation

To use the bundled scripts:

```bash
cd /home/user/clawd/skills/public/tts-skill/scripts
npm install
```

This installs:
- `node-edge-tts` - TTS library
- `commander` - CLI argument parsing

## Workflow

1. **Detect intent**: Check for "tts" trigger or keyword in user message
2. **Choose method**: Use built-in `tts` tool for simple requests, or `scripts/tts-converter.js` for customization
3. **Generate audio**: Convert the target text (message, search results, summary)
4. **Return to user**: The tts tool returns a MEDIA: path; Clawdbot handles delivery

## Testing

### Basic Test
Run the test script to verify TTS functionality:
```bash
cd /home/user/clawd/skills/public/edge-tts/scripts
npm test
```
This generates a test audio file and verifies the TTS service is working.

### Voice Testing
Test different voices and preview audio quality at: https://tts.travisvn.com/

### Integration Test
Use the built-in `tts` tool for quick testing:
```javascript
// Example: Test TTS with default settings
tts("This is a test of the TTS functionality.")
```

### Configuration Test
Verify configuration persistence:
```bash
cd /home/user/clawd/skills/public/edge-tts/scripts
node config-manager.js --get
node config-manager.js --set-voice en-US-GuyNeural
node config-manager.js --get
```

## Troubleshooting

- **Test connectivity**: Run `npm test` to check if TTS service is accessible
- **Check voice availability**: Use `node tts-converter.js --list-voices` to see available voices
- **Verify proxy settings**: If using proxy, test with `node tts-converter.js "test" --proxy http://localhost:7890`
- **Check audio output**: The test should generate `test-output.mp3` in the scripts directory

## Notes

- node-edge-tts uses Microsoft Edge's online TTS service (updated, working authentication)
- No API key needed (free service)
- Output is MP3 format by default
- Requires internet connection
- Supports subtitle generation (JSON format with word-level timing)
- **Temporary File Handling**: By default, audio files are saved to the system's temporary directory (`/tmp/edge-tts-temp/` on Unix, `C:\Users\<user>\AppData\Local\Temp\edge-tts-temp\` on Windows) with unique filenames (e.g., `tts_1234567890_abc123.mp3`). Files are not automatically deleted - the calling application (Clawdbot) should handle cleanup after use. You can specify a custom output path with the `--output` option if permanent storage is needed.
- **TTS keyword filtering**: The skill automatically filters out TTS-related keywords (tts, TTS, text-to-speech) from text before conversion to avoid converting the trigger words themselves to audio
- For repeated preferences, use `config-manager.js` to set defaults
- **Default voice**: `en-US-MichelleNeural` (female, natural)
- Neural voices (ending in `Neural`) provide higher quality than Standard voices

Related Skills

portfolio-watcher

533

from sundial-org/awesome-openclaw-skills

Monitor stock/crypto holdings, get price alerts, track portfolio performance

portainer

533

from sundial-org/awesome-openclaw-skills

Control Docker containers and stacks via Portainer API. List containers, start/stop/restart, view logs, and redeploy stacks from git.

portable-tools

533

from sundial-org/awesome-openclaw-skills

Build cross-device tools without hardcoding paths or account names

polymarket

533

from sundial-org/awesome-openclaw-skills

Trade prediction markets on Polymarket. Analyze odds, place bets, track positions, automate alerts, and maximize returns from event outcomes. Covers sports, politics, entertainment, and more.

polymarket-traiding-bot

533

from sundial-org/awesome-openclaw-skills

No description provided.

polymarket-analysis

533

from sundial-org/awesome-openclaw-skills

Analyze Polymarket prediction markets for trading edges. Pair Cost arbitrage, whale tracking, sentiment analysis, momentum signals, user profile tracking. No execution.

polymarket-agent

533

from sundial-org/awesome-openclaw-skills

Autonomous prediction market agent - analyzes markets, researches news, and identifies trading opportunities

polymarket-5

533

from sundial-org/awesome-openclaw-skills

Query Polymarket prediction markets. Use for questions about prediction markets, betting odds, market prices, event probabilities, or when user asks about Polymarket data.

polymarket-4

533

from sundial-org/awesome-openclaw-skills

Query Polymarket prediction markets. Use for questions about prediction markets, betting odds, market prices, event probabilities, or when user asks about Polymarket data.

polymarket-3

533

from sundial-org/awesome-openclaw-skills

Query Polymarket prediction market odds and events via CLI. Search for markets, get current prices, list events by category. Supports sports betting (NFL, NBA, soccer/EPL, Champions League), politics, crypto, elections, geopolitics. Real money markets = more accurate than polls. No API key required. Use when asked about odds, probabilities, predictions, or "what are the chances of X".

polymarket-2

533

from sundial-org/awesome-openclaw-skills

Query Polymarket prediction markets - check odds, trending markets, search events, track prices.

pollinations

533

from sundial-org/awesome-openclaw-skills

Pollinations.ai API for AI generation - text, images, videos, audio, and analysis. Use when user requests AI-powered generation (text completion, images, videos, audio, vision/analysis, transcription) or mentions Pollinations. Supports 25+ models (OpenAI, Claude, Gemini, Flux, Veo, etc.) with OpenAI-compatible chat endpoint and specialized generation endpoints.