edge-tts
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Best use case
edge-tts is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Teams using edge-tts should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/edge-tts/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How edge-tts Compares
| Feature / Agent | edge-tts | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Edge-TTS Skill
## Overview
Generate high-quality text-to-speech audio using Microsoft Edge's neural TTS service via the node-edge-tts npm package. Supports multiple languages, voices, adjustable speed/pitch, and subtitle generation.
## Quick Start
When you detect TTS intent from triggers or user request:
1. **Call the tts tool** (Clawdbot built-in) to convert text to speech
2. The tool returns a MEDIA: path
3. Clawdbot routes the audio to the current channel
```javascript
// Example: Built-in tts tool usage
tts("Your text to convert to speech")
// Returns: MEDIA: /path/to/audio.mp3
```
## Trigger Detection
Recognize "tts" keyword as TTS requests. The skill automatically filters out TTS-related keywords from text before conversion to avoid converting the trigger words themselves to audio.
## Advanced Customization
### Using the Node.js Scripts
For more control, use the bundled scripts directly:
#### TTS Converter
```bash
cd scripts
npm install
node tts-converter.js "Your text" --voice en-US-AriaNeural --rate +10% --output output.mp3
```
**Options:**
- `--voice, -v`: Voice name (default: en-US-AriaNeural)
- `--lang, -l`: Language code (e.g., en-US, es-ES)
- `--format, -o`: Output format (default: audio-24khz-48kbitrate-mono-mp3)
- `--pitch`: Pitch adjustment (e.g., +10%, -20%, default)
- `--rate, -r`: Rate adjustment (e.g., +10%, -20%, default)
- `--volume`: Volume adjustment (e.g., +0%, -10%, default)
- `--save-subtitles, -s`: Save subtitles as JSON file
- `--output, -f`: Output file path (default: tts_output.mp3)
- `--proxy, -p`: Proxy URL (e.g., http://localhost:7890)
- `--timeout`: Request timeout in milliseconds (default: 10000)
- `--list-voices, -L`: List available voices
#### Configuration Manager
```bash
cd scripts
npm install
node config-manager.js --set-voice en-US-AriaNeural
node config-manager.js --set-rate +10%
node config-manager.js --get
node config-manager.js --reset
```
### Voice Selection
Common voices (use `--list-voices` for full list):
**English:**
- `en-US-MichelleNeural` (female, natural, **default**)
- `en-US-AriaNeural` (female, natural)
- `en-US-GuyNeural` (male, natural)
- `en-GB-SoniaNeural` (female, British)
- `en-GB-RyanNeural` (male, British)
**Other Languages:**
- `es-ES-ElviraNeural` (Spanish, Spain)
- `fr-FR-DeniseNeural` (French)
- `de-DE-KatjaNeural` (German)
- `ja-JP-NanamiNeural` (Japanese)
- `zh-CN-XiaoxiaoNeural` (Chinese)
- `ar-SA-ZariyahNeural` (Arabic)
### Rate Guidelines
Rate values use percentage format:
- `"default"`: Normal speed
- `"-20%"` to `"-10%"`: Slow, clear (tutorials, stories, accessibility)
- `"+10%"` to `"+20%"`: Slightly fast (summaries)
- `"+30%"` to `"+50%"`: Fast (news, efficiency)
### Output Formats
Choose audio quality based on use case:
- `audio-24khz-48kbitrate-mono-mp3`: Standard quality (voice notes, messages)
- `audio-24khz-96kbitrate-mono-mp3`: High quality (presentations, content)
- `audio-48khz-96kbitrate-stereo-mp3`: Highest quality (professional audio, music)
## Resources
### scripts/tts-converter.js
Main TTS conversion script using node-edge-tts. Generates audio files with customizable voice, rate, volume, pitch, and format. Supports subtitle generation and voice listing.
### scripts/config-manager.js
Manages persistent user preferences for TTS settings (voice, language, format, pitch, rate, volume). Stores config in `~/.tts-config.json`.
### scripts/package.json
NPM package configuration with node-edge-tts dependency.
### references/node_edge_tts_guide.md
Complete documentation for node-edge-tts npm package including:
- Full voice list by language
- Prosody options (rate, pitch, volume)
- Usage examples (CLI and Module)
- Subtitle generation
- Output formats
- Best practices and limitations
### Voice Testing
Test different voices and preview audio quality at: https://tts.travisvn.com/
Refer to this when you need specific voice details or advanced features.
## Installation
To use the bundled scripts:
```bash
cd /home/user/clawd/skills/public/tts-skill/scripts
npm install
```
This installs:
- `node-edge-tts` - TTS library
- `commander` - CLI argument parsing
## Workflow
1. **Detect intent**: Check for "tts" trigger or keyword in user message
2. **Choose method**: Use built-in `tts` tool for simple requests, or `scripts/tts-converter.js` for customization
3. **Generate audio**: Convert the target text (message, search results, summary)
4. **Return to user**: The tts tool returns a MEDIA: path; Clawdbot handles delivery
## Testing
### Basic Test
Run the test script to verify TTS functionality:
```bash
cd /home/user/clawd/skills/public/edge-tts/scripts
npm test
```
This generates a test audio file and verifies the TTS service is working.
### Voice Testing
Test different voices and preview audio quality at: https://tts.travisvn.com/
### Integration Test
Use the built-in `tts` tool for quick testing:
```javascript
// Example: Test TTS with default settings
tts("This is a test of the TTS functionality.")
```
### Configuration Test
Verify configuration persistence:
```bash
cd /home/user/clawd/skills/public/edge-tts/scripts
node config-manager.js --get
node config-manager.js --set-voice en-US-GuyNeural
node config-manager.js --get
```
## Troubleshooting
- **Test connectivity**: Run `npm test` to check if TTS service is accessible
- **Check voice availability**: Use `node tts-converter.js --list-voices` to see available voices
- **Verify proxy settings**: If using proxy, test with `node tts-converter.js "test" --proxy http://localhost:7890`
- **Check audio output**: The test should generate `test-output.mp3` in the scripts directory
## Notes
- node-edge-tts uses Microsoft Edge's online TTS service (updated, working authentication)
- No API key needed (free service)
- Output is MP3 format by default
- Requires internet connection
- Supports subtitle generation (JSON format with word-level timing)
- **Temporary File Handling**: By default, audio files are saved to the system's temporary directory (`/tmp/edge-tts-temp/` on Unix, `C:\Users\<user>\AppData\Local\Temp\edge-tts-temp\` on Windows) with unique filenames (e.g., `tts_1234567890_abc123.mp3`). Files are not automatically deleted - the calling application (Clawdbot) should handle cleanup after use. You can specify a custom output path with the `--output` option if permanent storage is needed.
- **TTS keyword filtering**: The skill automatically filters out TTS-related keywords (tts, TTS, text-to-speech) from text before conversion to avoid converting the trigger words themselves to audio
- For repeated preferences, use `config-manager.js` to set defaults
- **Default voice**: `en-US-MichelleNeural` (female, natural)
- Neural voices (ending in `Neural`) provide higher quality than Standard voicesRelated Skills
portfolio-watcher
Monitor stock/crypto holdings, get price alerts, track portfolio performance
portainer
Control Docker containers and stacks via Portainer API. List containers, start/stop/restart, view logs, and redeploy stacks from git.
portable-tools
Build cross-device tools without hardcoding paths or account names
polymarket
Trade prediction markets on Polymarket. Analyze odds, place bets, track positions, automate alerts, and maximize returns from event outcomes. Covers sports, politics, entertainment, and more.
polymarket-traiding-bot
No description provided.
polymarket-analysis
Analyze Polymarket prediction markets for trading edges. Pair Cost arbitrage, whale tracking, sentiment analysis, momentum signals, user profile tracking. No execution.
polymarket-agent
Autonomous prediction market agent - analyzes markets, researches news, and identifies trading opportunities
polymarket-5
Query Polymarket prediction markets. Use for questions about prediction markets, betting odds, market prices, event probabilities, or when user asks about Polymarket data.
polymarket-4
Query Polymarket prediction markets. Use for questions about prediction markets, betting odds, market prices, event probabilities, or when user asks about Polymarket data.
polymarket-3
Query Polymarket prediction market odds and events via CLI. Search for markets, get current prices, list events by category. Supports sports betting (NFL, NBA, soccer/EPL, Champions League), politics, crypto, elections, geopolitics. Real money markets = more accurate than polls. No API key required. Use when asked about odds, probabilities, predictions, or "what are the chances of X".
polymarket-2
Query Polymarket prediction markets - check odds, trending markets, search events, track prices.
pollinations
Pollinations.ai API for AI generation - text, images, videos, audio, and analysis. Use when user requests AI-powered generation (text completion, images, videos, audio, vision/analysis, transcription) or mentions Pollinations. Supports 25+ models (OpenAI, Claude, Gemini, Flux, Veo, etc.) with OpenAI-compatible chat endpoint and specialized generation endpoints.