audio-reply

Generate audio replies using TTS. Trigger with "read it to me [URL]" to fetch and read content aloud, or "talk to me [topic]" to generate a spoken response. Also responds to "speak", "say it", "voice reply".

533 stars

bysundial-org

View on GitHub Installation ↓

Best use case

audio-reply is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using audio-reply should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/audio-reply/SKILL.md --create-dirs "https://raw.githubusercontent.com/sundial-org/awesome-openclaw-skills/main/skills/audio-reply/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/audio-reply/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How audio-reply Compares

Feature / Agent	audio-reply	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Audio Reply Skill

Generate spoken audio responses using MLX Audio TTS (chatterbox-turbo model).

## Trigger Phrases

- **"read it to me [URL]"** - Fetch content from URL and read it aloud
- **"talk to me [topic/question]"** - Generate a conversational response as audio
- **"speak"**, **"say it"**, **"voice reply"** - Convert your response to audio

## How to Use

### Mode 1: Read URL Content
```
User: read it to me https://example.com/article
```
1. Fetch the URL content using WebFetch
2. Extract readable text (strip HTML, focus on main content)
3. Generate audio using TTS
4. Play the audio and delete the file afterward

### Mode 2: Conversational Audio Response
```
User: talk to me about the weather today
```
1. Generate a natural, conversational response
2. Keep it concise (TTS works best with shorter segments)
3. Convert to audio, play it, then delete the file

## Implementation

### TTS Command
```bash
uv run mlx_audio.tts.generate \
  --model mlx-community/chatterbox-turbo-fp16 \
  --text "Your text here" \
  --play \
  --file_prefix /tmp/audio_reply
```

### Key Parameters
- `--model mlx-community/chatterbox-turbo-fp16` - Fast, natural voice
- `--play` - Auto-play the generated audio
- `--file_prefix` - Save to temp location for cleanup
- `--exaggeration 0.3` - Optional: add expressiveness (0.0-1.0)
- `--speed 1.0` - Adjust speech rate if needed

### Text Preparation Guidelines

**For "read it to me" mode:**
1. Fetch URL with WebFetch tool
2. Extract main content, strip navigation/ads/boilerplate
3. Summarize if very long (>500 words) - keep key points
4. Add natural pauses with periods and commas

**For "talk to me" mode:**
1. Write conversationally, as if speaking
2. Use contractions (I'm, you're, it's)
3. Add filler words sparingly for naturalness ([chuckle], um, anyway)
4. Keep responses under 200 words for best quality
5. Avoid technical jargon unless explaining it

### Audio Generation & Cleanup (IMPORTANT)

Always delete the audio file after playing - it's already in the chat history.

```bash
# Generate with unique filename and play
OUTPUT_FILE="/tmp/audio_reply_$(date +%s)"
uv run mlx_audio.tts.generate \
  --model mlx-community/chatterbox-turbo-fp16 \
  --text "Your response text" \
  --play \
  --file_prefix "$OUTPUT_FILE"

# ALWAYS clean up after playing
rm -f "${OUTPUT_FILE}"*.wav 2>/dev/null
```

### Error Handling

If TTS fails:
1. Check if model is downloaded (first run downloads ~500MB)
2. Ensure `uv` is installed and in PATH
3. Fall back to text response with apology

## Example Workflows

### Example 1: Read URL
```
User: read it to me https://blog.example.com/new-feature

Assistant actions:
1. WebFetch the URL
2. Extract article content
3. Generate TTS:
   uv run mlx_audio.tts.generate \
     --model mlx-community/chatterbox-turbo-fp16 \
     --text "Here's what I found... [article summary]" \
     --play --file_prefix /tmp/audio_reply_1706123456
4. Delete: rm -f /tmp/audio_reply_1706123456*.wav
5. Confirm: "Done reading the article to you."
```

### Example 2: Talk to Me
```
User: talk to me about what you can help with

Assistant actions:
1. Generate conversational response text
2. Generate TTS:
   uv run mlx_audio.tts.generate \
     --model mlx-community/chatterbox-turbo-fp16 \
     --text "Hey! So I can help you with all kinds of things..." \
     --play --file_prefix /tmp/audio_reply_1706123789
3. Delete: rm -f /tmp/audio_reply_1706123789*.wav
4. (No text output needed - audio IS the response)
```

## Notes

- First run may take longer as the model downloads (~500MB)
- Audio quality is best for English; other languages may vary
- For long content, consider chunking into multiple audio segments
- The `--play` flag uses system audio - ensure volume is up

Related Skills

audio-gen

533

from sundial-org/awesome-openclaw-skills

Generate audiobooks, podcasts, or educational audio content on demand. User provides an idea or topic, Claude AI writes a script, and ElevenLabs converts it to high-quality audio. Supports multiple formats (audiobook, podcast, educational), custom lengths, and voice effects. Use when asked to create audio content, make a podcast, generate an audiobook, or produce educational audio. Returns MP3 audio file via MEDIA token.

portfolio-watcher

533

from sundial-org/awesome-openclaw-skills

Monitor stock/crypto holdings, get price alerts, track portfolio performance

portainer

533

from sundial-org/awesome-openclaw-skills

Control Docker containers and stacks via Portainer API. List containers, start/stop/restart, view logs, and redeploy stacks from git.

portable-tools

533

from sundial-org/awesome-openclaw-skills

Build cross-device tools without hardcoding paths or account names

polymarket

533

from sundial-org/awesome-openclaw-skills

Trade prediction markets on Polymarket. Analyze odds, place bets, track positions, automate alerts, and maximize returns from event outcomes. Covers sports, politics, entertainment, and more.

polymarket-traiding-bot

533

from sundial-org/awesome-openclaw-skills

No description provided.

polymarket-analysis

533

from sundial-org/awesome-openclaw-skills

Analyze Polymarket prediction markets for trading edges. Pair Cost arbitrage, whale tracking, sentiment analysis, momentum signals, user profile tracking. No execution.

polymarket-agent

533

from sundial-org/awesome-openclaw-skills

Autonomous prediction market agent - analyzes markets, researches news, and identifies trading opportunities

polymarket-5

533

from sundial-org/awesome-openclaw-skills

Query Polymarket prediction markets. Use for questions about prediction markets, betting odds, market prices, event probabilities, or when user asks about Polymarket data.

polymarket-4

533

from sundial-org/awesome-openclaw-skills

Query Polymarket prediction markets. Use for questions about prediction markets, betting odds, market prices, event probabilities, or when user asks about Polymarket data.

polymarket-3

533

from sundial-org/awesome-openclaw-skills

Query Polymarket prediction market odds and events via CLI. Search for markets, get current prices, list events by category. Supports sports betting (NFL, NBA, soccer/EPL, Champions League), politics, crypto, elections, geopolitics. Real money markets = more accurate than polls. No API key required. Use when asked about odds, probabilities, predictions, or "what are the chances of X".

polymarket-2

533

from sundial-org/awesome-openclaw-skills

Query Polymarket prediction markets - check odds, trending markets, search events, track prices.