clawvox

ClawVox - ElevenLabs voice studio for OpenClaw. Generate speech, transcribe audio, clone voices, create sound effects, and more.

3,891 stars

byopenclaw

View on GitHub Installation ↓

Best use case

clawvox is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

ClawVox - ElevenLabs voice studio for OpenClaw. Generate speech, transcribe audio, clone voices, create sound effects, and more.

Teams using clawvox should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/clawvox/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/abhishek-official1/clawvox/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/clawvox/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How clawvox Compares

Feature / Agent	clawvox	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

ClawVox - ElevenLabs voice studio for OpenClaw. Generate speech, transcribe audio, clone voices, create sound effects, and more.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

SKILL.md Source

# ClawVox

Transform your OpenClaw assistant into a professional voice production studio with ClawVox - powered by ElevenLabs.

## Quick Reference

| Action | Command | Description |
|--------|---------|-------------|
| Speak | `{baseDir}/scripts/speak.sh 'text'` | Convert text to speech |
| Transcribe | `{baseDir}/scripts/transcribe.sh audio.mp3` | Speech to text |
| Clone | `{baseDir}/scripts/clone.sh --name "Voice" sample.mp3` | Clone a voice |
| SFX | `{baseDir}/scripts/sfx.sh "thunder storm"` | Generate sound effects |
| Voices | `{baseDir}/scripts/voices.sh list` | List available voices |
| Dub | `{baseDir}/scripts/dub.sh --target es audio.mp3` | Translate audio |
| Isolate | `{baseDir}/scripts/isolate.sh audio.mp3` | Remove background noise |

## Setup

1. Get your API key from [elevenlabs.io/app/settings/api-keys](https://elevenlabs.io/app/settings/api-keys)
2. Configure in `~/.openclaw/openclaw.json`:

```json5
{
  skills: {
    entries: {
      "clawvox": {
        apiKey: "YOUR_ELEVENLABS_API_KEY",
        config: {
          defaultVoice: "Rachel",
          defaultModel: "eleven_turbo_v2_5",
          outputDir: "~/.openclaw/audio"
        }
      }
    }
  }
}
```

Or set the environment variable:
```bash
export ELEVENLABS_API_KEY="your_api_key_here"
```

## Voice Generation (TTS)

### Basic Text-to-Speech
```bash
# Quick speak with default voice (Rachel)
{baseDir}/scripts/speak.sh 'Hello, I am your personal AI assistant.'

# Specify voice by name
{baseDir}/scripts/speak.sh --voice Adam 'Hello from Adam'

# Save to file
{baseDir}/scripts/speak.sh --out ~/audio/greeting.mp3 'Welcome to the show'

# Use specific model
{baseDir}/scripts/speak.sh --model eleven_multilingual_v2 'Bonjour'

# Adjust voice settings
{baseDir}/scripts/speak.sh --stability 0.5 --similarity 0.8 'Expressive speech'

# Adjust speed
{baseDir}/scripts/speak.sh --speed 1.2 'Faster speech'

# Use multilingual model for other languages
{baseDir}/scripts/speak.sh --model eleven_multilingual_v2 --voice Rachel 'Hola, que tal'
{baseDir}/scripts/speak.sh --model eleven_multilingual_v2 --voice Adam 'Guten Tag'
```

### Voice Models

| Model | Latency | Languages | Best For |
|-------|---------|-----------|----------|
| `eleven_flash_v2_5` | ~75ms | 32 | Real-time, streaming |
| `eleven_turbo_v2_5` | ~250ms | 32 | Balanced quality/speed |
| `eleven_multilingual_v2` | ~500ms | 29 | Long-form, highest quality |

### Available Voices

Premade voices: Rachel, Adam, Antoni, Bella, Domi, Elli, Josh, Sam, Callum, Charlie, George, Liam, Matilda, Alice, Bill, Brian, Chris, Daniel, Eric, Jessica, Laura, Lily, River, Roger, Sarah, Will

### Long-Form Content
```bash
# Generate audio from text file
{baseDir}/scripts/speak.sh --input chapter.txt --voice "George" --out audiobook.mp3
```

## Speech-to-Text (Transcription)

### Basic Transcription
```bash
# Transcribe audio file
{baseDir}/scripts/transcribe.sh recording.mp3

# Save to file
{baseDir}/scripts/transcribe.sh --out transcript.txt audio.mp3

# Transcribe with language hint
{baseDir}/scripts/transcribe.sh --language es spanish_audio.mp3

# Include timestamps
{baseDir}/scripts/transcribe.sh --timestamps podcast.mp3
```

### Supported Formats
- MP3, MP4, MPEG, MPGA, M4A, WAV, WebM
- Maximum file size: 100MB

## Voice Cloning

### Instant Voice Clone
```bash
# Clone from single sample (minimum 30 seconds recommended)
{baseDir}/scripts/clone.sh --name MyVoice recording.mp3

# Clone with description
{baseDir}/scripts/clone.sh --name BusinessVoice \
  --description 'Professional male voice' \
  sample.mp3

# Clone with labels
{baseDir}/scripts/clone.sh --name MyVoice \
  --labels '{"gender":"male","age":"adult"}' \
  sample.mp3

# Remove background noise during cloning
{baseDir}/scripts/clone.sh --name CleanVoice \
  --remove-bg-noise \
  sample.mp3

# Test cloned voice
{baseDir}/scripts/speak.sh --voice MyVoice 'Testing my cloned voice'
```

## Voice Library Management

```bash
# List all available voices
{baseDir}/scripts/voices.sh list

# Get voice details
{baseDir}/scripts/voices.sh info --name Rachel
{baseDir}/scripts/voices.sh info --id 21m00Tcm4TlvDq8ikWAM

# Search voices (filter output with grep)
{baseDir}/scripts/voices.sh list | grep -i "female"

# Filter by category
{baseDir}/scripts/voices.sh list --category premade
{baseDir}/scripts/voices.sh list --category cloned

# Download voice preview
{baseDir}/scripts/voices.sh preview --name Rachel -o preview.mp3

# Delete custom voice
{baseDir}/scripts/voices.sh delete --id "voice_id"
```

## Sound Effects

```bash
# Generate sound effect
{baseDir}/scripts/sfx.sh 'Heavy rain on a tin roof'

# With duration
{baseDir}/scripts/sfx.sh --duration 5 'Forest ambiance with birds'

# With prompt influence (higher = more accurate)
{baseDir}/scripts/sfx.sh --influence 0.8 'Sci-fi laser gun firing'

# Save to file
{baseDir}/scripts/sfx.sh --out effects/thunder.mp3 'Rolling thunder'
```

**Note:** Duration range is 0.5 to 22 seconds (rounded to nearest 0.5)

## Voice Isolation

```bash
# Remove background noise and isolate voice
{baseDir}/scripts/isolate.sh noisy_recording.mp3

# Save to specific file
{baseDir}/scripts/isolate.sh --out clean_voice.mp3 meeting_recording.mp3

# Don't tag audio events
{baseDir}/scripts/isolate.sh --no-audio-events recording.mp3
```

**Requirements:**
- Minimum duration: 4.6 seconds
- Supported formats: MP3, WAV, M4A, OGG, FLAC

## Dubbing (Multi-Language Translation)

```bash
# Dub audio to Spanish
{baseDir}/scripts/dub.sh --target es audio.mp3

# Dub with source language specified
{baseDir}/scripts/dub.sh --source en --target ja video.mp4

# Check dubbing status
{baseDir}/scripts/dub.sh --status --id "dubbing_id"

# Download dubbed audio
{baseDir}/scripts/dub.sh --download --id "dubbing_id" --out dubbed.mp3
```

**Supported languages:** en, es, fr, de, it, pt, pl, hi, ar, zh, ja, ko, nl, ru, tr, vi, sv, da, fi, cs, el, he, id, ms, no, ro, uk, hu, th

## API Usage Examples

For direct API access, all scripts use curl under the hood:

```bash
# Direct TTS API call
curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/VOICE_ID" \
  -H "xi-api-key: $ELEVENLABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "model_id": "eleven_turbo_v2_5"}' \
  --output speech.mp3
```

## Error Handling

All scripts provide helpful error messages:

- **401**: Authentication failed - Check your API key
- **403**: Permission denied - Your API key may not have access
- **429**: Rate limit exceeded - Wait before trying again
- **500/502/503**: ElevenLabs API issues - Try again later

## Testing

Run the test suite to verify everything works:

```bash
{baseDir}/test.sh YOUR_API_KEY
```

Or with environment variable:
```bash
export ELEVENLABS_API_KEY="your_key"
{baseDir}/test.sh
```

## Troubleshooting

### Common Issues

1. **"exec host not allowed (requested gateway)"**
   - The skill needs to run commands in a sandbox environment
   - Configure OpenClaw to use sandbox: `tools.exec.host: "sandbox"`
   - Or enable sandboxing in your OpenClaw config
   - Alternative: Configure exec approvals for gateway host (see OpenClaw docs)

2. **Parse errors with quotes or exclamation marks**
   - Use single quotes instead of double quotes: `'Hello world'` not `"Hello world!"`
   - Avoid exclamation marks (`!`) in text when using double quotes
   - For complex text, use the `--input` option with a file

3. **"ELEVENLABS_API_KEY not set"**
   - Ensure `ELEVENLABS_API_KEY` is set or configured in openclaw.json
   - Check that the API key is at least 20 characters long

2. **"jq is required but not installed"**
   - Install jq: `apt-get install jq` (Linux) or `brew install jq` (macOS)

3. **"Rate limited"**
   - Check your ElevenLabs plan quota at elevenlabs.io/app/usage
   - Free tier: ~10,000 characters/month

4. **"Voice not found"**
   - Use `{baseDir}/scripts/voices.sh list` to see available voices
   - Check if the voice ID is correct

5. **"Dubbing failed"**
   - Ensure source audio is clear and audible
   - Check supported language codes

6. **"File too large"**
   - Transcription: 100MB max
   - Dubbing: 500MB max
   - Voice cloning: 50MB per file

### Debug Mode
```bash
# Enable verbose output
DEBUG=1 {baseDir}/scripts/speak.sh 'test'

# Show API request details
DEBUG=1 {baseDir}/scripts/transcribe.sh audio.mp3
```

## Pricing Notes

ElevenLabs API pricing (approximate):
- **Flash v2.5**: ~$0.06/min
- **Turbo v2.5**: ~$0.06/min  
- **Multilingual v2**: ~$0.12/min
- **Voice cloning**: Included in plan
- **Sound effects**: ~$0.02/generation
- **Transcription**: ~$0.02/min (Scribe v1)

Free tier: ~10,000 characters/month

## Links

- [ElevenLabs Dashboard](https://elevenlabs.io/app)
- [API Documentation](https://elevenlabs.io/docs)
- [Voice Library](https://elevenlabs.io/voice-library)
- [Pricing](https://elevenlabs.io/pricing)

Related Skills

---

3891

from openclaw/skills

name: article-factory-wechat

Content & Documentation

humanizer

3891

from openclaw/skills

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.

Content & Documentation

find-skills

3891

from openclaw/skills

Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.

General Utilities

tavily-search

3891

from openclaw/skills

Use Tavily API for real-time web search and content extraction. Use when: user needs real-time web search results, research, or current information from the web. Requires Tavily API key.

Data & Research

baidu-search

3891

from openclaw/skills

Search the web using Baidu AI Search Engine (BDSE). Use for live information, documentation, or research topics.

Data & Research

agent-autonomy-kit

3891

from openclaw/skills

Stop waiting for prompts. Keep working.

Workflow & Productivity

Meeting Prep

3891

from openclaw/skills

Never walk into a meeting unprepared again. Your agent researches all attendees before calendar events—pulling LinkedIn profiles, recent company news, mutual connections, and conversation starters. Generates a briefing doc with talking points, icebreakers, and context so you show up informed and confident. Triggered automatically before meetings or on-demand. Configure research depth, advance timing, and output format. Walking into meetings blind is amateur hour—missed connections, generic small talk, zero leverage. Use when setting up meeting intelligence, researching specific attendees, generating pre-meeting briefs, or automating your prep workflow.

Workflow & Productivity

self-improvement

3891

from openclaw/skills

Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Claude ('No, that's wrong...', 'Actually...'), (3) User requests a capability that doesn't exist, (4) An external API or tool fails, (5) Claude realizes its knowledge is outdated or incorrect, (6) A better approach is discovered for a recurring task. Also review learnings before major tasks.

Agent Intelligence & Learning

botlearn-healthcheck

3891

from openclaw/skills

botlearn-healthcheck — BotLearn autonomous health inspector for OpenClaw instances across 5 domains (hardware, config, security, skills, autonomy); triggers on system check, health report, diagnostics, or scheduled heartbeat inspection.

DevOps & Infrastructure