elevenlabs-speech

Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.

7 stars

byDemerzels-lab

View on GitHub Installation ↓

Best use case

elevenlabs-speech is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using elevenlabs-speech should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/elevenlabs-voice/SKILL.md --create-dirs "https://raw.githubusercontent.com/Demerzels-lab/elsamultiskillagent/main/public/skills/amreahmed/elevenlabs-voice/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/elevenlabs-voice/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How elevenlabs-speech Compares

Feature / Agent	elevenlabs-speech	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# ElevenLabs Speech

Complete voice solution — both TTS and STT using one API:
- **TTS**: Text-to-Speech (high-quality voices)
- **STT**: Speech-to-Text via Scribe (accurate transcription)

## Quick Start

### Environment Setup

Set your API key:
```bash
export ELEVENLABS_API_KEY="sk_..."
```

Or create `.env` file in workspace root.

### Text-to-Speech (TTS)

Convert text to natural-sounding speech:

```bash
python scripts/elevenlabs_speech.py tts -t "Hello world" -o greeting.mp3
```

With custom voice:
```bash
python scripts/elevenlabs_speech.py tts -t "Hello" -v "voice_id_here" -o output.mp3
```

### List Available Voices

```bash
python scripts/elevenlabs_speech.py voices
```

## Using in Code

```python
from scripts.elevenlabs_speech import ElevenLabsClient

client = ElevenLabsClient(api_key="sk_...")

# Basic TTS
result = client.text_to_speech(
    text="Hello from zerox",
    output_path="greeting.mp3"
)

# With custom settings
result = client.text_to_speech(
    text="Your text here",
    voice_id="21m00Tcm4TlvDq8ikWAM",  # Rachel
    stability=0.5,
    similarity_boost=0.75,
    output_path="output.mp3"
)

# Get available voices
voices = client.get_voices()
for voice in voices['voices']:
    print(f"{voice['name']}: {voice['voice_id']}")
```

## Popular Voices

| Voice ID | Name | Description |
|----------|------|-------------|
| `21m00Tcm4TlvDq8ikWAM` | Rachel | Natural, versatile (default) |
| `AZnzlk1XvdvUeBnXmlld` | Domi | Strong, energetic |
| `EXAVITQu4vr4xnSDxMaL` | Bella | Soft, soothing |
| `ErXwobaYiN019PkySvjV` | Antoni | Well-rounded |
| `MF3mGyEYCl7XYWbV9V6O` | Elli | Warm, friendly |
| `TxGEqnHWrfWFTfGW9XjX` | Josh | Deep, calm |
| `VR6AewLTigWG4xSOukaG` | Arnold | Authoritative |

## Voice Settings

- **stability** (0-1): Lower = more emotional, Higher = more stable
- **similarity_boost** (0-1): Higher = closer to original voice

Default: stability=0.5, similarity_boost=0.75

## Models

- `eleven_turbo_v2_5` - Fast, high quality (default)
- `eleven_multilingual_v2` - Best for non-English
- `eleven_monolingual_v1` - English only

## Integration with Telegram

When user sends text and wants voice reply:

```python
# Generate speech
result = client.text_to_speech(text=user_text, output_path="reply.mp3")

# Send via Telegram message tool with media path
message(action="send", media="path/to/reply.mp3", as_voice=True)
```

## Pricing

Check https://elevenlabs.io/pricing for current rates. Free tier available!

## Speech-to-Text (STT) with ElevenLabs Scribe

Transcribe voice messages using ElevenLabs Scribe:

### Transcribe Audio

```bash
python scripts/elevenlabs_scribe.py voice_message.ogg
```

With specific language:
```bash
python scripts/elevenlabs_scribe.py voice_message.ogg --language ara
```

With speaker diarization (multiple speakers):
```bash
python scripts/elevenlabs_scribe.py voice_message.ogg --speakers 2
```

### Using in Code

```python
from scripts.elevenlabs_scribe import ElevenLabsScribe

client = ElevenLabsScribe(api_key="sk-...")

# Basic transcription
result = client.transcribe("voice_message.ogg")
print(result['text'])

# With language hint (improves accuracy)
result = client.transcribe("voice_message.ogg", language_code="ara")

# With speaker detection
result = client.transcribe("voice_message.ogg", num_speakers=2)
```

### Supported Formats

- mp3, mp4, mpeg, mpga, m4a, wav, webm
- Max file size: 100 MB
- Works great with Telegram voice messages (`.ogg`)

### Language Support

Scribe supports 99 languages including:
- Arabic (`ara`)
- English (`eng`)
- Spanish (`spa`)
- French (`fra`)
- And many more...

Without language hint, it auto-detects.

## Complete Workflow Example

**User sends voice message → You reply with voice:**

```python
from scripts.elevenlabs_scribe import ElevenLabsScribe
from scripts.elevenlabs_speech import ElevenLabsClient

# 1. Transcribe user's voice message
stt = ElevenLabsScribe()
transcription = stt.transcribe("user_voice.ogg")
user_text = transcription['text']

# 2. Process/understand the text
# ... your logic here ...

# 3. Generate response text
response_text = "Your response here"

# 4. Convert to speech
tts = ElevenLabsClient()
tts.text_to_speech(response_text, output_path="reply.mp3")

# 5. Send voice reply
message(action="send", media="reply.mp3", as_voice=True)
```

## Pricing

Check https://elevenlabs.io/pricing for current rates:

**TTS (Text-to-Speech):**
- Free tier: 10,000 characters/month
- Paid plans available

**STT (Speech-to-Text) - Scribe:**
- Free tier available
- Check website for current pricing

Related Skills

miranda-elevenlabs-speech

from Demerzels-lab/elsamultiskillagent

Text-to-Speech and Speech-to-Text using ElevenLabs AI.

Speech is Cheap (SIC) Skill

from Demerzels-lab/elsamultiskillagent

Fast, accurate, and incredibly inexpensive automatic speech-to-text transcription service.

chichi-speech

from Demerzels-lab/elsamultiskillagent

A RESTful service for high-quality text-to-speech using Qwen3.

elevenlabs-cli

from Demerzels-lab/elsamultiskillagent

CLI for ElevenLabs AI audio platform - text-to-speech, speech-to-text, voice cloning.

elevenlabs-phone-reminder-lite

from Demerzels-lab/elsamultiskillagent

Build AI phone call reminders with ElevenLabs Conversational AI + Twilio. Free starter guide.

elevenlabs-ai

from Demerzels-lab/elsamultiskillagent

OpenClaw skill for ElevenLabs APIs: text-to-speech, speech-to-speech, realtime speech-to-text, voices/models.

elevenlabs-stt

from Demerzels-lab/elsamultiskillagent

Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).

elevenlabs-music

from Demerzels-lab/elsamultiskillagent

Generate music from text prompts using ElevenLabs Eleven Music API. Use when creating songs, soundtracks, jingles, lullabies, or any audio music from descriptions. Supports vocals with AI-generated lyrics, instrumental tracks, and multiple genres/styles. Requires paid ElevenLabs plan.

salute-speech

from Demerzels-lab/elsamultiskillagent

Transcribe audio files using Sber Salute Speech async API.

speechall-cli

from Demerzels-lab/elsamultiskillagent

Install and use the speechall CLI tool for speech-to-text transcription.

paylock

from Demerzels-lab/elsamultiskillagent

Non-custodial SOL escrow for AI agent deals.

agent-reputation

from Demerzels-lab/elsamultiskillagent

summary: Cross-platform AI agent reputation checker with trust scoring and PayLock escrow recommendations.