elevenlabs-stt

Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).

7 stars

byDemerzels-lab

View on GitHub Installation ↓

Best use case

elevenlabs-stt is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).

Teams using elevenlabs-stt should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/elevenlabs-stt/SKILL.md --create-dirs "https://raw.githubusercontent.com/Demerzels-lab/elsamultiskillagent/main/public/skills/clawdbotborges/elevenlabs-stt/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/elevenlabs-stt/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How elevenlabs-stt Compares

Feature / Agent	elevenlabs-stt	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# ElevenLabs Speech-to-Text

Transcribe audio files using ElevenLabs' Scribe v2 model. Supports 90+ languages with speaker diarization.

## Quick Start

```bash
# Basic transcription
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3

# With speaker diarization
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --diarize

# Specify language (improves accuracy)
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --lang en

# Full JSON output with timestamps
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --json
```

## Options

| Flag | Description |
|------|-------------|
| `--diarize` | Identify different speakers |
| `--lang CODE` | ISO language code (e.g., en, pt, es) |
| `--json` | Output full JSON with word timestamps |
| `--events` | Tag audio events (laughter, music, etc.) |

## Supported Formats

All major audio/video formats: mp3, m4a, wav, ogg, webm, mp4, etc.

## API Key

Set `ELEVENLABS_API_KEY` environment variable, or configure in clawdbot.json:

```json5
{
  skills: {
    entries: {
      "elevenlabs-stt": {
        apiKey: "sk_..."
      }
    }
  }
}
```

## Examples

```bash
# Transcribe a WhatsApp voice note
{baseDir}/scripts/transcribe.sh ~/Downloads/voice_note.ogg

# Meeting recording with multiple speakers
{baseDir}/scripts/transcribe.sh meeting.mp3 --diarize --lang en

# Get JSON for processing
{baseDir}/scripts/transcribe.sh podcast.mp3 --json > transcript.json
```

Related Skills

miranda-elevenlabs-speech

from Demerzels-lab/elsamultiskillagent

Text-to-Speech and Speech-to-Text using ElevenLabs AI.

elevenlabs-cli

from Demerzels-lab/elsamultiskillagent

CLI for ElevenLabs AI audio platform - text-to-speech, speech-to-text, voice cloning.

elevenlabs-phone-reminder-lite

from Demerzels-lab/elsamultiskillagent

Build AI phone call reminders with ElevenLabs Conversational AI + Twilio. Free starter guide.

elevenlabs-ai

from Demerzels-lab/elsamultiskillagent

OpenClaw skill for ElevenLabs APIs: text-to-speech, speech-to-speech, realtime speech-to-text, voices/models.

elevenlabs-music

from Demerzels-lab/elsamultiskillagent

Generate music from text prompts using ElevenLabs Eleven Music API. Use when creating songs, soundtracks, jingles, lullabies, or any audio music from descriptions. Supports vocals with AI-generated lyrics, instrumental tracks, and multiple genres/styles. Requires paid ElevenLabs plan.

elevenlabs-speech

from Demerzels-lab/elsamultiskillagent

Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.