voice-transcribe

Transcribe audio files using OpenAI's gpt-4o-mini-transcribe model with vocabulary hints and text replacements. Requires uv (https://docs.astral.sh/uv/).

7 stars

byDemerzels-lab

View on GitHub Installation ↓

Best use case

voice-transcribe is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Transcribe audio files using OpenAI's gpt-4o-mini-transcribe model with vocabulary hints and text replacements. Requires uv (https://docs.astral.sh/uv/).

Teams using voice-transcribe should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/voice-transcribe/SKILL.md --create-dirs "https://raw.githubusercontent.com/Demerzels-lab/elsamultiskillagent/main/public/skills/darinkishore/voice-transcribe/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/voice-transcribe/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How voice-transcribe Compares

Feature / Agent	voice-transcribe	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Transcribe audio files using OpenAI's gpt-4o-mini-transcribe model with vocabulary hints and text replacements. Requires uv (https://docs.astral.sh/uv/).

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# voice-transcribe

transcribe audio files using openai's gpt-4o-mini-transcribe model.

## when to use

when receiving voice memos (especially via whatsapp), just run:
```bash
uv run /Users/darin/clawd/skills/voice-transcribe/transcribe <audio-file>
```
then respond based on the transcribed content.

## fixing transcription errors

if darin says a word was transcribed wrong, add it to `vocab.txt` (for hints) or `replacements.txt` (for guaranteed fix). see sections below.

## supported formats

- mp3, mp4, mpeg, mpga, m4a, wav, webm, ogg, opus

## examples

```bash
# transcribe a voice memo
transcribe /tmp/voice-memo.ogg

# pipe to other tools
transcribe /tmp/memo.ogg | pbcopy
```

## setup

1. add your openai api key to `/Users/darin/clawd/skills/voice-transcribe/.env`:
   ```
   OPENAI_API_KEY=sk-...
   ```

## custom vocabulary

add words to `vocab.txt` (one per line) to help the model recognize names/jargon:
```
Clawdis
Clawdbot
```

## text replacements

if the model still gets something wrong, add a replacement to `replacements.txt`:
```
wrong spelling -> correct spelling
```

## notes

- assumes english (no language detection)
- uses gpt-4o-mini-transcribe model specifically
- caches by sha256 of audio file

Related Skills

gettr-transcribe-summarize

from Demerzels-lab/elsamultiskillagent

Download audio from a GETTR post (via HTML og:video), transcribe it locally with MLX Whisper on Apple Silicon (with timestamps via VTT), and summarize the transcript into bullet points and/or a timestamped outline. Use when given a GETTR post URL and asked to produce a transcript or summary.

invoice-tracker-pro

from Demerzels-lab/elsamultiskillagent

Complete freelance billing workflow — generate professional invoices, track payment status, send automated.

invoice-template

from Demerzels-lab/elsamultiskillagent

Free simple invoice generator.

voicemonkey

from Demerzels-lab/elsamultiskillagent

Control Alexa devices via VoiceMonkey API v2 - make announcements, trigger routines, start flows, and display media.

vibevoice

from Demerzels-lab/elsamultiskillagent

Local Spanish TTS using Microsoft VibeVoice.

transcribe

from Demerzels-lab/elsamultiskillagent

Transcribe audio files to text using local Whisper (Docker). Use when receiving voice messages, audio files (.mp3, .m4a, .ogg, .wav, .webm), or when asked to transcribe audio content.

percept-voice-cmd

from Demerzels-lab/elsamultiskillagent

Voice command detection and action execution for OpenClaw agents.

transcribee

from Demerzels-lab/elsamultiskillagent

Transcribe YouTube videos and local audio/video files with speaker diarization. Use when user asks to transcribe a YouTube URL, podcast, video, or audio file. Outputs clean speaker-labeled transcripts ready for LLM analysis.

free-groq-voice

from Demerzels-lab/elsamultiskillagent

FREE voice recognition using Groq's complimentary Whisper API.

voice-recognition

from Demerzels-lab/elsamultiskillagent

Local speech-to-text with OpenAI Whisper CLI.

x-voice-match

from Demerzels-lab/elsamultiskillagent

Analyze a Twitter/X account's posting style and generate authentic posts that match their voice. Use when the user wants to create X posts that sound like them, analyze their posting patterns, or maintain consistent voice across posts. Works with Bird CLI integration.

jarvis-voice

from Demerzels-lab/elsamultiskillagent

Metallic AI voice persona with TTS and visual transcript styling.