voice-transcribe
Transcribe audio files using OpenAI's gpt-4o-mini-transcribe model with vocabulary hints and text replacements. Requires uv (https://docs.astral.sh/uv/).
Best use case
voice-transcribe is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Transcribe audio files using OpenAI's gpt-4o-mini-transcribe model with vocabulary hints and text replacements. Requires uv (https://docs.astral.sh/uv/).
Teams using voice-transcribe should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/voice-transcribe/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How voice-transcribe Compares
| Feature / Agent | voice-transcribe | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Transcribe audio files using OpenAI's gpt-4o-mini-transcribe model with vocabulary hints and text replacements. Requires uv (https://docs.astral.sh/uv/).
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# voice-transcribe transcribe audio files using openai's gpt-4o-mini-transcribe model. ## when to use when receiving voice memos (especially via whatsapp), just run: ```bash uv run /Users/darin/clawd/skills/voice-transcribe/transcribe <audio-file> ``` then respond based on the transcribed content. ## fixing transcription errors if darin says a word was transcribed wrong, add it to `vocab.txt` (for hints) or `replacements.txt` (for guaranteed fix). see sections below. ## supported formats - mp3, mp4, mpeg, mpga, m4a, wav, webm, ogg, opus ## examples ```bash # transcribe a voice memo transcribe /tmp/voice-memo.ogg # pipe to other tools transcribe /tmp/memo.ogg | pbcopy ``` ## setup 1. add your openai api key to `/Users/darin/clawd/skills/voice-transcribe/.env`: ``` OPENAI_API_KEY=sk-... ``` ## custom vocabulary add words to `vocab.txt` (one per line) to help the model recognize names/jargon: ``` Clawdis Clawdbot ``` ## text replacements if the model still gets something wrong, add a replacement to `replacements.txt`: ``` wrong spelling -> correct spelling ``` ## notes - assumes english (no language detection) - uses gpt-4o-mini-transcribe model specifically - caches by sha256 of audio file
Related Skills
gettr-transcribe-summarize
Download audio from a GETTR post (via HTML og:video), transcribe it locally with MLX Whisper on Apple Silicon (with timestamps via VTT), and summarize the transcript into bullet points and/or a timestamped outline. Use when given a GETTR post URL and asked to produce a transcript or summary.
invoice-tracker-pro
Complete freelance billing workflow — generate professional invoices, track payment status, send automated.
invoice-template
Free simple invoice generator.
voicemonkey
Control Alexa devices via VoiceMonkey API v2 - make announcements, trigger routines, start flows, and display media.
vibevoice
Local Spanish TTS using Microsoft VibeVoice.
transcribe
Transcribe audio files to text using local Whisper (Docker). Use when receiving voice messages, audio files (.mp3, .m4a, .ogg, .wav, .webm), or when asked to transcribe audio content.
percept-voice-cmd
Voice command detection and action execution for OpenClaw agents.
transcribee
Transcribe YouTube videos and local audio/video files with speaker diarization. Use when user asks to transcribe a YouTube URL, podcast, video, or audio file. Outputs clean speaker-labeled transcripts ready for LLM analysis.
free-groq-voice
FREE voice recognition using Groq's complimentary Whisper API.
voice-recognition
Local speech-to-text with OpenAI Whisper CLI.
x-voice-match
Analyze a Twitter/X account's posting style and generate authentic posts that match their voice. Use when the user wants to create X posts that sound like them, analyze their posting patterns, or maintain consistent voice across posts. Works with Bird CLI integration.
jarvis-voice
Metallic AI voice persona with TTS and visual transcript styling.