whisper-transcription
Transcribe audio and video files to text using OpenAI Whisper. Use when: converting podcasts to blog posts; creating video subtitles; extracting quotes from interviews; repurposing video content to text; building searchable audio archives
Best use case
whisper-transcription is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Transcribe audio and video files to text using OpenAI Whisper. Use when: converting podcasts to blog posts; creating video subtitles; extracting quotes from interviews; repurposing video content to text; building searchable audio archives
Teams using whisper-transcription should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/whisper-transcription/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How whisper-transcription Compares
| Feature / Agent | whisper-transcription | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Transcribe audio and video files to text using OpenAI Whisper. Use when: converting podcasts to blog posts; creating video subtitles; extracting quotes from interviews; repurposing video content to text; building searchable audio archives
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
# Whisper Transcription > Transcribe any audio or video to text using OpenAI's Whisper model - the same technology powering ChatGPT voice features. ## When to Use This Skill - **Podcast repurposing** - Convert episodes to blog posts, show notes, social snippets - **Video subtitles** - Generate SRT/VTT files for YouTube, social media - **Interview extraction** - Pull quotes and insights from recorded calls - **Content audit** - Make audio/video libraries searchable - **Translation** - Transcribe and translate foreign language content ## What Claude Does vs What You Decide | Claude Does | You Decide | |-------------|------------| | Structures production workflow | Final creative direction | | Suggests technical approaches | Equipment and tool choices | | Creates templates and checklists | Quality standards | | Identifies best practices | Brand/voice decisions | | Generates script outlines | Final script approval | ## Dependencies ```bash pip install openai-whisper torch ffmpeg-python click # Also requires ffmpeg installed on system # macOS: brew install ffmpeg # Ubuntu: sudo apt install ffmpeg ``` ## Commands ### Transcribe Single File ```bash python scripts/main.py transcribe audio.mp3 --model medium --output transcript.txt python scripts/main.py transcribe video.mp4 --format srt --output subtitles.srt ``` ### Batch Transcription ```bash python scripts/main.py batch ./recordings/ --format txt --output ./transcripts/ ``` ### Transcribe + Translate ```bash python scripts/main.py translate foreign-audio.mp3 --to en ``` ### Extract Timestamps ```bash python scripts/main.py timestamps podcast.mp3 --format json ``` ## Examples ### Example 1: Podcast to Blog Post ```bash # Transcribe 1-hour podcast python scripts/main.py transcribe episode-42.mp3 --model medium # Output: episode-42.txt (full transcript with timestamps) # Processing time: ~5 min for 1 hour audio on M1 Mac ``` ### Example 2: YouTube Subtitles ```bash # Generate SRT for video upload python scripts/main.py transcribe marketing-video.mp4 --format srt # Output: marketing-video.srt # Upload directly to YouTube/Vimeo ``` ### Example 3: Batch Process Interview Library ```bash # Transcribe all recordings in folder python scripts/main.py batch ./customer-interviews/ --model small --format txt # Output: ./customer-interviews/*.txt (one per audio file) ``` ## Model Selection Guide | Model | Speed | Accuracy | VRAM | Best For | |-------|-------|----------|------|----------| | `tiny` | Fastest | ~70% | 1GB | Quick drafts, short clips | | `base` | Fast | ~80% | 1GB | Social media clips | | `small` | Medium | ~85% | 2GB | Podcasts, interviews | | `medium` | Slow | ~90% | 5GB | Professional transcripts | | `large` | Slowest | ~95% | 10GB | Critical accuracy needs | **Recommendation:** Start with `small` for most marketing content. Use `medium` for client deliverables. ## Output Formats | Format | Extension | Use Case | |--------|-----------|----------| | `txt` | .txt | Blog posts, analysis | | `srt` | .srt | Video subtitles (YouTube) | | `vtt` | .vtt | Web video subtitles | | `json` | .json | Programmatic access | | `tsv` | .tsv | Spreadsheet analysis | ## Performance Tips 1. **GPU acceleration** - 10x faster with CUDA GPU 2. **Audio extraction** - Script auto-extracts audio from video 3. **Chunking** - Long files auto-split for memory efficiency 4. **Language detection** - Automatic, or specify with `--language` ## Skill Boundaries ### What This Skill Does Well - Structuring audio production workflows - Providing technical guidance - Creating quality checklists - Suggesting creative approaches ### What This Skill Cannot Do - Replace audio engineering expertise - Make subjective creative decisions - Access or edit audio files directly - Guarantee commercial success ## Related Skills - [video-processing](../video-processing/) - Extract audio from video - [youtube-downloader](../youtube-downloader/) - Download videos to transcribe - [content-repurposer](../content-repurposer/) - Transform transcripts to content - [podcast-production](../../audio/podcast-production/) - Create podcasts ## Skill Metadata - **Mode**: cyborg ```yaml category: automation subcategory: audio-processing dependencies: [openai-whisper, torch, ffmpeg-python] difficulty: beginner time_saved: 10+ hours/week ```
Related Skills
azure-ai-transcription-py
Azure AI Transcription SDK for Python. Use for real-time and batch speech-to-text transcription with timestamps and diarization.
bgo
Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.
moai-lang-r
R 4.4+ best practices with testthat 3.2, lintr 3.2, and data analysis patterns.
moai-lang-python
Python 3.13+ development specialist covering FastAPI, Django, async patterns, data science, testing with pytest, and modern Python features. Use when developing Python APIs, web applications, data pipelines, or writing tests.
moai-icons-vector
Vector icon libraries ecosystem guide covering 10+ major libraries with 200K+ icons, including React Icons (35K+), Lucide (1000+), Tabler Icons (5900+), Iconify (200K+), Heroicons, Phosphor, and Radix Icons with implementation patterns, decision trees, and best practices.
moai-foundation-trust
Complete TRUST 4 principles guide covering Test First, Readable, Unified, Secured. Validation methods, enterprise quality gates, metrics, and November 2025 standards. Enterprise v4.0 with 50+ software quality standards references.
moai-foundation-memory
Persistent memory across sessions using MCP Memory Server for user preferences, project context, and learned patterns
moai-foundation-core
MoAI-ADK's foundational principles - TRUST 5, SPEC-First TDD, delegation patterns, token optimization, progressive disclosure, modular architecture, agent catalog, command reference, and execution rules for building AI-powered development workflows
moai-cc-claude-md
Authoring CLAUDE.md Project Instructions. Design project-specific AI guidance, document workflows, define architecture patterns. Use when creating CLAUDE.md files for projects, documenting team standards, or establishing AI collaboration guidelines.
moai-alfred-language-detection
Auto-detects project language and framework from package.json, pyproject.toml, etc.
mnemonic
Unified memory system - aggregates communications and AI sessions across all channels into searchable, analyzable memory
mlops
MLflow, model versioning, experiment tracking, model registry, and production ML systems