whisper-transcription

Transcribe audio and video files to text using OpenAI Whisper. Use when: converting podcasts to blog posts; creating video subtitles; extracting quotes from interviews; repurposing video content to text; building searchable audio archives

16 stars

Best use case

whisper-transcription is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Transcribe audio and video files to text using OpenAI Whisper. Use when: converting podcasts to blog posts; creating video subtitles; extracting quotes from interviews; repurposing video content to text; building searchable audio archives

Teams using whisper-transcription should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/whisper-transcription/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/content-media/whisper-transcription/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/whisper-transcription/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How whisper-transcription Compares

Feature / Agentwhisper-transcriptionStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Transcribe audio and video files to text using OpenAI Whisper. Use when: converting podcasts to blog posts; creating video subtitles; extracting quotes from interviews; repurposing video content to text; building searchable audio archives

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Whisper Transcription

> Transcribe any audio or video to text using OpenAI's Whisper model - the same technology powering ChatGPT voice features.

## When to Use This Skill

- **Podcast repurposing** - Convert episodes to blog posts, show notes, social snippets
- **Video subtitles** - Generate SRT/VTT files for YouTube, social media
- **Interview extraction** - Pull quotes and insights from recorded calls
- **Content audit** - Make audio/video libraries searchable
- **Translation** - Transcribe and translate foreign language content


## What Claude Does vs What You Decide

| Claude Does | You Decide |
|-------------|------------|
| Structures production workflow | Final creative direction |
| Suggests technical approaches | Equipment and tool choices |
| Creates templates and checklists | Quality standards |
| Identifies best practices | Brand/voice decisions |
| Generates script outlines | Final script approval |

## Dependencies

```bash
pip install openai-whisper torch ffmpeg-python click
# Also requires ffmpeg installed on system
# macOS: brew install ffmpeg
# Ubuntu: sudo apt install ffmpeg
```

## Commands

### Transcribe Single File
```bash
python scripts/main.py transcribe audio.mp3 --model medium --output transcript.txt
python scripts/main.py transcribe video.mp4 --format srt --output subtitles.srt
```

### Batch Transcription
```bash
python scripts/main.py batch ./recordings/ --format txt --output ./transcripts/
```

### Transcribe + Translate
```bash
python scripts/main.py translate foreign-audio.mp3 --to en
```

### Extract Timestamps
```bash
python scripts/main.py timestamps podcast.mp3 --format json
```

## Examples

### Example 1: Podcast to Blog Post
```bash
# Transcribe 1-hour podcast
python scripts/main.py transcribe episode-42.mp3 --model medium

# Output: episode-42.txt (full transcript with timestamps)
# Processing time: ~5 min for 1 hour audio on M1 Mac
```

### Example 2: YouTube Subtitles
```bash
# Generate SRT for video upload
python scripts/main.py transcribe marketing-video.mp4 --format srt

# Output: marketing-video.srt
# Upload directly to YouTube/Vimeo
```

### Example 3: Batch Process Interview Library
```bash
# Transcribe all recordings in folder
python scripts/main.py batch ./customer-interviews/ --model small --format txt

# Output: ./customer-interviews/*.txt (one per audio file)
```

## Model Selection Guide

| Model | Speed | Accuracy | VRAM | Best For |
|-------|-------|----------|------|----------|
| `tiny` | Fastest | ~70% | 1GB | Quick drafts, short clips |
| `base` | Fast | ~80% | 1GB | Social media clips |
| `small` | Medium | ~85% | 2GB | Podcasts, interviews |
| `medium` | Slow | ~90% | 5GB | Professional transcripts |
| `large` | Slowest | ~95% | 10GB | Critical accuracy needs |

**Recommendation:** Start with `small` for most marketing content. Use `medium` for client deliverables.

## Output Formats

| Format | Extension | Use Case |
|--------|-----------|----------|
| `txt` | .txt | Blog posts, analysis |
| `srt` | .srt | Video subtitles (YouTube) |
| `vtt` | .vtt | Web video subtitles |
| `json` | .json | Programmatic access |
| `tsv` | .tsv | Spreadsheet analysis |

## Performance Tips

1. **GPU acceleration** - 10x faster with CUDA GPU
2. **Audio extraction** - Script auto-extracts audio from video
3. **Chunking** - Long files auto-split for memory efficiency
4. **Language detection** - Automatic, or specify with `--language`

## Skill Boundaries

### What This Skill Does Well
- Structuring audio production workflows
- Providing technical guidance
- Creating quality checklists
- Suggesting creative approaches

### What This Skill Cannot Do
- Replace audio engineering expertise
- Make subjective creative decisions
- Access or edit audio files directly
- Guarantee commercial success

## Related Skills

- [video-processing](../video-processing/) - Extract audio from video
- [youtube-downloader](../youtube-downloader/) - Download videos to transcribe
- [content-repurposer](../content-repurposer/) - Transform transcripts to content
- [podcast-production](../../audio/podcast-production/) - Create podcasts

## Skill Metadata


- **Mode**: cyborg
```yaml
category: automation
subcategory: audio-processing
dependencies: [openai-whisper, torch, ffmpeg-python]
difficulty: beginner
time_saved: 10+ hours/week
```

Related Skills

azure-ai-transcription-py

16
from diegosouzapw/awesome-omni-skill

Azure AI Transcription SDK for Python. Use for real-time and batch speech-to-text transcription with timestamps and diarization.

bgo

10
from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

moai-lang-r

16
from diegosouzapw/awesome-omni-skill

R 4.4+ best practices with testthat 3.2, lintr 3.2, and data analysis patterns.

moai-lang-python

16
from diegosouzapw/awesome-omni-skill

Python 3.13+ development specialist covering FastAPI, Django, async patterns, data science, testing with pytest, and modern Python features. Use when developing Python APIs, web applications, data pipelines, or writing tests.

moai-icons-vector

16
from diegosouzapw/awesome-omni-skill

Vector icon libraries ecosystem guide covering 10+ major libraries with 200K+ icons, including React Icons (35K+), Lucide (1000+), Tabler Icons (5900+), Iconify (200K+), Heroicons, Phosphor, and Radix Icons with implementation patterns, decision trees, and best practices.

moai-foundation-trust

16
from diegosouzapw/awesome-omni-skill

Complete TRUST 4 principles guide covering Test First, Readable, Unified, Secured. Validation methods, enterprise quality gates, metrics, and November 2025 standards. Enterprise v4.0 with 50+ software quality standards references.

moai-foundation-memory

16
from diegosouzapw/awesome-omni-skill

Persistent memory across sessions using MCP Memory Server for user preferences, project context, and learned patterns

moai-foundation-core

16
from diegosouzapw/awesome-omni-skill

MoAI-ADK's foundational principles - TRUST 5, SPEC-First TDD, delegation patterns, token optimization, progressive disclosure, modular architecture, agent catalog, command reference, and execution rules for building AI-powered development workflows

moai-cc-claude-md

16
from diegosouzapw/awesome-omni-skill

Authoring CLAUDE.md Project Instructions. Design project-specific AI guidance, document workflows, define architecture patterns. Use when creating CLAUDE.md files for projects, documenting team standards, or establishing AI collaboration guidelines.

moai-alfred-language-detection

16
from diegosouzapw/awesome-omni-skill

Auto-detects project language and framework from package.json, pyproject.toml, etc.

mnemonic

16
from diegosouzapw/awesome-omni-skill

Unified memory system - aggregates communications and AI sessions across all channels into searchable, analyzable memory

mlops

16
from diegosouzapw/awesome-omni-skill

MLflow, model versioning, experiment tracking, model registry, and production ML systems