whisper-transcription

Transcribe audio and video files to text using OpenAI Whisper. Use when: converting podcasts to blog posts; creating video subtitles; extracting quotes from interviews; repurposing video content to text; building searchable audio archives

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

whisper-transcription is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using whisper-transcription should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/whisper-transcription/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/content-media/whisper-transcription/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/whisper-transcription/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How whisper-transcription Compares

Feature / Agent	whisper-transcription	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agent for YouTube Script Writing

Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.

SKILL.md Source

# Whisper Transcription

> Transcribe any audio or video to text using OpenAI's Whisper model - the same technology powering ChatGPT voice features.

## When to Use This Skill

- **Podcast repurposing** - Convert episodes to blog posts, show notes, social snippets
- **Video subtitles** - Generate SRT/VTT files for YouTube, social media
- **Interview extraction** - Pull quotes and insights from recorded calls
- **Content audit** - Make audio/video libraries searchable
- **Translation** - Transcribe and translate foreign language content


## What Claude Does vs What You Decide

| Claude Does | You Decide |
|-------------|------------|
| Structures production workflow | Final creative direction |
| Suggests technical approaches | Equipment and tool choices |
| Creates templates and checklists | Quality standards |
| Identifies best practices | Brand/voice decisions |
| Generates script outlines | Final script approval |

## Dependencies

```bash
pip install openai-whisper torch ffmpeg-python click
# Also requires ffmpeg installed on system
# macOS: brew install ffmpeg
# Ubuntu: sudo apt install ffmpeg
```

## Commands

### Transcribe Single File
```bash
python scripts/main.py transcribe audio.mp3 --model medium --output transcript.txt
python scripts/main.py transcribe video.mp4 --format srt --output subtitles.srt
```

### Batch Transcription
```bash
python scripts/main.py batch ./recordings/ --format txt --output ./transcripts/
```

### Transcribe + Translate
```bash
python scripts/main.py translate foreign-audio.mp3 --to en
```

### Extract Timestamps
```bash
python scripts/main.py timestamps podcast.mp3 --format json
```

## Examples

### Example 1: Podcast to Blog Post
```bash
# Transcribe 1-hour podcast
python scripts/main.py transcribe episode-42.mp3 --model medium

# Output: episode-42.txt (full transcript with timestamps)
# Processing time: ~5 min for 1 hour audio on M1 Mac
```

### Example 2: YouTube Subtitles
```bash
# Generate SRT for video upload
python scripts/main.py transcribe marketing-video.mp4 --format srt

# Output: marketing-video.srt
# Upload directly to YouTube/Vimeo
```

### Example 3: Batch Process Interview Library
```bash
# Transcribe all recordings in folder
python scripts/main.py batch ./customer-interviews/ --model small --format txt

# Output: ./customer-interviews/*.txt (one per audio file)
```

## Model Selection Guide

| Model | Speed | Accuracy | VRAM | Best For |
|-------|-------|----------|------|----------|
| `tiny` | Fastest | ~70% | 1GB | Quick drafts, short clips |
| `base` | Fast | ~80% | 1GB | Social media clips |
| `small` | Medium | ~85% | 2GB | Podcasts, interviews |
| `medium` | Slow | ~90% | 5GB | Professional transcripts |
| `large` | Slowest | ~95% | 10GB | Critical accuracy needs |

**Recommendation:** Start with `small` for most marketing content. Use `medium` for client deliverables.

## Output Formats

| Format | Extension | Use Case |
|--------|-----------|----------|
| `txt` | .txt | Blog posts, analysis |
| `srt` | .srt | Video subtitles (YouTube) |
| `vtt` | .vtt | Web video subtitles |
| `json` | .json | Programmatic access |
| `tsv` | .tsv | Spreadsheet analysis |

## Performance Tips

1. **GPU acceleration** - 10x faster with CUDA GPU
2. **Audio extraction** - Script auto-extracts audio from video
3. **Chunking** - Long files auto-split for memory efficiency
4. **Language detection** - Automatic, or specify with `--language`

## Skill Boundaries

### What This Skill Does Well
- Structuring audio production workflows
- Providing technical guidance
- Creating quality checklists
- Suggesting creative approaches

### What This Skill Cannot Do
- Replace audio engineering expertise
- Make subjective creative decisions
- Access or edit audio files directly
- Guarantee commercial success

## Related Skills

- [video-processing](../video-processing/) - Extract audio from video
- [youtube-downloader](../youtube-downloader/) - Download videos to transcribe
- [content-repurposer](../content-repurposer/) - Transform transcripts to content
- [podcast-production](../../audio/podcast-production/) - Create podcasts

## Skill Metadata


- **Mode**: cyborg
```yaml
category: automation
subcategory: audio-processing
dependencies: [openai-whisper, torch, ffmpeg-python]
difficulty: beginner
time_saved: 10+ hours/week
```

Related Skills

azure-ai-transcription-py

from diegosouzapw/awesome-omni-skill

Azure AI Transcription SDK for Python. Use for real-time and batch speech-to-text transcription with timestamps and diarization.

bgo

from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

moai-lang-r

from diegosouzapw/awesome-omni-skill

R 4.4+ best practices with testthat 3.2, lintr 3.2, and data analysis patterns.

moai-lang-python

from diegosouzapw/awesome-omni-skill

Python 3.13+ development specialist covering FastAPI, Django, async patterns, data science, testing with pytest, and modern Python features. Use when developing Python APIs, web applications, data pipelines, or writing tests.

moai-icons-vector

from diegosouzapw/awesome-omni-skill

Vector icon libraries ecosystem guide covering 10+ major libraries with 200K+ icons, including React Icons (35K+), Lucide (1000+), Tabler Icons (5900+), Iconify (200K+), Heroicons, Phosphor, and Radix Icons with implementation patterns, decision trees, and best practices.

moai-foundation-trust

from diegosouzapw/awesome-omni-skill

Complete TRUST 4 principles guide covering Test First, Readable, Unified, Secured. Validation methods, enterprise quality gates, metrics, and November 2025 standards. Enterprise v4.0 with 50+ software quality standards references.

moai-foundation-memory

from diegosouzapw/awesome-omni-skill

Persistent memory across sessions using MCP Memory Server for user preferences, project context, and learned patterns

moai-foundation-core

from diegosouzapw/awesome-omni-skill

MoAI-ADK's foundational principles - TRUST 5, SPEC-First TDD, delegation patterns, token optimization, progressive disclosure, modular architecture, agent catalog, command reference, and execution rules for building AI-powered development workflows

moai-cc-claude-md

from diegosouzapw/awesome-omni-skill

Authoring CLAUDE.md Project Instructions. Design project-specific AI guidance, document workflows, define architecture patterns. Use when creating CLAUDE.md files for projects, documenting team standards, or establishing AI collaboration guidelines.

moai-alfred-language-detection

from diegosouzapw/awesome-omni-skill

Auto-detects project language and framework from package.json, pyproject.toml, etc.

mnemonic

from diegosouzapw/awesome-omni-skill

Unified memory system - aggregates communications and AI sessions across all channels into searchable, analyzable memory

mlops

from diegosouzapw/awesome-omni-skill

MLflow, model versioning, experiment tracking, model registry, and production ML systems