assemblyai-transcribe
Transcribe podcast and audio files with speaker diarization using AssemblyAI API. Use when the user wants to: (1) Transcribe a podcast or audio file with AssemblyAI, (2) Get speaker-labeled transcripts (who said what), (3) Diarize audio to identify different speakers, (4) Generate SRT subtitles from audio. Triggers on: "assemblyai", "transcribe with assemblyai", "diarize podcast", "assemblyai transcribe".
Best use case
assemblyai-transcribe is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Transcribe podcast and audio files with speaker diarization using AssemblyAI API. Use when the user wants to: (1) Transcribe a podcast or audio file with AssemblyAI, (2) Get speaker-labeled transcripts (who said what), (3) Diarize audio to identify different speakers, (4) Generate SRT subtitles from audio. Triggers on: "assemblyai", "transcribe with assemblyai", "diarize podcast", "assemblyai transcribe".
Teams using assemblyai-transcribe should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/assemblyai-transcribe/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How assemblyai-transcribe Compares
| Feature / Agent | assemblyai-transcribe | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Transcribe podcast and audio files with speaker diarization using AssemblyAI API. Use when the user wants to: (1) Transcribe a podcast or audio file with AssemblyAI, (2) Get speaker-labeled transcripts (who said what), (3) Diarize audio to identify different speakers, (4) Generate SRT subtitles from audio. Triggers on: "assemblyai", "transcribe with assemblyai", "diarize podcast", "assemblyai transcribe".
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Podcast Transcription with AssemblyAI Transcribe audio files with speaker diarization using `scripts/transcribe.py`. ## Requirements - Set `ASSEMBLYAI_API_KEY` environment variable - Dependencies installed automatically via `uv run` ## Supported Formats WAV, MP3, AIFF, AAC, OGG, FLAC, M4A, WMA, WEBM ## Usage Transcribe a local file with speaker diarization (default): ```bash uv run scripts/transcribe.py /path/to/podcast.mp3 ``` Transcribe from a URL: ```bash uv run scripts/transcribe.py https://example.com/podcast.mp3 ``` Save to file: ```bash uv run scripts/transcribe.py /path/to/podcast.mp3 -o transcript.txt ``` Specify expected number of speakers: ```bash uv run scripts/transcribe.py /path/to/podcast.mp3 -n 3 ``` Plain text output (no speaker labels): ```bash uv run scripts/transcribe.py /path/to/podcast.mp3 --no-diarize -f text ``` SRT subtitle format: ```bash uv run scripts/transcribe.py /path/to/podcast.mp3 -f srt -o subtitles.srt ``` ## Options | Flag | Description | |------|-------------| | `-o, --output` | Output file path (default: stdout) | | `-f, --format` | Output format: `diarized` (default), `text`, `srt` | | `--no-diarize` | Disable speaker diarization | | `-n, --speakers` | Expected number of speakers (helps accuracy) | ## Output Formats - **diarized** (default): `[MM:SS] Speaker A: text` with blank lines between utterances - **text**: Plain transcript without speaker labels or timestamps - **srt**: SRT subtitle format with speaker labels ## Notes - Local files are uploaded to AssemblyAI's servers for processing, then transcribed - URLs are passed directly (the audio must be publicly accessible) - Polling interval is 5 seconds; long audio files may take several minutes - By default, AssemblyAI detects up to 10 speakers; use `-n` to hint if you know the count
Related Skills
stop-slop
Use this skill when writing or editing prose to eliminate predictable AI writing patterns. Helps make writing more direct, authentic, and human.
sonos-control
Control Sonos speakers on Tim's home network. Use when the user wants to (1) play, pause, or stop music on Sonos speakers, (2) change volume on speakers, (3) skip tracks, (4) check what's playing, (5) see speaker status, (6) group or ungroup speakers, (7) any Sonos or music/audio playback task involving home speakers. Triggers on "sonos", "speakers", "play music", "what's playing", "volume", "turn up", "turn down", "pause music", "stop music".
slack-message
Draft and send Slack messages in Tim's natural voice. Use when the user wants to (1) post an update to a channel, (2) draft a Slack message, (3) share something on Slack, (4) send a DM, (5) reply in a thread. Applies Tim's Slack writing style and prose principles automatically.
skill-creator
Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, edit, or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.
sending-to-codex
Delegate tasks or ask questions to OpenAI's Codex CLI from within Claude Code. Use this skill when the user says "ask codex", "send to codex", "delegate to codex", "have codex do this", "get codex's opinion", "run this in codex", or wants to offload a coding task or question to the Codex agent. Supports both fire-and-forget coding tasks (fix bugs, add features, refactor) and research questions (analyze code, explain behavior, get a second opinion).
reviewing-writing
Review and critique writing using Michael Nielsen's principles on craft. Analyzes text for purpose focus, brevity, danger words, opening strength, originality, reader psychology, truthfulness, and title impact. Use when the user says "review my writing", "nielsen review", "writing review", "review this writing", "critique my writing", or asks for feedback on prose quality.
reviewing-code
Review pull requests, branch changes, or code diffs. Triggers on "review this PR", "review my changes", "code review", "review branch", or GitHub PR URLs. Focuses on bugs, tests, complexity, and performance - not linting.
resend-email
Send emails via Resend.com API. Use when the user wants to (1) send an email, (2) email someone, (3) send a message to an email address, (4) send email with attachments, (5) schedule an email for later. Requires RESEND_API_KEY environment variable.
refresh-dotfiles
Full sync of personal (yadm) and work (yadm-work) dotfiles. Pulls remote changes, commits and pushes local changes, and audits for untracked files that should be tracked. Use when the user says 'refresh yadm', 'sync dotfiles', 'dotfiles sync', or 'update dotfiles'.
omnifocus
Interact with OmniFocus task manager via the command-line interface (@stephendolan/omnifocus-cli). Use when the user wants to: (1) Add tasks or projects to OmniFocus, (2) List, view, or search tasks/projects, (3) Update or complete tasks, (4) Manage inbox items, (5) Work with tags and analyze tag usage, (6) Process or organize their OmniFocus database from the command line.
omnifocus-triage
Interactively process OmniFocus inbox items using AskUserQuestion. Use when the user wants to (1) triage their inbox, (2) process inbox items, (3) organize their OmniFocus inbox, (4) clear out their inbox, (5) do a GTD-style inbox review. Triggers on "triage inbox", "process inbox", "organize inbox", "clear inbox", "inbox zero".
Nightshift
Manage and interact with Nightshift, an AI-powered development automation tool that runs coding tasks during off-hours.