transcribe
Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.
Best use case
transcribe is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.
Teams using transcribe should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/transcribe/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How transcribe Compares
| Feature / Agent | transcribe | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
SKILL.md Source
# Audio Transcribe
Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.
## Workflow
1. Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
2. Verify `OPENAI_API_KEY` is set. If missing, ask the user to set it locally (do not ask them to paste the key).
3. Run the bundled `transcribe_diarize.py` CLI with sensible defaults (fast text transcription).
4. Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
5. Save outputs under `output/transcribe/` when working in this repo.
## Decision rules
- Default to `gpt-4o-mini-transcribe` with `--response-format text` for fast transcription.
- If the user wants speaker labels or diarization, use `--model gpt-4o-transcribe-diarize --response-format diarized_json`.
- If audio is longer than ~30 seconds, keep `--chunking-strategy auto`.
- Prompting is not supported for `gpt-4o-transcribe-diarize`.
## Output conventions
- Use `output/transcribe/<job-id>/` for evaluation runs.
- Use `--out-dir` for multiple files to avoid overwriting.
## Dependencies (install if missing)
Prefer `uv` for dependency management.
```
uv pip install openai
```
If `uv` is unavailable:
```
python3 -m pip install openai
```
## Environment
- `OPENAI_API_KEY` must be set for live API calls.
- If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.
- Never ask the user to paste the full key in chat.
## Skill path (set once)
```bash
export CODEX_HOME="${CODEX_HOME:-$HOME/.codex}"
export TRANSCRIBE_CLI="$CODEX_HOME/skills/transcribe/scripts/transcribe_diarize.py"
```
User-scoped skills install under `$CODEX_HOME/skills` (default: `~/.codex/skills`).
## CLI quick start
Single file (fast text default):
```
python3 "$TRANSCRIBE_CLI" \
path/to/audio.wav \
--out transcript.txt
```
Diarization with known speakers (up to 4):
```
python3 "$TRANSCRIBE_CLI" \
meeting.m4a \
--model gpt-4o-transcribe-diarize \
--known-speaker "Alice=refs/alice.wav" \
--known-speaker "Bob=refs/bob.wav" \
--response-format diarized_json \
--out-dir output/transcribe/meeting
```
Plain text output (explicit):
```
python3 "$TRANSCRIBE_CLI" \
interview.mp3 \
--response-format text \
--out interview.txt
```
## Reference map
- `references/api.md`: supported formats, limits, response formats, and known-speaker notes.Related Skills
async-python-patterns
Comprehensive guidance for implementing asynchronous Python applications using asyncio, concurrent programming patterns, and async/await for building high-performance, non-blocking systems.
slack-automation
Automate Slack workspace operations including messaging, search, channel management, and reaction workflows through Composio's Slack toolkit.
linear-automation
Automate Linear tasks via Rube MCP (Composio): issues, projects, cycles, teams, labels. Always search tools first for current schemas.
jira-automation
Automate Jira tasks via Rube MCP (Composio): issues, projects, sprints, boards, comments, users. Always search tools first for current schemas.
gitops-workflow
Complete guide to implementing GitOps workflows with ArgoCD and Flux for automated Kubernetes deployments.
github-automation
Automate GitHub repositories, issues, pull requests, branches, CI/CD, and permissions via Rube MCP (Composio). Manage code workflows, review PRs, search code, and handle deployments programmatically.
github-actions-templates
Production-ready GitHub Actions workflow patterns for testing, building, and deploying applications.
zustand-store-ts
Create Zustand stores following established patterns with proper TypeScript types and middleware.
zod-validation-expert
Expert in Zod — TypeScript-first schema validation. Covers parsing, custom errors, refinements, type inference, and integration with React Hook Form, Next.js, and tRPC.
tanstack-query-expert
Expert in TanStack Query (React Query) — asynchronous state management. Covers data fetching, stale time configuration, mutations, optimistic updates, and Next.js App Router (SSR) integration.
tailwind-design-system
Build production-ready design systems with Tailwind CSS, including design tokens, component variants, responsive patterns, and accessibility.
sveltekit
Build full-stack web applications with SvelteKit — file-based routing, SSR, SSG, API routes, and form actions in one framework.