openai-whisper-api

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

411 stars

Best use case

openai-whisper-api is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

Teams using openai-whisper-api should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/openai-whisper-api/SKILL.md --create-dirs "https://raw.githubusercontent.com/understudy-ai/understudy/main/skills/openai-whisper-api/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/openai-whisper-api/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How openai-whisper-api Compares

Feature / Agent	openai-whisper-api	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# OpenAI Whisper API (curl)

Transcribe an audio file via OpenAI’s `/v1/audio/transcriptions` endpoint.

## Quick start

```bash
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a
```

Defaults:

- Model: `whisper-1`
- Output: `<input>.txt`

## Useful flags

```bash
{baseDir}/scripts/transcribe.sh /path/to/audio.ogg --model whisper-1 --out /tmp/transcript.txt
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --language en
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --prompt "Speaker names: Peter, Daniel"
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --json --out /tmp/transcript.json
```

## API key

Set `OPENAI_API_KEY`, or configure it in `~/.understudy/config.json5`:

```json5
{
  skills: {
    "openai-whisper-api": {
      apiKey: "OPENAI_KEY_HERE",
    },
  },
}
```

Related Skills

openai-whisper

411

from understudy-ai/understudy

Local speech-to-text with the Whisper CLI (no API key).

openai-image-gen

411

from understudy-ai/understudy

Batch-generate images via OpenAI Images API. Random prompt sampler + `index.html` gallery.

xurl

411

from understudy-ai/understudy

A CLI tool for making authenticated requests to the X (Twitter) API. Use this skill when you need to post tweets, reply, quote, search, read posts, manage followers, send DMs, upload media, or interact with any X API v2 endpoint.

weather

411

from understudy-ai/understudy

Get current weather and forecasts via wttr.in or Open-Meteo. Use when: user asks about weather, temperature, or forecasts for any location. NOT for: historical weather data, severe weather alerts, or detailed meteorological analysis. No API key needed.

wacli

411

from understudy-ai/understudy

Send WhatsApp messages to other people or search/sync WhatsApp history via the wacli CLI (not for normal user chats).

video-frames

411

from understudy-ai/understudy

Extract frames or short clips from videos using ffmpeg.

trello

411

from understudy-ai/understudy

Manage Trello boards, lists, and cards via the Trello REST API.

tmux

411

from understudy-ai/understudy

Remote-control tmux sessions for interactive CLIs by sending keystrokes and scraping pane output.

things-mac

411

from understudy-ai/understudy

Manage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database). Use when a user asks Understudy to add a task to Things, list inbox/today/upcoming, search tasks, or inspect projects/areas/tags.

summarize

411

from understudy-ai/understudy

Summarize or extract text/transcripts from URLs, podcasts, and local files (great fallback for “transcribe this YouTube/video”).

spotify-player

411

from understudy-ai/understudy

Terminal Spotify playback/search via spogo (preferred) or spotify_player.

sonoscli

411

from understudy-ai/understudy

Control Sonos speakers (discover/status/play/volume/group).