podcast-generation
Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creatio...
Best use case
podcast-generation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creatio...
Teams using podcast-generation should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/podcast-generation/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How podcast-generation Compares
| Feature / Agent | podcast-generation | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creatio...
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Podcast Generation with GPT Realtime Mini
Generate real audio narratives from text content using Azure OpenAI's Realtime API.
## Quick Start
1. Configure environment variables for Realtime API
2. Connect via WebSocket to Azure OpenAI Realtime endpoint
3. Send text prompt, collect PCM audio chunks + transcript
4. Convert PCM to WAV format
5. Return base64-encoded audio to frontend for playback
## Environment Configuration
```env
AZURE_OPENAI_AUDIO_API_KEY=your_realtime_api_key
AZURE_OPENAI_AUDIO_ENDPOINT=https://your-resource.cognitiveservices.azure.com
AZURE_OPENAI_AUDIO_DEPLOYMENT=gpt-realtime-mini
```
**Note**: Endpoint should NOT include `/openai/v1/` - just the base URL.
## Core Workflow
### Backend Audio Generation
```python
from openai import AsyncOpenAI
import base64
# Convert HTTPS endpoint to WebSocket URL
ws_url = endpoint.replace("https://", "wss://") + "/openai/v1"
client = AsyncOpenAI(
websocket_base_url=ws_url,
api_key=api_key
)
audio_chunks = []
transcript_parts = []
async with client.realtime.connect(model="gpt-realtime-mini") as conn:
# Configure for audio-only output
await conn.session.update(session={
"output_modalities": ["audio"],
"instructions": "You are a narrator. Speak naturally."
})
# Send text to narrate
await conn.conversation.item.create(item={
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": prompt}]
})
await conn.response.create()
# Collect streaming events
async for event in conn:
if event.type == "response.output_audio.delta":
audio_chunks.append(base64.b64decode(event.delta))
elif event.type == "response.output_audio_transcript.delta":
transcript_parts.append(event.delta)
elif event.type == "response.done":
break
# Convert PCM to WAV (see scripts/pcm_to_wav.py)
pcm_audio = b''.join(audio_chunks)
wav_audio = pcm_to_wav(pcm_audio, sample_rate=24000)
```
### Frontend Audio Playback
```javascript
// Convert base64 WAV to playable blob
const base64ToBlob = (base64, mimeType) => {
const bytes = atob(base64);
const arr = new Uint8Array(bytes.length);
for (let i = 0; i < bytes.length; i++) arr[i] = bytes.charCodeAt(i);
return new Blob([arr], { type: mimeType });
};
const audioBlob = base64ToBlob(response.audio_data, 'audio/wav');
const audioUrl = URL.createObjectURL(audioBlob);
new Audio(audioUrl).play();
```
## Voice Options
| Voice | Character |
|-------|-----------|
| alloy | Neutral |
| echo | Warm |
| fable | Expressive |
| onyx | Deep |
| nova | Friendly |
| shimmer | Clear |
## Realtime API Events
- `response.output_audio.delta` - Base64 audio chunk
- `response.output_audio_transcript.delta` - Transcript text
- `response.done` - Generation complete
- `error` - Handle with `event.error.message`
## Audio Format
- **Input**: Text prompt
- **Output**: PCM audio (24kHz, 16-bit, mono)
- **Storage**: Base64-encoded WAV
## References
- **Full architecture**: See references/architecture.md for complete stack design
- **Code examples**: See references/code-examples.md for production patterns
- **PCM conversion**: Use scripts/pcm_to_wav.py for audio format conversion
## When to Use
This skill is applicable to execute the workflow or actions described in the overview.Related Skills
openapi-spec-generation
Generate and maintain OpenAPI 3.1 specifications from code, design-first specs, and validation patterns. Use when creating API documentation, generating SDKs, or ensuring API contract compliance.
documentation-generation-doc-generate
You are a documentation expert specializing in creating comprehensive, maintainable documentation from code. Generate API docs, architecture diagrams, user guides, and technical references using AI...
apify-lead-generation
Generates B2B/B2C leads by scraping Google Maps, websites, Instagram, TikTok, Facebook, LinkedIn, YouTube, and Google Search. Use when user asks to find leads, prospects, businesses, build lead lis...
zustand-store-ts
Create Zustand stores with TypeScript, subscribeWithSelector middleware, and proper state/action separation. Use when building React state management, creating global stores, or implementing reacti...
zoom-automation
Automate Zoom meeting creation, management, recordings, webinars, and participant tracking via Rube MCP (Composio). Always search tools first for current schemas.
zoho-crm-automation
Automate Zoho CRM tasks via Rube MCP (Composio): create/update records, search contacts, manage leads, and convert leads. Always search tools first for current schemas.
zendesk-automation
Automate Zendesk tasks via Rube MCP (Composio): tickets, users, organizations, replies. Always search tools first for current schemas.
zapier-make-patterns
No-code automation democratizes workflow building. Zapier and Make (formerly Integromat) let non-developers automate business processes without writing code. But no-code doesn't mean no-complexity ...
youtube-summarizer
Extract transcripts from YouTube videos and generate comprehensive, detailed summaries using intelligent analysis frameworks
youtube-automation
Automate YouTube tasks via Rube MCP (Composio): upload videos, manage playlists, search content, get analytics, and handle comments. Always search tools first for current schemas.
xss-html-injection
This skill should be used when the user asks to "test for XSS vulnerabilities", "perform cross-site scripting attacks", "identify HTML injection flaws", "exploit client-side injection...
xlsx-official
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, ....