ai-podcast-creation

Create AI-powered podcasts with text-to-speech, music, and audio editing. Tools: Kokoro TTS, DIA TTS, Chatterbox, AI music generation, media merger. Capabilities: multi-voice conversations, background music, intro/outro, full episodes. Use for: podcast production, audiobooks, voice content, audio newsletters. Triggers: podcast, ai podcast, text to speech podcast, audio content, voice over, ai audiobook, multi voice, conversation ai, notebooklm alternative, audio generation, podcast automation, ai narrator, voice content, audio newsletter, podcast maker

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

ai-podcast-creation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using ai-podcast-creation should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ai-podcast-creation/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/content-media/ai-podcast-creation/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ai-podcast-creation/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ai-podcast-creation Compares

Feature / Agent	ai-podcast-creation	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# AI Podcast Creation

Create AI-powered podcasts and audio content via [inference.sh](https://inference.sh) CLI.

![AI Podcast Creation](https://cloud.inference.sh/u/4mg21r6ta37mpaz6ktzwtt8krr/01jz00krptarq4bwm89g539aea.png)

## Quick Start

```bash
curl -fsSL https://cli.inference.sh | sh && infsh login

# Generate podcast segment
infsh app run infsh/kokoro-tts --input '{
  "text": "Welcome to the AI Frontiers podcast. Today we explore the latest developments in generative AI.",
  "voice": "am_michael"
}'
```

## Available Voices

### Kokoro TTS

| Voice ID | Description | Best For |
|----------|-------------|----------|
| `af_sarah` | American female, warm | Host, narrator |
| `af_nicole` | American female, professional | News, business |
| `am_michael` | American male, authoritative | Documentary, tech |
| `am_adam` | American male, conversational | Casual podcast |
| `bf_emma` | British female, refined | Audiobooks |
| `bm_george` | British male, classic | Formal content |

### DIA TTS (Conversational)

| Voice ID | Description | Best For |
|----------|-------------|----------|
| `dia-conversational` | Natural conversation | Dialogue, interviews |

### Chatterbox

| Voice ID | Description | Best For |
|----------|-------------|----------|
| `chatterbox-default` | Expressive | Casual, entertainment |

## Podcast Workflows

### Simple Narration

```bash
# Single voice podcast segment
infsh app run infsh/kokoro-tts --input '{
  "text": "Your podcast script here. Make it conversational and engaging. Add natural pauses with punctuation.",
  "voice": "am_michael"
}'
```

### Multi-Voice Conversation

```bash
# Host introduction
infsh app run infsh/kokoro-tts --input '{
  "text": "Welcome back to Tech Talk. Today I have a special guest to discuss AI developments.",
  "voice": "am_michael"
}' > host_intro.json

# Guest response
infsh app run infsh/kokoro-tts --input '{
  "text": "Thanks for having me. I am excited to share what we have been working on.",
  "voice": "af_sarah"
}' > guest_response.json

# Merge into conversation
infsh app run infsh/media-merger --input '{
  "audio_files": ["<host-url>", "<guest-url>"],
  "crossfade_ms": 500
}'
```

### Full Episode Pipeline

```bash
# 1. Generate script with Claude
infsh app run openrouter/claude-sonnet-45 --input '{
  "prompt": "Write a 5-minute podcast script about the impact of AI on creative work. Format as a two-person dialogue between HOST and GUEST. Include natural conversation, questions, and insights."
}' > script.json

# 2. Generate intro music
infsh app run infsh/ai-music --input '{
  "prompt": "Podcast intro music, upbeat, modern, tech feel, 15 seconds"
}' > intro_music.json

# 3. Generate host segments
infsh app run infsh/kokoro-tts --input '{
  "text": "<host-lines>",
  "voice": "am_michael"
}' > host.json

# 4. Generate guest segments
infsh app run infsh/kokoro-tts --input '{
  "text": "<guest-lines>",
  "voice": "af_sarah"
}' > guest.json

# 5. Generate outro music
infsh app run infsh/ai-music --input '{
  "prompt": "Podcast outro music, matching intro style, fade out, 10 seconds"
}' > outro_music.json

# 6. Merge everything
infsh app run infsh/media-merger --input '{
  "audio_files": [
    "<intro-music>",
    "<host>",
    "<guest>",
    "<outro-music>"
  ],
  "crossfade_ms": 1000
}'
```

### NotebookLM-Style Content

Generate podcast-style discussions from documents.

```bash
# 1. Extract key points
infsh app run openrouter/claude-sonnet-45 --input '{
  "prompt": "Read this document and create a podcast script where two hosts discuss the key points in an engaging, conversational way. Include questions, insights, and natural dialogue.\n\nDocument:\n<your-document-content>"
}' > discussion_script.json

# 2. Generate Host A
infsh app run infsh/kokoro-tts --input '{
  "text": "<host-a-lines>",
  "voice": "am_michael"
}' > host_a.json

# 3. Generate Host B
infsh app run infsh/kokoro-tts --input '{
  "text": "<host-b-lines>",
  "voice": "af_sarah"
}' > host_b.json

# 4. Interleave and merge
infsh app run infsh/media-merger --input '{
  "audio_files": ["<host-a-1>", "<host-b-1>", "<host-a-2>", "<host-b-2>"],
  "crossfade_ms": 300
}'
```

### Audiobook Chapter

```bash
# Long-form narration
infsh app run infsh/kokoro-tts --input '{
  "text": "Chapter One. It was a dark and stormy night when the first AI achieved consciousness...",
  "voice": "bf_emma",
  "speed": 0.9
}'
```

## Audio Enhancement

### Add Background Music

```bash
# 1. Generate podcast audio
infsh app run infsh/kokoro-tts --input '{
  "text": "<podcast-script>",
  "voice": "am_michael"
}' > podcast.json

# 2. Generate ambient music
infsh app run infsh/ai-music --input '{
  "prompt": "Soft ambient background music for podcast, subtle, non-distracting, loopable"
}' > background.json

# 3. Mix with lower background volume
infsh app run infsh/media-merger --input '{
  "audio_files": ["<podcast-url>"],
  "background_audio": "<background-url>",
  "background_volume": 0.15
}'
```

### Add Sound Effects

```bash
# Transition sounds between segments
infsh app run infsh/ai-music --input '{
  "prompt": "Short podcast transition sound, whoosh, 2 seconds"
}' > transition.json
```

## Script Writing Tips

### Prompt for Claude

```bash
infsh app run openrouter/claude-sonnet-45 --input '{
  "prompt": "Write a podcast script with these requirements:
  - Topic: [YOUR TOPIC]
  - Duration: 5 minutes (about 750 words)
  - Format: Two hosts (HOST_A and HOST_B)
  - Tone: Conversational, informative, engaging
  - Include: Hook intro, 3 main points, call to action
  - Mark speaker changes clearly

  Make it sound natural, not scripted. Add verbal fillers like \"you know\" and \"I mean\" occasionally."
}'
```

## Podcast Templates

### Interview Format

```
HOST: Introduction and welcome
GUEST: Thank you, happy to be here
HOST: First question about background
GUEST: Response with story
HOST: Follow-up question
GUEST: Deeper insight
... continue pattern ...
HOST: Closing question
GUEST: Final thoughts
HOST: Thank you and outro
```

### Solo Episode

```
Introduction with hook
Topic overview
Point 1 with examples
Point 2 with examples
Point 3 with examples
Summary and takeaways
Call to action
Outro
```

### News Roundup

```
Intro music
Welcome and date
Story 1: headline + details
Story 2: headline + details
Story 3: headline + details
Analysis/opinion segment
Outro
```

## Best Practices

1. **Natural punctuation** - Use commas and periods for pacing
2. **Short sentences** - Easier to speak and listen
3. **Varied voices** - Different speakers prevent monotony
4. **Background music** - Subtle, at 10-15% volume
5. **Crossfades** - Smooth transitions between segments
6. **Edit scripts** - Remove filler before generating

## Related Skills

```bash
# Text-to-speech models
npx skills add inference-sh/skills@text-to-speech

# AI music generation
npx skills add inference-sh/skills@ai-music-generation

# LLM for scripts
npx skills add inference-sh/skills@llm-models

# Content pipelines
npx skills add inference-sh/skills@ai-content-pipeline

# Full platform skill
npx skills add inference-sh/skills@inference-sh
```

Browse all apps: `infsh app list --category audio`

Related Skills

data-model-creation

from diegosouzapw/awesome-omni-skill

Professional rules for AI-driven data modeling and creation. Use this skill when users need to create and manage MySQL databases, design data models using Mermaid ER diagrams, and implement database schemas.

agent-creation

from diegosouzapw/awesome-omni-skill

Standards-compliant agent definitions with templates. Trigger: When creating agent definitions, setting up project agents, or documenting workflows.

content-creation-flow

from diegosouzapw/awesome-omni-skill

Step-by-step content creation workflow for documentation and training materials. Includes MCP validation, discovery, content design, Jira planning, and draft generation with priority-based execution. Use when starting the content creation flow, creating documentation or training materials, or when the user asks to follow the content creation workflow.

Automate YouTube Top-Ten Video Creation with OpenAI and Safe Image Search

from diegosouzapw/awesome-omni-skill

Integrates OpenAI API for content generation, Bing Image Search API for safe image retrieval, and Pexels API for video footage. Handles authentication via Bearer token, enforces safe search, formats ChatGPT responses into a top-ten list, and includes error handling for API failures.

podcast-generation

from diegosouzapw/awesome-omni-skill

Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creatio...

adr-creation

from diegosouzapw/awesome-omni-skill

Generate Architecture Decision Records following the project template and numbering convention. Use when documenting architecture decisions, technical choices, or when the user asks to create an ADR.

claude-agent-sdk-agent-creation

from diegosouzapw/awesome-omni-skill

Use when creating or configuring Claude AI agents using the Agent SDK. Covers agent initialization, configuration, and basic setup patterns.

architecture-agent-creation

from diegosouzapw/awesome-omni-skill

Create specialized infrastructure agent definitions for platform/service management (Grafana, Prometheus, Traefik, ERPNext, etc.). Use when the user requests creation of an agent for a specific technology platform or infrastructure component. This skill produces complete agent prompts with integrated research, SOPs, tool references, and handoff protocols following the Linear-First Agentic Workflow framework.

agent-expert-creation

from diegosouzapw/awesome-omni-skill

Create specialized agent experts with pre-loaded domain knowledge using the Act-Learn-Reuse pattern. Use when building domain-specific agents that maintain mental models via expertise files and self-improve prompts.

bgo

from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

moai-lang-r

from diegosouzapw/awesome-omni-skill

R 4.4+ best practices with testthat 3.2, lintr 3.2, and data analysis patterns.

moai-lang-python

from diegosouzapw/awesome-omni-skill

Python 3.13+ development specialist covering FastAPI, Django, async patterns, data science, testing with pytest, and modern Python features. Use when developing Python APIs, web applications, data pipelines, or writing tests.