AudioEditor

AI-powered audio/video editing — transcription, intelligent cut detection, automated editing with crossfades, and optional cloud polish. USE WHEN clean audio, edit audio, remove filler words, clean podcast, remove ums, fix audio, cut dead air, polish audio, clean recording, transcribe and edit.

11,146 stars

bydanielmiessler

View on GitHub Installation ↓

Best use case

AudioEditor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using AudioEditor should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/AudioEditor/SKILL.md --create-dirs "https://raw.githubusercontent.com/danielmiessler/Personal_AI_Infrastructure/main/Packs/Utilities/src/AudioEditor/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/AudioEditor/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How AudioEditor Compares

Feature / Agent	AudioEditor	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agent for YouTube Script Writing

Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

SKILL.md Source

# AudioEditor

AI-powered audio/video editing — transcription, intelligent cut detection, automated editing with crossfades, and optional cloud polish.

## Customization

**Before executing, check for user customizations at:**
`~/.claude/PAI/USER/SKILLCUSTOMIZATIONS/AudioEditor/`

If this directory exists, load and apply any PREFERENCES.md, configurations, or resources found there. These override default behavior. If the directory does not exist, proceed with skill defaults.

## Voice Notification

**You MUST send this notification BEFORE doing anything else when this skill is invoked.**

1. **Send voice notification**:
   ```bash
   curl -s -X POST http://localhost:8888/notify \
     -H "Content-Type: application/json" \
     -d '{"message": "Running the WORKFLOWNAME workflow in the AudioEditor skill to ACTION"}' \
     > /dev/null 2>&1 &
   ```

2. **Output text notification**:
   ```
   Running the **WorkflowName** workflow in the **AudioEditor** skill to ACTION...
   ```

**This is not optional. Execute this curl command immediately upon skill invocation.**

## Workflow Routing

| Workflow | Trigger | File |
|----------|---------|------|
| **Clean** | "clean audio", "edit audio", "remove filler words", "clean podcast", "remove ums", "cut dead air", "polish audio" | `Workflows/Clean.md` |

## Pipeline Architecture

```
Audio Input
    |
[Transcribe] Whisper word-level timestamps (insanely-fast-whisper on MPS)
    |
[Analyze] Claude classifies each segment:
    |   KEEP / CUT_FILLER / CUT_FALSE_START / CUT_EDIT_MARKER / CUT_STUTTER / CUT_DEAD_AIR
    |   Distinguishes rhetorical emphasis from accidental repetition
    |
[Edit] ffmpeg executes cuts:
    |   - 40ms qsin crossfades at every edit point
    |   - Room tone extraction and gap filling
    |   - Breath attenuation (50% volume, not removal)
    |
[Polish] (optional) Cleanvoice API final pass:
        - Mouth sound removal
        - Remaining filler detection
        - Loudness normalization

Output: cleaned MP3/WAV
```

## Tools

| Tool | Command | Purpose |
|------|---------|---------|
| **Transcribe** | `bun ~/.claude/skills/Utilities/AudioEditor/Tools/Transcribe.ts <file>` | Word-level transcription via Whisper |
| **Analyze** | `bun ~/.claude/skills/Utilities/AudioEditor/Tools/Analyze.ts <transcript.json>` | LLM-powered edit classification |
| **Edit** | `bun ~/.claude/skills/Utilities/AudioEditor/Tools/Edit.ts <file> <edits.json>` | Execute cuts with crossfades + room tone |
| **Polish** | `bun ~/.claude/skills/Utilities/AudioEditor/Tools/Polish.ts <file>` | Cleanvoice API cloud polish |
| **Pipeline** | `bun ~/.claude/skills/Utilities/AudioEditor/Tools/Pipeline.ts <file> [--polish]` | Full end-to-end pipeline |

## API Keys Required

| Service | Env Var | Where to Get |
|---------|---------|-------------|
| Anthropic (for analyze step) | `ANTHROPIC_API_KEY` | Already set via Claude Code |
| Cleanvoice (for polish step, optional) | `CLEANVOICE_API_KEY` | cleanvoice.ai Dashboard Settings API Key |

## Examples

**Example 1: Clean a podcast recording**
```
User: "clean up the audio on this podcast file"
-> Invokes Clean workflow
-> Runs full pipeline: transcribe -> analyze -> edit
-> Outputs cleaned MP3 with filler words, stutters, and dead air removed
```

**Example 2: Preview edits before applying**
```
User: "show me what edits you'd make to this recording"
-> Invokes Clean workflow with --preview flag
-> Transcribes and analyzes, shows proposed edits without modifying audio
-> User reviews edit list, then runs again to apply
```

**Example 3: Aggressive clean with cloud polish**
```
User: "aggressively clean this audio and polish it"
-> Invokes Clean workflow with --aggressive --polish flags
-> Tighter thresholds for filler detection
-> Cleanvoice API pass for mouth sounds and normalization
```

Related Skills

Utilities

11146

from danielmiessler/Personal_AI_Infrastructure

Developer utilities and tools — CLI generation, skill scaffolding, agent delegation, system upgrades, evals, documents, parsing, audio editing, Fabric patterns, Cloudflare infrastructure, browser automation, meta-prompting, and aphorisms. USE WHEN create CLI, build CLI, command-line tool, wrap API, add command, upgrade tier, TypeScript CLI, create skill, new skill, scaffold skill, validate skill, update skill, fix skill structure, canonicalize skill, parallel execution, agent teams, delegate, workstreams, swarm, upgrade, improve system, system upgrade, check Anthropic, algorithm upgrade, mine reflections, find sources, research upgrade, PAI upgrade, eval, evaluate, test agent, benchmark, verify behavior, regression test, capability test, run eval, compare models, compare prompts, create judge, view results, document, process file, create document, convert format, extract text, PDF, DOCX, XLSX, PPTX, Word, Excel, spreadsheet, PowerPoint, presentation, slides, consulting report, large PDF, merge PDF, fill form, tracked changes, redlining, parse, extract, URL, transcript, entities, JSON, batch, YouTube, article, newsletter, Twitter, browser extension, collision detection, detect content type, extract article, extract newsletter, extract YouTube, extract PDF, parse content, clean audio, edit audio, remove filler words, clean podcast, remove ums, cut dead air, polish audio, transcribe, analyze audio, audio pipeline, fabric, fabric pattern, run fabric, update patterns, sync fabric, summarize, threat model pattern, Cloudflare, worker, deploy, Pages, MCP server, wrangler, DNS, KV, R2, D1, Vectorize, browser, screenshot, debug web, verify UI, troubleshoot frontend, automate browser, browse website, review stories, run stories, web automation, meta-prompting, template generation, prompt optimization, programmatic prompt, render template, validate template, prompt engineering, aphorism, quote, saying, find quote, research thinker, newsletter quotes, add aphorism, search aphorisms.

ContentAnalysis

11146

from danielmiessler/Personal_AI_Infrastructure

Content extraction and analysis — wisdom extraction from videos, podcasts, articles, and YouTube. USE WHEN extract wisdom, content analysis, analyze content, insight report, analyze video, analyze podcast, extract insights, key takeaways, what did I miss, extract from YouTube.

WriteStory

11146

from danielmiessler/Personal_AI_Infrastructure

Layered fiction writing system using Will Storr's storytelling science and rhetorical figures. USE WHEN write story, fiction, novel, short story, book, chapter, story bible, character arc, plot outline, creative writing, worldbuilding, narrative, mystery writing, dialogue, prose, series planning.

USMetrics

11146

from danielmiessler/Personal_AI_Infrastructure

US economic indicators. USE WHEN GDP, inflation, unemployment, economic metrics, gas prices. SkillSearch('usmetrics') for docs.

Sales

11146

from danielmiessler/Personal_AI_Infrastructure

Sales workflows. USE WHEN sales, proposal, pricing. SkillSearch('sales') for docs.

PAI

11146

from danielmiessler/Personal_AI_Infrastructure

Personal AI Infrastructure core. The authoritative reference for how PAI works.

VoiceServer

11146

from danielmiessler/Personal_AI_Infrastructure

Voice server management. USE WHEN voice server, TTS server, voice notification, prosody.

THEALGORITHM

11146

from danielmiessler/Personal_AI_Infrastructure

Universal execution engine using scientific method to achieve ideal state. USE WHEN complex tasks, multi-step work, "run the algorithm", "use the algorithm", OR any non-trivial request that benefits from structured execution with ISC (Ideal State Criteria) tracking.

System

11146

from danielmiessler/Personal_AI_Infrastructure

System maintenance with three core operations - integrity check (find/fix broken references), document session (current transcript), document recent (catch-up since last update). Plus security workflows. USE WHEN integrity check, audit system, document session, document this session, document today, document recent, catch up docs, what's undocumented, check for secrets, security scan, privacy check, OR asking about past work ("we just worked on", "remember when we").

CORE

11146

from danielmiessler/Personal_AI_Infrastructure

Personal AI Infrastructure core. AUTO-LOADS at session start. The authoritative reference for how the PAI system works, how to use it, and all system-level configuration. USE WHEN any session begins, user asks about the system, identity, configuration, workflows, security, or any other question about how the PAI system operates.

thinking

11146

from danielmiessler/Personal_AI_Infrastructure

Multi-mode analytical and creative thinking — first principles decomposition, iterative depth analysis, creative brainstorming, multi-agent council debates, adversarial red teaming, world threat modeling, and scientific hypothesis testing. USE WHEN first principles, decompose, deconstruct, reconstruct, challenge assumptions, iterative depth, multi-angle, deep exploration, be creative, brainstorm, divergent ideas, tree of thoughts, maximum creativity, technical creativity, idea generation, domain specific, council, debate, perspectives, quick consensus, red team, critique, stress test, adversarial validation, parallel analysis, devil's advocate, threat model, world model, future analysis, test idea, test investment, update models, view models, time horizon, think about, figure out, experiment, iterate, science, hypothesis, define goal, design experiment, quick diagnosis, structured investigation, full cycle.

telos

11146

from danielmiessler/Personal_AI_Infrastructure

Life OS and project analysis. USE WHEN TELOS, life goals, projects, dependencies, books, movies. SkillSearch('telos') for docs.