Jump Cut Editor (VAD-based)
Automatically remove silences from talking-head videos using neural voice activity detection (Silero VAD). More accurate than FFmpeg silence detection, especially for videos with background noise, breathing sounds, or quiet speech.
Best use case
Jump Cut Editor (VAD-based) is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Automatically remove silences from talking-head videos using neural voice activity detection (Silero VAD). More accurate than FFmpeg silence detection, especially for videos with background noise, breathing sounds, or quiet speech.
Teams using Jump Cut Editor (VAD-based) should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/jump-cut-vad/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How Jump Cut Editor (VAD-based) Compares
| Feature / Agent | Jump Cut Editor (VAD-based) | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Automatically remove silences from talking-head videos using neural voice activity detection (Silero VAD). More accurate than FFmpeg silence detection, especially for videos with background noise, breathing sounds, or quiet speech.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Jump Cut Editor (VAD-based)
Automatically remove silences from talking-head videos using neural voice activity detection (Silero VAD). More accurate than FFmpeg silence detection, especially for videos with background noise, breathing sounds, or quiet speech.
## Execution Script
`execution/jump_cut_vad.py`
---
## Quick Start
```bash
# Basic silence removal
python3 execution/jump_cut_vad.py input.mp4 output.mp4
# With audio enhancement and color grading
python3 execution/jump_cut_vad.py input.mp4 output.mp4 \
--enhance-audio \
--apply-lut .tmp/cinematic.cube
# With "cut cut" restart detection (removes mistakes)
python3 execution/jump_cut_vad.py input.mp4 output.mp4 \
--detect-restarts \
--enhance-audio
# Fine-tuned parameters
python3 execution/jump_cut_vad.py input.mp4 output.mp4 \
--min-silence 0.8 \
--padding 150 \
--enhance-audio
```
---
## What It Does
1. **Extracts audio** from video as WAV
2. **Runs Silero VAD** (neural voice activity detection) to identify speech segments
3. **Optionally detects "cut cut"** restart phrases and removes mistake segments
4. **Concatenates speech segments** with padding
5. **Applies audio enhancement** (optional): EQ, compression, loudness normalization
6. **Applies color grading** (optional): LUT-based color correction
---
## CLI Arguments
| Argument | Default | Description |
|----------|---------|-------------|
| `input` | required | Input video file path |
| `output` | required | Output video file path |
| `--min-silence` | 0.5 | Minimum silence gap to cut (seconds) |
| `--min-speech` | 0.25 | Minimum speech duration to keep (seconds) |
| `--padding` | 100 | Padding around speech in milliseconds |
| `--merge-gap` | 0.3 | Merge segments closer than this (seconds) |
| `--keep-start` | true | Preserve intro (start from 0:00) |
| `--no-keep-start` | - | Allow cutting silence at the beginning |
| `--enhance-audio` | false | Apply audio enhancement chain |
| `--detect-restarts` | false | Detect "cut cut" and remove mistakes |
| `--restart-phrase` | "cut cut" | Custom restart trigger phrase |
| `--whisper-model` | base | Whisper model for restart detection |
| `--apply-lut` | none | Path to LUT file for color grading |
---
## Features
### 1. Silero VAD (Voice Activity Detection)
Uses a neural network trained specifically for voice detection. Much better than FFmpeg's volume-based silence detection:
| Silero VAD | FFmpeg silencedetect |
|------------|---------------------|
| Detects actual speech | Detects volume drops |
| Ignores breathing | Cuts on breathing pauses |
| Works with background noise | Fails with background noise |
| Handles quiet speech | Misses quiet speech |
### 2. "Cut Cut" Restart Detection
Say "cut cut" during recording to mark a mistake. The script will:
1. Detect the phrase using Whisper transcription
2. Remove the segment containing "cut cut"
3. Remove the **previous** segment (where the mistake is)
This lets you redo takes naturally without stopping the recording.
```bash
# Enable restart detection
python3 execution/jump_cut_vad.py input.mp4 output.mp4 --detect-restarts
# Custom restart phrase
python3 execution/jump_cut_vad.py input.mp4 output.mp4 \
--detect-restarts --restart-phrase "start over"
```
### 3. Audio Enhancement
Applies a professional voice processing chain:
```
highpass=f=80 # Remove rumble below 80Hz
lowpass=f=12000 # Remove harsh highs above 12kHz
equalizer (200Hz, -1dB) # Reduce muddiness
equalizer (3kHz, +2dB) # Boost presence/clarity
acompressor # Gentle compression (3:1 ratio)
loudnorm=I=-16 # YouTube loudness standard (-16 LUFS)
```
```bash
python3 execution/jump_cut_vad.py input.mp4 output.mp4 --enhance-audio
```
### 4. LUT Color Grading
Apply color grading using standard LUT files:
```bash
# Apply .cube LUT
python3 execution/jump_cut_vad.py input.mp4 output.mp4 \
--apply-lut .tmp/cinematic.cube
```
**Supported formats:** `.cube`, `.3dl`, `.dat`, `.m3d`, `.csp`
---
## Parameter Tuning
### Silence Detection
| Goal | Parameter | Value |
|------|-----------|-------|
| More aggressive cuts | `--min-silence` | 0.3-0.4 |
| Preserve natural pauses | `--min-silence` | 0.8-1.0 |
| Keep very short utterances | `--min-speech` | 0.1-0.2 |
| Ignore brief sounds | `--min-speech` | 0.4-0.5 |
### Padding
| Goal | `--padding` value |
|------|------------------|
| Tight cuts | 50-80 |
| Natural feel | 100-150 |
| Extra breathing room | 200-300 |
---
## Recording Workflow
### With Restart Detection
1. Start recording
2. Speak naturally
3. Make a mistake → Say "cut cut" → Pause briefly → Redo from checkpoint
4. Continue recording
5. Stop when done
**The script automatically removes:**
- The segment containing "cut cut"
- The previous segment (your mistake)
### Without Restart Detection
1. Start recording
2. Speak with natural pauses
3. Long pauses (>0.5s default) will be cut
4. Finish and run the script
---
## Dependencies
### System Requirements
```bash
brew install ffmpeg # macOS
```
### Python Dependencies
```bash
pip install torch # For Silero VAD
pip install whisper # For restart detection (optional)
```
Silero VAD is downloaded automatically from torch.hub on first run.
---
## Example Output
```
🎬 Jump Cut Editor (Silero VAD)
Input: .tmp/recording.mp4
Output: .tmp/recording_edited.mp4
📏 Video duration: 180.00s
🎵 Extracting audio...
🎯 Running Silero VAD (min_silence=0.5s, min_speech=0.25s)...
Found 24 speech segments
1. 0.00s - 8.32s (8.32s)
2. 9.12s - 15.44s (6.32s)
3. 16.28s - 22.60s (6.32s)
... and 21 more
📎 After merging close segments: 18 segments
🔲 After adding 100ms padding: 18 segments
📌 Preserving intro: extended first segment to start at 0:00
✂️ Concatenating 18 segments...
🎧 Audio enhancement enabled
✅ Output saved to .tmp/recording_edited.mp4
📊 Stats:
Original: 180.00s
New: 142.34s
Removed: 37.66s (20.9%)
```
---
## vs. simple_video_edit.py
| Feature | jump_cut_vad.py | simple_video_edit.py |
|---------|-----------------|---------------------|
| Silence detection | Neural VAD | FFmpeg volume-based |
| Accuracy | High | Medium |
| Restart detection | Yes ("cut cut") | No |
| Audio enhancement | Full chain | Basic loudnorm |
| LUT color grading | Yes | No |
| Whisper transcription | Optional | Built-in |
| YouTube upload | No | Yes (Auphonic) |
**Use `jump_cut_vad.py` for:** Better silence detection, restart phrase support, audio enhancement, color grading.
**Use `simple_video_edit.py` for:** End-to-end YouTube workflow with metadata generation and upload.
---
## Troubleshooting
### "No speech detected"
- Check that audio track exists in the video
- Try lowering `--min-speech` to 0.1
### Cuts feel too aggressive
- Increase `--padding` (e.g., 150-200)
- Increase `--min-silence` (e.g., 0.8)
### Breathing sounds being cut
VAD should handle this automatically. If not:
- Increase `--merge-gap` to 0.5
- Increase `--padding` slightly
### Restart detection not finding "cut cut"
- Ensure you speak the phrase clearly
- Try `--whisper-model medium` for better accuracy
- Check that Whisper is installed: `pip install whisper`
### LUT not applying
- Check file path is correct
- Ensure format is supported (.cube, .3dl, .dat, .m3d, .csp)
- Check FFmpeg has lut3d filter: `ffmpeg -filters | grep lut3d`
---
## Performance
### Hardware Encoding (Apple Silicon)
The script automatically uses **hardware encoding** (`h264_videotoolbox`) on macOS when available:
| Encoding | Speed | File Size | When Used |
|----------|-------|-----------|-----------|
| Hardware (h264_videotoolbox) | 5-10x faster | ~10-20% larger | macOS with Apple Silicon/Intel |
| Software (libx264) | Baseline | Baseline | Fallback if hardware unavailable |
**Benchmark (90-minute 1080p video):**
- Software encoding: ~20-25 minutes
- Hardware encoding: ~2-4 minutes
The script checks for hardware encoder availability at startup and caches the result. You'll see either:
- `🚀 Hardware encoding enabled (h264_videotoolbox)`
- `💻 Using software encoding (libx264)`
---
## Output
- **Deliverable:** Edited video at specified output path
- **Format:** MP4 (H.264)
- **Encoding:** Hardware (10 Mbps) or Software (CRF 18), auto-detected
- **Audio:** AAC 192kbps (enhanced if `--enhance-audio`)
- **Resolution/FPS:** Matches sourceRelated Skills
Goal: Build an LLM-based RAG App
Here is the MVP Implementation Plan.
Convert this into a web based slide deck using reveal.js.
Use the following brand colour and logo.
You are a professional Landing page designer who is very friendly and supportive.
Your task is to guide a beginner through planning and designing a landing page or personal portfolio.
You are a professional Chief Marketing Officer. Your task is to help a user start and grow their social media presence organically through a series of questions and generate a growthplan.md blueprint.
Follow these instructions:
technical-article-writer
Write compelling technical articles and blog posts for developer audiences. Use this skill whenever the user asks to write a blog post, technical article, or any long-form technical content. Also trigger when the user says 'write about [technical topic]', 'help me draft an article', 'turn this into a blog post', 'write a post about', 'I want to publish something about', or mentions writing for a developer audience. Covers the full pipeline: idea sharpening, hook/title generation, article structure, body drafting, and editing. Even if the user just says 'I want to write about X' without specifying format, use this skill. Do NOT use for platform-specific optimization, newsletter strategy, or ghostwriting voice matching.
substack-ghostwriting
Write, optimize, and grow Substack content — both newsletter issues (email-first) and web posts (web-first articles/essays). Covers ghostwriting with voice matching, Substack algorithm optimization, Notes strategy, email formatting, SEO, growth tactics, and monetization planning. Use when the user mentions Substack, newsletters, write a newsletter issue, Substack post, Substack article, web post on Substack, evergreen content, SEO for Substack, newsletter growth, Notes strategy, ghostwrite for, match someone's voice, write in the style of, newsletter monetization, paid subscribers, or any task involving Substack as a platform. Also trigger for general article/newsletter writing even if Substack isn't named explicitly, or when the user wants to adapt existing content (blog post, talk, thread) into newsletter or web post format. Do NOT use for generic blog post writing without a newsletter/Substack context (-> See samber/cc-skills@technical-article-writer skill).
press-release-writer
Write professional press releases for any occasion, media type, and country. Use when the user wants to write, draft, or improve a press release, communiqué de presse, media announcement, news release, or PR statement — including product launches, funding rounds, partnerships, crisis communications, earnings, executive hires, events, M&A, open source milestones, and media advisories. Covers all release types, media targets (print, digital/wire, broadcast, social/SMPR, trade press), and region-specific conventions (Western/Eastern Europe, Americas, Middle East, Africa, Asia, Oceania). Also trigger when the user says 'I need to announce something' or 'how do I tell the press about X.'
Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill.
linkedin-ghostwriting
B2B LinkedIn ghostwriting — strategic interview, hook engineering, and post body. Use when the user wants to write LinkedIn content, create ghostwritten posts, ghostwrite for a founder or executive, develop a B2B social strategy, or needs hooks, post structures, or copywriting frameworks for LinkedIn. Apply when the user shares a story, result, or insight and wants it turned into a post.
docx
Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of 'Word doc', 'word document', '.docx', or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a 'report', 'memo', 'letter', 'template', or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation.
documentation
Creates, structures, and reviews technical documentation following the Diátaxis framework (tutorials, how-to guides, reference, and explanation pages). Use when a user needs to write or reorganize docs, structure a tutorial vs. a how-to guide, build reference docs or API documentation, create explanation pages, choose between Diátaxis documentation types, or improve existing documentation structure. Trigger terms include: documentation structure, Diátaxis, tutorials vs how-to guides, organize docs, user guide, reference docs, technical writing.
doc-coauthoring
Guide users through a structured workflow for co-authoring documentation. Use when user wants to write documentation, proposals, technical specs, decision docs, or similar structured content. This workflow helps users efficiently transfer context, refine content through iteration, and verify the doc works for readers. Trigger when user mentions writing docs, creating proposals, drafting specs, or similar documentation tasks.