youtube-transcript
Download YouTube video transcripts when user provides a YouTube URL or asks to download/get/fetch a transcript from YouTube. Also use when user wants to transcribe or get captions/subtitles from a YouTube video.
Best use case
youtube-transcript is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Download YouTube video transcripts when user provides a YouTube URL or asks to download/get/fetch a transcript from YouTube. Also use when user wants to transcribe or get captions/subtitles from a YouTube video.
Teams using youtube-transcript should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/youtube-transcript/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How youtube-transcript Compares
| Feature / Agent | youtube-transcript | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Download YouTube video transcripts when user provides a YouTube URL or asks to download/get/fetch a transcript from YouTube. Also use when user wants to transcribe or get captions/subtitles from a YouTube video.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
# YouTube Transcript Downloader
This skill helps download transcripts (subtitles/captions) from YouTube videos using yt-dlp.
## When to Use This Skill
Activate this skill when the user:
- Provides a YouTube URL and wants the transcript
- Asks to "download transcript from YouTube"
- Wants to "get captions" or "get subtitles" from a video
- Asks to "transcribe a YouTube video"
- Needs text content from a YouTube video
## How It Works
### Priority Order:
1. **Check if yt-dlp is installed** - install if needed
2. **List available subtitles** - see what's actually available
3. **Try manual subtitles first** (`--write-sub`) - highest quality
4. **Fallback to auto-generated** (`--write-auto-sub`) - usually available
5. **Last resort: Whisper transcription** - if no subtitles exist (requires user confirmation)
6. **Confirm the download** and show the user where the file is saved
7. **Optionally clean up** the VTT format if the user wants plain text
## Installation Check
**IMPORTANT**: Always check if yt-dlp is installed first:
```bash
which yt-dlp || command -v yt-dlp
```
### If Not Installed
Attempt automatic installation based on the system:
**macOS (Homebrew)**:
```bash
brew install yt-dlp
```
**Linux (apt/Debian/Ubuntu)**:
```bash
sudo apt update && sudo apt install -y yt-dlp
```
**Alternative (pip - works on all systems)**:
```bash
pip3 install yt-dlp
# or
python3 -m pip install yt-dlp
```
**If installation fails**: Inform the user they need to install yt-dlp manually and provide them with installation instructions from https://github.com/yt-dlp/yt-dlp#installation
## Check Available Subtitles
**ALWAYS do this first** before attempting to download:
```bash
yt-dlp --list-subs "YOUTUBE_URL"
```
This shows what subtitle types are available without downloading anything. Look for:
- Manual subtitles (better quality)
- Auto-generated subtitles (usually available)
- Available languages
## Download Strategy
### Option 1: Manual Subtitles (Preferred)
Try this first - highest quality, human-created:
```bash
yt-dlp --write-sub --skip-download --output "OUTPUT_NAME" "YOUTUBE_URL"
```
### Option 2: Auto-Generated Subtitles (Fallback)
If manual subtitles aren't available:
```bash
yt-dlp --write-auto-sub --skip-download --output "OUTPUT_NAME" "YOUTUBE_URL"
```
Both commands create a `.vtt` file (WebVTT subtitle format).
## Option 3: Whisper Transcription (Last Resort)
**ONLY use this if both manual and auto-generated subtitles are unavailable.**
### Step 1: Show File Size and Ask for Confirmation
```bash
# Get audio file size estimate
yt-dlp --print "%(filesize,filesize_approx)s" -f "bestaudio" "YOUTUBE_URL"
# Or get duration to estimate
yt-dlp --print "%(duration)s %(title)s" "YOUTUBE_URL"
```
**IMPORTANT**: Display the file size to the user and ask: "No subtitles are available. I can download the audio (approximately X MB) and transcribe it using Whisper. Would you like to proceed?"
**Wait for user confirmation before continuing.**
### Step 2: Check for Whisper Installation
```bash
command -v whisper
```
If not installed, ask user: "Whisper is not installed. Install it with `pip install openai-whisper` (requires ~1-3GB for models)? This is a one-time installation."
**Wait for user confirmation before installing.**
Install if approved:
```bash
pip3 install openai-whisper
```
### Step 3: Download Audio Only
```bash
yt-dlp -x --audio-format mp3 --output "audio_%(id)s.%(ext)s" "YOUTUBE_URL"
```
### Step 4: Transcribe with Whisper
```bash
# Auto-detect language (recommended)
whisper audio_VIDEO_ID.mp3 --model base --output_format vtt
# Or specify language if known
whisper audio_VIDEO_ID.mp3 --model base --language en --output_format vtt
```
**Model Options** (stick to `base` for now):
- `tiny` - fastest, least accurate (~1GB)
- `base` - good balance (~1GB) ← **USE THIS**
- `small` - better accuracy (~2GB)
- `medium` - very good (~5GB)
- `large` - best accuracy (~10GB)
### Step 5: Cleanup
After transcription completes, ask user: "Transcription complete! Would you like me to delete the audio file to save space?"
If yes:
```bash
rm audio_VIDEO_ID.mp3
```
## Getting Video Information
### Extract Video Title (for filename)
```bash
yt-dlp --print "%(title)s" "YOUTUBE_URL"
```
Use this to create meaningful filenames based on the video title. Clean the title for filesystem compatibility:
- Replace `/` with `-`
- Replace special characters that might cause issues
- Consider using sanitized version: `$(yt-dlp --print "%(title)s" "URL" | tr '/' '-' | tr ':' '-')`
## Post-Processing
### Convert to Plain Text (Recommended)
YouTube's auto-generated VTT files contain **duplicate lines** because captions are shown progressively with overlapping timestamps. Always deduplicate when converting to plain text while preserving the original speaking order.
```bash
python3 -c "
import sys, re
seen = set()
with open('transcript.en.vtt', 'r') as f:
for line in f:
line = line.strip()
if line and not line.startswith('WEBVTT') and not line.startswith('Kind:') and not line.startswith('Language:') and '-->' not in line:
clean = re.sub('<[^>]*>', '', line)
clean = clean.replace('&', '&').replace('>', '>').replace('<', '<')
if clean and clean not in seen:
print(clean)
seen.add(clean)
" > transcript.txt
```
### Complete Post-Processing with Video Title
```bash
# Get video title
VIDEO_TITLE=$(yt-dlp --print "%(title)s" "YOUTUBE_URL" | tr '/' '_' | tr ':' '-' | tr '?' '' | tr '"' '')
# Find the VTT file
VTT_FILE=$(ls *.vtt | head -n 1)
# Convert with deduplication
python3 -c "
import sys, re
seen = set()
with open('$VTT_FILE', 'r') as f:
for line in f:
line = line.strip()
if line and not line.startswith('WEBVTT') and not line.startswith('Kind:') and not line.startswith('Language:') and '-->' not in line:
clean = re.sub('<[^>]*>', '', line)
clean = clean.replace('&', '&').replace('>', '>').replace('<', '<')
if clean and clean not in seen:
print(clean)
seen.add(clean)
" > "${VIDEO_TITLE}.txt"
echo "✓ Saved to: ${VIDEO_TITLE}.txt"
# Clean up VTT file
rm "$VTT_FILE"
echo "✓ Cleaned up temporary VTT file"
```
## Output Formats
- **VTT format** (`.vtt`): Includes timestamps and formatting, good for video players
- **Plain text** (`.txt`): Just the text content, good for reading or analysis
## Tips
- The filename will be `{output_name}.{language_code}.vtt` (e.g., `transcript.en.vtt`)
- Most YouTube videos have auto-generated English subtitles
- Some videos may have multiple language options
- If auto-subtitles aren't available, try `--write-sub` instead for manual subtitles
## Complete Workflow Example
```bash
VIDEO_URL="https://www.youtube.com/watch?v=dQw4w9WgXcQ"
# Get video title for filename
VIDEO_TITLE=$(yt-dlp --print "%(title)s" "$VIDEO_URL" | tr '/' '_' | tr ':' '-' | tr '?' '' | tr '"' '')
OUTPUT_NAME="transcript_temp"
# ============================================
# STEP 1: Check if yt-dlp is installed
# ============================================
if ! command -v yt-dlp &> /dev/null; then
echo "yt-dlp not found, attempting to install..."
if command -v brew &> /dev/null; then
brew install yt-dlp
elif command -v apt &> /dev/null; then
sudo apt update && sudo apt install -y yt-dlp
else
pip3 install yt-dlp
fi
fi
# ============================================
# STEP 2: List available subtitles
# ============================================
echo "Checking available subtitles..."
yt-dlp --list-subs "$VIDEO_URL"
# ============================================
# STEP 3: Try manual subtitles first
# ============================================
echo "Attempting to download manual subtitles..."
if yt-dlp --write-sub --skip-download --output "$OUTPUT_NAME" "$VIDEO_URL" 2>/dev/null; then
echo "✓ Manual subtitles downloaded successfully!"
ls -lh ${OUTPUT_NAME}.*
else
# ============================================
# STEP 4: Fallback to auto-generated
# ============================================
echo "Manual subtitles not available. Trying auto-generated..."
if yt-dlp --write-auto-sub --skip-download --output "$OUTPUT_NAME" "$VIDEO_URL" 2>/dev/null; then
echo "✓ Auto-generated subtitles downloaded successfully!"
ls -lh ${OUTPUT_NAME}.*
else
# ============================================
# STEP 5: Last resort - Whisper transcription
# ============================================
echo "⚠ No subtitles available for this video."
# Get file size
FILE_SIZE=$(yt-dlp --print "%(filesize_approx)s" -f "bestaudio" "$VIDEO_URL")
DURATION=$(yt-dlp --print "%(duration)s" "$VIDEO_URL")
TITLE=$(yt-dlp --print "%(title)s" "$VIDEO_URL")
echo "Video: $TITLE"
echo "Duration: $((DURATION / 60)) minutes"
echo "Audio size: ~$((FILE_SIZE / 1024 / 1024)) MB"
echo ""
echo "Would you like to download and transcribe with Whisper? (y/n)"
read -r RESPONSE
if [[ "$RESPONSE" =~ ^[Yy]$ ]]; then
# Check for Whisper
if ! command -v whisper &> /dev/null; then
echo "Whisper not installed. Install now? (requires ~1-3GB) (y/n)"
read -r INSTALL_RESPONSE
if [[ "$INSTALL_RESPONSE" =~ ^[Yy]$ ]]; then
pip3 install openai-whisper
else
echo "Cannot proceed without Whisper. Exiting."
exit 1
fi
fi
# Download audio
echo "Downloading audio..."
yt-dlp -x --audio-format mp3 --output "audio_%(id)s.%(ext)s" "$VIDEO_URL"
# Get the actual audio filename
AUDIO_FILE=$(ls audio_*.mp3 | head -n 1)
# Transcribe
echo "Transcribing with Whisper (this may take a few minutes)..."
whisper "$AUDIO_FILE" --model base --output_format vtt
# Cleanup
echo "Transcription complete! Delete audio file? (y/n)"
read -r CLEANUP_RESPONSE
if [[ "$CLEANUP_RESPONSE" =~ ^[Yy]$ ]]; then
rm "$AUDIO_FILE"
echo "Audio file deleted."
fi
ls -lh *.vtt
else
echo "Transcription cancelled."
exit 0
fi
fi
fi
# ============================================
# STEP 6: Convert to readable plain text with deduplication
# ============================================
VTT_FILE=$(ls ${OUTPUT_NAME}*.vtt 2>/dev/null || ls *.vtt | head -n 1)
if [ -f "$VTT_FILE" ]; then
echo "Converting to readable format and removing duplicates..."
python3 -c "
import sys, re
seen = set()
with open('$VTT_FILE', 'r') as f:
for line in f:
line = line.strip()
if line and not line.startswith('WEBVTT') and not line.startswith('Kind:') and not line.startswith('Language:') and '-->' not in line:
clean = re.sub('<[^>]*>', '', line)
clean = clean.replace('&', '&').replace('>', '>').replace('<', '<')
if clean and clean not in seen:
print(clean)
seen.add(clean)
" > "${VIDEO_TITLE}.txt"
echo "✓ Saved to: ${VIDEO_TITLE}.txt"
# Clean up temporary VTT file
rm "$VTT_FILE"
echo "✓ Cleaned up temporary VTT file"
else
echo "⚠ No VTT file found to convert"
fi
echo "✓ Complete!"
```
**Note**: This complete workflow handles all scenarios with proper error checking and user prompts at each decision point.
## Error Handling
### Common Issues and Solutions:
**1. yt-dlp not installed**
- Attempt automatic installation based on system (Homebrew/apt/pip)
- If installation fails, provide manual installation link
- Verify installation before proceeding
**2. No subtitles available**
- List available subtitles first to confirm
- Try both `--write-sub` and `--write-auto-sub`
- If both fail, offer Whisper transcription option
- Show file size and ask for user confirmation before downloading audio
**3. Invalid or private video**
- Check if URL is correct format: `https://www.youtube.com/watch?v=VIDEO_ID`
- Some videos may be private, age-restricted, or geo-blocked
- Inform user of the specific error from yt-dlp
**4. Whisper installation fails**
- May require system dependencies (ffmpeg, rust)
- Provide fallback: "Install manually with: `pip3 install openai-whisper`"
- Check available disk space (models require 1-10GB depending on size)
**5. Download interrupted or failed**
- Check internet connection
- Verify sufficient disk space
- Try again with `--no-check-certificate` if SSL issues occur
**6. Multiple subtitle languages**
- By default, yt-dlp downloads all available languages
- Can specify with `--sub-langs en` for English only
- List available with `--list-subs` first
### Best Practices:
- ✅ Always check what's available before attempting download (`--list-subs`)
- ✅ Verify success at each step before proceeding to next
- ✅ Ask user before large downloads (audio files, Whisper models)
- ✅ Clean up temporary files after processing
- ✅ Provide clear feedback about what's happening at each stage
- ✅ Handle errors gracefully with helpful messagesRelated Skills
zustand-state-management
Build type-safe global state in React applications with Zustand. Supports TypeScript, persist middleware, devtools, slices pattern, and Next.js SSR. Use when setting up React state, migrating from Redux/Context API, implementing localStorage persistence, or troubleshooting Next.js hydration errors, TypeScript inference issues, or infinite render loops.
zinc-database
Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.
zarr-python
Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.
xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
wordpress-plugin-core
Build secure WordPress plugins with core patterns for hooks, database interactions, Settings API, custom post types, REST API, and AJAX. Covers three architecture patterns (Simple, OOP, PSR-4) and the Security Trinity. Use when creating plugins, implementing nonces/sanitization/escaping, working with $wpdb prepared statements, or troubleshooting SQL injection, XSS, CSRF vulnerabilities, or plugin activation errors.
whisper
OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.
weights-and-biases
Track ML experiments with automatic logging, visualize training in real-time, optimize hyperparameters with sweeps, and manage model registry with W&B - collaborative MLOps platform
webapp-testing
Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.
serving-llms-vllm
Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.
video-downloader
Downloads videos from YouTube and other platforms for offline viewing, editing, or archival. Handles various formats and quality options.
vercel-kv
Integrate Redis-compatible Vercel KV for caching, session management, and rate limiting in Next.js applications. Powered by Upstash with strong consistency and TTL support. Use when implementing cache strategies, storing temporary data with expiration, building rate limiters, or troubleshooting missing environment variables, serialization errors, or rate limit issues.
vercel-blob
Integrate Vercel Blob object storage for file uploads, image management, and CDN-delivered assets in Next.js applications. Supports client-side uploads with presigned URLs and multipart transfers. Use when implementing file uploads (images, PDFs, videos), managing user-generated content, or troubleshooting missing tokens, size limit errors, or client upload failures.