youtube-anycaption-summarizer
Turn YouTube videos into dependable markdown transcripts and polished summaries — even when caption coverage is messy. This skill works with manual closed captions (CC), auto-generated subtitles, or no usable subtitles at all by using subtitle-first extraction with local Whisper fallback. Supports private/restricted videos via cookies, batch processing, transcript cleanup, language backfill, source-language or user-selected summary language, and end-to-end completion reporting. Ideal for YouTube research, technical walkthroughs, founder content, tutorials, private/internal uploads, and batch video summarization workflows.
Best use case
youtube-anycaption-summarizer is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Turn YouTube videos into dependable markdown transcripts and polished summaries — even when caption coverage is messy. This skill works with manual closed captions (CC), auto-generated subtitles, or no usable subtitles at all by using subtitle-first extraction with local Whisper fallback. Supports private/restricted videos via cookies, batch processing, transcript cleanup, language backfill, source-language or user-selected summary language, and end-to-end completion reporting. Ideal for YouTube research, technical walkthroughs, founder content, tutorials, private/internal uploads, and batch video summarization workflows.
Teams using youtube-anycaption-summarizer should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/youtube-anycaption-summarizer/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How youtube-anycaption-summarizer Compares
| Feature / Agent | youtube-anycaption-summarizer | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Turn YouTube videos into dependable markdown transcripts and polished summaries — even when caption coverage is messy. This skill works with manual closed captions (CC), auto-generated subtitles, or no usable subtitles at all by using subtitle-first extraction with local Whisper fallback. Supports private/restricted videos via cookies, batch processing, transcript cleanup, language backfill, source-language or user-selected summary language, and end-to-end completion reporting. Ideal for YouTube research, technical walkthroughs, founder content, tutorials, private/internal uploads, and batch video summarization workflows.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agent for YouTube Script Writing
Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.
AI Agents for Marketing
Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.
Best AI Skills for ChatGPT
Find the best AI skills to adapt into ChatGPT workflows for research, writing, summarization, planning, and repeatable assistant tasks.
SKILL.md Source
# YouTube AnyCaption Summarizer
**The YouTube summarizer that still works when captions are broken, missing, or inconsistent.**
Outputs: raw markdown transcript + polished markdown summary + session-ready result block.
Unlike caption-only tools, this skill still works when subtitles are missing by falling back to local Whisper transcription.
Generate a raw transcript markdown file and a polished summary markdown file from one or more YouTube videos.
This skill is self-contained. It does not require any other YouTube summarizer skill or prior workflow context.
## Best for
- founder videos, operator walkthroughs, and technical explainers
- long tutorial videos that need transcript + implementation summary
- private/internal YouTube uploads that may require cookies
- mixed-caption environments where some videos have CC, some only have auto-captions, and some have no usable subtitles
- batch research workflows where many YouTube links need standardized markdown outputs
- users who want reliable markdown artifacts, not just a one-off chat summary
## Why choose this over simpler transcript skills?
- manual CC first, auto-captions second, local Whisper fallback last
- keeps working when subtitle coverage is weak or missing
- supports private/restricted YouTube videos via cookies
- returns durable markdown artifacts, not just chat text
- supports batch processing and session-ready completion reporting
## Install dependencies
For a fresh macOS setup, new users should be able to copy-paste the following exactly:
```bash
brew install yt-dlp ffmpeg whisper-cpp
MODELS_DIR="$HOME/.openclaw/workspace"
MODEL_PATH="$MODELS_DIR/ggml-medium.bin"
mkdir -p "$MODELS_DIR"
if [ ! -f "$MODEL_PATH" ]; then
curl -L https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.bin \
-o "$MODEL_PATH.part" && mv "$MODEL_PATH.part" "$MODEL_PATH"
else
echo "Model already exists at $MODEL_PATH — leaving it unchanged."
fi
command -v python3 yt-dlp ffmpeg whisper-cli
ls -lh "$MODEL_PATH"
```
What this does:
- installs `yt-dlp`, `ffmpeg`, and `whisper-cli`
- creates the default models directory used by this skill if it does not already exist: `~/.openclaw/workspace`
- downloads the default Whisper model file only if it is missing
- avoids touching `~/.openclaw/openclaw.json` or any other OpenClaw config file
- does not delete, replace, or overwrite other files in your existing workspace folder
- verifies that the required binaries and model file are present
If you want to store models elsewhere, pass `--models-dir /path/to/models` when running the workflow.
## Example requests
- “Summarize this YouTube video into markdown.”
- “Generate a transcript and polished summary for this YouTube link.”
- “Process this private YouTube video with my browser cookies.”
- “Batch summarize these YouTube links and give me transcript + summary files.”
- “Use subtitles when available, otherwise transcribe locally.”
- “Create a Chinese summary from this English YouTube video.”
## Quick start
### Single video
```bash
python3 scripts/run_youtube_workflow.py "https://www.youtube.com/watch?v=VIDEO_ID"
```
This creates a dedicated per-video folder, writes the raw transcript markdown, creates the summary placeholder markdown, and prints JSON describing the outputs plus the exact follow-up commands/prompts needed to finish the summary step.
Important: the workflow script alone is not the finished deliverable. The current OpenClaw session must still:
1. infer/backfill the language if the workflow left it as `unknown`
2. overwrite the placeholder `Summary.md` with a real polished summary
3. run `scripts/complete_youtube_summary.py` to validate/finalize the result
### Force simplified Chinese summary
```bash
python3 scripts/run_youtube_workflow.py "https://www.youtube.com/watch?v=VIDEO_ID" \
--summary-language zh-CN
```
### Restricted video with cookies
```bash
python3 scripts/run_youtube_workflow.py "https://www.youtube.com/watch?v=VIDEO_ID" \
--cookies /path/to/cookies.txt
```
or
```bash
python3 scripts/run_youtube_workflow.py "https://www.youtube.com/watch?v=VIDEO_ID" \
--cookies-from-browser chrome
```
### Batch / queue mode
See `references/batch-input-format.md`.
```bash
python3 scripts/run_youtube_workflow.py --batch-file ./youtube-urls.txt
```
## Why this skill stands out
This skill is designed to keep working across the messy reality of YouTube:
- if a video has **manual closed captions (CC)**, use them first
- if it only has **auto-generated subtitles**, use those next
- if it has **no usable subtitles at all**, fall back to **local Whisper transcription**
That makes it materially more reliable than caption-only workflows. It works well for caption-rich videos, caption-poor videos, and private/internal uploads where subtitle coverage is inconsistent.
Core capabilities:
- fetch YouTube metadata first and derive safe output paths
- support single-video mode and batch / queue mode
- handle manual CC, auto-generated subtitles, or no subtitles via subtitle-first extraction with local Whisper fallback
- support restricted/private videos via cookies or browser-cookie extraction
- normalize noisy transcript text before summarization
- create a placeholder summary file, overwrite it with the final summary, and finalize end-to-end timing
- clean up only known intermediates created by the workflow unless explicitly told otherwise
## What this skill produces
For each video, create exactly one dedicated output folder containing these final deliverables:
- `SANITIZED_VIDEO_NAME_transcript_raw.md`
- `SANITIZED_VIDEO_NAME_Summary.md`
By default, delete only the known intermediate media, subtitle, and WAV files created by the workflow. Do not wipe unrelated files that may already exist in the per-video folder.
## Required local tools
Verify these tools exist before running the workflow:
- `yt-dlp`
- `ffmpeg`
- `whisper-cli`
- `python3`
The workflow also requires a supported Whisper ggml model file in the configured models directory.
## Bundled scripts
Use these scripts directly:
- `scripts/run_youtube_workflow.py` — main deterministic workflow for metadata, download/subtitles, transcription, placeholder summary creation, cleanup, and workflow metadata emission
- `scripts/backfill_detected_language.py` — update `transcript_raw.md`, `Summary.md`, and workflow metadata after the current session LLM decides the major transcript language
- `scripts/complete_youtube_summary.py` — validate that `Summary.md` is no longer a placeholder, optionally backfill language, compute the final end-to-end timing report for one item, and emit a session-ready result block
- `scripts/normalize_transcript_text.py` — convert raw timestamped transcript text into cleaner summary input without modifying the raw transcript file
- `scripts/finalize_youtube_summary.py` — lower-level timing helper used by the completion flow
- `scripts/prepare_video_paths.py` — derive sanitized folder and output file paths from a title and video ID
Useful references:
- `references/detailed-workflow.md` — full operational workflow, completion rules, batch guidance, naming rules, and practical notes
- `references/summary-template.md` — required structure and writing rules for the final `Summary.md`
- `references/session-output-template.md` — required user-facing output format to return to the current OpenClaw session after completion
- `references/batch-input-format.md` — input format for queue / batch processing
## Defaults
- Default parent output folder: `~/Downloads`
- Default whisper model: `ggml-medium`
- Supported whisper models: `ggml-base`, `ggml-small`, `ggml-medium`
- Default media mode: audio-only
- Default transcript language: auto-detect if transcription is needed
- Default summary language: `source`
- Raw transcript keeps timestamps
## Public workflow overview
At a high level, the skill does this:
1. fetch metadata first and create safe output paths
2. try manual subtitles, then auto-captions, then local Whisper fallback
3. write `SANITIZED_VIDEO_NAME_transcript_raw.md`
4. create `SANITIZED_VIDEO_NAME_Summary.md` as a placeholder
5. have the current OpenClaw session overwrite the placeholder with a real summary
6. run `scripts/complete_youtube_summary.py` to validate completion, backfill language if needed, and emit a session-ready result block
## What counts as completion
For a normal end-to-end request, completion means all of the following are true:
1. the workflow script succeeded
2. if language was initially `unknown`, the language was backfilled into both markdown files
3. the placeholder summary file was overwritten with a real summary
4. `scripts/complete_youtube_summary.py` was run successfully
5. the user received the resulting output paths and timing/result status
If the workflow script succeeded but the summary/completion step did not happen yet, describe the state as partial/in-progress rather than complete.
## When to read the deeper references
Read these as needed:
- `references/detailed-workflow.md` when you need the full implementation contract, batch guidance, naming rules, cleanup rules, timing flow, or debugging details
- `references/summary-template.md` before writing the final polished `Summary.md`
- `references/session-output-template.md` before returning the final user-facing per-video result block
- `references/batch-input-format.md` when handling `--batch-file`
## Practical public promise
This skill is optimized for dependable end-to-end output, not just quick transcript extraction:
- raw transcript markdown
- polished summary markdown
- session-ready completion reportRelated Skills
openclaw-youtube
YouTube SERP Scout for agents. Search top-ranking videos, channels, and trends for content research and competitor tracking.
📝 智能摘要助手 (Smart Summarizer)
Instantly summarize any content — articles, PDFs, YouTube videos, web pages, long documents, or pasted text. Extracts key points, action items, and insights. Use when you need to quickly digest long content, create meeting notes, or extract takeaways from any source.
youtube-search
YouTube Search API via AIsa unified endpoint. Search YouTube videos, channels, and playlists with a single AIsa API key — no Google API key or OAuth required. Use this skill when users want to search YouTube content. For other AIsa capabilities (LLM, financial data, Twitter, web search), see the aisa-core skill.
ib-summarizer
Summarize core safety information from Investigator's Brochures for clinical researchers
youtube-archiver
Archive YouTube playlists into markdown notes with metadata, transcripts, AI summaries, and tags. Use when a user asks to import/sync YouTube playlists, archive Watch Later or Liked videos, enrich YouTube notes, batch process video notes, or automate recurring YouTube-to-markdown sync jobs with cron.
youtube-digest
Understand, summarize, translate, and extract key points from YouTube videos. Use when a user provides a YouTube URL and wants: (1) a Chinese summary, (2) a transcript or subtitle extraction, (3) translation of spoken content, (4) timestamps / chapter notes, (5) visual understanding via key frames, or (6) question answering about a video. Prefer this skill for transcript-first workflows.
youtube-content-manager
YouTube内容管理后台,支持AI选题生成、脚本创作、标题优化、SEO描述生成、缩略图文案建议、发布记录管理和数据分析。集成SkillPay支付接口,每次调用收0.001USDT。
youtube-content-manager-pro
All-in-one YouTube Content Management Tool, AI generate topics, scripts, titles, SEO descriptions, tags, thumbnails, analytics. $0.005 USDT per use.
youtube-audio-download
Download YouTube video audio and convert to MP3. Supports age-restricted videos with cookies.
banner-youtube-translate-workflow
Complete workflow: download YouTube audio, launch Doubao, play audio, capture translation. Activates when user needs full video translation.
YouTube Channel Scraper
A browser-based YouTube channel discovery and scraping tool.
YouTube Transcript Extraction
Extract high-quality transcripts from YouTube videos using multiple methods.