comfyui-video

Automate AI video generation with ComfyUI and LTX-2.3. Supports text-to-video (T2V), image-to-video (I2V), batch scene rendering for music videos, and multi-scene workflows. Includes progress monitoring, fault recovery, and performance tuning. Use when generating AI videos with ComfyUI, creating MV scenes in batch, troubleshooting video rendering, or optimizing generation speed.

3,891 stars

Best use case

comfyui-video is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Automate AI video generation with ComfyUI and LTX-2.3. Supports text-to-video (T2V), image-to-video (I2V), batch scene rendering for music videos, and multi-scene workflows. Includes progress monitoring, fault recovery, and performance tuning. Use when generating AI videos with ComfyUI, creating MV scenes in batch, troubleshooting video rendering, or optimizing generation speed.

Teams using comfyui-video should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/comfyui-video/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/a3165458/comfyui-video/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/comfyui-video/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How comfyui-video Compares

Feature / Agentcomfyui-videoStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Automate AI video generation with ComfyUI and LTX-2.3. Supports text-to-video (T2V), image-to-video (I2V), batch scene rendering for music videos, and multi-scene workflows. Includes progress monitoring, fault recovery, and performance tuning. Use when generating AI videos with ComfyUI, creating MV scenes in batch, troubleshooting video rendering, or optimizing generation speed.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# ComfyUI Video Generation

Automate AI video generation using ComfyUI + LTX-2.3 model. Ideal for music video (MV) production, multi-scene batch rendering, and AI video content creation.

## Requirements

| Item | Spec |
|------|------|
| GPU | ≥24GB VRAM (Turing/Ampere/Ada) |
| ComfyUI | 0.17+ |
| PyTorch | 2.6+cu124 |
| Access | SSH tunnel forwarding port 18188 |

## Model Setup

| Model | Size | Path |
|-------|------|------|
| LTX-2.3 dev (bf16) | 43GB | `models/checkpoints/ltx-2.3-22b-dev.safetensors` |
| Gemma 3 12B | 23GB | `models/text_encoders/comfy_gemma_3_12B_it.safetensors` |
| Distilled LoRA | 7.1GB | `models/loras/ltxv/ltx2/ltx-2.3-22b-distilled-lora-384.safetensors` |
| Video VAE (bf16) | - | `models/vae/LTX23_video_vae_bf16.safetensors` |

**Turing GPUs** (e.g., Quadro RTX 8000) do NOT support `fp8_e4m3fn`. Use bf16/fp16 models only.

## Performance Baseline

```
Per-step time: ~221s (constant, regardless of frame count!)
15 steps: ~57 min
25 steps: ~1h45m
Frames: 72=3s, 121=5s, 480=20s (24fps)
```

**Key insight**: Frame count does NOT affect total time. Bottleneck is model forward pass.

## Workflow Node Reference

| Node | ID | Purpose |
|------|-----|---------|
| LoadImage | 2004 | I2V reference input |
| CLIPTextEncode (positive) | 2483 | Positive prompt |
| CLIPTextEncode (negative) | 2612 | Negative prompt |
| EmptyLTXVLatentVideo | 3059 | Empty latent |
| LTXVScheduler | 4966 | Steps/length params |
| LoraLoaderModelOnly | 4922+ | LoRA loader |
| SaveVideo | 4823/4852 | Output mp4 |

## Quick Start

### Generate a Single Video (I2V)

1. Load workflow: `/workspace/ComfyUI/custom_nodes/ComfyUI-LTXVideo/example_workflows/2.3/LTX-2.3_T2V_I2V_Single_Stage_Distilled_Full.json`
2. Set params using `scripts/batch_scenes.js`
3. Click Run
4. Wait ~1 hour
5. Download from `/workspace/ComfyUI/output/`

### Batch Scene Generation

Use `scripts/batch_scenes.js` for automation:

```javascript
// Load script first, then configure each scene:
await comfyui_batch.configureScene({
  name: "scene_01",
  prompt: "A lonely girl running through rain at night, neon reflections",
  image: "unified_ref.png",
  steps: 15,
  frames: 72
});
// Click Run, repeat for next scene
```

## Step Count Guide

| Steps | Quality | Time/Scene | Use Case |
|-------|---------|------------|----------|
| 8 | Rough | ~30min | Quick preview |
| 15 | Good | ~57min | **Recommended sweet spot** |
| 25 | Best | ~1h45m | Final quality output |

I2V + LoRA at 15 steps achieves ~90% of 25-step quality with 40% less time.

## Troubleshooting

### VAEDecode Validation Failed

**Error**: `Exception when validating node: 'VAEDecode'`
**Cause**: VAE load timing or insufficient VRAM
**Fix**: Reload the entire workflow (fetch + loadGraphData), wait for models to fully load, then run. Never reload during execution.

### Browser Tab Lost

**Cause**: SSH tunnel disconnected
**Fix**:
1. Rebuild tunnel: `ssh -f -N -L 18188:localhost:18188 user@host -p port`
2. Navigate to ComfyUI
3. Reload workflow

### Inconsistent Characters Across Scenes

**Cause**: Different reference images per scene
**Fix**: Use the SAME reference image for all scenes. Extract a clear frame from an existing video if needed. The I2V input image dictates the visual baseline.

### Output Video Not Saved

**Check**: `ssh -p PORT root@HOST "ls -lht /workspace/ComfyUI/output/*.mp4"`
**Fix**: Check for VAEDecode errors in log, then re-run.

## Monitoring Progress

```bash
# Current sampling progress
ssh -p PORT root@HOST "grep 'it/s' /tmp/comfy.log | tail -1"

# Completion check
ssh -p PORT root@HOST "grep 'Prompt executed' /tmp/comfy.log | tail -1"

# Output files
ssh -p PORT root@HOST "ls -lht /workspace/ComfyUI/output/*.mp4"
```

## Best Practices

1. **15 steps is the sweet spot** — I2V converges at 15-20 steps, 25 has diminishing returns
2. **Unified reference image** — Same input image for all scenes ensures character consistency
3. **Reload workflow every time** — Avoids VAEDecode validation failures
4. **Never reload during execution** — Current run will fail
5. **Frame selection** — 72 frames (3s) for testing, 480 frames (20s) for final output
6. **VRAM management** — Wait for each generation to complete before starting next

## T2V vs I2V Comparison

| Mode | Steps | Quality | Notes |
|------|-------|---------|-------|
| T2V (no LoRA) | 15 | ❌ Very blurry | Not recommended |
| I2V + LoRA | 25 | ✅ Excellent | Major quality improvement |
| I2V + LoRA | 15 | ✅ Very good | Best time/quality ratio |

**Conclusion**: I2V + LoRA is the recommended combination.

## Resources

- `scripts/batch_scenes.js` — Batch scene automation
- `references/workflow_nodes.md` — Full node ID mapping
- `references/tips.md` — Prompt tips, VRAM management, optimization

Related Skills

demo-video

3891
from openclaw/skills

Create product demo videos by automating browser interactions and capturing frames. Use when the user wants to record a demo, walkthrough, product showcase, or interactive video of a web application. Supports Playwright CDP screencast for high-quality capture and FFmpeg for video encoding.

Video Production

seedance-video

3891
from openclaw/skills

Generate AI videos using ByteDance Seedance. Use when the user wants to: (1) generate videos from text prompts, (2) generate videos from images (first frame, first+last frame, reference images), or (3) query/manage video generation tasks. Supports Seedance 1.5 Pro (with audio), 1.0 Pro, 1.0 Pro Fast, and 1.0 Lite models.

recipe-video-extractor

3891
from openclaw/skills

Extract a structured cooking recipe from a shared video URL when the user sends `recipe <url>`. Prioritize caption/description and comments via browser automation, then use web search/fetch as fallback with clear source attribution.

json2video-pinterest

3891
from openclaw/skills

Generate Pinterest-optimized vertical videos using JSON2Video API. Supports AI-generated or URL-based images, AI-generated or provided voiceovers, optional subtitles, and zoom effects. Use when creating video content for Pinterest affiliate marketing, creating vertical social media videos, automating video production with JSON2Video API, or generating videos with voiceovers and subtitles.

arch-video-cut

3891
from openclaw/skills

Automatic Architecture Video Editing Workflow with Self-Learning Preferences

short-video-script-generator-pro

3891
from openclaw/skills

AI Short Video Script Generator, support TikTok/YouTube Shorts/Instagram Reels, auto generate hook, shots, voiceover, subtitles, BGM, CTA. $0.005 USDT per use.

ai-notes-of-video

3891
from openclaw/skills

The video AI notes tool is provided by Baidu. Based on the video download address provided by the user, it downloads and parses the video, and finally generates AI notes corresponding to the video (a total of three types of notes can be generated: document notes, outline notes, and image-text notes).

keevx-video-translate

3891
from openclaw/skills

Translate videos into a specified target language using the Keevx API. Supports audio-only translation, subtitle generation, and dynamic duration adjustment. Use this skill when the user needs to (1) Translate/dub a video (2) Translate a video from one language to another (3) Query the list of supported translation languages (4) Check the status of a video translation task. Keywords video translate, Keevx, dubbing.

keevx-image-to-video

3891
from openclaw/skills

Use the Keevx API to convert images to videos. Supports multiple models (V/KL), various resolutions (720p/1080p/4K), and audio generation. Use this skill when the user needs to: (1) Convert images to video (2) Generate video with Keevx (3) Create and query image-to-video tasks (4) Batch image-to-video conversion. Keywords: image to video, Keevx, video generation.

ai-video-prompt

3891
from openclaw/skills

AI视频Prompt构建专家。采用"首尾帧图片+视频"工作流,支持多段5秒视频拼接生成长视频(30秒/60秒)。先生成关键帧图片,再生成视频Prompt,确保段与段之间无缝衔接。针对即梦平台优化,支持全中文Prompt输出。

seeddance-ai-video

3891
from openclaw/skills

集成字节跳动SeedDance AI视频生成API,支持文本到视频、图片到视频等功能

douyin-video-downloader

3891
from openclaw/skills

抖音视频下载工具 - 通过第三方解析服务实现无水印视频下载