ltxv2-video

Build LTX-V2 19B video workflows — text-to-video, image-to-video, distilled model, camera control LoRAs, and two-stage upscaling

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

ltxv2-video is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Build LTX-V2 19B video workflows — text-to-video, image-to-video, distilled model, camera control LoRAs, and two-stage upscaling

Teams using ltxv2-video should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ltxv2-video/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/content-media/ltxv2-video/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ltxv2-video/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ltxv2-video Compares

Feature / Agent	ltxv2-video	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Build LTX-V2 19B video workflows — text-to-video, image-to-video, distilled model, camera control LoRAs, and two-stage upscaling

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# LTX-V2 19B Video Workflows

## Overview

LTX-V2 (LTX-2) is a 19-billion parameter DiT-based video foundation model from Lightricks. It uses a Gemma 3 12B text encoder and supports both text-to-video (T2V) and image-to-video (I2V). Key features:

- **Distilled model** for fast 8-step generation
- **Two-stage pipeline**: Generate at low res, then 2x spatial upscale in latent space
- **Camera control LoRAs** for cinematic movements
- **Audio-video generation** in a single pass (optional)

## Models

### Checkpoint (Installed)

| Component | Node | Model | Notes |
|-----------|------|-------|-------|
| **Checkpoint** | `CheckpointLoaderSimple` | `ltx-2-19b-distilled.safetensors` | 41GB bf16, distilled variant |

### Text Encoder (Installed)

| Component | Node | Model | Notes |
|-----------|------|-------|-------|
| **Gemma 3** | `CLIPLoader` (type=`ltxv`) | `gemma_3_12B_it_fp4_mixed.safetensors` | 9GB FP4, in text_encoders/ |

**Loading note**: The checkpoint bundles the VAE internally. The Gemma 3 text encoder loads separately. Use `CLIPLoader` with `type: "ltxv"` pointing at the `text_encoders/` directory.

### LoRAs (Installed)

| LoRA | File | Purpose |
|------|------|---------|
| **Distilled LoRA** | `ltx2/ltx-2-19b-lora-camera-control-dolly-left.safetensors` | Camera dolly left |
| **Distilled LoRA (384)** | `ltx2/ltx-2-19b-distilled-lora-384.safetensors` | Apply to base model for distilled behavior |
| **Camera Dolly Left** | `ltx-2-19b-lora-camera-control-dolly-left.safetensors` | Camera movement |

### Concept/Style LoRAs (Installed)

Located in `loras/LTXV2/`:
- `style/PLORAV7_LTX_000010500.safetensors`
- `concept/head_swap_v1_13500_first_frame.safetensors`
- `concept/LTX-2 - Better Female Nudity.safetensors`
- `action/LTX2-i2v-OralSuite.safetensors`
- `action/LTX2-i2v-SexThrust.safetensors`
- And more in `concept/` and `action/` subfolders

## Key Nodes

### LTXVConditioning

Binds text conditioning with frame rate information:

```json
{
  "class_type": "LTXVConditioning",
  "inputs": {
    "positive": ["<clip_text_encode>", 0],
    "negative": ["<clip_text_encode_neg>", 0],
    "frame_rate": 25
  }
}
```

### EmptyLTXVLatentVideo

Creates the initial video latent (for T2V):

```json
{
  "class_type": "EmptyLTXVLatentVideo",
  "inputs": {
    "width": 768,
    "height": 512,
    "length": 97,
    "batch_size": 1
  }
}
```

**Frame count constraint**: Must be `8n + 1` (9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 105, 113, 121).

### LTXVScheduler

Dedicated sigma schedule for LTX-V2 latent space:

```json
{
  "class_type": "LTXVScheduler",
  "inputs": {
    "steps": 8,
    "max_shift": 2.05,
    "base_shift": 0.95,
    "stretch": true,
    "terminal": 0.1
  }
}
```

Connect the optional `latent` input for latent-aware shift scaling.

### LTXVImgToVideo (For I2V)

All-in-one node that encodes image, creates latent, and wraps conditioning:

```json
{
  "class_type": "LTXVImgToVideo",
  "inputs": {
    "positive": ["<conditioning>", 0],
    "negative": ["<conditioning>", 0],
    "vae": ["<checkpoint>", 2],
    "image": ["<load_image>", 0],
    "width": 768,
    "height": 512,
    "length": 97,
    "batch_size": 1,
    "strength": 0.6
  }
}
```

### LTXVLatentUpsampler (For Two-Stage Upscale)

```json
{
  "class_type": "LTXVLatentUpsampler",
  "inputs": {
    "latent": ["<sampler_output>", 0],
    "upscale_model": ["<upscale_loader>", 0]
  }
}
```

Requires `LatentUpscaleModelLoader` with `ltx-2-spatial-upscaler-x2-1.0.safetensors`.

## Sampler Settings

### Distilled Model (Installed)

Uses `SamplerCustomAdvanced` with manual sigmas, NOT standard `KSampler`:

| Parameter | Stage 1 (Generate) | Stage 2 (Upscale) |
|-----------|--------------------|--------------------|
| sampler | euler | euler |
| steps | 8 | 4 |
| cfg | 1.0 | 1.0 |
| scheduler | LTXVScheduler | Manual sigmas |

**Stage 1 sigmas** (via LTXVScheduler): `max_shift=2.05`, `base_shift=0.95`, `stretch=true`, `terminal=0.1`

**Stage 2 sigmas** (manual, for upscale refinement): `0.909375, 0.725, 0.421875, 0.0`

### Base Model (If Using Distilled LoRA on Base)

| Parameter | Value |
|-----------|-------|
| sampler | res_2s |
| steps | 20 |
| cfg | 4.0 |
| scheduler | LTXVScheduler |
| distilled_lora_strength | 0.6 |

## Resolution and Frame Count

### Resolutions (Must be multiples of 32)

| Aspect | Stage 1 | After 2x Upscale | Notes |
|--------|---------|-------------------|-------|
| 3:2 landscape | 768x512 | 1536x1024 | Default |
| 16:9 landscape | 960x544 | 1920x1088 | Official example |
| 1:1 square | 640x640 | 1280x1280 | |
| 4:3 landscape | 704x512 | 1408x1024 | |

Start at lower resolution for Stage 1 to manage VRAM, then upscale.

### Frame Count (`8n + 1`)

| Frames | Duration @25fps | Duration @24fps | Notes |
|--------|----------------|-----------------|-------|
| 49 | 1.96s | 2.04s | Quick test |
| 81 | 3.24s | 3.38s | Short clip |
| 97 | 3.88s | 4.04s | Default |
| 121 | 4.84s | 5.04s | Official example, recommended |
| 161 | 6.44s | 6.71s | Longer clip |
| 257 | 10.28s | 10.71s | Maximum |

### Frame Rate

Standard: **25 fps** (conditioned via `LTXVConditioning`). 24 and 30 fps also supported.

## Pipeline Flow: T2V Distilled

```
CheckpointLoaderSimple → MODEL + VAE
CLIPLoader (ltxv, gemma_3_12B_it_fp4_mixed) → CLIP
  ├─ CLIPTextEncode (positive) → CONDITIONING
  └─ CLIPTextEncode (negative) → CONDITIONING

LTXVConditioning (positive, negative, frame_rate=25) → pos/neg CONDITIONING
EmptyLTXVLatentVideo (768x512, 121 frames) → LATENT
LTXVScheduler (steps=8, max_shift=2.05, base_shift=0.95) → SIGMAS

SamplerCustomAdvanced (model, sigmas, positive, negative, latent)
  → Stage 1 LATENT

[Optional: LTXVLatentUpsampler → 2x LATENT → SamplerCustomAdvanced Stage 2]

VAEDecode (or LTXVSpatioTemporalTiledVAEDecode for VRAM savings) → IMAGE
VHS_VideoCombine (or CreateVideo + SaveVideo) → MP4
```

## Complete Workflow: T2V Distilled (8-Step)

```json
{
  "1": { "class_type": "CheckpointLoaderSimple", "inputs": { "ckpt_name": "ltx-2-19b-distilled.safetensors" }},
  "2": { "class_type": "CLIPLoader", "inputs": { "clip_name": "gemma_3_12B_it_fp4_mixed.safetensors", "type": "ltxv" }},
  "3": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["2", 0], "text": "<positive prompt>" }},
  "4": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["2", 0], "text": "" }},
  "5": { "class_type": "LTXVConditioning", "inputs": {
    "positive": ["3", 0], "negative": ["4", 0], "frame_rate": 25
  }},
  "6": { "class_type": "EmptyLTXVLatentVideo", "inputs": {
    "width": 768, "height": 512, "length": 121, "batch_size": 1
  }},
  "7": { "class_type": "LTXVScheduler", "inputs": {
    "steps": 8, "max_shift": 2.05, "base_shift": 0.95,
    "stretch": true, "terminal": 0.1, "latent": ["6", 0]
  }},
  "8": { "class_type": "KSamplerSelect", "inputs": { "sampler_name": "euler" }},
  "9": { "class_type": "SamplerCustomAdvanced", "inputs": {
    "model": ["1", 0],
    "positive": ["5", 0],
    "negative": ["5", 1],
    "sigmas": ["7", 0],
    "latent_image": ["6", 0],
    "noise": ["10", 0],
    "sampler": ["8", 0],
    "guider": ["11", 0]
  }},
  "10": { "class_type": "RandomNoise", "inputs": { "noise_seed": 42 }},
  "11": { "class_type": "CFGGuider", "inputs": {
    "model": ["1", 0],
    "positive": ["5", 0],
    "negative": ["5", 1],
    "cfg": 1.0
  }},
  "12": { "class_type": "VAEDecode", "inputs": { "samples": ["9", 0], "vae": ["1", 2] }},
  "13": { "class_type": "VHS_VideoCombine", "inputs": {
    "images": ["12", 0], "frame_rate": 25, "loop_count": 0,
    "filename_prefix": "ltxv2", "format": "video/h264-mp4",
    "pingpong": false, "save_output": true,
    "pix_fmt": "yuv420p", "crf": 19, "save_metadata": true, "trim_to_audio": false
  }}
}
```

**Alternative simple output** (built-in nodes instead of VHS):
```json
{
  "12": { "class_type": "VAEDecode", "inputs": { "samples": ["9", 0], "vae": ["1", 2] }},
  "13": { "class_type": "CreateVideo", "inputs": { "images": ["12", 0], "fps": 25 }},
  "14": { "class_type": "SaveVideo", "inputs": { "video": ["13", 0], "filename_prefix": "video/ltxv2", "format": "auto", "codec": "auto" }}
}
```

## Camera Control LoRAs

Seven official camera control LoRAs from Lightricks:

| Movement | LoRA File |
|----------|-----------|
| Dolly Left | `ltx-2-19b-lora-camera-control-dolly-left.safetensors` |
| Dolly Right | `ltx-2-19b-lora-camera-control-dolly-right.safetensors` |
| Dolly In | `ltx-2-19b-lora-camera-control-dolly-in.safetensors` |
| Dolly Out | `ltx-2-19b-lora-camera-control-dolly-out.safetensors` |
| Jib Up | `ltx-2-19b-lora-camera-control-jib-up.safetensors` |
| Jib Down | `ltx-2-19b-lora-camera-control-jib-down.safetensors` |
| Static | `ltx-2-19b-lora-camera-control-static.safetensors` |

**Usage**: Apply with `LoraLoaderModelOnly` at strength **1.0**. Do NOT describe camera movement in your prompt — the LoRA handles it.

```json
{
  "class_type": "LoraLoaderModelOnly",
  "inputs": {
    "model": ["<checkpoint>", 0],
    "lora_name": "ltx-2-19b-lora-camera-control-dolly-left.safetensors",
    "strength_model": 1.0
  }
}
```

**Cannot combine** camera control LoRA with IC-LoRA (canny/depth/pose) in the same generation.

## Concept/Style LoRAs

Apply with `LoraLoaderModelOnly`. Typical strength: 0.5–1.0.

```json
{
  "class_type": "LoraLoaderModelOnly",
  "inputs": {
    "model": ["<checkpoint_or_camera_lora>", 0],
    "lora_name": "LTXV2\\concept\\LTX-2 - Better Female Nudity.safetensors",
    "strength_model": 0.8
  }
}
```

Concept/style LoRAs CAN be stacked with camera control LoRAs.

## VRAM Considerations

| Config | VRAM | Notes |
|--------|------|-------|
| bf16 checkpoint + FP4 Gemma | ~24GB+ | Tight on RTX 4090, may OOM |
| FP8 checkpoint + FP4 Gemma | ~16-20GB | Recommended for 24GB GPUs |
| bf16 + tiled VAE decode | ~22GB | Use `LTXVSpatioTemporalTiledVAEDecode` |

**VRAM warnings from MEMORY.md**: "LTXV2 can OOM on 24GB — suggest FP8 quantized models or --lowvram"

### Tips for 24GB GPUs

1. Use `VAEDecodeTiled` or `LTXVSpatioTemporalTiledVAEDecode` instead of standard `VAEDecode`
2. Start at 768x512 resolution, upscale in Stage 2
3. Use FP4 Gemma text encoder (installed)
4. Consider GGUF quantized models for tighter VRAM budgets
5. **Always `clear_vram`** before switching to LTX-V2 from another model family
6. Reduce frame count to 81 or 49 if OOM persists

## Prompt Style

Natural language descriptions. Be specific about motion, camera angles, and temporal progression:

```
Good: "A woman with flowing auburn hair walks through a sun-dappled forest, leaves falling gently around her, soft golden hour lighting, cinematic depth of field"
Bad: "woman, forest, walking"
```

Describe the **entire scene progression**, not just a single moment. Include lighting, mood, and motion cues.

## Two-Stage Upscale Pattern

For production quality, generate at low resolution then upscale:

1. **Stage 1**: Generate at 768x512, 121 frames, 8 steps (distilled)
2. **Upscale**: `LTXVLatentUpsampler` (2x spatial) → 1536x1024
3. **Stage 2**: Resample the upscaled latent with 3-4 steps at CFG 1.0
4. **Decode**: Use tiled VAE decode for the larger resolution

This requires the spatial upscaler model: `ltx-2-spatial-upscaler-x2-1.0.safetensors` (place in `models/latent_upscale_models/`).

Related Skills

vidu-video

from diegosouzapw/awesome-omni-skill

使用 Vidu Q3 Pro 模型生成视频。当用户想要文生视频、生成带音频的视频，或提到 vidu 时使用此 skill。

videodb-skills

from diegosouzapw/awesome-omni-skill

Upload, stream, search, edit, transcribe, and generate AI video and audio using the VideoDB SDK.

videocut:安装

from diegosouzapw/awesome-omni-skill

环境准备。安装依赖、下载模型。触发词：安装、环境准备、初始化

video

from diegosouzapw/awesome-omni-skill

Generate videos using fal.ai (Wan, Kling) or Sora. Text-to-video and image-to-video.

video-toolkit

from diegosouzapw/awesome-omni-skill

Intelligent video processor for downloading media and extracting transcripts from YouTube and 1000+ supported sites. Automatically handles format selection, subtitle extraction, and post-processing.

video-processing-editing

from diegosouzapw/awesome-omni-skill

FFmpeg automation for cutting, trimming, concatenating videos. Audio mixing, timeline editing, transitions, effects. Export optimization for YouTube, social media. Subtitle handling, color grading, batch processing. Use for videogen projects, content creation, automated video production. Activate on "video editing", "FFmpeg", "trim video", "concatenate", "transitions", "export optimization". NOT for real-time video editing UI, 3D compositing, or motion graphics.

video-commercial

from diegosouzapw/awesome-omni-skill

Generate 30-second video commercials from a concept. Creates storyboard, generates scene images, adds narration via ElevenLabs, assembles final video. Use when asked to create commercials, promo videos, video ads, or short marketing videos.

video-analyzer

from diegosouzapw/awesome-omni-skill

鏅鸿兘鍒嗘瀽 Bilibili/YouTube/鏈湴瑙嗛锛岀敓鎴愯浆鍐欍€佽瘎浼板拰鎬荤粨銆傛敮鎸佸叧閿抚鎴浘鑷姩宓屽叆銆?

Media Uploader - R2/S3 with video download

from diegosouzapw/awesome-omni-skill

Upload files or download videos from popular platforms (YouTube, Vimeo, Bilibili, etc.) and upload to Cloudflare R2, AWS S3, or any S3-compatible storage with secure presigned download links.

edu-video-analyzer

from diegosouzapw/awesome-omni-skill

Analyze educational YouTube channels for classroom adoption potential, curriculum alignment, and pedagogical effectiveness. Use when comparing educational video content (like MRU vs Crash Course), evaluating teaching methodologies, identifying content gaps for course design, or developing educational video strategy focused on student learning outcomes rather than monetization.

Automate YouTube Top-Ten Video Creation with OpenAI and Safe Image Search

from diegosouzapw/awesome-omni-skill

Integrates OpenAI API for content generation, Bing Image Search API for safe image retrieval, and Pexels API for video footage. Handles authentication via Bearer token, enforces safe search, formats ChatGPT responses into a top-ten list, and includes error handling for API failures.

apex-video-generator

from diegosouzapw/awesome-omni-skill

Generate real estate marketing videos from property data. Use when creating property showcases, social media content, market reports, or neighborhood tours. Integrates Firecrawl scraped data with Remotion rendering.