ltx-video

Generate videos via LTX-2.3 API (ltx.video). Supports text-to-video, image-to-video, audio-to-video (lip-sync from audio + image), extend, and retake. Use when: generating AI video from text/image/audio, animating a portrait, creating lip-sync video from an existing image + audio recording.

1,864 stars

Best use case

ltx-video is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Generate videos via LTX-2.3 API (ltx.video). Supports text-to-video, image-to-video, audio-to-video (lip-sync from audio + image), extend, and retake. Use when: generating AI video from text/image/audio, animating a portrait, creating lip-sync video from an existing image + audio recording.

Teams using ltx-video should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ltx-video/SKILL.md --create-dirs "https://raw.githubusercontent.com/LeoYeAI/openclaw-master-skills/main/skills/ltx-video/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/ltx-video/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How ltx-video Compares

Feature / Agentltx-videoStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Generate videos via LTX-2.3 API (ltx.video). Supports text-to-video, image-to-video, audio-to-video (lip-sync from audio + image), extend, and retake. Use when: generating AI video from text/image/audio, animating a portrait, creating lip-sync video from an existing image + audio recording.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# LTX-2.3 Video API

## API Reference

**Base URL:** `https://api.ltx.video/v1`  
**Auth:** `Authorization: Bearer <API_KEY>`  
**Response:** MP4 binary (direct download, no polling)

### Endpoints

| Endpoint | Input | Use |
|----------|-------|-----|
| `/v1/text-to-video` | prompt | Generate video from text |
| `/v1/image-to-video` | image_uri + prompt | Animate a still image |
| `/v1/audio-to-video` | audio_uri + image_uri + prompt | Lip-sync video from audio + image |
| `/v1/extend` | video_uri + prompt | Extend a video at start or end |
| `/v1/retake` | video_uri + time range | Regenerate a section of a video |

### Models

| Model | Speed | Quality |
|-------|-------|---------|
| `ltx-2-3-fast` | ~17s | Good (use for tests) |
| `ltx-2-3-pro` | ~30-60s | Best (use for final) |

### Supported Resolutions

- `1920x1080` (landscape 16:9)
- `1080x1920` (portrait 9:16 — native vertical, trained on vertical data)
- `1440x1080`, `4096x2160` (text-to-video only)

**audio-to-video only supports:** `1920x1080` or `1080x1920`

---

## Quick Examples

### Text to Video
```bash
curl -X POST "https://api.ltx.video/v1/text-to-video" \
  -H "Authorization: Bearer $LTX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A man in a navy blue suit sits at a luxury restaurant table...",
    "model": "ltx-2-3-pro",
    "duration": 8,
    "resolution": "1920x1080"
  }' -o output.mp4
```

### Audio to Video (Lip-sync)
```bash
curl -X POST "https://api.ltx.video/v1/audio-to-video" \
  -H "Authorization: Bearer $LTX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_uri": "https://example.com/voice.mp3",
    "image_uri": "https://example.com/portrait.jpg",
    "prompt": "A man speaks directly to camera...",
    "model": "ltx-2-3-pro",
    "resolution": "1920x1080"
  }' -o output.mp4
```

### Python Wrapper
```python
import requests

def ltx_audio_to_video(audio_url, image_url, prompt, api_key,
                        model="ltx-2-3-pro", resolution="1920x1080",
                        output_path="output.mp4"):
    r = requests.post(
        "https://api.ltx.video/v1/audio-to-video",
        headers={"Authorization": f"Bearer {api_key}",
                 "Content-Type": "application/json"},
        json={"audio_uri": audio_url, "image_uri": image_url,
              "prompt": prompt, "model": model, "resolution": resolution},
        timeout=300, stream=True
    )
    if r.status_code != 200:
        raise RuntimeError(f"LTX error {r.status_code}: {r.text}")
    with open(output_path, "wb") as f:
        for chunk in r.iter_content(8192): f.write(chunk)
    return output_path
```

---

## ⚠️ Critical Rules (learned from experience)

### File Hosting
- URLs must be **HTTPS** — HTTP is rejected
- Files must return correct MIME type (not `application/octet-stream`)
- **uguu.se** works: upload with `curl -F "files[]=@file.mp3" https://uguu.se/upload`
- Audio: upload as **MP3** (not WAV) → uguu returns `audio/mpeg` ✅
- **4K images fail** → resize to 1920x1080 before uploading

```bash
# Upload MP3 to uguu.se
AUDIO_URL=$(curl -s -F "files[]=@audio.mp3" "https://uguu.se/upload" | \
  python3 -c "import sys,json; print(json.load(sys.stdin)['files'][0]['url'])")

# Upload image
IMAGE_URL=$(curl -s -F "files[]=@portrait.jpg" "https://uguu.se/upload" | \
  python3 -c "import sys,json; print(json.load(sys.stdin)['files'][0]['url'])")
```

### Image Size Limit
```bash
# Resize large images before upload
ffmpeg -y -i input_4k.png -vf "scale=1920:1080" output_1080.jpg
```

### Face Consistency
- Avoid prompts where the character **looks down** — breaks face consistency
- Keep head level and gaze forward throughout
- Place objects already in frame instead of having character reach below frame

### Last Frame
- LTX does **not** support first+last frame natively
- Workaround: generate clip A, generate clip B, then use `/v1/extend` to chain them

---

## Prompting Guide (LTX-2.3)

LTX-2.3 has a much stronger text connector. **Specificity wins.**

### 1. Use Verbs, Not Nouns
❌ `"A dramatic portrait of a man standing"`  
✅ `"A man stands on a rooftop. His coat flaps in the wind. He adjusts his collar and steps forward as the camera tracks right."`

### 2. Block the Scene Like a Director
- Specify **left vs right**, **foreground vs background**
- Describe **who moves**, **what moves**, **how they move**, **what the camera does**
- Spatial relationships are now respected

### 3. Describe Audio Explicitly (for text-to-video)
- Name the type of sound: dialogue, ambient, music
- Specify tone and intensity
- Example: `"His voice is clear and warm. Restaurant ambient sound softly in the background."`

### 4. Avoid Static Photo-Like Prompts
- If the prompt reads like a still image → the output behaves like one
- Add wind, motion, breathing, gestures, camera movement

### 5. Describe Texture and Material
- Hair, fabric, surface finish, lighting fall-off
- `"Individual hair strands visible in the backlight"` → now renders correctly

### 6. Portrait (9:16) Native
- `resolution: "1080x1920"` → trained on vertical data
- Frame for vertical intentionally, don't treat as cropped landscape

### 7. Complex Shots Work Now
- Layer multiple actions: `"He picks up the banana, raises it to his ear, and smirks"`
- Combine character performance + environment + camera motion

### Lip-Sync Prompt Template
```
A [description of person] sits/stands [location]. He/she speaks directly 
to camera, lips moving in perfect sync with his/her voice. [Gesture details]. 
Head stays level and gaze remains locked on camera throughout. 
[Environment description softly blurred in background]. 
[Lighting]. [Camera: holds steady at eye level, front-on].
```

---

## ComfyUI Node

Custom nodes for ComfyUI (no manual API calls):

```bash
cd ComfyUI/custom_nodes
git clone https://github.com/PauldeLavallaz/comfyui-ltx-node
```

Nodes: `LTX Text to Video`, `LTX Image to Video`, `LTX Extend Video`  
Category: **LTX Video**

---

## API Key
Paul's key: stored in `~/clawd/.env` as `LTX_API_KEY`  
```
ltxv_RfSU5hdKJb_g5dwbECZWnilE1P8dJzbavz6niP_0LQJ942ARHIVhrBCfebcytEL1efLVx_63S_PJyWTzicrBcWEkOXfCbGTl8JSzlJJk329MwRViEgOoE2KnE9LIA5t6QSFeBy7DLnTIcX0AZNbV9Jv0TuC7qcq2gV33G6ROhUVUDCuN
```

Related Skills

video-transcript-downloader

1864
from LeoYeAI/openclaw-master-skills

Download videos, audio, subtitles, and clean paragraph-style transcripts from YouTube and any other yt-dlp supported site. Use when asked to “download this video”, “save this clip”, “rip audio”, “get subtitles”, “get transcript”, or to troubleshoot yt-dlp/ffmpeg and formats/playlists.

video-frames

1864
from LeoYeAI/openclaw-master-skills

Extract frames or short clips from videos using ffmpeg.

sglang-diffusion-video

1864
from LeoYeAI/openclaw-master-skills

Generate videos using a local SGLang-Diffusion server (Wan2.2, Hunyuan, FastWan, etc.). Use when: user asks to generate, create, or render a video with a locally running SGLang-Diffusion instance. NOT for: cloud-hosted video APIs or image generation (use sglang-diffusion for images). Requires a running SGLang-Diffusion server with a video model loaded.

seek-and-analyze-video

1864
from LeoYeAI/openclaw-master-skills

Video intelligence and content analysis using Memories.ai LVMM. Discover videos on TikTok, YouTube, Instagram by topic or creator. Analyze video content, summarize meetings, build searchable knowledge bases across multiple videos. Use for video research, competitor content analysis, meeting notes, lecture summaries, or building video knowledge libraries.

citedy-video-shorts

1864
from LeoYeAI/openclaw-master-skills

Generate branded AI avatar lip-sync video shorts for TikTok, Reels, and YouTube Shorts. Create 15-second talking-head videos with custom avatars, auto-generated scripts, and burned-in subtitles for $1.85.

youtube-watcher

1864
from LeoYeAI/openclaw-master-skills

Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.

youtube-transcript

1864
from LeoYeAI/openclaw-master-skills

Fetch and summarize YouTube video transcripts. Use when asked to summarize, transcribe, or extract content from YouTube videos. Handles transcript fetching via residential IP proxy to bypass YouTube's cloud IP blocks.

youtube-auto-captions - YouTube 自动字幕

1864
from LeoYeAI/openclaw-master-skills

## 描述

youtube

1864
from LeoYeAI/openclaw-master-skills

YouTube Data API integration with managed OAuth. Search videos, manage playlists, access channel data, and interact with comments. Use this skill when users want to interact with YouTube. For other third party apps, use the api-gateway skill (https://clawhub.ai/byungkyu/api-gateway).

yahoo-finance

1864
from LeoYeAI/openclaw-master-skills

Get stock prices, quotes, fundamentals, earnings, options, dividends, and analyst ratings using Yahoo Finance. Uses yfinance library - no API key required.

xurl

1864
from LeoYeAI/openclaw-master-skills

A Twitter research and content intelligence skill focused on attracting WordPress and Shopify clients. Use to analyze Twitter profiles, threads, and conversations for: (1) Identifying what small agency founders and eCommerce brands are discussing; (2) Understanding pain points around WordPress performance, Shopify CRO, and development bottlenecks; (3) Extracting high-performing content angles; (4) Turning insights into authority-building posts; (5) Converting Twitter intelligence into business leverage for clear content angles, strong positioning, and qualified inbound leads.

xlsx

1864
from LeoYeAI/openclaw-master-skills

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.