image-to-video

FFmpeg-based video creation from image and audio.

290 stars

bynotque

View on GitHub Installation ↓

Best use case

image-to-video is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

FFmpeg-based video creation from image and audio.

Teams using image-to-video should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/image-to-video/SKILL.md --create-dirs "https://raw.githubusercontent.com/notque/claude-code-toolkit/main/skills/image-to-video/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/image-to-video/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How image-to-video Compares

Feature / Agent	image-to-video	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

FFmpeg-based video creation from image and audio.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Image to Video Skill

Combine a static image with an audio file to produce an MP4 video using FFmpeg. Supports resolution presets (1080p, 720p, square, vertical), optional audio visualization overlays (waveform, spectrum, cqt, bars), and batch processing of matched image+audio pairs. For image generation, use `gemini-image-generator` instead.

## Instructions

### Phase 1: VALIDATE

Confirm all prerequisites before attempting video creation.

**Step 1: Check FFmpeg installation**

Always run this check first -- many systems lack FFmpeg or have minimal builds, and skipping it produces confusing subprocess errors instead of clear install guidance.

```bash
ffmpeg -version
```

If FFmpeg is not installed, provide platform-specific install instructions and stop.

**Step 2: Verify input files exist**

Both the image and audio files must be confirmed present before processing. Use absolute paths for all arguments -- relative paths break silently when the script executes from a different working directory.

```bash
ls -la /absolute/path/to/image.png /absolute/path/to/audio.mp3
```

Confirm both files exist and have non-zero size. Supported formats:
- **Images**: PNG, JPG, JPEG, GIF, WEBP, BMP
- **Audio**: MP3, WAV, M4A, OGG, FLAC

**Step 3: Determine parameters**

Re-read the user's request before selecting defaults. Resolve resolution preset and visualization mode from what the user actually asked for. Only apply defaults (1080p, static) when the user did not specify -- defaulting to static when the user requested a visualization is a common mistake.

If the user mentions a target platform, select the matching preset to avoid cropping or black bars on delivery:

| Preset | Dimensions | Platform |
|--------|------------|----------|
| `1080p` | 1920x1080 | YouTube HD (default) |
| `720p` | 1280x720 | Standard HD, smaller files |
| `square` | 1080x1080 | Instagram, social media |
| `vertical` | 1080x1920 | Stories, Reels, TikTok |

Optional visualization modes (off unless the user requests one):
- `--visualization waveform` -- Neon waveform overlay
- `--visualization spectrum` -- Scrolling frequency spectrum
- `--visualization cqt` -- Piano-roll style bars
- `--visualization bars` -- Frequency bar graph

**Gate**: FFmpeg installed, both input files exist, parameters resolved. Proceed only when gate passes.

### Phase 2: PREPARE

Set up output path and confirm no conflicts.

**Step 1: Determine output path**

Use the path provided by the user. If none given, derive from the audio filename:
```
/same/directory/as/audio/filename.mp4
```

**Step 2: Ensure output directory exists**

The script creates parent directories automatically. Verify the target directory is writable.

**Gate**: Output path determined, directory accessible. Proceed only when gate passes.

### Phase 3: ENCODE

Execute FFmpeg to produce the video. Only implement what the user requested -- no extra visualizations or format conversions beyond MP4.

Encoding defaults: libx264 preset medium, CRF 23, yuv420p pixel format, 192k AAC audio.

**Step 1: Run the script**

```bash
python3 $HOME/claude-code-toolkit/skills/image-to-video/scripts/image_to_video.py \
--image /absolute/path/to/image.png \
--audio /absolute/path/to/audio.mp3 \
--output /absolute/path/to/output.mp4 \
--resolution 1080p \
--visualization static
```

For workspace batch mode (processes all matched pairs in `workspace/input/`):

```bash
python3 $HOME/claude-code-toolkit/skills/image-to-video/scripts/image_to_video.py \
--process-workspace \
--visualization waveform
```

**Step 2: Monitor output**

The script prints progress including input paths, resolution, visualization mode, and duration. Watch for ERROR lines in output.

**Gate**: Script exits with code 0. Proceed only when gate passes.

### Phase 4: VERIFY

Confirm the output video is valid. Do not report success based on exit code alone -- FFmpeg can exit 0 but produce a corrupt or zero-duration file.

**Step 1: Check file exists and has reasonable size**

```bash
ls -la /absolute/path/to/output.mp4
```

**Step 2: Probe video metadata**

File size alone does not prove video integrity. Always probe with ffprobe to confirm the output is a valid video with correct duration.

```bash
ffprobe -v error -show_entries format=duration,size -show_entries stream=codec_name,width,height \
-of default=noprint_wrappers=1 /absolute/path/to/output.mp4
```

Confirm video duration matches audio duration (within 1 second tolerance).

**Step 3: Report to user**

Provide: output file path, file size, duration, resolution, and visualization mode used.

**Gate**: Output file exists, duration matches audio, metadata is valid. Task complete.

## Error Handling

### Error: "FFmpeg is not installed or not in PATH"
Cause: FFmpeg binary not found on system
Solution:
1. Install via package manager: `brew install ffmpeg` (macOS), `sudo apt install ffmpeg` (Ubuntu)
2. Verify with `ffmpeg -version` after install
3. Ensure FFmpeg is in system PATH

### Error: "Image file not found" or "Audio file not found"
Cause: Path is incorrect, relative, or file does not exist
Solution:
1. Verify the path is absolute, not relative
2. Check file permissions with `ls -la`
3. Confirm the file extension matches a supported format

### Error: "FFmpeg failed" with filter errors
Cause: FFmpeg build lacks filter support (showwaves, showspectrum, showcqt)
Solution:
1. Install the full FFmpeg build, not a minimal variant
2. On Ubuntu: `sudo apt install ffmpeg` (full package)
3. Fall back to `--visualization static` which requires no special filters

### Error: "Could not determine audio duration"
Cause: Audio file is corrupted or uses an unsupported container format
Solution:
1. Test the audio independently: `ffprobe /path/to/audio.mp3`
2. Convert to a known format: `ffmpeg -i input.audio -acodec pcm_s16le output.wav`
3. Re-run with the converted file

## References

- `${CLAUDE_SKILL_DIR}/references/ffmpeg-filters.md`: FFmpeg filter documentation for visualization modes
- `${CLAUDE_SKILL_DIR}/scripts/image_to_video.py`: Python CLI script (exit codes: 0=success, 1=no FFmpeg, 2=encode failed, 3=missing args)

Related Skills

video-editing

290

from notque/claude-code-toolkit

Video editing pipeline: cut footage, assemble clips via FFmpeg and Remotion.

image-auditor

290

from notque/claude-code-toolkit

Non-destructive image validation for accessibility and health.

gemini-image-generator

290

from notque/claude-code-toolkit

Generate images from text prompts via Google Gemini.

x-api

290

from notque/claude-code-toolkit

Post tweets, build threads, upload media via the X API.

worktree-agent

290

from notque/claude-code-toolkit

Mandatory rules for agents in git worktree isolation.

workflow

290

from notque/claude-code-toolkit

Structured multi-phase workflows: review, debug, refactor, deploy, create, research, and more.

workflow-help

290

from notque/claude-code-toolkit

Interactive guide to workflow system: agents, skills, routing, execution patterns.

wordpress-uploader

290

from notque/claude-code-toolkit

WordPress REST API integration for posts and media uploads.

wordpress-live-validation

290

from notque/claude-code-toolkit

Validate published WordPress posts in browser via Playwright.

with-anti-rationalization

290

from notque/claude-code-toolkit

Anti-rationalization enforcement for maximum-rigor task execution.

voice-writer

290

from notque/claude-code-toolkit

Unified voice content generation pipeline with mandatory validation and joy-check. 8-phase pipeline: LOAD, GROUND, GENERATE, VALIDATE, REFINE, JOY-CHECK, OUTPUT, CLEANUP. Use when writing articles, blog posts, or any content that uses a voice profile. Use for "write article", "blog post", "write in voice", "generate content", "draft article", "write about".

voice-validator

290

from notque/claude-code-toolkit

Critique-and-rewrite loop for voice fidelity validation.