image-to-video

Still-to-video conversion guide: model selection, motion prompting, and camera movement. Covers Wan 2.5 i2v, Seedance, Fabric, Grok Video with when to use each. Use for: animating images, creating video from stills, adding motion, product animations. Triggers: image to video, i2v, animate image, still to video, add motion to image, image animation, photo to video, animate still, wan i2v, image2video, bring image to life, animate photo, motion from image

1,592 stars

byopenakita

View on GitHub Installation ↓

Best use case

image-to-video is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using image-to-video should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/image-to-video/SKILL.md --create-dirs "https://raw.githubusercontent.com/openakita/openakita/main/skills/agent-browser/skills/image-to-video/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/image-to-video/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How image-to-video Compares

Feature / Agent	image-to-video	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agent for YouTube Script Writing

Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.

SKILL.md Source

# Image to Video

Convert still images to animated videos via [inference.sh](https://inference.sh) CLI.

## Quick Start

```bash
curl -fsSL https://cli.inference.sh | sh && infsh login

# Generate a still image
infsh app run falai/flux-dev-lora --input '{
  "prompt": "serene mountain lake at sunset, snow-capped peaks reflected in still water, golden hour light, landscape photography",
  "width": 1248,
  "height": 832
}'

# Animate it
infsh app run falai/wan-2-5-i2v --input '{
  "prompt": "gentle ripples on the lake surface, clouds slowly drifting, warm light shifting, birds flying in the distance",
  "image": "path/to/lake-image.png"
}'
```

> **Install note:** The [install script](https://cli.inference.sh) only detects your OS/architecture, downloads the matching binary from `dist.inference.sh`, and verifies its SHA-256 checksum. No elevated permissions or background processes. [Manual install & verification](https://dist.inference.sh/cli/checksums.txt) available.

## Model Selection

| Model | App ID | Best For | Motion Style |
|-------|--------|----------|-------------|
| **Wan 2.5 i2v** | `falai/wan-2-5-i2v` | Realistic motion, natural movement | Photorealistic, subtle |
| **Seedance 1.5 Pro** | `bytedance/seedance-1-5-pro` | Stylized, creative, animation-like | Artistic, expressive |
| **Seedance 1.0 Pro** | `bytedance/seedance-1-0-pro` | General purpose, good quality | Balanced |
| **Fabric 1.0** | `falai/fabric-1-0` | Cloth, fabric, liquid, flowing materials | Physics-based flow |
| **Grok Imagine Video** | `xai/grok-imagine-video` | General animation, text-guided | Versatile |

### When to Use Each

| Scenario | Best Model | Why |
|----------|-----------|-----|
| Landscape with water/clouds | **Wan 2.5 i2v** | Best at natural, realistic motion |
| Portrait with subtle expression | **Wan 2.5 i2v** | Maintains face fidelity |
| Product with fabric/cloth | **Fabric 1.0** | Specialized in material physics |
| Flag waving, curtain flowing | **Fabric 1.0** | Cloth simulation |
| Illustrated/artistic image | **Seedance** | Matches stylized content |
| General "bring to life" | **Seedance 1.5 Pro** | Good all-rounder |
| Quick test/iteration | **Seedance 1.0 Lite** | Fastest, 720p |

## Motion Types

### Camera Movement

| Movement | Prompt Keyword | Effect |
|----------|---------------|--------|
| Push in / Dolly forward | "slow dolly forward", "camera pushes in" | Increasing intimacy/focus |
| Pull out / Dolly back | "camera pulls back", "slow zoom out" | Reveal, context |
| Pan left/right | "camera pans slowly to the right" | Scanning, following |
| Tilt up/down | "camera tilts upward" | Revealing height |
| Orbit | "camera orbits around the subject" | 3D exploration |
| Crane up | "camera rises upward" | Grand reveal |
| Static | (no camera movement prompt) | Subject motion only |

### Subject Motion

| Type | Prompt Examples |
|------|----------------|
| Natural elements | "water rippling", "clouds drifting", "leaves rustling in breeze" |
| Hair/clothing | "hair blowing gently in wind", "dress fabric flowing" |
| Atmospheric | "fog slowly rolling", "dust particles floating in light beams" |
| Character | "person slowly turns to camera", "subtle breathing motion" |
| Mechanical | "gears turning", "clock hands moving" |
| Liquid | "coffee steam rising", "paint dripping", "water pouring" |

## Prompting Best Practices

### The Golden Rule: Subtle > Dramatic

AI video models produce better results with **gentle, subtle motion** than dramatic action. Requesting too much movement causes distortion and artifacts.

```
❌ "person running and jumping over obstacles while the camera spins"
✅ "person slowly walking forward, gentle breeze, camera follows alongside"

❌ "explosion with debris flying everywhere"
✅ "candle flame flickering gently, warm ambient light shifting"

❌ "fast zoom into the eyes with dramatic camera shake"
✅ "slow dolly forward toward the subject, subtle focus shift"
```

### Prompt Structure

```
[Camera movement] + [Subject motion] + [Atmospheric effects] + [Mood/pace]
```

### Examples by Scenario

```bash
# Landscape animation
infsh app run falai/wan-2-5-i2v --input '{
  "prompt": "gentle camera pan right, water reflecting moving clouds, trees swaying slightly in breeze, warm golden light, peaceful and slow",
  "image": "landscape.png"
}'

# Portrait animation
infsh app run falai/wan-2-5-i2v --input '{
  "prompt": "subtle breathing motion, slight head turn, natural eye blink, hair moving gently, soft ambient lighting shifts",
  "image": "portrait.png"
}'

# Product shot animation
infsh app run bytedance/seedance-1-5-pro --input '{
  "prompt": "slow 360 degree orbit around the product, gentle spotlight movement, subtle reflections shifting, premium product showcase, smooth motion",
  "image": "product.png"
}'

# Fabric/cloth animation
infsh app run falai/fabric-1-0 --input '{
  "prompt": "fabric flowing and rippling in gentle wind, natural cloth physics, soft movement",
  "image": "fabric-scene.png"
}'

# Architectural visualization
infsh app run falai/wan-2-5-i2v --input '{
  "prompt": "slow dolly forward through the entrance, slight camera tilt upward, ambient light filtering through windows, dust particles in light beams",
  "image": "building-interior.png"
}'
```

## Duration Guidelines

| Duration | Quality | Use For |
|----------|---------|---------|
| 2-3 seconds | Highest quality | GIFs, looping backgrounds, cinemagraphs |
| 4-5 seconds | High quality | Social media posts, product reveals |
| 6-8 seconds | Good quality | Short clips, transitions |
| 10+ seconds | Quality degrades | Avoid unless stitching shorter clips |

### Extending Duration

For longer videos, generate multiple short clips and stitch:

```bash
# Generate 3 clips from the same image with progressive motion
infsh app run falai/wan-2-5-i2v --input '{
  "prompt": "slow pan left, gentle water motion",
  "image": "scene.png"
}' --no-wait

infsh app run falai/wan-2-5-i2v --input '{
  "prompt": "continuing pan, clouds shifting, light changing",
  "image": "scene.png"
}' --no-wait

# Stitch together
infsh app run infsh/media-merger --input '{
  "media": ["clip1.mp4", "clip2.mp4"]
}'
```

## The Full Workflow

### Still-to-Final-Video Pipeline

```bash
# 1. Generate source image (best quality)
infsh app run bytedance/seedream-4-5 --input '{
  "prompt": "cinematic landscape, misty mountains at dawn, lake in foreground, dramatic clouds, golden hour, 4K quality, professional photography",
  "size": "2K"
}'

# 2. Animate the image
infsh app run falai/wan-2-5-i2v --input '{
  "prompt": "gentle mist rolling through the valley, lake surface rippling, clouds slowly moving, birds in distance, warm light shifting",
  "image": "landscape.png"
}'

# 3. Upscale video if needed
infsh app run falai/topaz-video-upscaler --input '{
  "video": "animated-landscape.mp4"
}'

# 4. Add ambient audio
infsh app run infsh/hunyuanvideo-foley --input '{
  "video": "animated-landscape.mp4",
  "prompt": "gentle nature ambience, distant birds, soft wind, water lapping"
}'

# 5. Merge video with audio
infsh app run infsh/video-audio-merger --input '{
  "video": "upscaled-landscape.mp4",
  "audio": "ambient-audio.mp3"
}'
```

## Cinemagraph Effect

A cinemagraph is a still photo where only one element moves (e.g., waterfall moving in an otherwise frozen scene). To achieve this:

1. Generate the still image with the motion element clearly defined
2. Prompt for motion only in that specific element
3. Keep to 2-4 seconds for seamless looping

```bash
infsh app run falai/wan-2-5-i2v --input '{
  "prompt": "only the waterfall is moving, everything else remains perfectly still, water cascading smoothly, rest of scene frozen",
  "image": "waterfall-scene.png"
}'
```

## Common Mistakes

| Mistake | Problem | Fix |
|---------|---------|-----|
| Too much motion requested | Distortion, artifacts, warping | Subtle > dramatic, always |
| Wrong model for content type | Poor results | Use selection guide above |
| Clips too long (10s+) | Quality degrades significantly | Keep to 3-5 seconds, stitch if needed |
| No camera movement specified | Random/unpredictable motion | Always specify camera behavior |
| Conflicting motion directions | Chaotic, unnatural | One primary motion direction |
| Low-res source image | Low-res video output | Start with highest quality source |
| Complex action scenes | Models can't handle | Keep motion simple and natural |

## Related Skills

```bash
npx skills add inference-sh/skills@ai-video-generation
npx skills add inference-sh/skills@ai-image-generation
npx skills add inference-sh/skills@video-prompting-guide
npx skills add inference-sh/skills@prompt-engineering
```

Browse all apps: `infsh app list`

Related Skills

openakita/skills@video-downloader

1592

from openakita/openakita

Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.

get-image-file

1592

from openakita/openakita

Get local file path of image sent by user. When user sends image, system auto-downloads it. When you need to process user's image or analyze image content.

generate-image

1592

from openakita/openakita

Generate images from text prompts using Qwen-Image (Dashscope). Saves output as local PNG files. Requires DASHSCOPE_API_KEY. Use deliver_artifacts to send generated images to IM chat.

openakita/skills@seedance-video

1592

from openakita/openakita

Generate AI videos using ByteDance Seedance models via Volcengine Ark API. Supports text-to-video, image-to-video (first frame, first+last frame, reference images), audio generation, and draft mode. Use when user wants to generate, create, or produce AI videos from text prompts or images.

openakita/skills@image-understanding

1592

from openakita/openakita

Analyze images using Dashscope (Qwen) Vision models for detailed description, OCR text extraction, object recognition, and visual Q&A. Use when the user needs to understand image content via Alibaba Cloud Dashscope API, especially for Chinese-language image analysis and documents.

openakita/skills@image-understander

1592

from openakita/openakita

Analyze images using GPT-4 Vision for detailed description, OCR text extraction, object recognition, and visual Q&A. Use when the user needs to understand image content, extract text from screenshots, identify objects in photos, or ask questions about images via OpenAI GPT-4 Vision API.

jimliu/baoyu-skills@baoyu-image-gen

1592

from openakita/openakita

Generate AI images using multiple providers (OpenAI DALL-E, Google Imagen, DashScope/Tongyi Wanxiang, Replicate). Supports various aspect ratios, quality presets, batch generation, and provider-specific prompt engineering techniques.

jimliu/baoyu-skills@baoyu-cover-image

1592

from openakita/openakita

Generates article cover images with 5 dimensions (type, palette, rendering, text, mood) combining 9 color palettes and 6 rendering styles. Supports cinematic (2.35:1), widescreen (16:9), and square (1:1) aspects. Use when user asks to "generate cover image", "create article cover", or "make cover".

openakita/skills@baidu-video-notes

1592

from openakita/openakita

AI Video Notes skill for video content analysis and note generation. Extract key information from videos for learning, meetings, and content summarization. Use when user wants to analyze video content or generate notes from videos.

video-prompting-guide

1592

from openakita/openakita

Best practices and techniques for writing effective AI video generation prompts. Covers: Veo, Seedance, Wan, Grok, Kling, Runway, Pika, Sora prompting strategies. Learn: shot types, camera movements, lighting, pacing, style keywords, negative prompts. Use for: improving video quality, getting consistent results, professional video prompts. Triggers: video prompt, how to prompt video, veo prompts, video generation tips, better ai video, video prompt engineering, video prompt guide, video prompt template, ai video tips, video prompt best practices, video prompt examples, cinematography prompts

video-ad-specs

1592

from openakita/openakita

Video ad creation with exact platform-specific specs for TikTok, Instagram, YouTube, Facebook, LinkedIn. Covers dimensions, duration limits, AIDA framework, and caption requirements. Use for: video ads, social media ads, paid media creative, video marketing, ad production. Triggers: video ad, social media ad, tiktok ad, instagram ad, youtube ad, facebook ad, linkedin ad, video creative, ad specs, paid media, video marketing, ad production, reels ad, stories ad, pre roll, bumper ad

og-image-design

1592

from openakita/openakita

Open Graph and social sharing image design with platform specs, text placement, and branding. Covers OG meta tags, Twitter cards, LinkedIn previews, and dynamic generation. Use for: social sharing images, blog thumbnails, link previews, social cards. Triggers: og image, open graph, social sharing image, twitter card, social card, link preview image, og meta, sharing preview, social thumbnail, meta image, og:image, twitter:image, linkedin preview