ai-avatar-video

Create AI avatar and talking head videos with OmniHuman, Fabric, PixVerse via inference.sh CLI. Models: OmniHuman 1.5, OmniHuman 1.0, Fabric 1.0, PixVerse Lipsync. Capabilities: audio-driven avatars, lipsync videos, talking head generation, virtual presenters. Use for: AI presenters, explainer videos, virtual influencers, dubbing, marketing videos. Triggers: ai avatar, talking head, lipsync, avatar video, virtual presenter, ai spokesperson, audio driven video, heygen alternative, synthesia alternative, talking avatar, lip sync, video avatar, ai presenter, digital human

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

ai-avatar-video is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using ai-avatar-video should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/ai-avatar-video-romaingras42/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/tools/ai-avatar-video-romaingras42/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/ai-avatar-video-romaingras42/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ai-avatar-video Compares

Feature / Agent	ai-avatar-video	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# AI Avatar & Talking Head Videos

Create AI avatars and talking head videos via [inference.sh](https://inference.sh) CLI.

![AI Avatar & Talking Head Videos](https://cloud.inference.sh/app/files/u/4mg21r6ta37mpaz6ktzwtt8krr/01kg0tszs96s0n8z5gy8y5mbg7.jpeg)

## Quick Start

```bash
curl -fsSL https://cli.inference.sh | sh && infsh login

# Create avatar video from image + audio
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'
```

## Available Models

| Model | App ID | Best For |
|-------|--------|----------|
| OmniHuman 1.5 | `bytedance/omnihuman-1-5` | Multi-character, best quality |
| OmniHuman 1.0 | `bytedance/omnihuman-1-0` | Single character |
| Fabric 1.0 | `falai/fabric-1-0` | Image talks with lipsync |
| PixVerse Lipsync | `falai/pixverse-lipsync` | Highly realistic |

## Search Avatar Apps

```bash
infsh app list --search "omnihuman"
infsh app list --search "lipsync"
infsh app list --search "fabric"
```

## Examples

### OmniHuman 1.5 (Multi-Character)

```bash
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'
```

Supports specifying which character to drive in multi-person images.

### Fabric 1.0 (Image Talks)

```bash
infsh app run falai/fabric-1-0 --input '{
  "image_url": "https://face.jpg",
  "audio_url": "https://audio.mp3"
}'
```

### PixVerse Lipsync

```bash
infsh app run falai/pixverse-lipsync --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'
```

Generates highly realistic lipsync from any audio.

## Full Workflow: TTS + Avatar

```bash
# 1. Generate speech from text
infsh app run infsh/kokoro-tts --input '{
  "text": "Welcome to our product demo. Today I will show you..."
}' > speech.json

# 2. Create avatar video with the speech
infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://presenter-photo.jpg",
  "audio_url": "<audio-url-from-step-1>"
}'
```

## Full Workflow: Dub Video in Another Language

```bash
# 1. Transcribe original video
infsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json

# 2. Translate text (manually or with an LLM)

# 3. Generate speech in new language
infsh app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json

# 4. Lipsync the original video with new audio
infsh app run infsh/latentsync-1-6 --input '{
  "video_url": "https://original-video.mp4",
  "audio_url": "<new-audio-url>"
}'
```

## Use Cases

- **Marketing**: Product demos with AI presenter
- **Education**: Course videos, explainers
- **Localization**: Dub content in multiple languages
- **Social Media**: Consistent virtual influencer
- **Corporate**: Training videos, announcements

## Tips

- Use high-quality portrait photos (front-facing, good lighting)
- Audio should be clear with minimal background noise
- OmniHuman 1.5 supports multiple people in one image
- LatentSync is best for syncing existing videos to new audio

## Related Skills

```bash
# Full platform skill (all 150+ apps)
npx skills add inferencesh/skills@inference-sh

# Text-to-speech (generate audio for avatars)
npx skills add inferencesh/skills@text-to-speech

# Speech-to-text (transcribe for dubbing)
npx skills add inferencesh/skills@speech-to-text

# Video generation
npx skills add inferencesh/skills@ai-video-generation

# Image generation (create avatar images)
npx skills add inferencesh/skills@ai-image-generation
```

Browse all video apps: `infsh app list --category video`

## Documentation

- [Running Apps](https://inference.sh/docs/apps/running) - How to run apps via CLI
- [Content Pipeline Example](https://inference.sh/docs/examples/content-pipeline) - Building media workflows
- [Streaming Results](https://inference.sh/docs/api/sdk/streaming) - Real-time progress updates

Related Skills

analyze-video-performance-skill

from diegosouzapw/awesome-omni-skill

Analyze your uploaded YouTube videos' performance against competitor benchmarks using Playwright MCP. Use when asked to review video analytics, compare performance, check how a video is doing, or reflect on a video's success.

video-sdk/web

from diegosouzapw/awesome-omni-skill

Zoom Video SDK for Web - JavaScript/TypeScript integration for browser-based video sessions, real-time communication, screen sharing, recording, and live transcription

Fixed Video Format (9:16)

from diegosouzapw/awesome-omni-skill

Fixed 1080x1920 pixel video format with percentage-based positioning. Use this when laying out video compositions, positioning elements on the canvas, or calculating dimensions. All videos render at exactly 9:16 aspect ratio for TikTok/Instagram Reels.

avatar

from diegosouzapw/awesome-omni-skill

Control the VTuber avatar system — speak through it with lip sync, change expressions, manage the avatar renderer and control server. Use when interacting with the avatar, making it speak, changing expressions, or troubleshooting avatar connection issues.

smart-short-video

from diegosouzapw/awesome-omni-skill

智能短影片生成器 - 混合 AI 圖片與原始影片片段

vidu-video

from diegosouzapw/awesome-omni-skill

使用 Vidu Q3 Pro 模型生成视频。当用户想要文生视频、生成带音频的视频，或提到 vidu 时使用此 skill。

videodb-skills

from diegosouzapw/awesome-omni-skill

Upload, stream, search, edit, transcribe, and generate AI video and audio using the VideoDB SDK.

videocut:安装

from diegosouzapw/awesome-omni-skill

环境准备。安装依赖、下载模型。触发词：安装、环境准备、初始化

video

from diegosouzapw/awesome-omni-skill

Generate videos using fal.ai (Wan, Kling) or Sora. Text-to-video and image-to-video.

video-toolkit

from diegosouzapw/awesome-omni-skill

Intelligent video processor for downloading media and extracting transcripts from YouTube and 1000+ supported sites. Automatically handles format selection, subtitle extraction, and post-processing.

video-processing-editing

from diegosouzapw/awesome-omni-skill

FFmpeg automation for cutting, trimming, concatenating videos. Audio mixing, timeline editing, transitions, effects. Export optimization for YouTube, social media. Subtitle handling, color grading, batch processing. Use for videogen projects, content creation, automated video production. Activate on "video editing", "FFmpeg", "trim video", "concatenate", "transitions", "export optimization". NOT for real-time video editing UI, 3D compositing, or motion graphics.

video-commercial

from diegosouzapw/awesome-omni-skill

Generate 30-second video commercials from a concept. Creates storyboard, generates scene images, adds narration via ElevenLabs, assembles final video. Use when asked to create commercials, promo videos, video ads, or short marketing videos.