digital-human-api

Digital human video generation via Qingyun API — avatar-based talking head videos

33 stars

byaAAaqwq

View on GitHub Installation ↓

Best use case

digital-human-api is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Digital human video generation via Qingyun API — avatar-based talking head videos

Teams using digital-human-api should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/digital-human-api/SKILL.md --create-dirs "https://raw.githubusercontent.com/aAAaqwq/AGI-Super-Team/main/skills/digital-human-api/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/digital-human-api/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How digital-human-api Compares

Feature / Agent	digital-human-api	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Digital human video generation via Qingyun API — avatar-based talking head videos

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# digital-human-api v3

基于青云API的通用数字人口播视频生成 Skill。

**v3核心改进：每shot生成专属场景图（Daniel真脸 + 个性化场景），视频自然不抽象。**

## 触发条件
- `数字人视频`、`口播视频`、`digital human`
- 基于剧本生成分镜头数字人视频

## v3 新流程（4步/shot）

```
剧本JSON → [Scene Image] → [TTS] → [Kling Video] → [Lip Sync] → FFmpeg合并
              ↑ 新增：每个shot独立生成贴合场景的图片
```

**每shot独立流程：**
1. 🖼️ **场景图生成** — Grok依据参考脸生成贴合场景的图片（保持Daniel的脸）
2. 📝 **TTS语音** — Gemini生成口播音频
3. 🎬 **Kling视频** — 场景图 + 动作提示词 → 动态视频
4. 👄 **对口型** — Kling LipSync音画同步
5. 🔗 **FFmpeg合并** — 所有shot + BGM → 最终视频

## v3 剧本格式

```json
{
  "title": "视频标题",
  "avatar_image": "/path/to/daniel-headshot.jpg",
  "shots": [
    {
      "id": 1,
      "text": "口播文案",
      "emotion": "sarcastic",
      "scene_description": "（可选）场景图详细描述",
      "duration": 5
    }
  ]
}
```

### emotion 可选值

| emotion | 动作风格 |
|---------|---------|
| `serious` | 严肃直视镜头 |
| `friendly` | 友好微笑 |
| `excited` | 兴奋手势多 |
| `sarcastic` | 讽刺挑眉 |
| `storytelling` | 讲故事手势 |
| `humorous` | 幽默轻松 |
| `intense` | 紧张/激动 |
| `confident` | 自信权威 |
| `questioning` | 疑惑歪头 |
| `casual` | 日常对话 |

### scene_description 写法

描述越具体，场景图越贴合。建议格式：
- 人物表情+动作（如：raised eyebrow, holding coffee cup）
- 场景（如：modern cafe, restaurant table）
- 光线（如：warm natural lighting）
- 风格（如：realistic photo, shot on iPhone）

## 使用方式

```bash
export QINGYUN_API_KEY=$(pass show api/qingyun)

# 完整流水线
python3 scripts/generate.py --script script.json --concurrent 1

# 分步执行
python3 scripts/generate.py --script script.json --step image    # 场景图
python3 scripts/generate.py --script script.json --step tts      # TTS语音
python3 scripts/generate.py --script script.json --step video     # Kling视频
python3 scripts/generate.py --script script.json --step lipsync  # 对口型
python3 scripts/generate.py --script script.json --step merge    # 合并
```

## 输出结构

```
output_dir/
├── shot_01_scene.jpg          # 场景原图
├── shot_01_scene_768.jpg      # 适配Kling的尺寸
├── shot_01_audio.mp3          # TTS语音
├── shot_01_video.mp4          # Kling视频
├── shot_01_lipsync.mp4        # 对口型完成
├── ...
└── final.mp4                  # 最终视频
```

## 已知限制

| 问题 | 解决 |
|------|------|
| 视频太抽象 | v3改用场景图，每个shot独立生成 |
| 429限流 | 并发=1，轮询间隔15s |
| 图片像素无效 | 自动resize到768px宽 |
| Grok场景图失败 | 自动降级到无ref生成 |

## 文件清单

```
digital-human-api/
├── SKILL.md              # 本文件
├── scripts/
│   ├── generate.py       # 主脚本 v3（~800行）
│   └── config.yaml       # 配置 v3
```

Related Skills

jimeng-digital-human

from aAAaqwq/AGI-Super-Team

即梦AI数字人视频生成全流程自动化。通过浏览器自动化操控 jimeng.jianying.com 数字人界面，完成角色上传、音色选择、台词填入、视频生成和下载。触发场景：用户需要生成数字人视频、即梦数字人、AI数字人口播视频、数字人视频制作。依赖 jimeng-login skill 处理登录。

humanizer

from aAAaqwq/AGI-Super-Team

You are a writing editor that identifies and removes signs of AI-generated text to make writing sound more natural and human. This guide is based on W

wemp-operator

from aAAaqwq/AGI-Super-Team

> 微信公众号全功能运营——草稿/发布/评论/用户/素材/群发/统计/菜单/二维码 API 封装

Content & Documentation

zsxq-smart-publish

from aAAaqwq/AGI-Super-Team

Publish and manage content on 知识星球 (zsxq.com). Supports talk posts, Q&A, long articles, file sharing, digest/bookmark, homework tasks, and tag management. Use when publishing content to 知识星球, creating/editing posts, uploading files/images/audio, managing digests, batch publishing, or formatting content for 知识星球.

zoom-automation

from aAAaqwq/AGI-Super-Team

Automate Zoom meeting creation, management, recordings, webinars, and participant tracking via Rube MCP (Composio). Always search tools first for current schemas.

zoho-crm-automation

from aAAaqwq/AGI-Super-Team

Automate Zoho CRM tasks via Rube MCP (Composio): create/update records, search contacts, manage leads, and convert leads. Always search tools first for current schemas.

ziliu-publisher

from aAAaqwq/AGI-Super-Team

字流(Ziliu) - AI驱动的多平台内容分发工具。用于一次创作、智能适配排版、一键分发到16+平台（公众号/知乎/小红书/B站/抖音/微博/X等）。当用户需要多平台发布、内容排版、格式适配时使用。触发词：字流、ziliu、多平台发布、一键分发、内容分发、排版发布。

zhihu-post-skill

from aAAaqwq/AGI-Super-Team

> 知乎文章发布——知乎平台内容创作与发布自动化

zendesk-automation

from aAAaqwq/AGI-Super-Team

Automate Zendesk tasks via Rube MCP (Composio): tickets, users, organizations, replies. Always search tools first for current schemas.

youtube-knowledge-extractor

from aAAaqwq/AGI-Super-Team

This skill performs deep analysis of YouTube videos through **both information channels** Multimodal YouTube video analysis through both audio (transcript) and visual (frame extraction + image analysis) channels. Especially powerful for HowTo videos, tutorials, demos, and explainer videos where what is SHOWN (screenshots, UI demos, diagrams, code, physical actions) is just as important as what is SAID. Use this skill whenever a user wants to analyze, summarize, or create step-by-step guides from YouTube videos, or when they share a YouTube URL and want to understand what happens in the video. Triggers on requests like "Analyze this YouTube video", "Create a step-by-step guide from this video", "What does this video show?", "Summarize this tutorial", or any YouTube URL shared with analysis intent.

youtube-factory

from aAAaqwq/AGI-Super-Team

Generate complete YouTube videos from a single prompt - script, voiceover, stock footage, captions, thumbnail. Self-contained, no external modules. 100% free tools.

youtube-automation

from aAAaqwq/AGI-Super-Team

Automate YouTube tasks via Rube MCP (Composio): upload videos, manage playlists, search content, get analytics, and handle comments. Always search tools first for current schemas.