AI Agent Skill HUB

Content & Documentation

explainer

Create explainer videos with narration and AI-generated visuals. Triggers on: "解说视频", "explainer video", "explain this as a video", "tutorial video", "introduce X (video)", "解释一下XX（视频形式）".

3,891 stars

Complexity: medium

View on GitHub Installation ↓

About this skill

This AI agent skill enables the creation of explainer videos by combining a single narrator's voiceover with AI-generated visuals. It is particularly well-suited for product introductions, explaining complex concepts, and developing instructional tutorials. The agent guides the user through the video creation process, collecting necessary parameters via interactive multiple-choice questions. Once all inputs are gathered, the agent summarizes the task and proceeds to generate the video, offering flexibility to output either a text-only script or a complete video with synchronized audio and visuals.

Best use case

The primary use case for this skill is to rapidly produce engaging explainer and tutorial videos without requiring manual video production expertise. It significantly benefits marketers, educators, content creators, and businesses seeking to quickly disseminate information, introduce new products, or teach a concept through a visually rich, narrated format. Users requiring an efficient, AI-powered solution for video content creation will find this skill invaluable.

Create explainer videos with narration and AI-generated visuals. Triggers on: "解说视频", "explainer video", "explain this as a video", "tutorial video", "introduce X (video)", "解释一下XX（视频形式）".

A link to a completed explainer or tutorial video with AI-generated visuals and narration, or a text script for such a video, depending on user choice.

Practical example

Example input

Create an explainer video about the benefits of renewable energy.

Example output

Here is your explainer video about renewable energy: [Link to video]. It features a narrator explaining the concepts with AI-generated visuals.

When to use this skill

User wants to create an explainer or tutorial video.
User asks to explain something in video form.
User wants narrated content with AI-generated visuals.
User explicitly says "explainer video", "解说视频", or "tutorial video".

When not to use this skill

User wants audio-only content without visuals (use `/speech` or `/podcast` instead).
User wants a podcast-style discussion (use `/podcast` instead).
User wants to generate a standalone image (use `/image-gen` instead).
User wants to read text aloud without video (use `/speech` instead).

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/explainer-video/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/0xfango/explainer-video/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/explainer-video/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How explainer Compares

Feature / Agent	explainer	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	medium	N/A

Frequently Asked Questions

What does this skill do?

Create explainer videos with narration and AI-generated visuals. Triggers on: "解说视频", "explainer video", "explain this as a video", "tutorial video", "introduce X (video)", "解释一下XX（视频形式）".

How difficult is it to install?

The installation complexity is rated as medium. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agent for YouTube Script Writing

Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.

Top AI Agents for Productivity

See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.

AI Agents for Marketing

Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.

SKILL.md Source

## When to Use

- User wants to create an explainer or tutorial video
- User asks to "explain" something in video form
- User wants narrated content with AI-generated visuals
- User says "explainer video", "解说视频", "tutorial video"

## When NOT to Use

- User wants audio-only content without visuals (use `/speech` or `/podcast`)
- User wants a podcast-style discussion (use `/podcast`)
- User wants to generate a standalone image (use `/image-gen`)
- User wants to read text aloud without video (use `/speech`)

## Purpose

Generate explainer videos that combine a single narrator's voiceover with AI-generated visuals. Ideal for product introductions, concept explanations, and tutorials. Supports text-only script generation or full text + video output.

## Hard Constraints

- No shell scripts. Construct curl commands from the API reference files listed in Resources
- Always read `shared/authentication.md` for API key and headers
- Follow `shared/common-patterns.md` for polling, errors, and interaction patterns
- Always read config following `shared/config-pattern.md` before any interaction
- Never hardcode speaker IDs — always fetch from the speakers API
- Never save files to `~/Downloads/` — use `.listenhub/explainer/` from config
- Explainer uses exactly 1 speaker
- Mode must be `info` (for Info style) or `story` (for Story style) — never `slides` (use `/slides` skill instead)

<HARD-GATE>
Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any generation API until the user has explicitly confirmed.

</HARD-GATE>

## Step -1: API Key Check

Follow `shared/config-pattern.md` § API Key Check. If the key is missing, stop immediately.

## Step 0: Config Setup

Follow `shared/config-pattern.md` Step 0.

**If file doesn't exist** — ask location, then create immediately:
```bash
mkdir -p ".listenhub/explainer"
echo '{"outputDir":".listenhub","outputMode":"inline","language":null,"defaultStyle":null,"defaultSpeakers":{}}' > ".listenhub/explainer/config.json"
CONFIG_PATH=".listenhub/explainer/config.json"
# (or $HOME/.listenhub/explainer/config.json for global)
```
Then run **Setup Flow** below.

**If file exists** — read config, display summary, and confirm:
```
当前配置 (explainer)：
  输出方式：{inline / download / both}
  语言偏好：{zh / en / 未设置}
  默认风格：{info / story / 未设置}
  默认主播：{speakerName / 未设置}
```
Ask: "使用已保存的配置？" → **确认，直接继续** / **重新配置**

### Setup Flow (first run or reconfigure)

Ask these questions in order, then save all answers to config at once:

1. **outputMode**: Follow `shared/output-mode.md` § Setup Flow Question.

2. **Language** (optional): "默认语言？"
   - "中文 (zh)"
   - "English (en)"
   - "每次手动选择" → keep `null`

3. **Style** (optional): "默认风格？"
   - "Info — 信息展示型"
   - "Story — 故事叙述型"
   - "每次手动选择" → keep `null`

After collecting answers, save immediately:
```bash
# Follow shared/output-mode.md § Save to Config
NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
```

Note: `defaultSpeakers` are saved after generation (see After Successful Generation section).

## Interaction Flow

### Step 1: Topic / Content

Free text input. Ask the user:

> What would you like to explain or introduce?

Accept: topic description, text content, or concept to explain.

### Step 2: Language

If `config.language` is set, pre-fill and show in summary — skip this question.
Otherwise ask:

```
Question: "What language?"
Options:
  - "Chinese (zh)" — Content in Mandarin Chinese
  - "English (en)" — Content in English
```

### Step 3: Style

If `config.defaultStyle` is set, pre-fill and show in summary — skip this question.
Otherwise ask:

```
Question: "What style of explainer?"
Options:
  - "Info" — Informational, factual presentation style
  - "Story" — Narrative, storytelling approach
```

### Step 4: Speaker Selection

Follow `shared/speaker-selection.md` for the full selection flow, including:
- Default from `config.defaultSpeakers.{language}` (skip step if set)
- Text table + free-text input
- Input matching and re-prompt on no match

Only 1 speaker is supported for explainer videos.

### Step 5: Output Type

```
Question: "What output do you want?"
Options:
  - "Text script only" — Generate narration script, no video
  - "Text + Video" — Generate full explainer video with AI visuals
```

### Step 6: Confirm & Generate

Summarize all choices:

```
Ready to generate explainer:

  Topic: {topic}
  Language: {language}
  Style: {info/story}
  Speaker: {speaker name}
  Output: {text only / text + video}

  Proceed?
```

Wait for explicit confirmation before calling any API.

## Workflow

1. **Submit (foreground)**: `POST /storybook/episodes` with content, speaker, language, mode → extract `episodeId`
2. Tell the user the task is submitted
3. **Poll (background)**: Run the following **exact** bash command with `run_in_background: true` and `timeout: 600000`. Do NOT use python3, awk, or any other JSON parser — use `jq` as shown:

   ```bash
   EPISODE_ID="<id-from-step-1>"
   for i in $(seq 1 30); do
     RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/storybook/episodes/$EPISODE_ID" \
       -H "Authorization: Bearer $LISTENHUB_API_KEY" 2>/dev/null)
     STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.processStatus // "pending"')
     case "$STATUS" in
       success|completed) echo "$RESULT"; exit 0 ;;
       failed|error) echo "FAILED: $RESULT" >&2; exit 1 ;;
       *) sleep 10 ;;
     esac
   done
   echo "TIMEOUT" >&2; exit 2
   ```

4. When notified, **download and present script**:

   Read `OUTPUT_MODE` from config. Follow `shared/output-mode.md` for behavior.

   **`inline` or `both`**: Present the script inline.

   Present:
   ```
   解说脚本已生成！

   「{title}」

   在线查看：https://listenhub.ai/app/explainer/{episodeId}
   ```

   **`download` or `both`**: Also save the script file.
   - Create `.listenhub/explainer/YYYY-MM-DD-{episodeId}/`
   - Write `{episodeId}.md` from the generated script content
   - Present the download path in addition to the above summary.

5. **If video requested**: `POST /storybook/episodes/{episodeId}/video` (foreground) → **poll again (background)** using the **exact** bash command below with `run_in_background: true` and `timeout: 600000`. Poll for `videoStatus`, not `processStatus`:

   ```bash
   EPISODE_ID="<id-from-step-1>"
   for i in $(seq 1 30); do
     RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/storybook/episodes/$EPISODE_ID" \
       -H "Authorization: Bearer $LISTENHUB_API_KEY" 2>/dev/null)
     STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.videoStatus // "pending"')
     case "$STATUS" in
       success|completed) echo "$RESULT"; exit 0 ;;
       failed|error) echo "FAILED: $RESULT" >&2; exit 1 ;;
       *) sleep 10 ;;
     esac
   done
   echo "TIMEOUT" >&2; exit 2
   ```
6. When notified, **download and present result**:

**Present result**

Read `OUTPUT_MODE` from config. Follow `shared/output-mode.md` for behavior.

**`inline` or `both`**: Display video URL and audio URL as clickable links.

Present:
```
解说视频已生成！

视频链接：{videoUrl}
音频链接：{audioUrl}
时长：{duration}s
消耗积分：{credits}
```

**`download` or `both`**: Also download the audio file.
```bash
DATE=$(date +%Y-%m-%d)
JOB_DIR=".listenhub/explainer/${DATE}-{jobId}"
mkdir -p "$JOB_DIR"
curl -sS -o "${JOB_DIR}/{jobId}.mp3" "{audioUrl}"
```
Present the download path in addition to the above summary.

### After Successful Generation

Update config with the choices made this session:

```bash
NEW_CONFIG=$(echo "$CONFIG" | jq \
  --arg lang "{language}" \
  --arg style "{info/story}" \
  --arg speakerId "{speakerId}" \
  '. + {"language": $lang, "defaultStyle": $style, "defaultSpeakers": (.defaultSpeakers + {($lang): [$speakerId]})}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
```

**Estimated times**:
- Text script only: 2-3 minutes
- Text + Video: 3-5 minutes

## API Reference

- Speaker list: `shared/api-speakers.md`
- Speaker selection guide: `shared/speaker-selection.md`
- Episode creation: `shared/api-storybook.md`
- Polling: `shared/common-patterns.md` § Async Polling
- Config pattern: `shared/config-pattern.md`

## Composability

- **Invokes**: speakers API (for speaker selection); may invoke `/speech` for voiceover
- **Invoked by**: content-planner (Phase 3)

## Example

**User**: "Create an explainer video introducing Claude Code"

**Agent workflow**:
1. Topic: "Claude Code introduction"
2. Ask language → "English"
3. Ask style → "Info"
4. Fetch speakers, user picks "cozy-man-english"
5. Ask output → "Text + Video"

```bash
curl -sS -X POST "https://api.marswave.ai/openapi/v1/storybook/episodes" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "sources": [{"type": "text", "content": "Introduce Claude Code: what it is, key features, and how to get started"}],
    "speakers": [{"speakerId": "cozy-man-english"}],
    "language": "en",
    "mode": "info"
  }'
```

Poll until text is ready, then generate video if requested.

Related Skills

---

from openclaw/skills

name: article-factory-wechat

Content & Documentation

humanizer

from openclaw/skills

Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, negative parallelisms, and excessive conjunctive phrases.

Content & Documentation

linkedin-cli

from openclaw/skills

A bird-like LinkedIn CLI for searching profiles, checking messages, and summarizing your feed using session cookies.

Content & Documentation

小红书长图文发布 Skill

from openclaw/skills

## 概述

Content & Documentation

openclaw-youtube

from openclaw/skills

YouTube SERP Scout for agents. Search top-ranking videos, channels, and trends for content research and competitor tracking.

Content & Documentation

openclaw-media-gen

from openclaw/skills

Generate images & videos with AIsa. Gemini 3 Pro Image (image) + Qwen Wan 2.6 (video) via one API key.

Content & Documentation

Cold Email Writer

from openclaw/skills

Writes personalized cold emails that actually get replies

Content & Documentation

Presentation Mastery — Complete Slide Design & Delivery System

from openclaw/skills

You are a Presentation Architect. You help build presentations that persuade, inform, and move people to action. You cover the full lifecycle: audience analysis → narrative structure → slide design → delivery coaching → post-presentation follow-up.

Content & Documentation

ai-humanizer

from openclaw/skills

Rewrites AI-generated content to sound natural, human, and undetectable. Removes robotic patterns, adds voice variety, and preserves meaning.

Content & Documentation

Employee Handbook Generator

from openclaw/skills

Build a complete, customized employee handbook for your company. Covers policies, benefits, conduct, leave, remote work, DEI, and compliance — ready for legal review.

Content & Documentation

afrexai-copywriting-mastery

from openclaw/skills

Write high-converting copy for any medium — landing pages, emails, ads, UX, sales pages, video scripts, and brand voice. Complete methodology with frameworks, templates, scoring rubrics, and swipe files. Use when writing or reviewing any user-facing text.

Content & Documentation

afrexai-conversion-copywriting

from openclaw/skills

Write high-converting copy for any surface — landing pages, emails, ads, sales pages, product descriptions, CTAs, video scripts, and more. Complete conversion copywriting system with research methodology, 12 proven frameworks, swipe-file templates, scoring rubrics, and A/B testing protocols. Use when you need to write or review any copy meant to drive action.

Content & Documentation