video-storytelling-core-principles

Core storytelling rules for AI video scripts: concrete metaphors instead of abstract jargon, the mute test (story reads without audio), visual contrast and closure, physically visible causes of failure or success, visualizing the “eureka” beat, camera motion tied to physics, in-scene transitions instead of black cuts, character consistency and multi-speaker action/lip sync timelines, and three-act pacing with Mandarin VO speed (~4.5 chars/s) and breathing room for action and SFX.

1,172 stars

Best use case

video-storytelling-core-principles is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Core storytelling rules for AI video scripts: concrete metaphors instead of abstract jargon, the mute test (story reads without audio), visual contrast and closure, physically visible causes of failure or success, visualizing the “eureka” beat, camera motion tied to physics, in-scene transitions instead of black cuts, character consistency and multi-speaker action/lip sync timelines, and three-act pacing with Mandarin VO speed (~4.5 chars/s) and breathing room for action and SFX.

Teams using video-storytelling-core-principles should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/video_script_review/SKILL.md --create-dirs "https://raw.githubusercontent.com/inclusionAI/AWorld/main/aworld-skills/video_script_review/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/video_script_review/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How video-storytelling-core-principles Compares

Feature / Agentvideo-storytelling-core-principlesStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Core storytelling rules for AI video scripts: concrete metaphors instead of abstract jargon, the mute test (story reads without audio), visual contrast and closure, physically visible causes of failure or success, visualizing the “eureka” beat, camera motion tied to physics, in-scene transitions instead of black cuts, character consistency and multi-speaker action/lip sync timelines, and three-act pacing with Mandarin VO speed (~4.5 chars/s) and breathing room for action and SFX.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

## 1. The “smoke and fire” rule (make it concrete)

*   **Reject vague grandeur**: Do **not** build visuals around abstract buzzwords like “system,” “architecture,” “underlying logic,” or “empowerment.”
*   **Use lived-in detail**: Map ideas to warm, everyday scenes—e.g. multi-agent teamwork as **tiny kitchen sprites dividing cake work**; single-model limits as **one person juggling chores until everything breaks**.
*   **Stress texture**: Prompts must highlight **physical material** (flour dust, cream sheen, strawberry color, oven glow) for “food appeal” or **satisfying, tactile** life moments.

## 2. The “mute” test

*   **Picture = story**: If you **mute** narration and dialogue, the audience should still follow setup, turn, and payoff. Voiceover **annotates** the image—images must not become a **slide deck for the VO**.
*   **Visual loop and contrast**:
  *   Problem and solution should read through **visual contrast**, not explanation-only VO.
  *   **Weak**: Scene 1 VO “solo is exhausting,” girl sighs; Scene 4 VO “teamwork is easy,” girl smiles—same vague staging.
  *   **Strong**: Scene 1—girl whisks with one hand and struggles to pour flour with the other → **flour explosion** (solo pain). Scene 4—friend takes whisking; girl can **sift flour calmly** (team gain). **Action contrast** closes the loop.

## 3. Physical logic and action breakdown

*   **Visible failure and success**:
  *   Conflict and outcome cannot be a vague label—they must split into **visible physical causes**.
  *   **Weak**: “She failed at the cake and got flour on her face.” (Why the face?)
  *   **Strong**: “Left hand whisks, right hand strains holding the flour bag, recipe in teeth. A sneeze drops the paper; hands slip; the bag tips and **flour blasts upward into her face**.” Tight chain, self-consistent motion.
*   **Visual bridge for the “aha”**:
  *   You cannot jump from “stuck” to “solution” without a **seen** link.
  *   If inspiration comes from watching something (e.g. ants), show **face change** (eyes widen, smile) and **physical action** (grabs a crayon)—then the next beat (drawing a plan) feels earned.

## 4. Camera narrative and transitions

*   **Motion serves physics**:
  *   Push, pull, pan, tilt must have a **story reason**—not motion for its own sake.
  *   Example: for a **sudden flour burst** hitting the face, keep the camera **locked (wide)** so flour **flies toward lens**—don’t chase the particles with the camera and dilute impact.
*   **In-scene transitions (no arbitrary black)**:
  *   Avoid lazy black frames or random hard cuts. Bridge shots with **moving or glowing elements already in frame**.
  *   **Example A (occlusion)**: A flour speck flies straight at a **fixed** camera, whites out the frame; white clears into a **micro world** shot.
  *   **Example B (light)**: An icon on paper **glows**; glow expands to fill screen; fade reveals a real kitchen counter.

## 5. Character consistency and sync

*   **Start/end consistency**: In diffusion video prompts, require **stable identity**—same character, outfit, and general framing **unless** a deliberate occlusion or scene reset (e.g. explicit style line: “same person, same clothes, same hairstyle as previous shot”).
*   **Multi-character action sync**: When several characters speak, the script must include a **beat timeline** (e.g. `[0–2s] A speaks`, `[2–5s] B speaks`) and specify **gesture size** and **mouth movement** in each window so picture matches dubbing.

## 6. Structure and pacing

*   **Compact three acts** (example for ~60s):
  *   **Opening (0–10s)**: Pain point—impossible task or chaos.
  *   **Middle (10–40s)**: Turning point (observe nature, code sketch, etc.)—dense execution and teamwork.
  *   **Ending (40–60s)**: Payoff—result, catharsis, theme lift.
*   **VO speed and breathing room**: Keep Mandarin VO at or below **~4.5 characters per second**. **Cut copy** before you cram frames—leave time for **physical acting**, **SFX**, and the viewer’s eye to rest.

---

## Document metadata

| Field | Value |
|-------|-------|
| Source | `story_skill.md` (Chinese); section 5 completed where the original had placeholders |
| Last updated | 2026-03-30 |

Related Skills

video-subtitles-and-audio-insert-workflow

1172
from inclusionAI/AWorld

Burn hard subtitles from UTF-8 SRT files using moviepy 2.x with CJK-capable system fonts; tune font size, placement, stroke, and encode settings (bitrate or CRF) to avoid oversized outputs. Documents ffprobe/ffmpeg workflows for inspection, encoding, and batch jobs; troubleshooting for fonts, bitrate, and pacing. Covers voiceover with edge-tts (voice selection, rate/volume/pitch), matching narration length to video with atempo/apad, and multi-scene pacing with breathing room. Targets moviepy 2.x and Python 3.x on macOS, Linux, and Windows.

ai-video-script-sop-remotion-diffusion

1172
from inclusionAI/AWorld

Standard operating procedure for automated AI video production using a Remotion (code) and diffusion (model) hybrid pipeline. Covers narrative DNA (hero, show-don’t-tell, three-act arc), technical specs (duration, integer segment lengths, resolution, fps, Mandarin pacing), tech-selection matrix (diffusion vs code), a five-part diffusion prompt protocol (style, micro-timing, entities, camera, transitions), end-to-end execution workflow, and a fixed output template (metadata table + per-shot table). Complements create-video and Remotion best-practice skills for execution quality.

embedded-video-pip-smooth-playback

1172
from inclusionAI/AWorld

Prevent stutter and frozen frames when embedding a child video inside a parent in code-driven pipelines (Remotion, After Effects scripting, FFmpeg filter graphs). Explains why sparse keyframes break frame-accurate seek during per-frame export, and how re-encoding with H.264 all-intra GOP (-g 1) and yuv420p makes every frame independently decodable. Includes FFmpeg command, parameter notes, file-size tradeoffs, and a reusable rule for any seek-heavy programmatic video workflow.

xhs-scraper

1172
from inclusionAI/AWorld

小红书搜索抓取 skill - 通过 agent-browser (CDP) 抓取小红书搜索结果,支持列表+详情、多格式输出。使用场景:按关键词抓取笔记列表与正文、生成 RSS/JSON/Markdown。

xhs-publisher

1172
from inclusionAI/AWorld

小红书发布 skill - 通过 agent-browser (CDP) 自动发布小红书图文笔记,支持多图上传、标题正文填写、一键发布。使用场景:自动化发布图文笔记到小红书创作中心。

read large webpage or knowledge

1172
from inclusionAI/AWorld

This skill is used for segmented reading and organization when facing large-scale knowledge bases or web pages. It captures original content segment by segment, summarizes key points in real-time, and continuously deposits them into the knowledge base, ensuring orderly information ingestion, clear structure, and traceability.

text2agent

1172
from inclusionAI/AWorld

Creates new agents from user requirements by generating Python implementation and mcp_config.

optimizer

1172
from inclusionAI/AWorld

Analyzes and automatically optimizes existing agents by improving system prompts and tool configuration.

media_comprehension

1172
from inclusionAI/AWorld

An intelligent assistant specialized in handling media files (images/audio/video). **Only for media file analysis**, does not handle document types.\n\n✅ Media files that can be processed:\n- Images: .jpg, .jpeg, .png, .gif, .bmp, .webp, .svg\n- Audio: .mp3, .wav, .m4a, .flac, .aac, .ogg\n- Video: .mp4, .avi, .mov, .mkv, .webm, .flv\n\n❌ Files that cannot be processed (please do not trigger this skill):\n- Documents: .pdf, .doc, .docx, .txt, .md, .rtf\n- Spreadsheets: .xlsx, .xls, .csv, .tsv\n- Presentations: .pptx, .ppt, .key\n- Code: .py, .js, .ts, .java, .cpp, .go, .rs\n- Archives: .zip, .tar, .gz, .rar, .7z\n- Executables: .exe, .bin, .app, .dmg\n- Databases: .db, .sqlite, .sql\n- Configuration files: .json, .xml, .yaml, .yml, .toml, .ini\n- Web pages: .html, .htm, .css\n\n**Trigger conditions**: When the user explicitly requests to analyze image/audio/video content, or when the file extension belongs to the aforementioned media types.".

last_7_days_news

1172
from inclusionAI/AWorld

Search and summarize the latest 7 days of AI news and X discussions using public sources plus browser-based X collection. Use for recent AI news, trends, X discussions, industry briefs, and summaries organized into hot topics, viewpoints, and opportunity areas.

app_evaluator

1172
from inclusionAI/AWorld

A professional skill for App Evaluation (evaluating app's performance with score) and App Improvement (giving professional suggestions for improving the app's performance).

agent-browser

1172
from inclusionAI/AWorld

Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.