ai-video-script-sop-remotion-diffusion

Standard operating procedure for automated AI video production using a Remotion (code) and diffusion (model) hybrid pipeline. Covers narrative DNA (hero, show-don’t-tell, three-act arc), technical specs (duration, integer segment lengths, resolution, fps, Mandarin pacing), tech-selection matrix (diffusion vs code), a five-part diffusion prompt protocol (style, micro-timing, entities, camera, transitions), end-to-end execution workflow, and a fixed output template (metadata table + per-shot table). Complements create-video and Remotion best-practice skills for execution quality.

1,172 stars

byinclusionAI

View on GitHub Installation ↓

Best use case

ai-video-script-sop-remotion-diffusion is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using ai-video-script-sop-remotion-diffusion should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/video_script_writting/SKILL.md --create-dirs "https://raw.githubusercontent.com/inclusionAI/AWorld/main/aworld-skills/video_script_writting/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/video_script_writting/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How ai-video-script-sop-remotion-diffusion Compares

Feature / Agent	ai-video-script-sop-remotion-diffusion	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agent for YouTube Script Writing

Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

SKILL.md Source

## 1. Core narrative rules (narrative DNA)

To keep the video engaging (“satisfaction”), the script should follow:

*   **Single hero**: One core character drives the story through **action** that solves the problem.
*   **Show, don’t tell**: No inner monologue; emphasize **what happens on screen**.
*   **Three-part arc**:
    1.  **Opening (hook)**: A clear, seemingly impossible big task.
    2.  **Middle (grind)**: Dense, fast execution (cathartic, orderly).
    3.  **Ending (payoff)**: A strong visual reward.
*   **Radical brevity**: Voice and subtitles stay **1:1**; lines only **announce** or **briefly react**—let the pictures carry meaning.

## 2. Technical specs and limits

*   **Total length**: $1\ \text{min}$–$3\ \text{min}$.
*   **Segment length**: Must be an **integer** in seconds (e.g. $4.5\text{s} \rightarrow 5\text{s}$). Diffusion clips are capped at **$10\text{s}$** per segment.
*   **Resolution**: $1080\text{p}$ or $720\text{p}$.
*   **Frame rate**: $24\text{fps}$ or $30\text{fps}$.
*   **Mandarin VO baseline**: Plan copy at about **4–5 characters per second**.

## 3. Shot tech-selection matrix

| Need | Recommended tech | Why | Avoid |
| :--- | :--- | :--- | :--- |
| **Photoreal / complex lighting** | **Diffusion (video)** | Texture, mood, physics, transitions. | On-screen **text or charts** in the same shot; don’t mix code and diffusion **in one lens**. |
| **Character close-up / background change** | **Diffusion (I2V)** | Image-to-video keeps continuity. | Control **physical camera motion** strictly. |
| **Cartoon / vector motion** | **Code (SVG/TSX)** | Clean edges, flat look, precise paths. | Hard to express rich texture. |
| **Info / formulas / charts** | **Code (HTML/Remotion)** | Exact typography, math, data. | Don’t use for photoreal landscapes. |

---

## 4. Diffusion prompt protocol

**This is what keeps visuals high quality and coherent.** Every diffusion shot description should combine **five parts**:

$$ \text{Prompt} = \text{[Style anchor]} + \text{[Micro-timeline]} + \text{[Concrete entities]} + \text{[Camera physics]} + \text{[Physical bridge]} $$

### A. Style anchors

*   **Force consistency**: Start every shot with the **same style phrase**, e.g. `【Impressionist oil painting】` or `【Cyberpunk photoreal】`.
*   **Push intensity**: Use extreme wording—reject “fine.”
    *   *Weak:* “sunflowers”
    *   *Strong:* “**Van Gogh sunflowers as extremely thick, rough impasto in blazing yellow**”

### B. Micro-timing

*   **Avoid even mush**: State what happens **each second**.
    *   *Pattern:* `【0–2s】action A, 【2–10s】action B`.

### C. Concrete entities

*   **Make everything physical**: Turn abstractions into **objects**. Models don’t understand metaphor alone.
    *   *Weak:* “falling into despair”
    *   *Strong:* “**the floor collapses underfoot into a bottomless pit of black tar**”

### D. Camera physics

*   **Lock direction**: Say push in, pull back, pan.
*   **Keep inertia**: If the last shot **pushed in**, this shot must **continue** pushing in—random moves cause visual whiplash.

### E. Physical transitions

*   **Input dependency**: Say explicitly: “this shot is generated from the **last frame of the previous shot**.”
*   **No pop in/out**: Nothing vanishes without a process.
    *   *Weak:* “the house disappears”
    *   *Strong:* “**the house crumbles from the roof into golden sand blown away by wind**”

---

## 5. Execution workflow

1.  **Storyboard**: Lock the story, split into $N$ shots.
2.  **Duration math**:
    *   Write lines $\rightarrow$ count characters $\rightarrow$ divide by speech rate ($4.5$) $\rightarrow$ **round up** to duration $T$.
    *   *Check:* $T \le 10\text{s}$ for diffusion segments.
3.  **Continuity**:
    *   For each shot, define **start frame** and **end frame** sources.
    *   *Strategy A (Diff $\rightarrow$ Diff)*: previous **end frame** = next **start frame** (I2V).
    *   *Strategy B (Code $\rightarrow$ Diff)*: last **code frame export** = first **diffusion** frame.
4.  **Asset build**:
    *   Render all **silent** video segments.
    *   Generate matching **TTS** and **SRT**.
    *   **Verify:** $\sum(\text{segment durations}) = \text{total audio duration}$.
5.  **Final mux**: Remotion combines video, audio, and subtitle layers into MP4.

---

## 6. Standard script output template

When writing a script, use this structure.

### Video basics

*   **Theme**: [e.g. a developer sorting a mountain of messy code]
*   **Estimated total length**: $[xx]\ \text{s}$
*   **Resolution**: $1920 \times 1080$ ($1080\text{p}$)
*   **Style keywords**: [e.g. minimal, low-poly, cool palette]

### Shot execution table

| Shot ID | Duration (s) | Technique | Visual & diffusion prompt / code logic | Audio (VO + subtitles) | Transition strategy |
| :--- | :--- | :--- | :--- | :--- | :--- |
| **01** | 5 | **Diffusion** (T2V) | **[Style]** … <br> **[Time]** 【0–2s】… <br> **[Entity]** … <br> **[Camera]** … | “This is everything that piled up this week.” | **Cold open**: text-only generation; no prior frame. |
| **02** | 8 | **Code** (React/SVG) | **UI**: giant red progress bar SVG.<br> **Motion**: numbers jump 0%→99%; warning icon blinks. | “The system is on the edge.” | **Hard cut**: clean code look vs previous chaos. |
| **03** | 6 | **Diffusion** (I2V) | **[Style]** …<br> **[Bridge]** Start from the red warning; red **liquifies** into flowing lava… | “We must cool it down now.” | **I2V**: **Shot 02 last frame** → **Shot 03 first frame**. |
| … | … | … | … | … | … |

---

## Document metadata

| Field | Value |
|-------|-------|
| Source | `script_skill.md` (Chinese) |
| Last updated | 2026-03-30 |

Related Skills

video-subtitles-and-audio-insert-workflow

1172

from inclusionAI/AWorld

Burn hard subtitles from UTF-8 SRT files using moviepy 2.x with CJK-capable system fonts; tune font size, placement, stroke, and encode settings (bitrate or CRF) to avoid oversized outputs. Documents ffprobe/ffmpeg workflows for inspection, encoding, and batch jobs; troubleshooting for fonts, bitrate, and pacing. Covers voiceover with edge-tts (voice selection, rate/volume/pitch), matching narration length to video with atempo/apad, and multi-scene pacing with breathing room. Targets moviepy 2.x and Python 3.x on macOS, Linux, and Windows.

video-storytelling-core-principles

1172

from inclusionAI/AWorld

Core storytelling rules for AI video scripts: concrete metaphors instead of abstract jargon, the mute test (story reads without audio), visual contrast and closure, physically visible causes of failure or success, visualizing the “eureka” beat, camera motion tied to physics, in-scene transitions instead of black cuts, character consistency and multi-speaker action/lip sync timelines, and three-act pacing with Mandarin VO speed (~4.5 chars/s) and breathing room for action and SFX.

embedded-video-pip-smooth-playback

1172

from inclusionAI/AWorld

Prevent stutter and frozen frames when embedding a child video inside a parent in code-driven pipelines (Remotion, After Effects scripting, FFmpeg filter graphs). Explains why sparse keyframes break frame-accurate seek during per-frame export, and how re-encoding with H.264 all-intra GOP (-g 1) and yuv420p makes every frame independently decodable. Includes FFmpeg command, parameter notes, file-size tradeoffs, and a reusable rule for any seek-heavy programmatic video workflow.

xhs-scraper

1172

from inclusionAI/AWorld

小红书搜索抓取 skill - 通过 agent-browser (CDP) 抓取小红书搜索结果，支持列表+详情、多格式输出。使用场景：按关键词抓取笔记列表与正文、生成 RSS/JSON/Markdown。

xhs-publisher

1172

from inclusionAI/AWorld

小红书发布 skill - 通过 agent-browser (CDP) 自动发布小红书图文笔记，支持多图上传、标题正文填写、一键发布。使用场景：自动化发布图文笔记到小红书创作中心。

read large webpage or knowledge

1172

from inclusionAI/AWorld

This skill is used for segmented reading and organization when facing large-scale knowledge bases or web pages. It captures original content segment by segment, summarizes key points in real-time, and continuously deposits them into the knowledge base, ensuring orderly information ingestion, clear structure, and traceability.

text2agent

1172

from inclusionAI/AWorld

Creates new agents from user requirements by generating Python implementation and mcp_config.

optimizer

1172

from inclusionAI/AWorld

Analyzes and automatically optimizes existing agents by improving system prompts and tool configuration.

media_comprehension

1172

from inclusionAI/AWorld

An intelligent assistant specialized in handling media files (images/audio/video). **Only for media file analysis**, does not handle document types.\n\n✅ Media files that can be processed:\n- Images: .jpg, .jpeg, .png, .gif, .bmp, .webp, .svg\n- Audio: .mp3, .wav, .m4a, .flac, .aac, .ogg\n- Video: .mp4, .avi, .mov, .mkv, .webm, .flv\n\n❌ Files that cannot be processed (please do not trigger this skill):\n- Documents: .pdf, .doc, .docx, .txt, .md, .rtf\n- Spreadsheets: .xlsx, .xls, .csv, .tsv\n- Presentations: .pptx, .ppt, .key\n- Code: .py, .js, .ts, .java, .cpp, .go, .rs\n- Archives: .zip, .tar, .gz, .rar, .7z\n- Executables: .exe, .bin, .app, .dmg\n- Databases: .db, .sqlite, .sql\n- Configuration files: .json, .xml, .yaml, .yml, .toml, .ini\n- Web pages: .html, .htm, .css\n\n**Trigger conditions**: When the user explicitly requests to analyze image/audio/video content, or when the file extension belongs to the aforementioned media types.".

last_7_days_news

1172

from inclusionAI/AWorld

Search and summarize the latest 7 days of AI news and X discussions using public sources plus browser-based X collection. Use for recent AI news, trends, X discussions, industry briefs, and summaries organized into hot topics, viewpoints, and opportunity areas.

app_evaluator

1172

from inclusionAI/AWorld

A professional skill for App Evaluation (evaluating app's performance with score) and App Improvement (giving professional suggestions for improving the app's performance).

agent-browser

1172

from inclusionAI/AWorld

Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.