av-sync-workflow

Audio-to-video synchronization workflow: analyze audio (beats, tempo, emotion, mood), find/match video clips to match scene and feeling, sync cuts to music beats, generate beat-marked videos. Use when user wants to: (1) turn a song into a music video, (2) sync video clips to music beats, (3) create a video that matches audio mood/scene/rhythm, (4) do beat-matching video editing. Triggers: "制作音乐视频", "音频转视频", "beat matching", "卡点视频", "音视频同步", "视频踩点", "music video creation", "sync video to audio"

33 stars

byaAAaqwq

View on GitHub Installation ↓

Best use case

av-sync-workflow is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using av-sync-workflow should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/av-sync-workflow/SKILL.md --create-dirs "https://raw.githubusercontent.com/aAAaqwq/AGI-Super-Team/main/skills/av-sync-workflow/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/av-sync-workflow/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How av-sync-workflow Compares

Feature / Agent	av-sync-workflow	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# AV-Sync Workflow

Transform audio into a professionally edited video synchronized to beats, mood, and scene.

## Workflow Overview

```
Audio → Analysis → Clip Matching → Beat Sync → Video Assembly → Export
```

## Step 1: Analyze Audio

Use `scripts/audio_analysis.py` to extract:
- **Beats/BPM**: Timestamp of each beat, overall tempo (BPM)
- **Sections**: Verse, chorus, bridge, outro markers
- **Emotion/Mood**: Energy level, valence (happy/sad), tempo category
- **Key moments**: High-impact points (drops, climaxes, transitions)

```bash
python3 scripts/audio_analysis.py /path/to/song.mp3 --output /tmp/analysis.json
```

Output structure:
```json
{
  "bpm": 120,
  "duration": 214,
  "beats": [0.0, 0.5, 1.0, ...],
  "sections": [
    {"type": "intro", "start": 0, "end": 15},
    {"type": "verse", "start": 15, "end": 45},
    {"type": "chorus", "start": 45, "end": 75}
  ],
  "mood": {"energy": 0.7, "valence": 0.6, "danceability": 0.8},
  "key_moments": [
    {"time": 45.0, "type": "chorus_drop", "intensity": 1.0}
  ]
}
```

## Step 2: Gather Video Clips

User provides video clips OR search for stock footage:

**Stock footage sources:**
- Pexels: `https://www.pexels.com/search/videos/{query}/`
- Pixabay: `https://pixabay.com/videos/search/{query}/`
- Coverr: `https://coverr.co/search/{query}`

**Download stock video:**
```bash
# Via yt-dlp (for pexels/pixabay)
yt-dlp -f "best[height<=1080]" -o "/tmp/clip_%(id)s.%(ext)s" "https://pexels.com/video/12345"

# Via direct URL
ffmpeg -i "https://example.com/video.mp4" -c copy /tmp/clip.mp4
```

## Step 3: Analyze Each Clip

For each clip, extract:
- Scene type (indoor/outdoor, city/nature, close-up/wide)
- Mood/style (energetic/calm, happy/sad)
- Duration and cut points
- Visual elements (faces, motion, colors)

```bash
python3 scripts/video_analysis.py /tmp/clip.mp4 --output /tmp/clip_analysis.json
```

## Step 4: Match Clips to Audio Sections

Algorithm: Map clips to audio sections based on:
1. **Emotion matching**: High-energy chorus → energetic clips
2. **Scene continuity**: Smooth transitions between scenes
3. **Beat alignment**: Cut on beats for rhythm
4. **Length fit**: Clip duration matches section duration

```bash
python3 scripts/match_clips.py \
  --audio-analysis /tmp/analysis.json \
  --clips /tmp/clip1.mp4,/tmp/clip2.mp4 \
  --clip-analyses /tmp/clip1_analysis.json,/tmp/clip2_analysis.json \
  --output /tmp/edit_plan.json
```

## Step 5: Generate Beat-Synced Video

```bash
python3 scripts/assemble_video.py \
  --edit-plan /tmp/edit_plan.json \
  --audio /path/to/song.mp3 \
  --output /tmp/final_video.mp4 \
  --format mp4 \
  --codec h264 \
  --quality high
```

## Reference Scripts

### `scripts/audio_analysis.py`

Analyzes audio file using librosa. Extracts:
- Beat timestamps (per-beat and bar-level)
- BPM
- Onset strength envelope
- Spectral features for mood
- librosa-beat-grid output option

### `scripts/video_analysis.py`

Analyzes video clip:
- Dominant colors / color mood
- Scene type classification (urban, nature, indoor, etc.)
- Motion level (static, moderate, high)
- Detected faces / people
- Suggested cut points (scene changes)

### `scripts/match_clips.py`

Intelligent clip-to-audio matching:
- Emotion/mood alignment scoring
- Scene variety ensuring no repetitive cuts
- Beat-synced cut point optimization
- Output: detailed edit decision list (EDL)

### `scripts/assemble_video.py`

Final video assembly:
- Apply cut points from edit plan
- Add smooth transitions (dissolve, fade)
- Add slow-motion on climactic beats
- Mix audio track
- Export at specified quality

## Beat-Sync Cut Points

For every beat in the audio, consider:
- **Strong beat (bar 1)**: Major cut or transition
- **Weak beat (bar 2-4)**: Minor cut or no cut
- **Off-beat**: Effect triggers (zoom, flash)

Standard cut cadence:
- 4-beat bars: Cut every 4 or 8 beats
- Chorus: Cut every 2 beats for high energy
- Outro: Gradual slowdown, fade

## Quick Start (Minimal)

If user provides just audio + one video:
```bash
# 1. Detect beats
python3 scripts/audio_analysis.py song.mp3 -o beats.json

# 2. Simple beat-sync assembly
python3 scripts/simple_sync.py --audio song.mp3 --clip video.mp4 --beats beats.json -o output.mp4
```

## Quality Settings

| Quality | Resolution | Bitrate | Use Case |
|---------|------------|---------|----------|
| draft | 720p | 2Mbps | Quick preview |
| standard | 1080p | 5Mbps | Social media |
| high | 1080p | 10Mbps | YouTube |
| premium | 4K | 20Mbps | Final output |

## Key Notes

- **FFmpeg required**: Most scripts depend on ffmpeg being installed
- **Audio duration vs video clips**: If clips shorter than audio, loop or find more clips
- **BPM > 140**: Consider half-time editing for drop-songs
- **Transitions**: Default is cut-only (beat-sync), add dissolves for chorus sections
- **Mood input**: If user specifies mood (e.g., "sad, rainy, nostalgic"), prioritize that over automatic analysis

## Troubleshooting

- **No beats detected**: Audio may be recorded poorly; try --spectral mode
- **Clip too short**: Auto-loop small clips up to 3x original length
- **Aspect ratio mismatch**: Automatically crop/pad to 16:9 or 9:16 for reels

Related Skills

workflow-automation

from aAAaqwq/AGI-Super-Team

Workflow automation is the infrastructure that makes AI agents reliable. Without durable execution, a network hiccup during a 10-step payment flow means lost money and angry customers. With it, workflows resume exactly where they left off. This skill covers the platforms (n8n, Temporal, Inngest) and patterns (sequential, parallel, orchestrator-worker) that turn brittle scripts into production-grade automation. Key insight: The platforms make different tradeoffs. n8n optimizes for accessibility

tdd-workflow

from aAAaqwq/AGI-Super-Team

Use this skill when writing new features, fixing bugs, or refactoring code. Enforces test-driven development with 80%+ coverage including unit, integration, and E2E tests.

orchestration-workflow

from aAAaqwq/AGI-Super-Team

> Agent 编排工作流——多 Agent 任务分发与协调

lyrics-video-sync

from aAAaqwq/AGI-Super-Team

歌词-视频精准匹配引擎。将MP3中的歌词片段按时间轴精准匹配到对应视频clip，支持歌词提取、时间轴对齐、情绪映射、字幕烧录、音频分段混合。触发词：歌词匹配、lyrics sync、歌词卡点、音频对齐、字幕视频、lyrics video、歌词视频、音乐视频字幕

git-workflow

from aAAaqwq/AGI-Super-Team

Git workflow: branches, PR, merge, cleanup

wemp-operator

from aAAaqwq/AGI-Super-Team

> 微信公众号全功能运营——草稿/发布/评论/用户/素材/群发/统计/菜单/二维码 API 封装

Content & Documentation

zsxq-smart-publish

from aAAaqwq/AGI-Super-Team

Publish and manage content on 知识星球 (zsxq.com). Supports talk posts, Q&A, long articles, file sharing, digest/bookmark, homework tasks, and tag management. Use when publishing content to 知识星球, creating/editing posts, uploading files/images/audio, managing digests, batch publishing, or formatting content for 知识星球.

zoom-automation

from aAAaqwq/AGI-Super-Team

Automate Zoom meeting creation, management, recordings, webinars, and participant tracking via Rube MCP (Composio). Always search tools first for current schemas.

zoho-crm-automation

from aAAaqwq/AGI-Super-Team

Automate Zoho CRM tasks via Rube MCP (Composio): create/update records, search contacts, manage leads, and convert leads. Always search tools first for current schemas.

ziliu-publisher

from aAAaqwq/AGI-Super-Team

字流(Ziliu) - AI驱动的多平台内容分发工具。用于一次创作、智能适配排版、一键分发到16+平台（公众号/知乎/小红书/B站/抖音/微博/X等）。当用户需要多平台发布、内容排版、格式适配时使用。触发词：字流、ziliu、多平台发布、一键分发、内容分发、排版发布。

zhihu-post-skill

from aAAaqwq/AGI-Super-Team

> 知乎文章发布——知乎平台内容创作与发布自动化

zendesk-automation

from aAAaqwq/AGI-Super-Team

Automate Zendesk tasks via Rube MCP (Composio): tickets, users, organizations, replies. Always search tools first for current schemas.