music-analysis

Analyze music/audio files locally without external APIs. Extract tempo, pocket/groove feel, pulse stability, swing proxy, section/repetition structure, key clarity, harmonic tension, timbre descriptors, temporal mood-energy journeys, and lyric-aware emotional reads where real Whisper lyrics can override the vibe when the words are clearly darker, warmer, or more intense than the arrangement alone suggests. Use when asked to 'listen to this', 'hear the music', audit tracks, compare mixes, inspect structure, or generate producer-facing notes from audio files.

3,891 stars

Best use case

music-analysis is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Analyze music/audio files locally without external APIs. Extract tempo, pocket/groove feel, pulse stability, swing proxy, section/repetition structure, key clarity, harmonic tension, timbre descriptors, temporal mood-energy journeys, and lyric-aware emotional reads where real Whisper lyrics can override the vibe when the words are clearly darker, warmer, or more intense than the arrangement alone suggests. Use when asked to 'listen to this', 'hear the music', audit tracks, compare mixes, inspect structure, or generate producer-facing notes from audio files.

Teams using music-analysis should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/music-analysis/SKILL.md --create-dirs "https://raw.githubusercontent.com/openclaw/skills/main/skills/adam-researchh/music-analysis/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/music-analysis/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How music-analysis Compares

Feature / Agentmusic-analysisStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Analyze music/audio files locally without external APIs. Extract tempo, pocket/groove feel, pulse stability, swing proxy, section/repetition structure, key clarity, harmonic tension, timbre descriptors, temporal mood-energy journeys, and lyric-aware emotional reads where real Whisper lyrics can override the vibe when the words are clearly darker, warmer, or more intense than the arrangement alone suggests. Use when asked to 'listen to this', 'hear the music', audit tracks, compare mixes, inspect structure, or generate producer-facing notes from audio files.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Music Analysis (Local, No External APIs)

Primary tool: a **full listen** that combines snapshot analysis, structure, groove, harmonic tension, temporal mood mapping, and optional Whisper lyric alignment into one report.

## 1. Full Listen — primary / recommended

```bash
python3 skills/music-analysis/scripts/listen.py /path/to/audio.mp3
python3 skills/music-analysis/scripts/listen.py track.mp3 --json
python3 skills/music-analysis/scripts/listen.py track.mp3 --out report.txt
python3 skills/music-analysis/scripts/listen.py track.mp3 --json --out report.json
```

**What it does in one pass:**
1. Snapshot analysis: tempo, pulse stability, swing proxy, key clarity, harmonic tension, timbre, structure
2. Whisper lyric transcription and filtering first — keep only real lyric text, drop artifact tags like `[MUSIC]`
3. Temporal listen: windowed energy / mood / tension journey
4. Synthesis layer that aligns lyrics with peak / tension / quiet windows and lets the lyric layer override the final vibe when confidence is high

### Human-readable output structure

- **SNAPSHOT**
  - groove/pocket
  - structure summary + repeated sections
  - harmony (key clarity + tension)
  - timbre descriptor tags
- **INSTRUMENT READ**
  - likely instrument palette (strong/likely/possible confidence)
  - per-section instrument entrances and exits
  - how instruments color the emotional feel
  - written as natural language, not clinical data
- **TEMPORAL JOURNEY**
  - opening / middle / closing mood-energy-tension read
  - peak / quietest / tensest moments
  - mood journey and transition count
- **EMOTIONAL READ**
  - explainable emotion summary based on measured features
- **LYRICS**
  - Whisper segment count
  - excerpt or graceful skip note
- **SYNTHESIS**
  - lyric-energy/tension alignment
  - peak / tension / quiet lyric moments
- **ALIGNED TIMELINE**
  - per-window moments where transitions / lyrics / tension spikes occur

## 2. Snapshot Analysis — standalone

```bash
python3 skills/music-analysis/scripts/analyze_music.py /path/to/audio.mp3
python3 skills/music-analysis/scripts/analyze_music.py track.mp3 --json
```

Reports:
- tempo / pulse stability / pulse confidence / swing proxy / pocket
- key estimate / key clarity / chroma entropy / harmonic change / tonal motion / tension
- timbre descriptors (brightness, richness, low-end, contrast, dynamic range)
- section labels (A/B/C...) and repeated material detection
- explainable emotional read with reasons

## 3. Temporal Listen — standalone

```bash
python3 skills/music-analysis/scripts/temporal_listen.py /path/to/audio.mp3
python3 skills/music-analysis/scripts/temporal_listen.py track.mp3 --json
```

Reports:
- sliding-window timeline (4s windows, 2s hops)
- energy contour
- mood labels
- harmonic tension + tonal motion
- transition types (drop hits, pulls back, tightens harmonically, shifts color, evolves)
- narrative arc (mountain / ascending / descending / plateau / wave)

## Interpretation rules

- **Structure labels are similarity labels**, not verse/chorus claims.
- **Swing proxy is a feel estimate**, not drummer-grade microtiming truth.
- **Emotion is explainable**, derived from pulse + timbre + harmonic tension rather than a black-box mood guess.
- **Lyrics can override the final vibe** when filtered Whisper text is confident and emotionally clear.

## Audio sourcing

The tool needs a real audio file on disk.
- Direct file (mp3, wav, flac, ogg, m4a — anything ffmpeg/librosa can read)
- YouTube / supported URLs: `yt-dlp -x --audio-format mp3 -o "output.mp3" "URL_OR_SEARCH"`

## Whisper lyrics transcription

`listen.py` uses:
- CLI: `/opt/homebrew/bin/whisper-cli`
- Model: `~/.local/share/whisper-cpp/ggml-large-v3-turbo.bin`
- Preprocess: convert input to mono 16kHz WAV via ffmpeg
- Fallback: skip gracefully if Whisper is missing or errors

## Dependencies

Python:
- librosa
- numpy

System:
- ffmpeg
- ffprobe

## Workspace hygiene

- Keep temporary audio files in a dedicated temp/output folder for the skill.
- Avoid modifying unrelated project files while working on audio analysis tasks.

Related Skills

Margin Analysis & Profit Optimization

3891
from openclaw/skills

Analyze gross, operating, and net margins by product line, customer segment, and channel. Identify margin erosion patterns and build pricing power.

Business Analysis

Investment Analysis & Portfolio Management Engine

3891
from openclaw/skills

Complete investment analysis, portfolio construction, risk management, and trade execution methodology. Works across stocks, crypto, ETFs, bonds, and alternatives. Zero dependencies — pure agent skill.

Finance & Investing

FP&A Command Center — Financial Planning & Analysis Engine

3891
from openclaw/skills

You are a senior FP&A professional. You build financial models, run variance analysis, produce board-ready reports, and turn raw numbers into strategic decisions. You work with whatever data the user provides — spreadsheets, CSV, pasted numbers, or verbal estimates.

Finance & Analytics

data-analysis-partner

3891
from openclaw/skills

智能数据分析 Skill,输入 CSV/Excel 文件和分析需求,输出带交互式 ECharts 图表的 HTML 自包含分析报告

Data & Research

onchain-contract-token-analysis

3891
from openclaw/skills

Analyze smart contracts, token mechanics, permissions, fee flows, upgradeability, market risks, and likely attack surfaces for onchain projects. Use when reviewing ERC-20s, launchpads, vaults, staking systems, LP fee routing, ownership controls, proxy setups, or suspicious token behavior.

Security

resume-analysis

3891
from openclaw/skills

简历分析 skill。用于诊断整份简历的完整性、清晰度、岗位相关性、成果表达和结构质量。当用户说“分析简历”“看看我的简历”“简历诊断”时使用。

Workflow & Productivity

contradiction-analysis

3891
from openclaw/skills

触发:当问题复杂、存在多个冲突因素、优先级不清,或你不知道应该先解决什么时调用;常见信号包括 trade-off、瓶颈、根因不明、主次不清、多个问题互相牵制。 English: Trigger when a problem contains competing forces, unclear priorities, or no obvious entry point. Use this skill to identify contradictions, isolate the principal contradiction, classify its nature, and choose the right response.

survey-analysis

3891
from openclaw/skills

AI-powered survey response analysis. Analyzes open-ended survey responses, clusters themes, detects sentiment, and generates actionable insights. Uses BERTopic + GPT-4o-mini.

ths-advanced-analysis

3891
from openclaw/skills

基于 thsdk 进行高级股票分析:分钟K线(1m/5m/15m/30m/60m/120m)、板块/指数行情(主要指数/申万行业/概念板块成分股)、多股票批量对比(表格+归一化走势图+相关性热力图)、盘口深度、大单流向、集合竞价异动、日内分时、历史分时。当用户提到"分钟K线"、"日内走势"、"盘口"、"大单"、"竞价异动"、"板块行情"、"行业排名"、"概念板块"、"成分股"、"对比多只股票"、"批量分析"、"涨幅对比"、"相关性"、"港股"、"美股"、"外汇"、"期货"、"资讯"、"快讯",或者需要同时查看2只以上股票、关注短线交易、量化研究时,必须使用此skill。

ohyesai-music

3891
from openclaw/skills

Generate custom music tracks (vocal or instrumental) via OhYesAI asynchronously.

ad-creative-analysis

3891
from openclaw/skills

Analyze ad creatives (images and videos) extracted from competitor research. Use when given a directory of ad images, video files, or transcripts to evaluate ad quality, score visual and messaging effectiveness, assign a scale score for viral/engagement potential, and generate a cross-creative pattern summary. Triggered by requests like "analyze these ads", "score these creatives", "what hooks are competitors using", "evaluate the ad library", "give me a scale score", "analyze the ad folder", or "what's working in these ads".

🎵 Play Music Skill

3891
from openclaw/skills

**Controlled music player with pause/resume/stop support**