Codex

check-completeness

Analyze collection completeness against canonical discography and generate prioritized gap report

104 stars

Best use case

check-completeness is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

It is a strong fit for teams already working in Codex.

Analyze collection completeness against canonical discography and generate prioritized gap report

Teams using check-completeness should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/check-completeness/SKILL.md --create-dirs "https://raw.githubusercontent.com/jmagly/aiwg/main/.agents/skills/check-completeness/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/check-completeness/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How check-completeness Compares

Feature / Agentcheck-completenessStandard Approach
Platform SupportCodexLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Analyze collection completeness against canonical discography and generate prioritized gap report

Which AI agents support this skill?

This skill is designed for Codex.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# check-completeness

Analyze collection completeness by comparing actual inventory against canonical discography, identify gaps, score coverage, and generate prioritized acquisition plans.

## Purpose

Answer collection completeness questions:
- "How complete is my [artist] collection?"
- "What am I missing?"
- "What should I acquire next?"
- "What legendary performances am I missing?"
- "Where should I upgrade quality?"

Produces comprehensive gap reports with actionable acquisition strategies prioritized by importance, availability, and cost.

## Parameters

### Required

**`<artist>`** - Artist name (quoted if contains spaces)
- Matched against collection directory structure
- Used to fetch canonical discography from MusicBrainz
- Example: `"Pink Floyd"` or `"The Beatles"`

### Optional

**`--collection <path>`** - Collection root directory
- Default: Current working directory
- Must contain artist subdirectories
- Example: `/mnt/media/Music`

**`--gaps-only`** - Show only missing items, skip completeness scoring
- Faster execution when you just need gap list
- Useful for quick "what's missing?" queries

**`--priority <level>`** - Filter gaps by priority level
- Values: `high`, `medium`, `low`, `all` (default: `all`)
- Example: `--priority high` shows only critical gaps

**`--output <file>`** - Write report to file instead of stdout
- Format determined by extension: `.yaml`, `.json`, `.md`
- Default: stdout (terminal display)
- Example: `--output gaps-report.yaml`

**`--canonical-source <source>`** - Canonical discography source
- Values: `musicbrainz` (default), `discogs`, `allmusic`
- Falls back to secondary sources if primary fails
- Example: `--canonical-source discogs`

**`--include-legendary`** - Include legendary/bootleg content in analysis
- Default: Official releases only
- Adds section for high-priority unofficial recordings
- Requires legendary catalog file or web search

**`--quality-threshold <level>`** - Minimum quality for "complete" status
- Values: `lossless`, `320kbps`, `256kbps`, `any` (default: `any`)
- Example: `--quality-threshold lossless` treats 320kbps as needing upgrade

**`--create-gap-notes`** - Generate GAP-NOTE.md files for missing items
- Creates placeholder files in artist directory structure
- Each contains acquisition strategy and priority
- Enables parallel gap-fill workflow

**`--incremental`** - Incremental update mode
- Loads previous state file if exists
- Only re-fetches canonical if >7 days old
- Faster for routine monitoring

**`--cost-estimate`** - Include cost estimates in report
- Estimates based on typical pricing for format/availability
- Adds budget planning section to output
- Example range: "$15-25 for used CD"

## Workflow

### 1. Load Collection Inventory

```bash
# Use existing scan results if available
SCAN_FILE=".media-curator/scans/${ARTIST_SLUG}/scan-latest.json"

if [[ -f "$SCAN_FILE" ]]; then
  INVENTORY=$(cat "$SCAN_FILE")
else
  # Trigger fresh scan
  /scan-collection "$ARTIST" --format json --output "$SCAN_FILE"
  INVENTORY=$(cat "$SCAN_FILE")
fi
```

Inventory provides:
- All albums/releases present in collection
- Track counts and file formats
- Quality levels (lossless/lossy bitrates)
- File sizes and directory structure

### 2. Fetch Canonical Discography

```bash
# Check for cached canonical data
CANONICAL_FILE=".media-curator/completeness/${ARTIST_SLUG}/canonical.yaml"
CACHE_AGE_DAYS=$((( $(date +%s) - $(stat -c %Y "$CANONICAL_FILE" 2>/dev/null || echo 0) ) / 86400))

if [[ $CACHE_AGE_DAYS -gt 7 ]] || [[ ! -f "$CANONICAL_FILE" ]]; then
  # Fetch fresh canonical discography
  # MusicBrainz API example:
  MBID=$(musicbrainz-lookup "$ARTIST" --type artist --format json | jq -r '.artists[0].id')
  RELEASES=$(musicbrainz-releases "$MBID" --format json)

  # Parse into canonical.yaml structure
  echo "$RELEASES" | parse-canonical > "$CANONICAL_FILE"
fi
```

Canonical discography includes:
- Studio albums with track counts and years
- EPs and singles with A/B sides
- Official live albums
- Compilations (filtered by novelty)
- Official box sets

**MusicBrainz advantages:**
- Comprehensive and accurate
- Includes release dates and track counts
- Distinguishes editions (original, remaster, deluxe)
- Free API with rate limiting

**Discogs alternative:**
- More complete for rare/indie releases
- Includes pressing details and variants
- Better for vinyl collectors
- Requires API key

### 3. Build Extended Content Catalog

If `--include-legendary` flag set:

```bash
# Load legendary performances catalog
LEGENDARY_FILE=".media-curator/completeness/${ARTIST_SLUG}/legendary.yaml"

if [[ ! -f "$LEGENDARY_FILE" ]]; then
  # Web search for "artist legendary performances bootleg"
  # Or load from community-maintained catalog
  # Or use private tracker tags (if integrated)

  fetch-legendary-catalog "$ARTIST" > "$LEGENDARY_FILE"
fi
```

Extended catalog includes:
- Notable covers (tribute albums, sessions)
- Soundtrack contributions
- Collaboration tracks (appears on other artists' albums)
- Legendary live performances (soundboard, historical significance)
- Unreleased demos and outtakes

### 4. Compare Inventory to Canonical

```yaml
# Comparison logic pseudocode
for release in canonical_releases:
  match = find_in_inventory(release.title, release.year)

  if match:
    if match.track_count == release.track_count:
      release.status = "complete"
    else:
      release.status = "partial"
      release.missing_tracks = release.track_count - match.track_count

    release.quality = match.quality
    release.source = match.source
  else:
    release.status = "missing"
```

**Fuzzy matching rules:**
- Tolerate minor title differences ("The Album" vs "Album, The")
- Match by year if title ambiguous
- Prefer exact track count match over year match
- Flag multiple matches as potential duplicates

### 5. Score Completeness

```python
# Weighted scoring formula
weights = {
  'studio_albums': 0.40,
  'eps': 0.15,
  'singles': 0.15,
  'live_albums': 0.10,
  'notable_covers': 0.10,
  'legendary': 0.10
}

component_scores = {}
for category, weight in weights.items():
  total = len(canonical[category])
  complete = sum(1 for item in canonical[category] if item.status == 'complete')
  partial = sum(0.5 for item in canonical[category] if item.status == 'partial')

  component_scores[category] = ((complete + partial) / total * 100) if total > 0 else 100

overall_score = sum(component_scores[cat] * weights[cat] for cat in weights)

# Quality bonus
lossless_pct = (lossless_count / total_count) * 100
if lossless_pct > 80:
  overall_score += 5
```

### 6. Identify and Prioritize Gaps

```yaml
# Gap prioritization logic
for release in canonical_releases:
  if release.status == "missing" or release.status == "partial":
    priority = calculate_priority(release)
    availability = check_availability(release)
    cost = estimate_cost(release, availability)

    gaps.append({
      'item': release.title,
      'type': release.type,
      'year': release.year,
      'priority': priority,
      'availability': availability,
      'estimated_cost': cost,
      'reason': priority_reason(release)
    })

gaps.sort(key=lambda x: (priority_order[x['priority']], x['year']))
```

**Priority calculation:**

```python
def calculate_priority(release):
  score = 0

  # Type importance
  if release.type == 'studio_album':
    score += 40
  elif release.type in ['ep', 'single']:
    score += 15
  elif release.type == 'live_album':
    score += 10

  # Critical acclaim (if available)
  if release.rating and release.rating > 8.0:
    score += 20

  # Classic period (artist-specific)
  if release.year in artist_classic_years:
    score += 15

  # Completes a set
  if is_part_of_trilogy(release) and have_other_parts(release):
    score += 10

  # Historical significance
  if release.legendary or release.historically_significant:
    score += 25

  # Assign tier
  if score >= 60:
    return "HIGH"
  elif score >= 30:
    return "MEDIUM"
  else:
    return "LOW"
```

### 7. Check Availability

```python
def check_availability(release):
  sources = []

  # Streaming services (easy, legal, lossy)
  if available_on_streaming(release):
    sources.append({
      'type': 'streaming',
      'services': ['Apple Music', 'Spotify', 'Tidal'],
      'quality': '256kbps AAC',
      'cost': 'subscription',
      'difficulty': 'easy'
    })

  # Digital purchase (legal, often lossless)
  if available_on_bandcamp(release):
    sources.append({
      'type': 'digital_purchase',
      'platform': 'Bandcamp',
      'quality': 'FLAC / 320kbps MP3',
      'cost': '$7-12',
      'difficulty': 'easy'
    })

  # Physical (CD/vinyl)
  discogs_listings = check_discogs(release)
  if discogs_listings:
    sources.append({
      'type': 'physical',
      'format': 'CD',
      'availability': f"{len(discogs_listings)} listings",
      'price_range': f"${min(discogs_listings)} - ${max(discogs_listings)}",
      'difficulty': 'moderate'
    })

  # Private trackers (torrents, requires ratio)
  if user_has_tracker_access():
    tracker_results = search_trackers(release)
    if tracker_results:
      sources.append({
        'type': 'private_tracker',
        'sites': tracker_results.sites,
        'quality': 'FLAC',
        'cost': 'ratio',
        'difficulty': 'moderate' if tracker_results.seeded else 'hard'
      })

  return sources
```

### 8. Generate Report

Output format depends on `--output` extension or defaults to rich terminal display:

**Terminal output (default):**

```
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Pink Floyd Collection Completeness Report
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📊 Overall Score: 92.3% (Excellent Collection)

🎵 Canonical Releases: 94.1%
  ✅ Studio Albums: 16/17 (94.1%)
  ✅ EPs: 4/4 (100%)
  ✅ Singles: 24/30 (80.0%)
  ✅ Live Albums: 12/15 (80.0%)

🎸 Extended Content: 78.5%
  ✅ Notable Covers: 8/12 (66.7%)
  ✅ Soundtracks: 3/5 (60.0%)
  ⚠️  Legendary: 12/47 (25.5%)

💿 Quality Distribution:
  🟢 Lossless (FLAC): 87% (298 files)
  🟡 High (320kbps): 11% (38 files)
  🔴 Medium (256kbps): 2% (6 files)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Top 5 Acquisition Priorities
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

🔴 HIGH: The Endless River (2014)
   Missing: Complete studio album
   Reason: Only missing album from official discography
   Sources:
     • Apple Music (stream rip, 256kbps) - FREE with subscription
     • Bandcamp (FLAC) - $9.99
     • Discogs (used CD) - $12-18
   Recommended: Bandcamp FLAC purchase

🔴 HIGH: Live at Pompeii (1972) - Soundboard
   Missing: Legendary performance
   Reason: Exceptional quality, historically significant
   Sources:
     • Private tracker (FLAC soundboard) - Ratio cost
     • YouTube rip (256kbps, unofficial) - FREE but low quality
   Recommended: Private tracker if available

🟡 MEDIUM: Obscured by Clouds B-sides
   Missing: 3 tracks from singles
   Reason: Completes album era, unique content
   Sources:
     • Compilation "Works" (1983) - Discogs $15-20
     • Individual singles (rare) - eBay ~$30 each
   Recommended: Works compilation

🟡 MEDIUM: The Wall (24bit/96kHz remaster)
   Upgrade: Current 320kbps → Lossless remaster
   Reason: Significant audio improvement, bonus tracks
   Sources:
     • HDtracks (24bit FLAC) - $19.99
     • Qobuz (24bit streaming) - Subscription
   Recommended: Wait for HDtracks sale (<$15)

🟢 LOW: Relics (1971 compilation)
   Missing: Compilation of earlier singles
   Reason: Low priority, all tracks owned on original albums
   Sources:
     • Streaming (256kbps) - FREE
     • CD (out of print) - eBay $20-30
   Recommended: Stream rip if desired for convenience

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Acquisition Plan Summary
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Next 5 acquisitions:
  1. The Endless River - Bandcamp FLAC ($9.99)
  2. Live at Pompeii - Private tracker (ratio)
  3. Obscured by Clouds B-sides - Works CD ($18)
  4. The Wall remaster - Wait for sale (~$15)
  5. (Optional) Relics - Stream rip (FREE)

💰 Estimated cost: $43-60 (excluding sale wait items)
⏱️  Estimated time: 1-2 weeks
🎯 Next score: ~96.5% (Completionist tier)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Report saved to: .media-curator/completeness/pink-floyd/gap-report-2026-02-14.yaml
GAP-NOTE files created: 5 items
```

**YAML output (--output report.yaml):**

Uses the full gap report structure defined in the Completeness Tracker agent documentation.

**Markdown output (--output report.md):**

Human-readable report with sections, tables, and acquisition instructions.

### 9. Create GAP-NOTE Files (if --create-gap-notes)

```bash
for gap in high_priority_gaps:
  NOTE_PATH="${COLLECTION}/${ARTIST}/${gap.year} - ${gap.title}/GAP-NOTE.md"
  mkdir -p "$(dirname "$NOTE_PATH")"

  cat > "$NOTE_PATH" << EOF
# GAP-NOTE: ${gap.title} (${gap.year})

## Expected Content
- ${gap.type}: ${gap.track_count} tracks
- ${gap.description}
- Rating: ${gap.rating}/10

## Acquisition Strategy
${format_acquisition_steps(gap.sources)}

## Priority
**${gap.priority}** - ${gap.reason}

## Sources
${format_source_links(gap)}
- Last searched: Never
EOF
done
```

## Examples

### Example 1: Basic completeness check

```bash
/check-completeness "Pink Floyd" --collection /mnt/media/Music
```

Output: Full terminal report with scores and top 5 priorities

### Example 2: Just show high-priority gaps

```bash
/check-completeness "The Beatles" --gaps-only --priority high
```

Output:
```
High-Priority Gaps for The Beatles:

1. Let It Be Sessions (1970) - Legendary unreleased sessions
2. Live at Hollywood Bowl (original 1977) - Out of print, reissue available
3. Anthology outtakes (various) - 12 tracks not on official release
```

### Example 3: Create acquisition workflow

```bash
/check-completeness "Led Zeppelin" \
  --include-legendary \
  --create-gap-notes \
  --output gaps-report.yaml
```

Creates:
- `gaps-report.yaml` - Full structured report
- GAP-NOTE.md files in each missing album directory
- Updated completeness state file

### Example 4: Incremental monitoring

```bash
# Weekly cron job
/check-completeness "Grateful Dead" \
  --incremental \
  --quality-threshold lossless \
  --output /reports/weekly-$(date +%F).yaml
```

Fast execution, only re-fetches canonical if stale.

## Integration with Other Commands

**Before completeness check:**
```bash
/scan-collection "Artist" --format json  # Generates inventory
```

**After completeness check:**
```bash
/acquisition-plan "Artist" --from-gaps gaps-report.yaml  # Plan downloads
/quality-audit "Artist" --upgrades-only  # Find quality improvement targets
```

**Parallel gap filling:**
```bash
/completeness-fill "Artist" --priority high --source streaming &
/completeness-fill "Artist" --priority high --source physical &
wait
/check-completeness "Artist" --incremental  # Re-check after acquisition
```

## Error Handling

**MusicBrainz API failure:**
- Fall back to Discogs API
- If both fail, use cached canonical (warn if >30 days old)
- Suggest manual canonical.yaml creation

**Artist not found:**
- Suggest fuzzy matches from collection directory names
- Offer to use directory name as canonical source

**No collection inventory:**
- Auto-trigger `/scan-collection` if not found
- Error if collection path invalid

## Performance

**Expected execution time:**
- Small collection (<500 files): 5-10 seconds
- Medium collection (500-2000 files): 15-30 seconds
- Large collection (>2000 files): 30-60 seconds
- Incremental mode: 50% faster (uses cached canonical)

**Rate limiting:**
- MusicBrainz: 1 request/second (built-in delay)
- Discogs: 60 requests/minute (with API key)
- Wait and retry on 429 responses

## Output Files

All output written to `.media-curator/completeness/{artist-slug}/`:

```
.media-curator/completeness/pink-floyd/
├── canonical.yaml           # Cached canonical discography
├── legendary.yaml           # Extended content catalog
├── state.json              # Completeness tracking state
├── gap-report-2026-02-14.yaml
├── gap-report-2026-01-15.yaml  # Historical reports
└── history.json            # Completeness score over time
```

## Success Criteria

- Completeness scores accurate within ±2%
- Zero false positives (claiming missing when present)
- Priority assignments align with collector consensus >90%
- Availability checks current within 7 days
- Report generation <60 seconds for any collection size

## References

- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/research-before-decision.md — Research canonical discography (MusicBrainz, Discogs) before determining completeness
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/analyze-artist/SKILL.md — Artist analysis that establishes the canonical discography used as completeness baseline
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/gap-documentation/SKILL.md — Gap documentation skill used to record identified missing items
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/find-sources/SKILL.md — Source discovery skill used to locate missing releases identified by completeness check

Related Skills

eslint-checker

104
from jmagly/aiwg

Run ESLint for JavaScript/TypeScript code quality and style enforcement. Use for static analysis and auto-fixing.

traceability-check

104
from jmagly/aiwg

Verify bidirectional traceability from requirements to code to tests and identify coverage gaps and orphan artifacts

Codex

regression-check

104
from jmagly/aiwg

Compare current behavior against baseline to detect regressions

Codex

quality-checker

104
from jmagly/aiwg

Validate skill quality, completeness, and adherence to standards. Use before packaging to ensure skill meets quality requirements.

Codex

project-health-check

104
from jmagly/aiwg

Analyze overall project health and metrics

Codex

link-check

104
from jmagly/aiwg

Verify @file references in AIWG skills and agents against the linking contract — per-file or corpus-wide, with optional auto-fix

Codex

flow-handoff-checklist

104
from jmagly/aiwg

Orchestrate handoff validation between SDLC phases and tracks (Discovery→Delivery, Delivery→Ops, phase transitions)

Codex

flow-gate-check

104
from jmagly/aiwg

Orchestrate SDLC phase gate validation with multi-agent review and comprehensive reporting

Codex

citation-check

104
from jmagly/aiwg

Check a file for citation quality and GRADE compliance

Codex

checkpoint

104
from jmagly/aiwg

Create, list, or recover mid-workflow checkpoints so interrupted work resumes from a known-good position

Codex

check-traceability

104
from jmagly/aiwg

Verify the full refinement chain from use cases through behavioral specs, pseudo-code specs, code, and tests — report coverage at each layer and identify gaps

Codex

aiwg-orchestrate

104
from jmagly/aiwg

Route structured artifact work to AIWG workflows via MCP with zero parent context cost