Provenance Tracking

W3C PROV-O patterns for tracking media derivation chains and production history

104 stars

Best use case

Provenance Tracking is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

W3C PROV-O patterns for tracking media derivation chains and production history

Teams using Provenance Tracking should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/provenance-tracking/SKILL.md --create-dirs "https://raw.githubusercontent.com/jmagly/aiwg/main/agentic/code/frameworks/media-curator/skills/provenance-tracking/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/provenance-tracking/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How Provenance Tracking Compares

Feature / AgentProvenance TrackingStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

W3C PROV-O patterns for tracking media derivation chains and production history

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Provenance Tracking

W3C PROV-compliant provenance tracking for media archives using the Entity-Activity-Agent model in JSON-LD format.

## PROV Model for Media

### Entity (prov:Entity)
A media file — the thing being tracked.

### Activity (prov:Activity)
A transformation — download, transcode, tag, extract audio.

### Agent (prov:Agent)
A tool or person — yt-dlp, ffmpeg, opustags, the curator.

## Derivation Chains

Media files often go through multiple transformations:

```
YouTube video (Entity)
  → downloaded by yt-dlp (Activity, Agent: yt-dlp)
  → raw .mkv file (Entity)
    → audio extracted by ffmpeg (Activity, Agent: ffmpeg)
    → raw .opus file (Entity)
      → tagged by opustags (Activity, Agent: opustags)
      → final .opus file (Entity)
```

Each step is recorded with timestamps, tool versions, and parameters.

## PROVENANCE.jsonld Template

```json
{
  "@context": {
    "prov": "http://www.w3.org/ns/prov#",
    "schema": "http://schema.org/",
    "premis": "http://www.loc.gov/premis/rdf/v3/",
    "dc": "http://purl.org/dc/terms/",
    "xsd": "http://www.w3.org/2001/XMLSchema#"
  },
  "@graph": [
    {
      "@id": "urn:archive:entity:collection",
      "@type": "prov:Collection",
      "dc:title": "Artist Name - Complete Collection",
      "prov:generatedAtTime": "2026-01-26T01:33:51.302755851Z",
      "schema:numberOfItems": 1109,
      "schema:size": "94GB",
      "premis:hasFixity": {
        "@type": "premis:Fixity",
        "premis:messageDigestAlgorithm": {
          "@id": "http://id.loc.gov/vocabulary/preservation/cryptographicHashFunctions/sha256"
        },
        "premis:messageDigestOriginator": "sha256sum (GNU coreutils)",
        "schema:url": "CHECKSUMS.sha256",
        "schema:numberOfItems": 1109
      }
    },
    {
      "@id": "urn:archive:activity:acquisition",
      "@type": "prov:Activity",
      "prov:startedAtTime": "2026-01-24T15:00:00.000000000Z",
      "prov:endedAtTime": "2026-01-26T01:33:51.302755851Z",
      "prov:wasAssociatedWith": [
        {"@id": "urn:archive:agent:yt-dlp"},
        {"@id": "urn:archive:agent:ffmpeg"},
        {"@id": "urn:archive:agent:opustags"}
      ],
      "prov:generated": {"@id": "urn:archive:entity:collection"}
    },
    {
      "@id": "urn:archive:agent:yt-dlp",
      "@type": ["prov:SoftwareAgent", "prov:Agent"],
      "schema:name": "yt-dlp",
      "schema:softwareVersion": "2024.12.23",
      "schema:description": "Video/audio downloader"
    },
    {
      "@id": "urn:archive:agent:ffmpeg",
      "@type": ["prov:SoftwareAgent", "prov:Agent"],
      "schema:name": "ffmpeg",
      "schema:softwareVersion": "7.1",
      "schema:description": "Audio/video transcoder"
    },
    {
      "@id": "urn:archive:agent:opustags",
      "@type": ["prov:SoftwareAgent", "prov:Agent"],
      "schema:name": "opustags",
      "schema:softwareVersion": "1.9.0",
      "schema:description": "Opus metadata tagger"
    }
  ]
}
```

## Per-File Provenance

For detailed tracking, record provenance per file:

```json
{
  "@id": "urn:archive:entity:car-radio-opus",
  "@type": "prov:Entity",
  "dc:title": "Car Radio",
  "schema:encodingFormat": "audio/opus",
  "prov:wasDerivedFrom": {
    "@id": "urn:archive:entity:car-radio-mkv"
  },
  "prov:wasGeneratedBy": {
    "@id": "urn:archive:activity:extract-audio-001",
    "@type": "prov:Activity",
    "prov:used": {"@id": "urn:archive:entity:car-radio-mkv"},
    "prov:wasAssociatedWith": {"@id": "urn:archive:agent:ffmpeg"},
    "prov:startedAtTime": "2026-01-25T10:15:30.000000000Z",
    "schema:description": "ffmpeg -i car-radio.mkv -vn -acodec libopus -b:a 128k car-radio.opus"
  }
}
```

## Timestamp Standard

All timestamps MUST use ISO 8601 UTC with nanosecond precision:

```
Format: YYYY-MM-DDTHH:MM:SS.NNNNNNNNNZ
Example: 2026-01-26T01:33:51.302755851Z
Command: date -u +%Y-%m-%dT%H:%M:%S.%NZ
```

- Always UTC (Z suffix), never local timezone
- 9 decimal places (nanosecond precision)

## Generation Commands

### Get Tool Versions
```bash
yt-dlp --version
ffmpeg -version | head -1 | awk '{print $3}'
opustags --version 2>&1 | head -1
```

### Generate Timestamp
```bash
date -u +%Y-%m-%dT%H:%M:%S.%NZ
```

### Count Files
```bash
find . -type f ! -name "CHECKSUMS.sha256" ! -name "PROVENANCE.jsonld" | wc -l
```

### Calculate Total Size
```bash
du -sh . | cut -f1
```

## Standards Reference

| Standard | Version | Usage |
|----------|---------|-------|
| W3C PROV-O | 2013-04-30 | Provenance ontology |
| PREMIS | 3.0 | Preservation fixity metadata |
| Dublin Core | 1.1 | Descriptive metadata |
| Schema.org | Latest | Structured data |
| JSON-LD | 1.1 | Serialization format |
| ISO 8601 | 2019 | Timestamps |

## References

- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/integrity-verification/SKILL.md — Integrity verification that generates fixity records tracked by provenance
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/acquire/SKILL.md — Acquisition skill whose download events are recorded as provenance activities
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/audio-extraction/SKILL.md — Audio extraction that produces derivation relationships tracked in provenance
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/verify-archive/SKILL.md — Archive verification skill that can generate provenance records via --provenance flag

Related Skills

research-provenance

104
from jmagly/aiwg

Query provenance chains and artifact relationships

Codex

provenance-validate

104
from jmagly/aiwg

Validate provenance records and chains for completeness and consistency

Codex

provenance-report

104
from jmagly/aiwg

Generate provenance coverage dashboard and statistics

Codex

provenance-query

104
from jmagly/aiwg

Query provenance chains to trace artifact derivation and impact

Codex

provenance-create

104
from jmagly/aiwg

Create a W3C PROV-compliant provenance record for an artifact

Codex

auto-provenance

104
from jmagly/aiwg

Generate W3C PROV-compliant provenance records automatically when agents create or modify artifacts

Codex

aiwg-orchestrate

104
from jmagly/aiwg

Route structured artifact work to AIWG workflows via MCP with zero parent context cost

venv-manager

104
from jmagly/aiwg

Create, manage, and validate Python virtual environments. Use for project isolation and dependency management.

pytest-runner

104
from jmagly/aiwg

Execute Python tests with pytest, supporting fixtures, markers, coverage, and parallel execution. Use for Python test automation.

vitest-runner

104
from jmagly/aiwg

Execute JavaScript/TypeScript tests with Vitest, supporting coverage, watch mode, and parallel execution. Use for JS/TS test automation.

eslint-checker

104
from jmagly/aiwg

Run ESLint for JavaScript/TypeScript code quality and style enforcement. Use for static analysis and auto-fixing.

repo-analyzer

104
from jmagly/aiwg

Analyze GitHub repositories for structure, documentation, dependencies, and contribution patterns. Use for codebase understanding and health assessment.