verify-archive
Verify archive integrity with self-verifying SHA-256 checksums, generate VERIFY.md, and optionally create W3C PROV provenance
Best use case
verify-archive is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
It is a strong fit for teams already working in Codex.
Verify archive integrity with self-verifying SHA-256 checksums, generate VERIFY.md, and optionally create W3C PROV provenance
Teams using verify-archive should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/verify-archive/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How verify-archive Compares
| Feature / Agent | verify-archive | Standard Approach |
|---|---|---|
| Platform Support | Codex | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Verify archive integrity with self-verifying SHA-256 checksums, generate VERIFY.md, and optionally create W3C PROV provenance
Which AI agents support this skill?
This skill is designed for Codex.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
Cursor vs Codex for AI Workflows
Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
AI Agents for Marketing
Discover AI agents for marketing workflows, from SEO and content production to campaign research, outreach, and analytics.
SKILL.md Source
# verify-archive
Detect bit rot, tampering, and transfer errors in media archives through cryptographic checksum verification. Maintain provenance records and fixity information for long-term preservation.
## Purpose
Media archives degrade over time through:
- **Bit rot**: Silent data corruption in storage media
- **Transfer errors**: Corruption during network transfer or backup
- **Tampering**: Unauthorized modifications
- **Media failure**: Gradual deterioration of physical storage
This command generates self-verifying checksum manifests, performs integrity verification, and optionally creates W3C PROV provenance records with PREMIS fixity metadata.
## Parameters
| Parameter | Required | Description |
|-----------|----------|-------------|
| `archive_path` | Yes | Path to media archive directory |
| `--generate` | No | Generate new CHECKSUMS.sha256 manifest |
| `--verify` | No | Verify existing manifest against files |
| `--provenance` | No | Generate PROVENANCE.jsonld with PREMIS fixity |
| `--fix` | No | Regenerate manifest after archive changes |
**Mutually exclusive modes**: Use `--generate` for initial setup, `--verify` for routine checks, `--fix` after making changes.
## CHECKSUMS.sha256 Format
Self-verifying manifest with cryptographic integrity protection:
```
# MANIFEST_HASH: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
# Generated: 2026-02-14T18:45:22.387654321Z
# Verify with: tail -n +4 CHECKSUMS.sha256 | sha256sum
1a2b3c4d... ./audio/episode-001.opus
5e6f7g8h... ./audio/episode-002.opus
9i0j1k2l... ./video/recording-2026-02-14.mp4
3m4n5o6p... ./playlists/favorites.m3u
7q8r9s0t... ./README.md
```
**Format specification**:
- **Line 1**: `# MANIFEST_HASH: <sha256>` - SHA-256 hash of manifest content (lines 4+)
- **Line 2**: `# Generated: <timestamp>` - ISO 8601 UTC timestamp with nanosecond precision
- **Line 3**: `# Verify with: tail -n +4 CHECKSUMS.sha256 | sha256sum` - Self-verification command
- **Lines 4+**: `<hash> <path>` - File checksums in sha256sum format
**Self-verification property**: Modifying any hash entry invalidates the manifest hash. The manifest is tamper-evident.
**Coverage**: ALL files in the archive are checksummed:
- Audio files (opus, mp3, flac, m4a, aac)
- Video files (mp4, mkv, webm, avi)
- Images (jpg, png, webp, svg)
- Text files (md, txt, srt, vtt)
- Playlists (m3u, m3u8, pls)
- Scripts and metadata (json, yaml, sh)
**Exclusions**: Only `CHECKSUMS.sha256` itself is excluded to prevent circular dependency.
## Verification Procedures
### Quick Verification (Manifest Integrity)
Verify manifest has not been tampered with (sub-second):
```bash
cd /path/to/archive
# Extract manifest hash from header
EXPECTED=$(grep '^# MANIFEST_HASH:' CHECKSUMS.sha256 | awk '{print $3}')
# Compute hash of manifest content (lines 4+)
ACTUAL=$(tail -n +4 CHECKSUMS.sha256 | sha256sum | awk '{print $1}')
# Compare
if [ "$EXPECTED" = "$ACTUAL" ]; then
echo "✓ Manifest integrity verified"
else
echo "✗ Manifest has been tampered with"
exit 1
fi
```
**Use case**: Daily automated checks to detect manifest corruption without verifying all files.
### Full Verification (All Files)
Verify all files match their checksums (minutes to hours depending on archive size):
```bash
cd /path/to/archive
# First verify manifest integrity
EXPECTED=$(grep '^# MANIFEST_HASH:' CHECKSUMS.sha256 | awk '{print $3}')
ACTUAL=$(tail -n +4 CHECKSUMS.sha256 | sha256sum | awk '{print $1}')
if [ "$EXPECTED" != "$ACTUAL" ]; then
echo "✗ Manifest integrity check failed - stopping"
exit 1
fi
echo "✓ Manifest integrity verified"
# Then verify all files
tail -n +4 CHECKSUMS.sha256 | sha256sum -c
```
**Use case**: Weekly/monthly deep verification to detect bit rot.
### Quiet Mode (Failures Only)
Show only files that failed verification:
```bash
cd /path/to/archive
tail -n +4 CHECKSUMS.sha256 | sha256sum -c --quiet
```
**Exit codes**:
- `0` - All files verified successfully
- `1` - One or more files failed verification
**Use case**: Cron jobs and automated monitoring.
## Timestamp Standard
All timestamps use ISO 8601 UTC with nanosecond precision:
**Format**: `YYYY-MM-DDTHH:MM:SS.NNNNNNNNNZ`
**Example**: `2026-02-14T18:45:22.387654321Z`
**Generation command**:
```bash
date -u +%Y-%m-%dT%H:%M:%S.%NZ
```
**Requirements**:
- Always UTC (trailing `Z`), never local timezone
- Nanosecond precision (9 decimal places)
- ISO 8601 compliant
- Monotonic (later timestamps are lexicographically greater)
## Workflow
### Initial Setup (--generate)
```bash
aiwg verify-archive /path/to/archive --generate
```
**Steps**:
1. Find all files recursively (excluding `CHECKSUMS.sha256`)
2. Compute SHA-256 hash for each file
3. Sort results by path (deterministic order)
4. Generate timestamp in ISO 8601 UTC format
5. Write checksums to temporary file
6. Compute SHA-256 of checksum content
7. Add self-verifying header with manifest hash
8. Write final `CHECKSUMS.sha256`
9. Generate `VERIFY.md` with human-readable instructions
**Bash implementation**:
```bash
ARCHIVE_PATH="/path/to/archive"
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%S.%NZ)
cd "$ARCHIVE_PATH"
# Generate checksums (exclude manifest itself)
find . -type f ! -name "CHECKSUMS.sha256" -print0 | \
sort -z | \
xargs -0 sha256sum > /tmp/checksums.tmp
# Compute manifest hash
MANIFEST_HASH=$(sha256sum /tmp/checksums.tmp | awk '{print $1}')
# Write final manifest with self-verifying header
{
echo "# MANIFEST_HASH: $MANIFEST_HASH"
echo "# Generated: $TIMESTAMP"
echo "# Verify with: tail -n +4 CHECKSUMS.sha256 | sha256sum"
cat /tmp/checksums.tmp
} > CHECKSUMS.sha256
rm /tmp/checksums.tmp
echo "✓ Generated CHECKSUMS.sha256 with $MANIFEST_HASH"
```
### Routine Verification (--verify)
```bash
aiwg verify-archive /path/to/archive --verify
```
**Steps**:
1. Read `MANIFEST_HASH` from header
2. Compute hash of manifest content (lines 4+)
3. Compare hashes (quick integrity check)
4. If manifest is intact, verify all files against checksums
5. Report any mismatches or missing files
### Regeneration After Changes (--fix)
```bash
aiwg verify-archive /path/to/archive --fix
```
**Use case**: After adding/removing/modifying files in the archive.
**Steps**:
1. Backup existing `CHECKSUMS.sha256` to `CHECKSUMS.sha256.bak`
2. Regenerate manifest (same as `--generate`)
3. Report changes: added files, removed files, modified files
4. Keep backup for comparison
**Example output**:
```
Backed up existing manifest to CHECKSUMS.sha256.bak
Regenerating checksums...
Changes detected:
Added: ./audio/episode-003.opus
Modified: ./README.md (hash changed)
Removed: ./audio/episode-001-draft.opus
✓ Generated new CHECKSUMS.sha256
```
## VERIFY.md Template
Human-readable verification instructions placed in archive root:
```markdown
# Archive Integrity Verification
This archive contains a self-verifying checksum manifest for detecting corruption, tampering, or transfer errors.
## Quick Verification (Manifest Integrity)
Verify the manifest has not been tampered with (sub-second):
\`\`\`bash
cd "$(dirname "$0")"
EXPECTED=$(grep '^# MANIFEST_HASH:' CHECKSUMS.sha256 | awk '{print $3}')
ACTUAL=$(tail -n +4 CHECKSUMS.sha256 | sha256sum | awk '{print $1}')
[ "$EXPECTED" = "$ACTUAL" ] && echo "✓ Manifest integrity verified" || echo "✗ Manifest tampered"
\`\`\`
## Full Verification (All Files)
Verify all files match their checksums:
\`\`\`bash
cd "$(dirname "$0")"
tail -n +4 CHECKSUMS.sha256 | sha256sum -c
\`\`\`
## Archive Information
- **Generated**: {TIMESTAMP}
- **File count**: {FILE_COUNT}
- **Total size**: {TOTAL_SIZE}
- **Manifest hash**: {MANIFEST_HASH}
## Recommended Verification Schedule
- **Daily**: Quick manifest integrity check
- **Weekly**: Full file verification
- **After transfer**: Full verification immediately after copying/downloading
- **Before backup**: Verify source integrity before creating backup
## If Verification Fails
1. **Manifest integrity failure**: Manifest has been tampered with or corrupted. Regenerate from source.
2. **File verification failure**: File has been corrupted or modified. Restore from backup or regenerate.
## Regeneration
After making changes to the archive, regenerate the manifest:
\`\`\`bash
aiwg verify-archive . --fix
\`\`\`
---
*Self-verifying checksums using SHA-256 (NIST FIPS 180-4)*
```
**Template variables**:
- `{TIMESTAMP}` - ISO 8601 generation timestamp
- `{FILE_COUNT}` - Number of files in manifest
- `{TOTAL_SIZE}` - Total archive size in human-readable format
- `{MANIFEST_HASH}` - SHA-256 hash of manifest content
## PROVENANCE.jsonld Integration
W3C PROV-O format with PREMIS fixity metadata:
```json
{
"@context": {
"prov": "http://www.w3.org/ns/prov#",
"premis": "http://www.loc.gov/premis/rdf/v3/",
"schema": "http://schema.org/",
"xsd": "http://www.w3.org/2001/XMLSchema#"
},
"@graph": [
{
"@id": "urn:uuid:ARCHIVE_UUID",
"@type": ["prov:Entity", "premis:IntellectualEntity"],
"prov:generatedAtTime": {
"@type": "xsd:dateTime",
"@value": "2026-02-14T18:45:22.387654321Z"
},
"premis:hasFixity": {
"@type": "premis:Fixity",
"premis:hasMessageDigest": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"premis:hasMessageDigestAlgorithm": "SHA-256",
"premis:hasMessageDigestOriginator": "aiwg verify-archive v2026.2.6",
"premis:fixityCheckDateTime": {
"@type": "xsd:dateTime",
"@value": "2026-02-14T18:45:22.387654321Z"
}
},
"schema:contentUrl": "./CHECKSUMS.sha256"
},
{
"@id": "urn:uuid:ACTIVITY_UUID",
"@type": "prov:Activity",
"prov:startedAtTime": {
"@type": "xsd:dateTime",
"@value": "2026-02-14T18:45:20.123456789Z"
},
"prov:endedAtTime": {
"@type": "xsd:dateTime",
"@value": "2026-02-14T18:45:22.387654321Z"
},
"prov:wasAssociatedWith": {
"@id": "urn:aiwg:agent:media-curator"
}
},
{
"@id": "urn:aiwg:agent:media-curator",
"@type": "prov:SoftwareAgent",
"schema:name": "AIWG Media Curator",
"schema:softwareVersion": "2026.2.6"
}
]
}
```
**Key relationships**:
- `premis:hasFixity` - Links entity to fixity information
- `premis:hasMessageDigest` - SHA-256 hash with `sha256:` prefix
- `premis:fixityCheckDateTime` - When verification was performed
- `prov:wasAssociatedWith` - Agent that performed verification
**Use case**: Long-term digital preservation requiring audit trails and provenance chains.
## Standards Compliance
| Standard | Purpose | Reference |
|----------|---------|-----------|
| SHA-256 | Cryptographic hash function | NIST FIPS 180-4 |
| ISO 8601 | Timestamp format | ISO 8601:2019 |
| PREMIS 3.0 | Preservation metadata | Library of Congress |
| W3C PROV-O | Provenance ontology | W3C Recommendation 2013 |
| JSON-LD 1.1 | Linked data format | W3C Recommendation 2020 |
## Error Handling
**Missing archive**: Exit with error if archive path does not exist.
**No CHECKSUMS.sha256**: For `--verify`, exit with error. For `--generate`, create new manifest.
**Hash mismatch**: Report which files failed, exit code 1.
**Permission errors**: Report files that cannot be read, continue verification for accessible files.
**Corrupt manifest**: If manifest cannot be parsed, exit with error and suggest regeneration.
## Deliverables
When running `--generate` or `--fix`:
1. **CHECKSUMS.sha256** - Self-verifying checksum manifest
2. **VERIFY.md** - Human-readable verification instructions
When running `--provenance`:
3. **PROVENANCE.jsonld** - W3C PROV-O provenance record with PREMIS fixity
## Examples
**Initial setup**:
```bash
aiwg verify-archive ~/media/podcast-archive --generate
```
**Routine verification**:
```bash
aiwg verify-archive ~/media/podcast-archive --verify
```
**After adding new episodes**:
```bash
aiwg verify-archive ~/media/podcast-archive --fix
```
**Generate with provenance**:
```bash
aiwg verify-archive ~/media/podcast-archive --generate --provenance
```
**Automated monitoring** (cron):
```bash
0 2 * * * aiwg verify-archive /media/archives/podcast --verify --quiet || echo "Archive verification failed" | mail -s "Alert" admin@example.com
```
## References
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/human-authorization.md — Seek explicit authorization before regenerating or fixing checksums (--generate, --fix)
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/integrity-verification/SKILL.md — Core integrity verification patterns used by this skill
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/provenance-tracking/SKILL.md — Provenance records generated via --provenance flag
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/curate/SKILL.md — Orchestration skill that invokes verify-archive as a curation phaseRelated Skills
Archive Acquisition
Patterns for acquiring content from Internet Archive and archival sources
verify-citations
Verify all citations in a document against the research corpus
research-archive
Package research artifacts for long-term archival
archive-answer
Capture a query answer and file it as a persistent artifact in .aiwg/working/answers/ with promotion guidance to a permanent destination.
aiwg-orchestrate
Route structured artifact work to AIWG workflows via MCP with zero parent context cost
venv-manager
Create, manage, and validate Python virtual environments. Use for project isolation and dependency management.
pytest-runner
Execute Python tests with pytest, supporting fixtures, markers, coverage, and parallel execution. Use for Python test automation.
vitest-runner
Execute JavaScript/TypeScript tests with Vitest, supporting coverage, watch mode, and parallel execution. Use for JS/TS test automation.
eslint-checker
Run ESLint for JavaScript/TypeScript code quality and style enforcement. Use for static analysis and auto-fixing.
repo-analyzer
Analyze GitHub repositories for structure, documentation, dependencies, and contribution patterns. Use for codebase understanding and health assessment.
pr-reviewer
Review GitHub pull requests for code quality, security, and best practices. Use for automated PR feedback and approval workflows.
YouTube Acquisition
yt-dlp patterns for acquiring content from YouTube and video platforms