template-reproducibility-audit
Deterministic reproducibility audit — fixed seeds, regenerate-from-clean, double-run diff before Zenodo/arXiv/release. USE WHEN outputs drift between runs, "worked on my machine", need regenerate-from-clean proof, or pre-release reproducibility check — even without naming docs/prompts.
Best use case
template-reproducibility-audit is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Deterministic reproducibility audit — fixed seeds, regenerate-from-clean, double-run diff before Zenodo/arXiv/release. USE WHEN outputs drift between runs, "worked on my machine", need regenerate-from-clean proof, or pre-release reproducibility check — even without naming docs/prompts.
Teams using template-reproducibility-audit should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/reproducibility-audit/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How template-reproducibility-audit Compares
| Feature / Agent | template-reproducibility-audit | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Deterministic reproducibility audit — fixed seeds, regenerate-from-clean, double-run diff before Zenodo/arXiv/release. USE WHEN outputs drift between runs, "worked on my machine", need regenerate-from-clean proof, or pre-release reproducibility check — even without naming docs/prompts.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Reproducibility audit Complements [manuscript-claim-verification](../manuscript-claim-verification/SKILL.md) (claim truth) by focusing on **stability of artifacts**. ## Natural invoke - "Prove template_code_project is reproducible before Zenodo" - "Numbers in the PDF don't match after a clean rebuild" - "Double-run the pipeline and diff outputs" ## Inputs to confirm - **Project** — from [`docs/_generated/active_projects.md`](../../_generated/active_projects.md). ## Workflow 1. **Determinism** — fixed RNG seeds, `MPLBACKEND=Agg`; no wall-clock/hostname/path leaks. List nondeterministic sources. 2. **Regenerate from clean** — wipe working outputs, run core pipeline, regenerate manuscript variables. Capture exit status per stage. 3. **Diff** — compare regenerated `output/<name>/` and manuscript variables vs prose assertions. Hand-typed numbers not from generated variables are findings even if equal. 4. **Double-run stability** — regenerate twice with clean tree between; any run-1 vs run-2 diff is hard failure. 5. **Fix** — seed injection, variable-ize hard-typed numbers, remove timestamp leakage. Update `projects/<n>/AGENTS.md` and README.md with regeneration command. Never hand-edit `output/`. ## Deliverables - Drift table: artifact | committed | regenerated | action. - Commands + raw output; no invented coverage numbers. ## Verification commands ```bash uv sync uv run python scripts/execute_pipeline.py --project <project> --core-only uv run python projects/<project>/scripts/z_generate_manuscript_variables.py git stash --include-untracked -- output/<project> 2>/dev/null || true uv run python scripts/execute_pipeline.py --project <project> --core-only git status --porcelain output/<project> uv run python scripts/01_run_tests.py --project <project> uv run python -m infrastructure.validation.cli prerender projects/<project>/manuscript --repo-root . uv run python -m infrastructure.validation.cli pdf output/<project>/pdf/ uv run python -m infrastructure.validation.cli integrity output/<project>/ ``` ## When NOT to use - **Stage fails during regeneration** → [pipeline-debugging](../pipeline-debugging/SKILL.md) - **Per-sentence claim audit** → [manuscript-claim-verification](../manuscript-claim-verification/SKILL.md)
Related Skills
template-validation-quality
Run validation CLI, prerender, markdown/PDF/integrity gates, and QA workflows for the Research Project Template. USE WHEN validate manuscript, check PDF for ?? refs, prerender gate, link checker, output integrity, or pre-commit validation — even without validation_quality prompt.
template-test-creation
Create pytest suites under the no-mocks policy — real data, temp files, subprocess, pytest-httpserver. USE WHEN adding tests, raising coverage, testing new src/ module, or user forbids mocks.
template-refactoring
Clean-break refactors with migration for the Research Project Template — move logic to src/, split modules, rename APIs with test updates. USE WHEN restructuring code, extracting modules, removing duplication, or migration without behavior change.
template-pipeline-debugging
Systematic pipeline DAG failure triage for the Research Project Template. USE WHEN ./run.sh or execute_pipeline.py fails, a stage stalls (setup, tests, analysis, render, validate, LLM, copy), pytest/coverage gate fails mid-pipeline, PDF render or validate breaks, Project Analysis finishes too fast with no figures, or user says pipeline debug, stage failed, resume checkpoint, core-only triage — even without naming this skill or docs/prompts.
template-methods-orchestration
Repo-wide methods orchestration workflow for the Research Project Template. USE WHEN the user asks to add, audit, improve, or validate methods, methodology, method contracts, stage-to-method wiring, artifact/evidence provenance, or orchestration across template projects.
template-manuscript-cross-references
Audit or author registry-driven manuscript cross-refs — labels.yaml, [[FIG:]], [[THMREF:]], [[VAR:]] tokens. USE WHEN fixing figure/equation/theorem numbering, orphan registry keys, hard-coded "Theorem 7.3" in prose, or [[MISSING:]] injection failures — even for Pandoc projects that also use a YAML registry.
template-manuscript-creation
Scaffold a research manuscript and project layout from a research brief — sections, config.yaml, src/, scripts/, tests. USE WHEN starting a new paper, new projects/ tree, manuscript from topic description, or aligning with template_code_project exemplar — even without copy-paste prompts.
template-manuscript-claim-verification
Triple-pass verification of every manuscript claim against code, data, refs, and renderer; repair prose while staying renderable. USE WHEN pre-submission, pre-Zenodo, pre-arXiv, abstract numbers disagree with CSV, citations do not support sentences, or user asks to triple-check / verify every claim — even without docs/prompts. Not for casual PDF summary.
template-feature-addition
End-to-end feature work across src/, scripts/, tests/, manuscript, and docs for the Research Project Template. USE WHEN adding a pipeline-visible feature, new analysis stage, manuscript-facing output, or cross-layer integration — even without feature_addition prompt.
template-documentation-creation
Author or refresh AGENTS.md and README.md for template directories — accurate commands, Mermaid where helpful, link _generated/active_projects.md. USE WHEN folder needs AGENTS, README audit, doc contract fix, or signposting after code change — even without documentation_creation prompt.
template-deep-research
Template-native research intake, literature search, source verification, synthesis, fact-checking, and systematic-review planning. USE WHEN the user asks to research a topic, build a literature corpus, fact-check claims, prepare a PRISMA-style review, or clarify a research question before manuscript work.
template-comprehensive-assessment
Full checkout audit for the Research Project Template — tests, architecture, docs, manuscript, pipeline. USE WHEN user asks for comprehensive assessment, full repo review, health check across projects, audit everything, or pre-merge sanity sweep for template exemplars — even without naming docs/prompts or a skill. Not for single failing stage only.