infrastructure-validation

Skill for the validation infrastructure module providing PDF validation, markdown validation, output integrity checks, link verification, documentation audits, issue categorization, and repository scanning. Use when validating research outputs, checking document quality, running audits, or verifying cross-references.

13 stars

bydocxology

View on GitHub Installation ↓

Best use case

infrastructure-validation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using infrastructure-validation should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/validation/SKILL.md --create-dirs "https://raw.githubusercontent.com/docxology/template/main/infrastructure/validation/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/validation/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How infrastructure-validation Compares

Feature / Agent	infrastructure-validation	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Startups

Explore AI agent skills for startup validation, product research, growth experiments, documentation, and fast execution with small teams.

AI Agent for SaaS Idea Validation

Use AI agent skills for SaaS idea validation, market research, customer discovery, competitor analysis, and documenting startup hypotheses.

SKILL.md Source

# Validation Module

Quality assurance and content validation tools for research outputs. Covers PDFs, markdown, links, output integrity, and comprehensive audits.

## PDF Validation (`pdf_validator.py`)

```python
from infrastructure.validation import validate_pdf_rendering, extract_text_from_pdf, scan_for_issues

# Validate a rendered PDF
results = validate_pdf_rendering(pdf_path)

# Extract text for analysis
text = extract_text_from_pdf(pdf_path)

# Scan for rendering issues
issues = scan_for_issues(text)
```

**CLI:**

```bash
uv run python -m infrastructure.validation.cli.main pdf output/{project}/pdf/
uv run python -m infrastructure.validation.cli.pdf output/{project}/pdf/
```

## Markdown Validation (`markdown_validator.py`)

```python
from pathlib import Path

from infrastructure.validation.content.discovery import discover_markdown_files
from infrastructure.validation.content.markdown_validator import (
    collect_symbols,
    validate_images,
    validate_markdown,
    validate_math,
    validate_refs,
)

repo_root = Path(".")
manuscript_dir = repo_root / "projects" / "project" / "manuscript"
md_files = [str(path) for path in discover_markdown_files(manuscript_dir, scope="tree")]
labels, anchors = collect_symbols(md_files)

# Validate all markdown in a directory
problems, exit_code = validate_markdown(manuscript_dir, repo_root)

# Individual checks
image_issues = validate_images(md_files, repo_root)
ref_issues = validate_refs(md_files, repo_root, labels, anchors)
math_issues = validate_math(md_files, repo_root)
```

**CLI:**

```bash
uv run python -m infrastructure.validation.cli.main markdown projects/{name}/manuscript/
uv run python -m infrastructure.validation.cli.markdown projects/{name}/manuscript/
```

## Output Integrity (`integrity.py`)

```python
from infrastructure.validation import (
    verify_output_integrity, verify_file_integrity,
    verify_cross_references, verify_data_consistency,
    verify_academic_standards, generate_integrity_report,
)

# Full integrity check
report = verify_output_integrity(output_dir)

# Individual checks
verify_file_integrity(file_path)
verify_cross_references(manuscript_dir)
verify_data_consistency(data_dir)
verify_academic_standards(manuscript_dir)
```

## Output Structure Validation (`output_validator.py`)

```python
from infrastructure.validation import validate_output_structure, validate_copied_outputs

validate_output_structure(output_dir)
validate_copied_outputs(source_dir, dest_dir)
```

## Link Verification (`check_links.py`, `link_validator.py`)

```python
from infrastructure.validation import LinkValidator

validator = LinkValidator()
results = validator.check_all(docs_dir)
```

## Figure Validation (`figure_validator.py`)

```python
from pathlib import Path
from infrastructure.validation import validate_figure_registry

success, issues = validate_figure_registry(
    Path("projects/<name>/output/figures/figure_registry.json"),
    Path("projects/<name>/manuscript"),
)
```

Both registry shapes are accepted: ``{"fig:label": {...}, ...}`` (dict, emitted
by ``FigureManager``) and ``[{"label": "fig:label", ...}, ...]`` (list, emitted
by project-side scripts that produce a flat manifest).

## Audit Orchestration (`audit_orchestrator.py`)

```python
from infrastructure.validation import run_comprehensive_audit, generate_audit_report

# Run all validation checks in one pass
audit_results = run_comprehensive_audit(project_path)
report = generate_audit_report(audit_results)
```

## Issue Categorization (`issue_categorizer.py`)

```python
from infrastructure.validation import (
    categorize_by_type, assign_severity, filter_false_positives,
    prioritize_issues, group_related_issues, generate_issue_summary,
)

categorized = categorize_by_type(raw_issues)
filtered = filter_false_positives(categorized)
prioritized = prioritize_issues(filtered)
summary = generate_issue_summary(prioritized)
```

## Documentation Scanning (`docs/scanner.py`, `docs/accuracy.py`, `docs/completeness.py`)

Comprehensive scanning of documentation for accuracy, completeness, and quality:

```python
from infrastructure.validation.docs.scanner import DocumentationScanner
from infrastructure.validation.docs.accuracy import verify_documentation_accuracy
from infrastructure.validation.docs.completeness import analyze_documentation_completeness

scanner = DocumentationScanner(repo_root)
inventory = scanner.discover_inventory()
accuracy_report, link_issues, accuracy_issues, headings = verify_documentation_accuracy(
    md_files, repo_root, config_files
)
completeness_report, gaps = analyze_documentation_completeness(repo_root, documentation_files, config_files)
```

## Repository Scanning (`repo/scanner.py`)

```python
from infrastructure.validation.repo.scanner import RepositoryScanner
scanner = RepositoryScanner(repo_root)
results = scanner.scan_all()
```

## Mock Validation (`output/no_mock_enforcer.py`)

Validates that no mock/fake methods are used in the codebase (enforces the no-mocks policy):

```python
from infrastructure.validation.output.no_mock_enforcer import validate_no_mocks
violations = validate_no_mocks(tests_dir, repo_root)
```

Related Skills

infrastructure-steganography

from docxology/template

Skill for the steganography infrastructure module providing QR code generation with dynamic mailto links, hash manifests, metadata payloads, and document-wide overlay processing. Use this module to insert opt-in cryptographic and steganographic provenance data onto PDFs.

infrastructure-skills

from docxology/template

Programmatic discovery of first-party agent SKILL.md files under configured public repo roots (infrastructure, projects, docs/prompts, and .cursor/skills). Use when enumerating skills, validating .cursor/skill_manifest.json, writing docs/_generated/skills_index.md, checking docs/prompts workflow contracts, or wiring editor automation. Exposes discover_skills, write_skill_manifest, manifest_matches_discovery, and check_skill_contracts.

infrastructure-search-literature

from docxology/template

Paperclip-style multi-source literature search across arXiv, Crossref, local JSON corpora, and (opt-in) the Paperclip API. Provides Paper/SearchQuery/SearchResult data models, a LiteratureClient aggregator with per-backend failure isolation, DOI/arXiv-aware deduplication via merge_papers, deterministic JSON caching via SearchCache, an HttpClient protocol for test injection, and a CLI (search/to-bibtex). Use when finding papers by topic, building reading lists, populating references.bib from a query, or replaying a prior search reproducibly.

infrastructure-search

from docxology/template

Discovery utilities for academic literature. Currently exposes the `literature` submodule — Paperclip-style multi-source search across arXiv, Crossref, local JSON corpora, and (opt-in) the Paperclip API, with deterministic JSON caching, a `LiteratureClient` aggregator, normalised `Paper` records, and a CLI. Use when the user wants to find papers, build reading lists, populate references.bib from a query, or replay a prior search reproducibly. Designed to host additional discovery workflows without breaking the public API.

infrastructure-scientific

from docxology/template

Skill for the scientific infrastructure module providing numerical stability checks, performance benchmarking, scientific documentation generation, implementation validation, and module/workflow templates. Use when benchmarking functions, checking numerical stability, validating scientific implementations, or creating scientific module scaffolds.

infrastructure-reporting

from docxology/template

Skill for the reporting infrastructure module providing pipeline reporting, error aggregation, executive summaries, dashboard generation, test reporting, and multi-project reports. Use when generating build reports, aggregating errors, creating visual dashboards, or producing executive summaries across projects.

infrastructure-rendering

from docxology/template

Skill for the rendering infrastructure module providing multi-format output generation including PDF manuscripts, HTML web pages, Beamer/Reveal.js slides, and posters. Use when rendering research outputs, converting markdown to PDF, generating slides, or configuring LaTeX rendering.

infrastructure-reference-citation

from docxology/template

BibTeX read/write/convert that matches the syntax/semantics of projects/template_code_project/manuscript/references.bib (consumed by Pandoc with --natbib -- see infrastructure/rendering/_pdf_combined_renderer.py). Provides BibEntry/BibDatabase models, parse_bibfile/render_database functions, paper_to_bibentry conversion from literature search results, generate_citation_key in the project's house style (firstauthorlastname+year+firsttitleword), LaTeX-special-character escape helpers, and a CLI (validate/format/convert). Use when reading or writing .bib files, exporting search results to BibTeX, or generating citation keys.

infrastructure-reference

from docxology/template

Bibliographic-reference utilities for research projects. Read, write, and convert BibTeX entries that match the syntax/semantics of projects/template_code_project/manuscript/references.bib (consumed by Pandoc with --natbib during PDF render -- see infrastructure/rendering/_pdf_combined_renderer.py). Currently exposes the `citation` submodule (BibTeX I/O + Paper→BibEntry conversion); designed to host additional reference workflows (e.g. CSL-JSON export, ORCID lookups) without breaking the public API.

infrastructure-publishing

from docxology/template

Skill for the publishing infrastructure module providing academic publishing workflows including BibTeX CLI citation generation, APA/MLA citation helper functions, DOI management, Zenodo publication, arXiv submission preparation, GitHub releases, and publication readiness validation. Use when publishing research, generating citations, minting DOIs, or preparing submissions.

infrastructure-prose

from docxology/template

Prose analysis utilities for research manuscripts and prose-focused projects. Provides readability metrics (Flesch, Flesch-Kincaid, Gunning Fog), heading-outline structural analysis, editorial quality flags (passive voice, hedge words, citation density, long sentences), aggregate ManuscriptReport across a manuscript directory, and a CLI (metrics/outline/quality/report). Use when analyzing manuscripts for readability, building editorial dashboards, validating heading structure, extracting citation keys from prose, or wiring prose-quality gates into the pipeline.

infrastructure-project

from docxology/template

Skill for the project management infrastructure module providing multi-project discovery, structure validation, and metadata extraction. Use when discovering active projects, validating project directory structure, or extracting project configuration metadata.