link-auditor

Hugo site link health: scan markdown, build link graph, validate paths.

290 stars

bynotque

View on GitHub Installation ↓

Best use case

link-auditor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Hugo site link health: scan markdown, build link graph, validate paths.

Teams using link-auditor should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/link-auditor/SKILL.md --create-dirs "https://raw.githubusercontent.com/notque/claude-code-toolkit/main/skills/link-auditor/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/link-auditor/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How link-auditor Compares

Feature / Agent	link-auditor	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Hugo site link health: scan markdown, build link graph, validate paths.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Link Auditor Skill

Hugo site link health analysis through a 4-phase pipeline: Scan, Analyze, Validate, Report. Extracts internal, external, and image links from Hugo markdown content; builds an adjacency matrix of internal link relationships; identifies orphan pages, under-linked pages, link sinks, and hub pages; validates that link targets resolve to real files; and generates audit reports with actionable fix suggestions.

Read and follow the repository CLAUDE.md before starting any audit.

## Instructions

### Phase 1: SCAN

**Goal**: Extract all links from every markdown file and classify them by type.

**Step 1: Identify content root**

Scan all markdown files in `content/` because even small sites with 10 posts can have orphan pages, and partial scans miss graph-level issues. Locate the Hugo content directory and enumerate all markdown files:

```bash
# TODO: scripts/link_scanner.py not yet implemented
# Manual alternative: extract links from markdown files
grep -rn '\[.*\](.*' ~/your-blog/content/ --include="*.md"
```

**Step 2: Extract links by type**

Parse each markdown file for three link categories. Classify by type to understand link distribution:

Internal Links:
- `[text](/posts/slug/)` -- absolute internal path
- `[text](../other-post/)` -- relative path
- `[text](/categories/tech/)` -- taxonomy pages
- `{{< ref "posts/slug.md" >}}` -- Hugo ref shortcode

External Links:
- `[text](https://example.com/path)`
- `[text](http://example.com/path)`

Image Links:
- `![alt](/images/filename.png)` -- static path
- `![alt](images/filename.png)` -- relative path
- `{{< figure src="/images/file.png" >}}` -- Hugo shortcode

**Step 3: Tally link counts per file**

Record total internal, external, and image links per file for the summary.

**Gate**: All markdown files scanned. Link extraction complete with counts by type. Proceed only when gate passes.

### Phase 2: ANALYZE

**Goal**: Build internal link graph and compute structural metrics.

**Step 1: Build adjacency matrix**

Always build the adjacency matrix and compute inbound link counts because orphan pages are invisible to search crawlers and this is often the highest-impact finding an audit produces. Map every internal link to its source and target:

```
Page A -> Page B (A links to B)
Page A -> Page C
Page B -> Page D
Page C -> (no outbound)
Page E -> (no outbound, no inbound = orphan)
```

**Step 2: Compute graph metrics**

| Metric | Definition | SEO Impact |
|--------|------------|------------|
| Orphan Pages | 0 inbound internal links | Critical -- invisible to crawlers |
| Under-Linked | < N inbound links (default 2, adjustable with --min-inbound N) | Missed SEO opportunity |
| Link Sinks | Receives links, no outbound | May indicate incomplete content |
| Hub Pages | Many outbound links | Good for navigation |

**Step 3: Classify findings by severity**

Clearly distinguish critical issues from suggestions because they require different urgency levels. Organize all findings by impact:

- **Critical**: Orphan pages, broken internal links, missing images
- **Warning**: Under-linked pages, link sinks
- **Info**: Hub pages, external link stats

**Gate**: Adjacency matrix built. All pages classified with inbound/outbound counts. Proceed only when gate passes.

### Phase 3: VALIDATE

**Goal**: Verify link targets resolve to real files or live URLs.

**Step 1: Validate internal links**

For each internal link target, try all Hugo path resolutions before reporting a link as broken because Hugo resolves paths through multiple conventions. Check these resolutions in order:

1. Parse the link target path
2. Try Hugo path resolutions: `content/posts/slug.md`, `content/posts/slug/index.md`, `content/posts/slug/_index.md`
3. Mark as broken only if ALL resolutions fail
4. Record source file and line number for broken links

**Step 2: Validate image paths**

Check all image paths against `static/` because missing images are critical issues. Validate both absolute and relative interpretations:

1. Parse image source path (absolute or relative)
2. Map to static/ directory, checking both absolute and relative interpretations
3. Check file exists
4. Record source file and line number for missing images

**Step 3: Validate external links (optional)**

Skip external URL validation by default because network latency, rate limiting, and bot-blocking make results unreliable. Only run validation when explicitly enabled with `--check-external` flag. When enabled, follow these steps:

1. HTTP HEAD request to URL
2. Follow redirects (up to 3)
3. Check response status code
4. Report known bot-blocked sites as "blocked (expected)" not broken because LinkedIn (403), Twitter/X (403/999), and Facebook actively block automated requests while links work fine in browsers

Use `--verbose` to include valid links in the output (default: issues only).

**Gate**: All link targets checked. Broken links have file and line numbers. External results (if enabled) distinguish real failures from false positives. Proceed only when gate passes.

### Phase 4: REPORT

**Goal**: Present findings in a structured, actionable audit report.

Never modify content files during this phase because users must approve all content changes. Report findings with specific suggestions and let the user decide which fixes to apply.

**Step 1: Generate summary header**

```
===============================================================
 LINK AUDIT: ~/your-blog/content/
===============================================================

 SCAN SUMMARY:
   Posts scanned: 15
   Internal links: 42
   External links: 28
   Image references: 12
```

**Step 2: Report by severity**

List critical issues first (orphans, broken links, missing images), then warnings (under-linked, sinks), then info (hubs, valid external counts). Show all findings without summarizing or abbreviating because partial issue lists hide problems.

Each issue must include:
- File path
- Line number (for broken links and missing images)
- Specific suggestion for resolution

**Step 3: Generate recommendations**

Conclude with numbered, actionable recommendations ordered by impact:

```
===============================================================
 RECOMMENDATIONS:
   1. Add internal links to 2 orphan pages
   2. Fix 1 broken internal link in /posts/example.md line 45
   3. Update or remove 1 dead external link
   4. Add missing image or fix path in /posts/images.md line 12
===============================================================
```

Always run the full 4-phase audit regardless of how few issues appear because link rot is progressive and orphan pages are invisible without graph analysis.

**Gate**: Report generated with all findings. Every issue has a file path and actionable suggestion. Audit is complete.

---

## Error Handling

### Error: "No markdown files found"
Cause: Wrong directory path or empty content root
Solution:
1. Verify the content/ directory exists at the given path
2. Check that .md files exist (not just subdirectories)
3. Confirm the path is the Hugo content root, not the project root

### Error: "External validation timeout"
Cause: Target site is slow, blocking requests, or unreachable
Solution:
1. Check if the site is in the known false-positives list (LinkedIn, Twitter)
2. Add persistently failing sites to the false-positives list
3. Use shorter timeout with `--timeout 5` for slow sites

### Error: "Image path ambiguous"
Cause: Path could be relative or absolute, unclear resolution
Solution:
1. The scanner checks both interpretations automatically
2. Report shows which interpretation was attempted
3. Verify the Hugo site's static directory structure matches expectations

---

## References

- `${CLAUDE_SKILL_DIR}/references/link-graph-metrics.md`: Graph metrics definitions and SEO impact
- `${CLAUDE_SKILL_DIR}/references/false-positives.md`: Sites known to block validation requests
- `${CLAUDE_SKILL_DIR}/references/fix-strategies.md`: Resolution strategies for each issue type

Related Skills

image-auditor

290

from notque/claude-code-toolkit

Non-destructive image validation for accessibility and health.

cron-job-auditor

290

from notque/claude-code-toolkit

Audit cron scripts for reliability and safety.

x-api

290

from notque/claude-code-toolkit

Post tweets, build threads, upload media via the X API.

worktree-agent

290

from notque/claude-code-toolkit

Mandatory rules for agents in git worktree isolation.

workflow

290

from notque/claude-code-toolkit

Structured multi-phase workflows: review, debug, refactor, deploy, create, research, and more.

workflow-help

290

from notque/claude-code-toolkit

Interactive guide to workflow system: agents, skills, routing, execution patterns.

wordpress-uploader

290

from notque/claude-code-toolkit

WordPress REST API integration for posts and media uploads.

wordpress-live-validation

290

from notque/claude-code-toolkit

Validate published WordPress posts in browser via Playwright.

with-anti-rationalization

290

from notque/claude-code-toolkit

Anti-rationalization enforcement for maximum-rigor task execution.

voice-writer

290

from notque/claude-code-toolkit

Unified voice content generation pipeline with mandatory validation and joy-check. 8-phase pipeline: LOAD, GROUND, GENERATE, VALIDATE, REFINE, JOY-CHECK, OUTPUT, CLEANUP. Use when writing articles, blog posts, or any content that uses a voice profile. Use for "write article", "blog post", "write in voice", "generate content", "draft article", "write about".

voice-validator

290

from notque/claude-code-toolkit

Critique-and-rewrite loop for voice fidelity validation.

vitest-runner

290

from notque/claude-code-toolkit

Run Vitest tests and parse results into actionable output.