markdown-consolidator
Intelligent consolidation and synthesis of multiple markdown files with overlapping content and different update dates. Use when: (1) Multiple AI-generated markdown files need merging, (2) Knowledge bases have fragmented or duplicate content, (3) Documentation requires recency-aware synthesis, (4) Supporting documents need re-synthesis after AI task completion, (5) Project documentation has semantic overlap across files, (6) Periodic knowledge base maintenance and deduplication is needed.
Best use case
markdown-consolidator is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Intelligent consolidation and synthesis of multiple markdown files with overlapping content and different update dates. Use when: (1) Multiple AI-generated markdown files need merging, (2) Knowledge bases have fragmented or duplicate content, (3) Documentation requires recency-aware synthesis, (4) Supporting documents need re-synthesis after AI task completion, (5) Project documentation has semantic overlap across files, (6) Periodic knowledge base maintenance and deduplication is needed.
Teams using markdown-consolidator should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/markdown-consolidator/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How markdown-consolidator Compares
| Feature / Agent | markdown-consolidator | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Intelligent consolidation and synthesis of multiple markdown files with overlapping content and different update dates. Use when: (1) Multiple AI-generated markdown files need merging, (2) Knowledge bases have fragmented or duplicate content, (3) Documentation requires recency-aware synthesis, (4) Supporting documents need re-synthesis after AI task completion, (5) Project documentation has semantic overlap across files, (6) Periodic knowledge base maintenance and deduplication is needed.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Markdown Consolidator
Consolidate and synthesize multiple markdown files with intelligent handling of overlapping content, different update dates, and semantic deduplication.
## Core Problem
AI-assisted workflows generate fragmented documentation:
- Each AI session creates task-specific markdown files
- AI references supporting docs but doesn't update them post-task
- Knowledge becomes scattered across files with overlapping content
- Different timestamps make version reconciliation complex
## Workflow Overview
```
1. ANALYZE → Inventory files, extract metadata, identify relationships
2. CLUSTER → Group semantically related files using content analysis
3. PLAN → Create merge strategy based on recency, overlap, authority
4. SYNTHESIZE → Merge content with intelligent conflict resolution
5. VALIDATE → Verify completeness and coherence of output
```
## Analysis Phase
### Step 1: File Inventory
Run the inventory script to analyze all markdown files:
```bash
python scripts/inventory.py <directory> --output inventory.json
```
The script extracts:
- File paths and sizes
- Modification timestamps (file system and YAML frontmatter)
- Section headers (H1-H6 structure)
- Word/token counts per section
- Internal links (`[[wikilinks]]` and `[markdown](links)`)
- YAML frontmatter metadata
- Content fingerprints for similarity detection
### Step 2: Relationship Mapping
```bash
python scripts/analyze_relationships.py inventory.json --output relationships.json
```
Identifies:
- **Semantic clusters**: Files covering similar topics (via TF-IDF/embedding similarity)
- **Temporal chains**: Files that evolved from each other (via timestamp + similarity)
- **Reference graphs**: Which files reference which (via link analysis)
- **Conflict zones**: Sections with contradictory or overlapping content
## Clustering Phase
### Clustering Strategies
Choose based on your consolidation goal:
**Topic-based clustering** (default)
Groups files by semantic similarity of content.
```bash
python scripts/cluster.py relationships.json --method topic --threshold 0.6
```
**Temporal clustering**
Groups files by modification date ranges.
```bash
python scripts/cluster.py relationships.json --method temporal --window 7d
```
**Hierarchical clustering**
Groups by directory structure + content similarity.
```bash
python scripts/cluster.py relationships.json --method hierarchical
```
### Cluster Output
Creates `clusters.json` with structure:
```json
{
"clusters": [
{
"id": "cluster_001",
"theme": "API Authentication",
"files": ["auth-design.md", "oauth-notes.md", "token-handling.md"],
"primary_file": "auth-design.md",
"overlap_score": 0.72,
"conflicts": ["token-handling.md:L45 vs oauth-notes.md:L23"]
}
]
}
```
## Planning Phase
### Merge Strategy Selection
**Authority-based** (recommended for documentation)
- Most recent file is authoritative for conflicts
- Older unique content is preserved with attribution
- Use when files represent evolving understanding
**Comprehensive** (for knowledge bases)
- Union of all unique information
- Conflicts flagged for manual review
- Use when completeness matters more than consistency
**Canonical** (for specifications)
- Designate one file as canonical
- Others provide supplementary/historical context
- Use when single source of truth is required
### Create Merge Plan
```bash
python scripts/plan_merge.py clusters.json --strategy authority --output merge_plan.json
```
Generates actionable merge plan:
```json
{
"cluster_id": "cluster_001",
"output_file": "consolidated/authentication.md",
"sections": [
{
"heading": "## Overview",
"sources": [{"file": "auth-design.md", "lines": "1-25", "action": "primary"}],
"conflicts": []
},
{
"heading": "## Token Handling",
"sources": [
{"file": "token-handling.md", "lines": "10-45", "action": "primary"},
{"file": "oauth-notes.md", "lines": "20-35", "action": "supplement"}
],
"conflicts": [
{
"description": "Token expiry differs: 24h vs 1h",
"resolution": "Use most recent (token-handling.md: 24h)"
}
]
}
]
}
```
## Synthesis Phase
### Execute Merge
```bash
python scripts/synthesize.py merge_plan.json --output consolidated/
```
The synthesizer:
1. Creates section-by-section merged content
2. Preserves original attribution via HTML comments
3. Resolves conflicts per strategy
4. Maintains internal link consistency
5. Updates frontmatter with merge metadata
### Synthesis Rules
**Content Deduplication**
- Exact duplicates: Remove, keep first occurrence
- Near duplicates (>80% similarity): Merge, note sources
- Partial overlap: Keep both with clear section breaks
**Conflict Resolution**
```
Authority strategy:
1. Prefer most recently modified source
2. Prefer explicitly dated content over undated
3. Prefer longer/more detailed explanations
4. Flag unresolvable conflicts for review
Comprehensive strategy:
1. Include all non-contradictory content
2. Present conflicts as "Version A / Version B" blocks
3. Add TODO markers for manual resolution
```
**Link Handling**
- Internal links updated to point to consolidated files
- Broken links flagged with `<!-- BROKEN: original-target.md -->`
- External links preserved as-is
### Output Format
Consolidated files include:
```markdown
---
title: Authentication System
consolidated_from:
- file: auth-design.md
modified: 2024-12-01T10:30:00
- file: oauth-notes.md
modified: 2024-11-28T15:45:00
- file: token-handling.md
modified: 2024-12-02T09:00:00
consolidated_at: 2024-12-03T14:00:00
strategy: authority
---
# Authentication System
<!-- SOURCE: auth-design.md:1-25 -->
## Overview
...
<!-- SOURCE: token-handling.md:10-45, SUPPLEMENTED: oauth-notes.md:20-35 -->
## Token Handling
...
<!-- CONFLICT RESOLVED: Used token-handling.md (most recent) -->
Token expiry is set to 24 hours...
```
## Validation Phase
```bash
python scripts/validate.py consolidated/ --original <source_dir>
```
Validates:
- **Completeness**: All source content represented or explicitly excluded
- **Link integrity**: All internal links resolve
- **Coherence**: No contradictions in final output
- **Metadata**: Proper attribution and timestamps
Generates `validation_report.md`:
```markdown
## Consolidation Validation Report
### Coverage
- 47/47 source files processed
- 3 files excluded (empty/invalid)
- 12 clusters created
- 8 consolidated files produced
### Content Coverage
- 98.3% of source content preserved
- 1.7% deduplicated (exact matches)
- 5 conflicts resolved automatically
- 2 conflicts flagged for review
### Issues
- [ ] REVIEW: consolidated/auth.md:L145 - conflicting token formats
- [ ] REVIEW: consolidated/api.md:L67 - unclear which version is correct
```
## Quick Start
For immediate consolidation of a directory:
```bash
# Full pipeline
python scripts/consolidate.py <source_dir> <output_dir> --strategy authority
# This runs: inventory → analyze → cluster → plan → synthesize → validate
```
## Advanced: Incremental Updates
For ongoing maintenance:
```bash
# Detect changes since last consolidation
python scripts/detect_changes.py <source_dir> --since "2024-12-01"
# Re-consolidate only affected clusters
python scripts/consolidate.py <source_dir> <output_dir> --incremental
```
## Configuration
Create `.consolidator.yaml` in project root:
```yaml
# Files/directories to exclude
exclude:
- "**/archive/**"
- "**/.obsidian/**"
- "**/templates/**"
# Similarity threshold for clustering (0-1)
similarity_threshold: 0.6
# Default merge strategy
default_strategy: authority
# Preserve original files
keep_originals: true
archive_path: .consolidated-archive/
# Frontmatter fields to preserve
preserve_frontmatter:
- tags
- aliases
- created
# Output format
output:
add_source_comments: true
add_merge_frontmatter: true
update_internal_links: true
```
## Integration Patterns
### With Claude Code Sessions
Add to your CLAUDE.md:
```markdown
## Post-Task Consolidation
After completing any task that creates or modifies markdown files:
1. Run `/project:consolidate` to update knowledge base
2. Review flagged conflicts in validation report
3. Archive original files if consolidation successful
```
### With Basic Memory MCP
The consolidator can output in Basic Memory format:
```bash
python scripts/synthesize.py merge_plan.json --format basic-memory
```
Outputs files with observation/relation syntax compatible with Basic Memory's knowledge graph.
## Reference Documentation
- [ALGORITHMS.md](references/ALGORITHMS.md) - Detailed similarity/clustering algorithms
- [CONFLICT-RESOLUTION.md](references/CONFLICT-RESOLUTION.md) - Conflict handling patterns
- [INTEGRATION.md](references/INTEGRATION.md) - Integration with other toolsRelated Skills
using-markdown-new
This skill should be used when the user asks to "fetch a website", "get webpage content", "scrape a URL", "download HTML", mentions "WebFetch", or needs to retrieve web content for analysis. Teaches Claude to use markdown.new service instead of direct HTML fetching for ~80% token reduction.
obsidian-markdown
Create and edit Obsidian Flavored Markdown with wikilinks, embeds, callouts, properties, and other Obsidian-specific syntax. Use when working with .md files in Obsidian, or when the user mentions wikilinks, callouts, frontmatter, tags, embeds, or Obsidian notes.
markdowntown-atlas-scan
Atlas Simulator scan flow and next-step guidance for markdowntown. Use when working on folder scanning, tool detection, cwd handling, results panels, or scan-to-workbench CTAs.
markdown-mdx
Advanced Markdown and MDX processing for technical documentation. Parse, validate, lint, and transform Markdown content with support for MDX components, front matter, and remark/rehype plugins.
Markdown Export
Specialist in generating comprehensive Markdown reports of the knowledge model.
markdown-drafts
Use markdown formatting when drafting content intended for external systems (GitHub issues/PRs, Jira tickets, wiki pages, design docs, etc.) so formatting is preserved when the user copies it. Load this skill before producing any draft the user will paste elsewhere.
fix-markdown
Fix lint, formatting, and prose issues in markdown files using Prettier and Vale. Use when the user or agent needs to fix lint, formatting, and prose issues in markdown files.
markdown-exporter
Markdown exporter for transform Markdown text to DOCX, PPTX, XLSX, PDF, PNG, HTML, MD, CSV, JSON, JSONL, XML, Mermaid files, and extract code blocks in Markdown to Python, Bash,JS and etc files. Also known as the md_exporter skill.
adding-markdown-highlighted-comments
Use when adding responses to markdown documents with user-highlighted comments, encountering markup errors, or unsure about mark tag placement - ensures proper model-highlight formatting with required attributes and correct placement within markdown elements
bgo
Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.
ai-search-technical-auditor
Audit front-end code for AI search readiness. Use when reviewing HTML structure, meta tags, schema markup, and technical elements that affect how AI crawlers understand and index web pages.
ai-output-validator
AI出力の品質を自動検証するスキル。事実確認、論理性、一貫性、幻覚(ハルシネーション)検出、バイアス分析、安全性チェックを実施し、改善提案を提供。