knowledge-extractor

Extract tribal knowledge from code, documentation, and commit history to preserve institutional memory

509 stars

Best use case

knowledge-extractor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Extract tribal knowledge from code, documentation, and commit history to preserve institutional memory

Teams using knowledge-extractor should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/knowledge-extractor/SKILL.md --create-dirs "https://raw.githubusercontent.com/a5c-ai/babysitter/main/library/specializations/code-migration-modernization/skills/knowledge-extractor/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/knowledge-extractor/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How knowledge-extractor Compares

Feature / Agent	knowledge-extractor	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Extract tribal knowledge from code, documentation, and commit history to preserve institutional memory

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Knowledge Extractor Skill

Extracts tribal knowledge from code comments, commit messages, documentation, and other sources to preserve institutional memory during migration.

## Purpose

Enable knowledge preservation for:
- Comment analysis and extraction
- Commit message mining
- Documentation parsing
- Pattern recognition
- Business rule discovery

## Capabilities

### 1. Comment Analysis
- Extract TODO/FIXME comments
- Parse documentation comments
- Identify explanatory notes
- Find warning comments

### 2. Commit Message Mining
- Extract rationale from commits
- Identify bug fix context
- Find feature explanations
- Track decision history

### 3. Documentation Parsing
- Parse markdown documentation
- Extract from wikis
- Process README files
- Catalog API docs

### 4. Pattern Recognition
- Identify coding patterns
- Recognize idioms
- Detect conventions
- Map architectural patterns

### 5. Business Rule Extraction
- Find business logic comments
- Extract validation rules
- Identify calculation explanations
- Document edge cases

### 6. Glossary Generation
- Build domain vocabulary
- Define abbreviations
- Map term usage
- Create terminology guide

## Tool Integrations

| Tool | Purpose | Integration Method |
|------|---------|-------------------|
| Sourcegraph | Code search | API |
| GitHub API | Commit history | API |
| grep/ripgrep | Pattern search | CLI |
| Custom NLP | Text analysis | Library |
| Confluence API | Wiki extraction | API |

## Output Schema

```json
{
  "extractionId": "string",
  "timestamp": "ISO8601",
  "knowledge": {
    "comments": [
      {
        "type": "todo|fixme|note|warning|explanation",
        "file": "string",
        "line": "number",
        "content": "string",
        "context": "string"
      }
    ],
    "commits": [
      {
        "hash": "string",
        "message": "string",
        "author": "string",
        "context": "string",
        "relatedFiles": []
      }
    ],
    "documentation": [],
    "businessRules": [],
    "glossary": {}
  }
}
```

## Integration with Migration Processes

- **legacy-codebase-assessment**: Knowledge discovery
- **documentation-migration**: Source material

## Related Skills

- `legacy-code-interpreter`: Code understanding
- `documentation-generator`: Doc creation

## Related Agents

- `legacy-system-archaeologist`: Uses for excavation
- `documentation-migration-agent`: Uses for doc creation

Related Skills

mock-spec-extractor

509

from a5c-ai/babysitter

Extracts design specifications from mock images including colors, typography, spacing, and component details

contract-extractor

509

from a5c-ai/babysitter

Extracts key terms from contracts, identifies risks, flags unusual provisions

knowledge-analytics

509

from a5c-ai/babysitter

Knowledge base analytics, usage reporting, and effectiveness measurement

domain-model-extractor

509

from a5c-ai/babysitter

Extract domain models from monolithic codebases using DDD principles for microservices decomposition

knowledge-curation

509

from a5c-ai/babysitter

Context priming before work (bd prime) and self-reflection after completion to extract patterns, gotchas, and decisions into the knowledge base.

knowledge-graph-management

509

from a5c-ai/babysitter

Capture, validate, query, and sync architectural patterns and design decisions in the knowledge graph

cog-knowledge-consolidation

509

from a5c-ai/babysitter

Build structured knowledge frameworks from scattered vault notes with source attribution

process-builder

509

from a5c-ai/babysitter

Scaffold new babysitter process definitions following SDK patterns, proper structure, and best practices. Guides the 3-phase workflow from research to implementation.

Workflow & Productivity

babysitter

509

from a5c-ai/babysitter

Orchestrate via @babysitter. Use this skill when asked to babysit a run, orchestrate a process or whenever it is called explicitly. (babysit, babysitter, orchestrate, orchestrate a run, workflow, etc.)

yolo

509

from a5c-ai/babysitter

Run Babysitter autonomously with minimal manual interruption.

user-install

509

from a5c-ai/babysitter

Install the user-level Babysitter Codex setup.

team-install

509

from a5c-ai/babysitter

Install the team-pinned Babysitter Codex workspace setup.