generate-knowledge-base
Generate a product knowledge base from a codebase. Analyzes source code to create an Obsidian vault with architecture docs, API references, domain logic, data models, and infrastructure documentation. Use when the user asks to document a codebase, create a knowledge base, or generate product docs.
Best use case
generate-knowledge-base is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Generate a product knowledge base from a codebase. Analyzes source code to create an Obsidian vault with architecture docs, API references, domain logic, data models, and infrastructure documentation. Use when the user asks to document a codebase, create a knowledge base, or generate product docs.
Teams using generate-knowledge-base should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/generate-knowledge-base/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How generate-knowledge-base Compares
| Feature / Agent | generate-knowledge-base | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Generate a product knowledge base from a codebase. Analyzes source code to create an Obsidian vault with architecture docs, API references, domain logic, data models, and infrastructure documentation. Use when the user asks to document a codebase, create a knowledge base, or generate product docs.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Generate Product Knowledge Base You are generating a comprehensive product knowledge base from source code analysis. The output is an Obsidian vault with interconnected documents covering architecture, data models, APIs, business domains, and infrastructure. ## Before You Start Read these reference files to understand the expected output format and quality criteria: - `references/document-formats.md` — the 4-part document structure with examples - `references/category-patterns.md` — where to find information for each tech stack - `references/quality-checklist.md` — self-review criteria for every document ## Workflow Execute these steps in order. Do not skip steps. Wait for user approval at Step 2 before generating documents. ### Step 1 — Setup & Discovery Gather project information: 1. **Product name**: Ask the user for the product/project name. Use it in all generated doc titles and references. 2. **Codebase path**: Use `$ARGUMENTS` if provided, otherwise ask the user. Resolve to an absolute path. Verify the directory exists. 3. **Output directory**: Ask where to write the vault. Default: a sibling directory named `<product>-knowledge/` next to the codebase. Detect the tech stack: 1. Glob for marker files at the codebase root and one level deep: - `package.json`, `tsconfig.json` → JavaScript/TypeScript - `requirements.txt`, `pyproject.toml`, `setup.py`, `Pipfile` → Python - `pom.xml`, `build.gradle`, `build.gradle.kts` → Java/Kotlin - `go.mod` → Go - `Cargo.toml` → Rust - `Gemfile` → Ruby - `composer.json` → PHP - `mix.exs` → Elixir - `*.sln`, `*.csproj` → C#/.NET 2. Read each detected marker file to identify specific frameworks: - `package.json` → check `dependencies` for `next`, `express`, `nestjs`, `react`, etc. - `requirements.txt` / `pyproject.toml` → check for `django`, `fastapi`, `flask`, etc. - `build.gradle.kts` → check for `ktor`, `spring-boot`, etc. - `go.mod` → check for `gin`, `echo`, `fiber`, etc. 3. Map the directory structure: - Find top-level directories: `src/`, `app/`, `cmd/`, `internal/`, `lib/`, `pkg/`, `server/`, `services/`, `api/`, `routes/`, `controllers/`, `models/`, `views/`, `templates/`, `static/`, `public/`, `frontend/`, `backend/`, `infra/`, `terraform/`, `deploy/`, `migrations/`, `.github/`, `.circleci/` - Identify monorepo patterns: multiple `package.json` files, workspace configs, `services/` directories with independent modules - Find test directories: `test/`, `tests/`, `__tests__/`, `spec/` - Find SDK/client directories: `sdk/`, `client/`, `packages/` 4. Report findings to the user: ``` Detected: [Language] with [Framework] Services: [list of services/modules found] Database: [type if detected from configs] Infrastructure: [CI/CD, cloud provider if found] ``` ### Step 2 — Plan the Vault Based on detected tech stack, determine which categories to generate: **Always include:** - `architecture/` — system overview, tech stack, data flows - `api/` — endpoint documentation (if HTTP routes found) - `domains/` — business logic by domain **Include if relevant sources found:** - `data-model/` — if migration files, ORM models, or schema definitions found - `infrastructure/` — if Terraform, CloudFormation, Docker, or CI configs found - `sdks/` — if SDK or client library code found - `services/` — if multiple backend services (monorepo/microservices) - `integrations/` — if third-party service integrations found Identify business domains by analyzing: - Directory names under `src/`, `app/`, `internal/`, `services/` - Route/controller groupings - Model/entity names - Service class names Present the plan to the user: ``` ## Generation Plan Product: [name] Output: [path] Tech Stack: [detected] ### Documents to Generate (~XX total) **Architecture** (X docs) - architecture/overview.md - architecture/tech-stack.md - ... **API** (X docs) - api/overview.md - ... **Domains** (X docs) - domains/[domain-1]/overview.md - ... Shall I proceed? ``` **Wait for explicit user approval before continuing.** ### Step 3 — Generate Architecture Docs Generate 3-8 architecture documents by reading: - README files, docker-compose files - Entry points (`main.ts`, `app.py`, `Application.kt`, `main.go`, etc.) - Infrastructure configs (Terraform, CloudFormation, Dockerfile) - Build configs (`package.json` scripts, `Makefile`, `build.gradle.kts`) Required documents: - `architecture/overview.md` — system topology with a Mermaid diagram showing services, data stores, and external dependencies - `architecture/tech-stack.md` — languages, frameworks, databases, queues, cloud services with version numbers where available Optional documents (create if sufficient source material exists): - `architecture/data-flow.md` — request lifecycle, async processing flows - `architecture/backend-services.md` — service responsibilities, ports, deployment - `architecture/frontend-apps.md` — frontend architecture, routing, state management ### Step 4 — Generate Data Model Docs Generate 2-10 data model documents by reading: - Migration files (`migrations/`, `db/migrate/`, `alembic/`) - ORM models (Django `models.py`, SQLAlchemy models, Exposed tables, GORM structs) - Schema definitions (SQL files, Prisma schema, TypeORM entities) - Seed data files Required documents: - `data-model/overview.md` — database architecture, schema organization Per-entity documents: - `data-model/<entity>.md` — table/collection schema with columns, types, constraints, relationships ### Step 5 — Generate API Docs Generate 3-20 API documents by reading: - Route definitions (Express routers, Django URLs, Ktor routing, Go handlers) - Controller/handler implementations - OpenAPI/Swagger specs if available - Middleware (auth, validation, rate limiting) - Request/response types (protobuf, TypeScript interfaces, Pydantic models) Required documents: - `api/overview.md` — API architecture, authentication methods, common patterns Per-resource documents: - `api/<resource>.md` — endpoints for a resource group with routes, methods, request/response shapes, and auth requirements If the codebase has multiple API servers (external + internal, public + admin), organize as: - `api/external-api/overview.md` - `api/internal-api/overview.md` ### Step 6 — Generate Domain Docs Generate 10-30 domain documents. This is the largest category and should be chunked. For each identified business domain: 1. Read service layer, domain models, and business logic files 2. Generate `domains/<domain>/overview.md` — concept, lifecycle, state machine 3. Generate `domains/<domain>/<feature>.md` — specific feature logic **Chunking strategy:** - Generate domains in batches of 5-10 documents - After each batch, verify wikilinks between generated docs - Continue until all domains are covered Use the Task tool to parallelize independent domain research when the codebase is large. ### Step 7 — Generate Infrastructure Docs Generate 2-5 infrastructure documents by reading: - Terraform/CloudFormation/Pulumi files - CI/CD configs (`.github/workflows/`, `.circleci/`, `Jenkinsfile`, `.gitlab-ci.yml`) - Docker files (`Dockerfile`, `docker-compose.yml`) - Monitoring configs (CloudWatch, Datadog, Prometheus) - Deployment scripts Required documents: - `infrastructure/overview.md` — cloud architecture, deployment topology Optional documents: - `infrastructure/ci-cd.md` — build and deploy pipeline - `infrastructure/monitoring.md` — observability, alerting, logging - `infrastructure/database-management.md` — backup, scaling, connection pooling ### Step 8 — Finalize 1. **Generate README.md**: Create the vault's master index using the `assets/README.md.template`. List every generated document as a `[[wikilink]]` organized by category. 2. **Generate CLAUDE.md**: Create the vault's CLAUDE.md using the `assets/CLAUDE.md.template`. Fill in: - Product name - Vault structure (categories and their contents) - Source code paths table - Conventions (wikilinks, document format, Mermaid diagrams) 3. **Validate wikilinks**: Run `scripts/validate-wikilinks.sh` on the output directory. Fix any broken links it reports. 4. **Print summary**: ``` ## Generation Complete Product: [name] Location: [path] Documents: [count] across [N] categories Wikilinks: [count] total, [broken] broken Categories: - architecture/: X docs - data-model/: X docs - api/: X docs - domains/: X docs - infrastructure/: X docs Open the vault in Obsidian to browse the knowledge graph. ``` ## Key Rules 1. **Code-first**: Every statement must trace to actual source code. Never invent or assume logic. If you cannot find the implementation, say "Not found in source" rather than guessing. 2. **Source attribution**: Every document must include a `> **Source files**:` block listing the exact files analyzed. Use relative paths from the codebase root. 3. **Fully-qualified wikilinks**: Always use the full path from the vault root: `[[domains/campaigns/overview]]`, never `[[overview]]` or `[[campaigns/overview]]`. 4. **One concern per file**: Each document covers exactly one topic. Split large topics into multiple documents. 5. **Mermaid diagrams**: Include a Mermaid diagram for any flow with 3+ steps. Use `graph TD/TB/LR` for flowcharts and `sequenceDiagram` for interaction flows. 6. **No marketing language**: Write for engineers. Include file paths, function names, and implementation details. This is internal documentation, not a product page. 7. **Quality check**: Before finalizing each document, verify it against `references/quality-checklist.md`.
Related Skills
knowledge-synthesis
知识合成 — 将多来源信息融合为结构化知识,生成摘要、报告和知识图谱
jikime-platform-supabase
Supabase specialist covering PostgreSQL 16, pgvector, RLS, real-time subscriptions, Edge Functions, and Postgres performance optimization. Use when building full-stack apps with Supabase backend or optimizing database performance.
generate-status-report
Comprehensive system status report with services, infrastructure, performance metrics, and recommendations
generate-qr-code-natively
Generate QR codes locally without external APIs using native CLI and runtime libraries in Bash and Node.js.
generate-instructions
Analyze a directory and generate consolidated Cursor rules.
firebase-functions-templates
Create production-ready Firebase Cloud Functions with TypeScript, Express integration, HTTP endpoints, background triggers, and scheduled functions. Use when building serverless APIs with Firebase or setting up Cloud Functions projects.
firebase-development-validate
This skill should be used when reviewing Firebase code against security model and best practices. Triggers on "review firebase", "check firebase", "validate", "audit firebase", "security review", "look at firebase code". Validates configuration, rules, architecture, and security.
explain-codebase
Generates a comprehensive overview of a codebase for onboarding, knowledge transfer, or architecture understanding. Use when the user says "explain this project", "how does this codebase work?", "onboard me", "give me an overview", "I'm new to this project", "walk me through the architecture", or "map this codebase".
enterprise-search-knowledge-synthesis
Combines search results from multiple sources into coherent, deduplicated answers with source attribution. Handles confidence scoring based on freshness and authority, and summarizes large result sets effectively.
e2e-generate
Generate end-to-end tests with Playwright browser automation
discover-database
Automatically discover database skills when working with SQL, PostgreSQL, MongoDB, Redis, database schema design, query optimization, migrations, connection pooling, ORMs, or database selection. Activates for database design, optimization, and implementation tasks.
designing-databases
データベーススキーマ設計と最適化を支援します。正規化戦略、インデックス設計、パフォーマンス最適化を提供します。データモデル設計、データベース構造の最適化が必要な場合に使用してください。