Codex

kb-ingest

Ingest a source (URL, file, or freeform note) into the knowledge base. Creates a source summary and updates or creates relevant entity and concept pages.

104 stars

Best use case

kb-ingest is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

It is a strong fit for teams already working in Codex.

Ingest a source (URL, file, or freeform note) into the knowledge base. Creates a source summary and updates or creates relevant entity and concept pages.

Teams using kb-ingest should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/kb-ingest/SKILL.md --create-dirs "https://raw.githubusercontent.com/jmagly/aiwg/main/.agents/skills/kb-ingest/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/kb-ingest/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How kb-ingest Compares

Feature / Agentkb-ingestStandard Approach
Platform SupportCodexLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Ingest a source (URL, file, or freeform note) into the knowledge base. Creates a source summary and updates or creates relevant entity and concept pages.

Which AI agents support this skill?

This skill is designed for Codex.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# KB Ingest

Ingest any source — a URL, local file, or freeform note — into the knowledge base. Produces a source summary, then creates or updates entity and concept pages with new information.

## Triggers

- "ingest this article" → fetch URL and ingest
- "add to KB: ..." → treat the text as a freeform note
- "summarize this book and add it to my KB" → ingest file or freeform content
- `/kb-ingest <source>` → direct invocation

## Parameters

### `<source>` (required)

| Format | Behavior |
|--------|----------|
| `https://...` | Fetch with WebFetch, extract content |
| File path | Read the file directly |
| Quoted text | Treat as freeform note or paste |

### `--topic <tag>` (optional)
Tag this source with a topic hint (e.g., `--topic "machine-learning"`). Influences which entity/concept pages to touch.

### `--kb <path>` (optional)
Root of the knowledge base. Defaults to `.aiwg/kb/`.

### `--dry-run` (optional)
Show what would be created or updated without writing files.

---

## Execution Flow

### Phase 1: Acquire Content

1. Resolve the source type (URL, file, freeform text).
2. For URLs: fetch with WebFetch. Extract title, author, date, and body text.
3. For files: read directly. Infer type from extension or content.
4. For freeform text: treat as a note; title defaults to first sentence (truncated to 60 chars).

### Phase 2: Summarize

Using the source-summary template at `$AIWG_ROOT/agentic/code/frameworks/knowledge-base/templates/source-summary.md`:

- Extract 3–7 key takeaways
- Identify notable quotes (verbatim, with location if available)
- Write a 2–5 sentence summary
- Note strengths and weaknesses

Determine the slug: lowercase title, spaces to hyphens, strip punctuation.
Save to: `<kb>/sources/<slug>.md`

### Phase 3: Identify Entities and Concepts

Scan the source content for:
- Named entities (people, tools, companies, places, products)
- Concepts, techniques, patterns, or frameworks mentioned

For each identified item:
1. Check whether a page already exists in `<kb>/entities/` or `<kb>/concepts/`.
2. If it exists: read the current page, add new facts or sources if not already present.
3. If it does not exist: create a new page from the appropriate template.

Use the entity-page template for discrete things.
Use the concept-page template for ideas and techniques.

### Phase 4: Cross-Link

In the new source summary, populate the **Connections** section with `[[wiki-links]]` to pages touched.
In each touched entity/concept page, add the source to the **Sources** table.

### Phase 5: Report

```
KB Ingest complete

Source summary: .aiwg/kb/sources/article-slug.md
Pages created:
  + .aiwg/kb/entities/person-name.md
  + .aiwg/kb/concepts/technique-name.md
Pages updated:
  ~ .aiwg/kb/entities/existing-entity.md  (added source)

Next steps:
  - Review created pages and fill placeholder sections
  - Run /kb-health to check for orphan pages
```

---

## Scope Limits

- Create or update at most 5 entity/concept pages per ingest run. If more are identified, list them in the report as "candidates for future pages" rather than creating stubs automatically.
- Do not fetch URLs found within the source content. Ingest one source at a time.
- Do not remove or overwrite existing content in updated pages — only append to Sources tables and add missing facts clearly marked with the source.

## References

- @$AIWG_ROOT/agentic/code/frameworks/knowledge-base/templates/source-summary.md
- @$AIWG_ROOT/agentic/code/frameworks/knowledge-base/templates/entity-page.md
- @$AIWG_ROOT/agentic/code/frameworks/knowledge-base/templates/concept-page.md
- @$AIWG_ROOT/agentic/code/frameworks/knowledge-base/skills/kb-health/SKILL.md

Related Skills

memory-ingest

104
from jmagly/aiwg

Ingest a source into any consumer's semantic memory by reading the topology contract

Codex

grade-on-ingest

104
from jmagly/aiwg

Trigger GRADE quality assessment automatically when new research sources or findings enter the corpus

Codex

aiwg-orchestrate

104
from jmagly/aiwg

Route structured artifact work to AIWG workflows via MCP with zero parent context cost

venv-manager

104
from jmagly/aiwg

Create, manage, and validate Python virtual environments. Use for project isolation and dependency management.

pytest-runner

104
from jmagly/aiwg

Execute Python tests with pytest, supporting fixtures, markers, coverage, and parallel execution. Use for Python test automation.

vitest-runner

104
from jmagly/aiwg

Execute JavaScript/TypeScript tests with Vitest, supporting coverage, watch mode, and parallel execution. Use for JS/TS test automation.

eslint-checker

104
from jmagly/aiwg

Run ESLint for JavaScript/TypeScript code quality and style enforcement. Use for static analysis and auto-fixing.

repo-analyzer

104
from jmagly/aiwg

Analyze GitHub repositories for structure, documentation, dependencies, and contribution patterns. Use for codebase understanding and health assessment.

pr-reviewer

104
from jmagly/aiwg

Review GitHub pull requests for code quality, security, and best practices. Use for automated PR feedback and approval workflows.

YouTube Acquisition

104
from jmagly/aiwg

yt-dlp patterns for acquiring content from YouTube and video platforms

Quality Filtering

104
from jmagly/aiwg

Accept/reject logic and quality scoring heuristics for media content

Provenance Tracking

104
from jmagly/aiwg

W3C PROV-O patterns for tracking media derivation chains and production history