knowledge-base

Ingest URLs, documents, and transcripts into a searchable knowledge base. Query past research and curated documentation using full-text search. Trigger words: ingest, knowledge base, look up, search knowledge, what do we know about, research, index this, add to knowledge base.

41 stars

byJKHeadley

View on GitHub Installation ↓

Best use case

knowledge-base is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using knowledge-base should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/knowledge-base/SKILL.md --create-dirs "https://raw.githubusercontent.com/JKHeadley/instar/main/skills/knowledge-base/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/knowledge-base/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How knowledge-base Compares

Feature / Agent	knowledge-base	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# knowledge-base -- Searchable Knowledge Base for Instar Agents

Build a searchable knowledge base from external sources -- URLs, documents, transcripts, PDFs. Uses the existing MemoryIndex (FTS5) for search, so no new dependencies.

---

## How It Works

The knowledge base is a set of markdown files in `.instar/knowledge/` that MemoryIndex indexes alongside your other memory files. Each file has YAML frontmatter for metadata and is tracked in a catalog for browsing.

```
.instar/knowledge/
  catalog.json            # Registry of all ingested sources
  articles/               # Ingested web articles
  transcripts/            # Video/audio transcripts
  docs/                   # Curated reference documentation
```

---

## Ingesting Content

### Via CLI

```bash
# Ingest text content directly
instar knowledge ingest "Article content here..." --title "My Article" --tags "AI,agents"

# Ingest from a URL (fetch first, then ingest)
# Step 1: Fetch the content
python3 .claude/scripts/smart-fetch.py "https://example.com/article" --auto > /tmp/fetched.md
# Step 2: Ingest it
instar knowledge ingest "$(cat /tmp/fetched.md)" --title "Article Title" --url "https://example.com/article" --tags "topic1,topic2"
```

### Via API

```bash
curl -X POST http://localhost:4040/knowledge/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "content": "The article content...",
    "title": "Article Title",
    "url": "https://example.com/article",
    "type": "article",
    "tags": ["AI", "infrastructure"],
    "summary": "Brief description"
  }'
```

### Via Agent Workflow

When the agent wants to ingest content during a session:

1. Fetch the content (WebFetch, smart-fetch, transcript tools, or Read for local files)
2. Clean it (strip navigation, ads, boilerplate)
3. Call the ingest API or write the file manually:

```bash
# Write the markdown file with frontmatter
cat > .instar/knowledge/articles/2026-02-25-my-article.md << 'EOF'
---
title: "My Article"
source: "https://example.com/article"
ingested: "2026-02-25"
tags: ["AI", "infrastructure"]
---

# My Article

[Cleaned article content here]
EOF

# Sync the index to pick up the new file
instar memory sync
```

---

## Searching Knowledge

### CLI

```bash
# Search within knowledge base only
instar knowledge search "notification batching"

# Search all memory (including knowledge)
instar memory search "notification batching"
```

### API

```bash
# Knowledge-scoped search
curl "http://localhost:4040/memory/search?q=notification+batching&source=knowledge/&limit=5"

# Browse the catalog
curl "http://localhost:4040/knowledge/catalog"
curl "http://localhost:4040/knowledge/catalog?tag=AI"
```

---

## Managing Sources

### List all sources

```bash
instar knowledge list
instar knowledge list --tag AI
```

### Remove a source

```bash
# Find the source ID from the list
instar knowledge list

# Remove it
instar knowledge remove kb_20260225123456_abc123

# Re-sync the index
instar memory sync
```

### Via API

```bash
# Remove
curl -X DELETE "http://localhost:4040/knowledge/kb_20260225123456_abc123"
```

---

## MemoryIndex Configuration

To enable knowledge base indexing, add these sources to your `.instar/config.json` memory section:

```json
{
  "memory": {
    "enabled": true,
    "sources": [
      { "path": "AGENT.md", "type": "markdown", "evergreen": true },
      { "path": "USER.md", "type": "markdown", "evergreen": true },
      { "path": "knowledge/articles/", "type": "markdown", "evergreen": false },
      { "path": "knowledge/transcripts/", "type": "markdown", "evergreen": false },
      { "path": "knowledge/docs/", "type": "markdown", "evergreen": true }
    ]
  }
}
```

**Source behavior:**
- `articles/` and `transcripts/` use `evergreen: false` -- recent content ranks higher (30-day temporal decay)
- `docs/` uses `evergreen: true` -- reference documentation doesn't decay

---

## Content Types

| Type | Directory | Temporal Decay | Best For |
|------|-----------|----------------|----------|
| `article` | `articles/` | Yes (30-day) | Web articles, blog posts, news |
| `transcript` | `transcripts/` | Yes (30-day) | YouTube videos, podcasts, meetings |
| `doc` | `docs/` | No (evergreen) | API docs, manuals, reference material |

---

## Tips

- **Always sync after ingesting**: `instar memory sync` updates the FTS5 index
- **Use tags consistently**: Tags enable filtered browsing via `instar knowledge list --tag X`
- **Include source URLs**: Helps trace back to original content
- **Clean before ingesting**: Strip navigation, ads, cookie banners for better search results
- **Use smart-fetch for URLs**: `python3 .claude/scripts/smart-fetch.py URL --auto` gets clean markdown

Related Skills

systematic-debugging

from JKHeadley/instar

Structured 4-phase debugging methodology that prevents blind probing and guesswork. Forces root cause identification before any fix attempt. Use when encountering bugs, errors, unexpected behavior, test failures, or when something "just stopped working." Trigger words: debug, bug, error, broken, not working, fix this, something's wrong, investigate, root cause, why is this failing, trace the issue.

spec-converge

from JKHeadley/instar

Iteratively review an instar-development spec with multi-angle internal reviewers (security, scalability, adversarial, integration) and cross-model external reviewers (GPT, Gemini, Grok) until convergence, then produce a comprehensive ELI10 convergence report. Output is a spec tagged review-convergence — one of the two tags /instar-dev requires before it will touch instar source. NOT user-invocable; run by the instar-developing agent before any spec-driven /instar-dev work.

smart-web-fetch

from JKHeadley/instar

Fetch web content efficiently by checking llms.txt first, then Cloudflare markdown endpoints, then falling back to HTML. Reduces token usage by 80% on sites that support clean markdown delivery. No external dependencies — installs a single Python script. Trigger words: fetch URL, web content, read website, scrape page, download page, get webpage, read this link.

instar-telegram

from JKHeadley/instar

Send and receive messages via Telegram for two-way agent communication. Use when the agent needs to notify the user, alert them about something, relay a response, or when Telegram messaging is the requested channel. Trigger words: send message, Telegram, notify, alert user, message me, ping me, let me know, reach out.

instar-session

from JKHeadley/instar

Spawn, monitor, and communicate with persistent Claude Code sessions running in the background. Use when a task needs to run without blocking the current session, when the user asks to do something in the background, or when a long-running task needs its own context window. Trigger words: background task, spawn session, persistent, run in background, parallel, separate session, async task.

instar-scheduler

from JKHeadley/instar

Schedule recurring agent tasks using cron expressions. Use when the user asks to run something on a schedule, check something periodically, automate a recurring task, set up a cron job, or wants work to happen while they're away. Trigger words: schedule, recurring, cron, every hour, every day, run daily, periodic, automated.

instar-identity

from JKHeadley/instar

Establish and recover persistent agent identity that survives context compaction, session restarts, and autonomous operation. Use when an agent needs to know who it is, recover after context compression, orient at session start, or understand the identity infrastructure. Trigger words: who am I, remember, identity, after restart, compaction, context loss, who am I working with, my principles.

instar-feedback

from JKHeadley/instar

Submit structured feedback about instar bugs, feature requests, improvements, or innovations worth sharing. Use when something isn't working, when a feature is missing, when you've built something that could benefit all agents, or when the user mentions a problem with instar. Also use proactively after building significant features — ask yourself if other agents would benefit. Feedback is relayed agent-to-agent to instar maintainers. Trigger words: bug report, feedback, issue, something's wrong, feature request, this isn't working, improvement, suggest, built something useful, other agents could use this.

instar-dev

from JKHeadley/instar

Instar-specific development skill used by the instar-developing agent (Echo, or any agent assigned instar-dev responsibilities). Wraps /build with mandatory side-effects review, signal-vs-authority principle check, and artifact generation. Structural enforcement via pre-commit/pre-push hooks — the instar repo refuses commits and pushes that didn't come through this skill. NOT a user-facing skill — end users should never invoke it.

credential-leak-detector

from JKHeadley/instar

PostToolUse hook that scans Bash tool output for leaked credentials — API keys, tokens, private keys, and secrets — before they reach the conversation. Blocks critical leaks, redacts high-severity matches, and warns on suspicious patterns. 14 detection patterns covering OpenAI, Anthropic, AWS, GitHub, Stripe, Google, Slack, SendGrid, Twilio, PEM keys, bearer tokens, and generic secrets. No external dependencies. Trigger words: security, credential leak, secret exposure, key detection, token scan, API key leaked, credential guard, secret scanner, prevent credential leak.

command-guard

from JKHeadley/instar

Set up a PreToolUse hook in .claude/settings.json that blocks dangerous commands — rm -rf, force push, database drops, and others — before they execute. Teaches the pattern of safety hooks for any Claude Code project. Trigger words: safety, guard, block dangerous, protect, prevent destructive, safe mode, dangerous commands, risky operations.

agent-memory

from JKHeadley/instar

Teach cross-session memory patterns using MEMORY.md — what to save, how to organize it, how to maintain it over time, and how to structure topic files as memory grows. Works in any Claude Code project with no external dependencies. Trigger words: remember this, save for later, across sessions, persistent memory, don't forget, note this, keep this, write this down.