knowledge-base-cache

Use when managing large knowledge bases, reducing API costs, or implementing multi-tier caching for frequent queries

18 stars

Best use case

knowledge-base-cache is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Use when managing large knowledge bases, reducing API costs, or implementing multi-tier caching for frequent queries

Teams using knowledge-base-cache should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/knowledge-base-cache/SKILL.md --create-dirs "https://raw.githubusercontent.com/Dqz00116/skill-lib/main/knowledge-base-cache/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/knowledge-base-cache/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How knowledge-base-cache Compares

Feature / Agentknowledge-base-cacheStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Use when managing large knowledge bases, reducing API costs, or implementing multi-tier caching for frequent queries

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Knowledge Base Cache Skill

## Overview

A layered knowledge base system with hot/cold/warm cache tiers and intelligent Working Memory for context management. Reduces API costs through multi-tier caching while supporting unlimited knowledge scale.

## When to Use

**Use this skill when:**
- Managing large knowledge bases that exceed context window limits
- Reducing API costs for frequent knowledge queries
- Implementing multi-tier caching (hot/cold/warm) for knowledge retrieval
- Needing intelligent context assembly with token budget management
- Requiring automatic caching with semantic retrieval capabilities

**Do NOT use when:**
- Simple, small knowledge bases that fit in a single context window
- One-off queries where caching overhead exceeds savings
- Only basic file storage without caching tiers is needed

Create a structured knowledge repository with **layered architecture** (hot/cold/warm) and intelligent context management.

## Architecture Overview

```
┌─────────────────────────────────────────────────────────────┐
│                  Application Layer                  │
│                    Agent Core                               │
└──────────────────────────┬──────────────────────────────────┘
                           │
┌──────────────────────────▼──────────────────────────────────┐
│              Working Memory Layer                      │
│  • Context Assembly        • Token Budget Management        │
│  • Multi-Source Coordination • LRU Cache                    │
└─────────────┬───────────────────────────────────────────────┘
              │ Standard Interface KnowledgeSource
    ┌─────────┼─────────┐
    ▼         ▼         ▼ (Reserved)
┌───────┐ ┌───────┐ ┌───────┐
│  Hot  │ │  Cold │ │ Warm  │
│ Cache │ │Storage│ │Vector │
│ Layer │ │ Layer │ │ Layer │
└───┬───┘ └───┬───┘ └───┬───┘
    │         │         │
Context   Repository  Vector DB
 Cache     Files     (Future)
```

### Three-Tier Architecture

| Tier | Technology | Use Case | Status |
|------|------------|----------|--------|
| **🔥 Hot** | Context Cache (API) | Full document retrieval, 90% cost savings | ✅ Available |
| **❄️ Cold** | Repository Files | Keyword search, browsing, discovery | ✅ Available |
| **🌡️ Warm** | Vector DB | Semantic search, precise Q&A | 🔮 Planned |

## What This Skill Does

1. **Layered Knowledge Storage**
   ```
   repository/
   ├── core/                    # Core components
   │   ├── __init__.py          # Standard interfaces
   │   └── working_memory.py    # Working Memory layer
   ├── adapters/                # Layer adapters
   │   ├── __init__.py
   │   ├── hot_cache_adapter.py
   │   ├── cold_storage_adapter.py
   │   └── warm_cache_adapter.py (reserved)
   ├── index.json               # Knowledge index
   ├── cache-state.json         # Cache status
   ├── skills/                  # Skill knowledge
   ├── docs/                    # Document knowledge
   └── scripts/
       ├── cache_manager.py     # Cache management
       └── cache_helper.py      # Helper utilities
   ```

2. **Working Memory Layer**
   - Unified interface for all knowledge sources
   - Automatic context assembly with token budgeting
   - LRU cache for repeated queries
   - Cross-tier result ranking

3. **Context Caching (Hot Layer)**
   - Full document caching via API
   - 90% cost reduction
   - 83% latency improvement

4. **File-Based Storage (Cold Layer)**
   - Keyword-based retrieval
   - Excerpt generation
   - No API costs

5. **Auto-Refresh**
   - Configures cron job for daily refresh
   - Keeps caches fresh without manual intervention

## Quick Start

### Step 1: Initialize Repository

```bash
# The repository structure is already created
# If not, run:
python scripts/init_knowledge_base.py
```

### Step 2: Add Knowledge

Add markdown files to appropriate directories:
- `repository/skills/` - Skill documentation
- `repository/docs/` - General documentation  
- `repository/projects/` - Project-specific knowledge

### Step 3: Build Cache

```bash
cd repository

# Initialize index
python scripts/cache_manager.py init

# Build hot cache (Context Caching)
python scripts/cache_manager.py build

# Test the system
python test_phase1.py
```

### Step 4: Use in Your Agent

**Modern Approach (Recommended):**
```python
from repository.core.working_memory import WorkingMemoryManager

# Initialize once
wm = WorkingMemoryManager({
    'max_tokens': 6000,
    'allocation': {
        'system_prompt': 0.15,      # 15%
        'conversation': 0.25,        # 25%
        'retrieved_knowledge': 0.60  # 60%
    }
})

# Use in conversations
context = wm.query(
    user_query="How do I deploy?",
    system_prompt="You are an assistant...",
    conversation=history_messages
)
```

**Legacy Approach:**
```python
from scripts.cache_helper import get_cache_headers, load_knowledge_context

# Get cache headers for API calls
headers = get_cache_headers()

# Load knowledge context
context = load_knowledge_context()
```

### Step 5: Configure Auto-Refresh

```bash
# Add cron job for daily refresh
# Configure in your agent's cron system
```

## Layer Details

### 🔥 Hot Cache Layer

**Purpose**: Store frequently accessed complete documents

**When to Use**:
- Reading full skill documentation
- API reference lookup
- Deployment guides

**Implementation**: `adapters/hot_cache_adapter.py`

```python
from adapters.hot_cache_adapter import HotCacheAdapter
from core import RetrievalQuery

hot = HotCacheAdapter()
result = hot.retrieve(RetrievalQuery(
    query="Docker deployment",
    context_budget=2000,
    top_k=3
))
```

### ❄️ Cold Storage Layer

**Purpose**: Keyword-based file retrieval with excerpt generation

**When to Use**:
- Browsing knowledge base
- Finding relevant files
- Low-cost retrieval

**Implementation**: `adapters/cold_storage_adapter.py`

```python
from adapters.cold_storage_adapter import ColdStorageAdapter
from core import RetrievalQuery

cold = ColdStorageAdapter()
result = cold.retrieve(RetrievalQuery(
    query="Docker deployment",
    context_budget=2000,
    top_k=5
))
```

### 🌡️ Warm Cache Layer (Planned)

**Purpose**: Semantic search with vector embeddings

**When to Use**:
- Precise Q&A
- Semantic similarity matching
- Large knowledge bases

**Implementation**: Reserved interface in `adapters/warm_cache_adapter.py`

## Working Memory Configuration

### Token Budget Allocation

Default allocation (customizable):

| Component | Percentage | Tokens (6K total) |
|-----------|------------|-------------------|
| System Prompt | 15% | 900 |
| Conversation | 25% | 1,500 |
| Retrieved Knowledge | 60% | 3,600 |

### Configuration Options

```python
from repository.core.working_memory import WorkingMemoryManager
from repository.core import MemoryAllocation

wm = WorkingMemoryManager({
    'max_tokens': 8000,                    # Total context window
    'lru_cache_size': 10,                  # LRU cache size
    'allocation': {
        'system_prompt': 0.20,             # 20%
        'conversation': 0.20,              # 20%
        'retrieved_knowledge': 0.60        # 60%
    },
    'repo_path': 'repository'              # Repository path
})
```

## Cache Management Commands

| Command | Description |
|---------|-------------|
| `cache_manager.py init` | Scan repository and update index |
| `cache_manager.py build` | Create/update hot caches |
| `cache_manager.py status` | Show cache status |
| `cache_manager.py refresh` | Refresh expired caches |
| `cache_manager.py stats` | Show statistics |

### Testing Commands

```bash
# Run Phase 1 integration tests
cd repository
python test_phase1.py

# Test individual layers
python -c "from adapters.hot_cache_adapter import HotCacheAdapter; print(HotCacheAdapter().get_stats())"
python -c "from adapters.cold_storage_adapter import ColdStorageAdapter; print(ColdStorageAdapter().get_stats())"
```

## Cost Benefits

### Hot Layer (Context Cache)

| Metric | Without Cache | With Cache | Savings |
|--------|--------------|------------|---------|
| Cost per 1000 queries | ~¥150 | ~¥15 | **90%** |
| First token latency | ~30s | ~5s | **83%** |
| Monthly cost (daily 50 queries) | ~¥450 | ~¥45 | **¥405** |

### Cold Layer (File Storage)

| Metric | Value |
|--------|-------|
| API Cost | ¥0 (no API calls) |
| Latency | ~10-50ms (local files) |
| Best For | Browsing, discovery, keyword search |

### Working Memory Layer

| Metric | Value |
|--------|-------|
| Context Assembly | Automatic |
| Token Budget | Enforced |
| Multi-Source | Hot + Cold (+ Warm in future) |
| LRU Cache | Reduces repeated queries |

## Troubleshooting

### Cache Not Working

```bash
# Check if caches are active
python scripts/cache_manager.py status

# Rebuild if needed
python scripts/cache_manager.py build

# Verify hot layer
python -c "from adapters.hot_cache_adapter import HotCacheAdapter; print(HotCacheAdapter().is_available())"
```

### Working Memory Not Finding Knowledge

```python
# Debug: Check registered sources
from repository.core.working_memory import WorkingMemoryManager

wm = WorkingMemoryManager()
print(wm.get_stats())

# Debug: Test individual layers
from adapters.hot_cache_adapter import HotCacheAdapter
from adapters.cold_storage_adapter import ColdStorageAdapter
from core import RetrievalQuery

hot = HotCacheAdapter()
cold = ColdStorageAdapter()

query = RetrievalQuery(query="test", context_budget=2000)
print("Hot:", hot.retrieve(query))
print("Cold:", cold.retrieve(query))
```

### API Key Issues

Ensure API key is set in environment or config for hot layer.
Cold layer works without API keys.

### Path Issues

All paths in generated files are relative (workspace-relative) for portability.

## Migration from v1

If you were using the old cache system:

1. **Old way still works**: `cache_helper.py` functions unchanged
2. **New way recommended**: Use `WorkingMemoryManager` for better control
3. **Same repository structure**: No migration needed

## References

- Context Caching documentation
- Component architecture design

Related Skills

unity-mcp

18
from Dqz00116/skill-lib

Use when controlling Unity editor via AI, automating scene operations, or programmatically generating Unity assets and scripts

ue5-umg

18
from Dqz00116/skill-lib

Use when building HUDs, menus, inventory screens, settings panels, or any widget-based interface in Unreal Engine 5. Also use when connecting C++ logic to UMG Blueprint visuals, handling gamepad or keyboard focus navigation, managing UI state, creating widget animations, or troubleshooting UMG performance issues like frame drops, hitches, or widget memory leaks.

taskmaster-skill

18
from Dqz00116/skill-lib

Use when managing complex project plans, tracking multi-phase task progress, or prioritizing development tasks

research-to-practice

18
from Dqz00116/skill-lib

Use when applying academic research to practical workflows, optimizing existing processes based on papers, or extracting actionable insights from research

requirement-clarification

18
from Dqz00116/skill-lib

Use when receiving ambiguous instructions, preparing for state-changing operations, or needing explicit user confirmation

paper-first-principles

18
from Dqz00116/skill-lib

Use when converting academic papers into engineer-friendly documentation, extracting design patterns from research, or preparing technical knowledge sharing

mvp-design

18
from Dqz00116/skill-lib

Use when designing new modules from scratch, creating minimal viable prototypes, or establishing architectural decisions before implementation

msvc-build

18
from Dqz00116/skill-lib

Use when compiling MSVC C++ projects, debugging build errors, or performing clean and incremental builds

layered-first-principles-teaching

18
from Dqz00116/skill-lib

Use when explaining complex concepts to others, designing training materials, or preparing technical presentations with progressive disclosure

kimicode-vision-bridge

18
from Dqz00116/skill-lib

Use when the current Agent LLM cannot process images directly and visual analysis is needed — bridges images through KimiCode CLI print mode to a multimodal Kimi model for text description

hexo-blog-update

18
from Dqz00116/skill-lib

Use when creating, editing, or publishing Hexo blog posts

git-workflow

18
from Dqz00116/skill-lib

Use when committing code, pushing changes, or managing Git operations that require safety checks