knowledge-base-cache

Create and manage a layered knowledge base with hot/cold/warm cache tiers. Provides component-based architecture with Working Memory layer, automatic caching, semantic retrieval, and intelligent context assembly. Reduces API costs and supports unlimited knowledge scale.

16 stars

Best use case

knowledge-base-cache is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Create and manage a layered knowledge base with hot/cold/warm cache tiers. Provides component-based architecture with Working Memory layer, automatic caching, semantic retrieval, and intelligent context assembly. Reduces API costs and supports unlimited knowledge scale.

Teams using knowledge-base-cache should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/knowledge-base-cache/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/frontend/knowledge-base-cache/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/knowledge-base-cache/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How knowledge-base-cache Compares

Feature / Agentknowledge-base-cacheStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Create and manage a layered knowledge base with hot/cold/warm cache tiers. Provides component-based architecture with Working Memory layer, automatic caching, semantic retrieval, and intelligent context assembly. Reduces API costs and supports unlimited knowledge scale.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Knowledge Base Cache Skill

Create a structured knowledge repository with **layered architecture** (hot/cold/warm) and intelligent context management.

## Architecture Overview

```
┌─────────────────────────────────────────────────────────────┐
│                     应用层 (Application)                     │
│                    Agent 核心                               │
└──────────────────────────┬──────────────────────────────────┘
                           │
┌──────────────────────────▼──────────────────────────────────┐
│              工作记忆层 (Working Memory)                     │
│  • 上下文组装              • Token预算管理                   │
│  • 多源协调                • LRU缓存                         │
└─────────────┬───────────────────────────────────────────────┘
              │ 标准接口 KnowledgeSource
    ┌─────────┼─────────┐
    ▼         ▼         ▼ (预留)
┌───────┐ ┌───────┐ ┌───────┐
│  Hot  │ │  Cold │ │ Warm  │
│ Cache │ │Storage│ │Vector │
│ Layer │ │ Layer │ │ Layer │
└───┬───┘ └───┬───┘ └───┬───┘
    │         │         │
Context   Repository  Vector DB
 Cache     Files     (Future)
```

### Three-Tier Architecture

| Tier | Technology | Use Case | Status |
|------|------------|----------|--------|
| **🔥 Hot** | Context Cache (API) | Full document retrieval, 90% cost savings | ✅ Available |
| **❄️ Cold** | Repository Files | Keyword search, browsing, discovery | ✅ Available |
| **🌡️ Warm** | Vector DB | Semantic search, precise Q&A | 🔮 Planned |

## What This Skill Does

1. **Layered Knowledge Storage**
   ```
   repository/
   ├── core/                    # Core components
   │   ├── __init__.py          # Standard interfaces
   │   └── working_memory.py    # Working Memory layer
   ├── adapters/                # Layer adapters
   │   ├── __init__.py
   │   ├── hot_cache_adapter.py
   │   ├── cold_storage_adapter.py
   │   └── warm_cache_adapter.py (reserved)
   ├── index.json               # Knowledge index
   ├── cache-state.json         # Cache status
   ├── skills/                  # Skill knowledge
   ├── docs/                    # Document knowledge
   └── scripts/
       ├── cache_manager.py     # Cache management
       └── cache_helper.py      # Helper utilities
   ```

2. **Working Memory Layer**
   - Unified interface for all knowledge sources
   - Automatic context assembly with token budgeting
   - LRU cache for repeated queries
   - Cross-tier result ranking

3. **Context Caching (Hot Layer)**
   - Full document caching via API
   - 90% cost reduction
   - 83% latency improvement

4. **File-Based Storage (Cold Layer)**
   - Keyword-based retrieval
   - Excerpt generation
   - No API costs

5. **Auto-Refresh**
   - Configures cron job for daily refresh
   - Keeps caches fresh without manual intervention

## Quick Start

### Step 1: Initialize Repository

```bash
# The repository structure is already created
# If not, run:
python scripts/init_knowledge_base.py
```

### Step 2: Add Knowledge

Add markdown files to appropriate directories:
- `repository/skills/` - Skill documentation
- `repository/docs/` - General documentation  
- `repository/projects/` - Project-specific knowledge

### Step 3: Build Cache

```bash
cd repository

# Initialize index
python scripts/cache_manager.py init

# Build hot cache (Context Caching)
python scripts/cache_manager.py build

# Test the system
python test_phase1.py
```

### Step 4: Use in Your Agent

**Modern Approach (Recommended):**
```python
from repository.core.working_memory import WorkingMemoryManager

# Initialize once
wm = WorkingMemoryManager({
    'max_tokens': 6000,
    'allocation': {
        'system_prompt': 0.15,      # 15%
        'conversation': 0.25,        # 25%
        'retrieved_knowledge': 0.60  # 60%
    }
})

# Use in conversations
context = wm.query(
    user_query="How do I deploy?",
    system_prompt="You are an assistant...",
    conversation=history_messages
)
```

**Legacy Approach:**
```python
from scripts.cache_helper import get_cache_headers, load_knowledge_context

# Get cache headers for API calls
headers = get_cache_headers()

# Load knowledge context
context = load_knowledge_context()
```

### Step 5: Configure Auto-Refresh

```bash
# Add cron job for daily refresh
# Configure in your agent's cron system
```

## Layer Details

### 🔥 Hot Cache Layer

**Purpose**: Store frequently accessed complete documents

**When to Use**:
- Reading full skill documentation
- API reference lookup
- Deployment guides

**Implementation**: `adapters/hot_cache_adapter.py`

```python
from adapters.hot_cache_adapter import HotCacheAdapter
from core import RetrievalQuery

hot = HotCacheAdapter()
result = hot.retrieve(RetrievalQuery(
    query="Docker deployment",
    context_budget=2000,
    top_k=3
))
```

### ❄️ Cold Storage Layer

**Purpose**: Keyword-based file retrieval with excerpt generation

**When to Use**:
- Browsing knowledge base
- Finding relevant files
- Low-cost retrieval

**Implementation**: `adapters/cold_storage_adapter.py`

```python
from adapters.cold_storage_adapter import ColdStorageAdapter
from core import RetrievalQuery

cold = ColdStorageAdapter()
result = cold.retrieve(RetrievalQuery(
    query="Docker deployment",
    context_budget=2000,
    top_k=5
))
```

### 🌡️ Warm Cache Layer (Planned)

**Purpose**: Semantic search with vector embeddings

**When to Use**:
- Precise Q&A
- Semantic similarity matching
- Large knowledge bases

**Implementation**: Reserved interface in `adapters/warm_cache_adapter.py`

## Working Memory Configuration

### Token Budget Allocation

Default allocation (customizable):

| Component | Percentage | Tokens (6K total) |
|-----------|------------|-------------------|
| System Prompt | 15% | 900 |
| Conversation | 25% | 1,500 |
| Retrieved Knowledge | 60% | 3,600 |

### Configuration Options

```python
from repository.core.working_memory import WorkingMemoryManager
from repository.core import MemoryAllocation

wm = WorkingMemoryManager({
    'max_tokens': 8000,                    # Total context window
    'lru_cache_size': 10,                  # LRU cache size
    'allocation': {
        'system_prompt': 0.20,             # 20%
        'conversation': 0.20,              # 20%
        'retrieved_knowledge': 0.60        # 60%
    },
    'repo_path': 'repository'              # Repository path
})
```

## Cache Management Commands

| Command | Description |
|---------|-------------|
| `cache_manager.py init` | Scan repository and update index |
| `cache_manager.py build` | Create/update hot caches |
| `cache_manager.py status` | Show cache status |
| `cache_manager.py refresh` | Refresh expired caches |
| `cache_manager.py stats` | Show statistics |

### Testing Commands

```bash
# Run Phase 1 integration tests
cd repository
python test_phase1.py

# Test individual layers
python -c "from adapters.hot_cache_adapter import HotCacheAdapter; print(HotCacheAdapter().get_stats())"
python -c "from adapters.cold_storage_adapter import ColdStorageAdapter; print(ColdStorageAdapter().get_stats())"
```

## Cost Benefits

### Hot Layer (Context Cache)

| Metric | Without Cache | With Cache | Savings |
|--------|--------------|------------|---------|
| Cost per 1000 queries | ~¥150 | ~¥15 | **90%** |
| First token latency | ~30s | ~5s | **83%** |
| Monthly cost (daily 50 queries) | ~¥450 | ~¥45 | **¥405** |

### Cold Layer (File Storage)

| Metric | Value |
|--------|-------|
| API Cost | ¥0 (no API calls) |
| Latency | ~10-50ms (local files) |
| Best For | Browsing, discovery, keyword search |

### Working Memory Layer

| Metric | Value |
|--------|-------|
| Context Assembly | Automatic |
| Token Budget | Enforced |
| Multi-Source | Hot + Cold (+ Warm in future) |
| LRU Cache | Reduces repeated queries |

## Troubleshooting

### Cache Not Working

```bash
# Check if caches are active
python scripts/cache_manager.py status

# Rebuild if needed
python scripts/cache_manager.py build

# Verify hot layer
python -c "from adapters.hot_cache_adapter import HotCacheAdapter; print(HotCacheAdapter().is_available())"
```

### Working Memory Not Finding Knowledge

```python
# Debug: Check registered sources
from repository.core.working_memory import WorkingMemoryManager

wm = WorkingMemoryManager()
print(wm.get_stats())

# Debug: Test individual layers
from adapters.hot_cache_adapter import HotCacheAdapter
from adapters.cold_storage_adapter import ColdStorageAdapter
from core import RetrievalQuery

hot = HotCacheAdapter()
cold = ColdStorageAdapter()

query = RetrievalQuery(query="test", context_budget=2000)
print("Hot:", hot.retrieve(query))
print("Cold:", cold.retrieve(query))
```

### API Key Issues

Ensure API key is set in environment or config for hot layer.
Cold layer works without API keys.

### Path Issues

All paths in generated files are relative (workspace-relative) for portability.

## Migration from v1

If you were using the old cache system:

1. **Old way still works**: `cache_helper.py` functions unchanged
2. **New way recommended**: Use `WorkingMemoryManager` for better control
3. **Same repository structure**: No migration needed

## References

- Context Caching documentation
- Component architecture design

Related Skills

app-knowledge

16
from diegosouzapw/awesome-omni-skill

When any part of the application needs to be found or understood.

relational-database-web-cloudbase

16
from diegosouzapw/awesome-omni-skill

Use when building frontend Web apps that talk to CloudBase Relational Database via @cloudbase/js-sdk – provides the canonical init pattern so you can then use Supabase-style queries from the browser.

nextjs-supabase-auth

16
from diegosouzapw/awesome-omni-skill

Expert integration of Supabase Auth with Next.js App Router Use when: supabase auth next, authentication next.js, login supabase, auth middleware, protected route.

class-based-state-management

16
from diegosouzapw/awesome-omni-skill

Enforces the use of classes for complex state management (state machines) in Svelte components. Applies specifically to `.svelte.ts` files.

knowledge-capture

16
from diegosouzapw/awesome-omni-skill

Capture and organize business rules, technical patterns, and service interfaces discovered during analysis or implementation into structured documentation

docs-codebase

16
from diegosouzapw/awesome-omni-skill

Technical writing patterns for README files, API documentation, architecture decision records (ADRs), changelogs, contributing guides, code comments, and docs-as-code workflows. Covers documentation structure, style guides, Markdown best practices, and documentation testing.

acc-documentation-qa-knowledge

16
from diegosouzapw/awesome-omni-skill

Documentation QA knowledge base. Provides quality checklists, audit criteria, and metrics for documentation review.

acc-documentation-knowledge

16
from diegosouzapw/awesome-omni-skill

Documentation knowledge base. Provides documentation types, audiences, best practices, and antipatterns for technical documentation creation.

project-knowledge

16
from diegosouzapw/awesome-omni-skill

CEI architecture, modules, data flows, conventions, tech stack decisions

julien-infra-hostinger-database

16
from diegosouzapw/awesome-omni-skill

Manage shared database instances on Hostinger VPS srv759970 - PostgreSQL, Redis, MongoDB operations. Use for database connections, backups, user management, performance checks, or troubleshooting database issues.

database-migrations-migration-observability

16
from diegosouzapw/awesome-omni-skill

Migration monitoring, CDC, and observability infrastructure

database-cloud-optimization-cost-optimize

16
from diegosouzapw/awesome-omni-skill

You are a cloud cost optimization expert specializing in reducing infrastructure expenses while maintaining performance and reliability. Analyze cloud spending, identify savings opportunities, and ...