prompt-caching

Caching strategies for LLM prompts including Anthropic prompt caching, response caching, and CAG (Cache Augmented Generation) Use when: prompt caching, cache prompt, response cache, cag, cache augmented.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

prompt-caching is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using prompt-caching should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/prompt-caching/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/ai-agents/prompt-caching/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/prompt-caching/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How prompt-caching Compares

Feature / Agent	prompt-caching	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Prompt Caching

You're a caching specialist who has reduced LLM costs by 90% through strategic caching.
You've implemented systems that cache at multiple levels: prompt prefixes, full responses,
and semantic similarity matches.

You understand that LLM caching is different from traditional caching—prompts have
prefixes that can be cached, responses vary with temperature, and semantic similarity
often matters more than exact match.

Your core principles:
1. Cache at the right level—prefix, response, or both
2. K

## Capabilities

- prompt-cache
- response-cache
- kv-cache
- cag-patterns
- cache-invalidation

## Patterns

### Anthropic Prompt Caching

Use Claude's native prompt caching for repeated prefixes

### Response Caching

Cache full LLM responses for identical or similar queries

### Cache Augmented Generation (CAG)

Pre-cache documents in prompt instead of RAG retrieval

## Anti-Patterns

### ❌ Caching with High Temperature

### ❌ No Cache Invalidation

### ❌ Caching Everything

## ⚠️ Sharp Edges

| Issue | Severity | Solution |
|-------|----------|----------|
| Cache miss causes latency spike with additional overhead | high | // Optimize for cache misses, not just hits |
| Cached responses become incorrect over time | high | // Implement proper cache invalidation |
| Prompt caching doesn't work due to prefix changes | medium | // Structure prompts for optimal caching |

## Related Skills

Works well with: `context-window-management`, `rag-implementation`, `conversation-memory`

Related Skills

llm-caching

from diegosouzapw/awesome-omni-skill

Optimize LLM costs and latency through KV caching and prompt caching. Use when (1) structuring prompts for cache hits, (2) configuring API cache_control for Anthropic/Cohere/OpenAI/Gemini, (3) setting up self-hosted inference with vLLM/SGLang/Ollama, (4) building agentic workflows with prefix reuse, (5) designing batch processing pipelines, or (6) understanding cache pricing and tradeoffs.

create-prompt

from diegosouzapw/awesome-omni-skill

Expert prompt engineering for creating effective prompts for Claude, GPT, and other LLMs. Use when writing system prompts, user prompts, few-shot examples, or optimizing existing prompts for better performance.

create-custom-prompt

from diegosouzapw/awesome-omni-skill

Prompt for creating custom prompt files

agentv-prompt-optimizer

from diegosouzapw/awesome-omni-skill

Iteratively optimize prompt files against AgentV evaluation datasets by analyzing failures and refining instructions.

prompt-engineer

from diegosouzapw/awesome-omni-skill

Transforms user prompts into optimized prompts using frameworks (RTF, RISEN, Chain of Thought, RODES, Chain of Density, RACE, RISE, STAR, SOAP, CLEAR, GROW)

Codex

gitlab-ci-artifacts-caching

from diegosouzapw/awesome-omni-skill

Use when configuring artifacts for inter-job data passing or caching for faster builds. Covers cache strategies and artifact management.

python-fastapi-scalable-api-cursorrules-prompt-fil

from diegosouzapw/awesome-omni-skill

Apply for python-fastapi-scalable-api-cursorrules-prompt-fil. --- description: Defines conventions specific to FastAPI usage in the backend. globs: backend/src/**/*.py

python-django-best-practices-cursorrules-prompt-fi

from diegosouzapw/awesome-omni-skill

Apply for python-django-best-practices-cursorrules-prompt-fi. --- description: Configurations for Django settings file with the list of dependencies and conventions. globs: **/settings.py

go-servemux-rest-api-cursorrules-prompt-file

from diegosouzapw/awesome-omni-skill

Apply for go-servemux-rest-api-cursorrules-prompt-file. --- description: This rule emphasizes security, scalability, and maintainability best practices in Go API development. globs: /*/**/*_api.go

go-backend-scalability-cursorrules-prompt-file-cursorrules

from diegosouzapw/awesome-omni-skill

Apply for go-backend-scalability-cursorrules-prompt-file. --- description: General rule for backend development expertise across the project. globs: **/*

apollo-caching-strategies

from diegosouzapw/awesome-omni-skill

Use when implementing Apollo caching strategies including cache policies, optimistic UI, cache updates, and normalization.

system-prompt-writer

from diegosouzapw/awesome-omni-skill

This skill should be used when writing or improving system prompts for AI agents, providing expert guidance based on Anthropic's context engineering principles.