prompt-caching
Caching strategies for LLM prompts including Anthropic prompt caching, response caching, and CAG (Cache Augmented Generation) Use when: prompt caching, cache prompt, response cache, cag, cache augmented.
Best use case
prompt-caching is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Caching strategies for LLM prompts including Anthropic prompt caching, response caching, and CAG (Cache Augmented Generation) Use when: prompt caching, cache prompt, response cache, cag, cache augmented.
Teams using prompt-caching should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/prompt-caching/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How prompt-caching Compares
| Feature / Agent | prompt-caching | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Caching strategies for LLM prompts including Anthropic prompt caching, response caching, and CAG (Cache Augmented Generation) Use when: prompt caching, cache prompt, response cache, cag, cache augmented.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Prompt Caching You're a caching specialist who has reduced LLM costs by 90% through strategic caching. You've implemented systems that cache at multiple levels: prompt prefixes, full responses, and semantic similarity matches. You understand that LLM caching is different from traditional caching—prompts have prefixes that can be cached, responses vary with temperature, and semantic similarity often matters more than exact match. Your core principles: 1. Cache at the right level—prefix, response, or both 2. K ## Capabilities - prompt-cache - response-cache - kv-cache - cag-patterns - cache-invalidation ## Patterns ### Anthropic Prompt Caching Use Claude's native prompt caching for repeated prefixes ### Response Caching Cache full LLM responses for identical or similar queries ### Cache Augmented Generation (CAG) Pre-cache documents in prompt instead of RAG retrieval ## Anti-Patterns ### ❌ Caching with High Temperature ### ❌ No Cache Invalidation ### ❌ Caching Everything ## ⚠️ Sharp Edges | Issue | Severity | Solution | |-------|----------|----------| | Cache miss causes latency spike with additional overhead | high | // Optimize for cache misses, not just hits | | Cached responses become incorrect over time | high | // Implement proper cache invalidation | | Prompt caching doesn't work due to prefix changes | medium | // Structure prompts for optimal caching | ## Related Skills Works well with: `context-window-management`, `rag-implementation`, `conversation-memory`
Related Skills
llm-caching
Optimize LLM costs and latency through KV caching and prompt caching. Use when (1) structuring prompts for cache hits, (2) configuring API cache_control for Anthropic/Cohere/OpenAI/Gemini, (3) setting up self-hosted inference with vLLM/SGLang/Ollama, (4) building agentic workflows with prefix reuse, (5) designing batch processing pipelines, or (6) understanding cache pricing and tradeoffs.
create-prompt
Expert prompt engineering for creating effective prompts for Claude, GPT, and other LLMs. Use when writing system prompts, user prompts, few-shot examples, or optimizing existing prompts for better performance.
create-custom-prompt
Prompt for creating custom prompt files
agentv-prompt-optimizer
Iteratively optimize prompt files against AgentV evaluation datasets by analyzing failures and refining instructions.
prompt-engineer
Transforms user prompts into optimized prompts using frameworks (RTF, RISEN, Chain of Thought, RODES, Chain of Density, RACE, RISE, STAR, SOAP, CLEAR, GROW)
gitlab-ci-artifacts-caching
Use when configuring artifacts for inter-job data passing or caching for faster builds. Covers cache strategies and artifact management.
python-fastapi-scalable-api-cursorrules-prompt-fil
Apply for python-fastapi-scalable-api-cursorrules-prompt-fil. --- description: Defines conventions specific to FastAPI usage in the backend. globs: backend/src/**/*.py
python-django-best-practices-cursorrules-prompt-fi
Apply for python-django-best-practices-cursorrules-prompt-fi. --- description: Configurations for Django settings file with the list of dependencies and conventions. globs: **/settings.py
go-servemux-rest-api-cursorrules-prompt-file
Apply for go-servemux-rest-api-cursorrules-prompt-file. --- description: This rule emphasizes security, scalability, and maintainability best practices in Go API development. globs: /*/**/*_api.go
go-backend-scalability-cursorrules-prompt-file-cursorrules
Apply for go-backend-scalability-cursorrules-prompt-file. --- description: General rule for backend development expertise across the project. globs: **/*
apollo-caching-strategies
Use when implementing Apollo caching strategies including cache policies, optimistic UI, cache updates, and normalization.
system-prompt-writer
This skill should be used when writing or improving system prompts for AI agents, providing expert guidance based on Anthropic's context engineering principles.