prompt-caching

Caching strategies for LLM prompts including Anthropic prompt caching, response caching, and CAG (Cache Augmented Generation) Use when: prompt caching, cache prompt, response cache, cag, cache augmented.

25 stars

byComeOnOliver

View on GitHub Installation ↓

Best use case

prompt-caching is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using prompt-caching should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/prompt-caching/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/aiskillstore/marketplace/sickn33/prompt-caching/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/prompt-caching/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How prompt-caching Compares

Feature / Agent	prompt-caching	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Prompt Caching

You're a caching specialist who has reduced LLM costs by 90% through strategic caching.
You've implemented systems that cache at multiple levels: prompt prefixes, full responses,
and semantic similarity matches.

You understand that LLM caching is different from traditional caching—prompts have
prefixes that can be cached, responses vary with temperature, and semantic similarity
often matters more than exact match.

Your core principles:
1. Cache at the right level—prefix, response, or both
2. K

## Capabilities

- prompt-cache
- response-cache
- kv-cache
- cag-patterns
- cache-invalidation

## Patterns

### Anthropic Prompt Caching

Use Claude's native prompt caching for repeated prefixes

### Response Caching

Cache full LLM responses for identical or similar queries

### Cache Augmented Generation (CAG)

Pre-cache documents in prompt instead of RAG retrieval

## Anti-Patterns

### ❌ Caching with High Temperature

### ❌ No Cache Invalidation

### ❌ Caching Everything

## ⚠️ Sharp Edges

| Issue | Severity | Solution |
|-------|----------|----------|
| Cache miss causes latency spike with additional overhead | high | // Optimize for cache misses, not just hits |
| Cached responses become incorrect over time | high | // Implement proper cache invalidation |
| Prompt caching doesn't work due to prefix changes | medium | // Structure prompts for optimal caching |

## Related Skills

Works well with: `context-window-management`, `rag-implementation`, `conversation-memory`

Related Skills

optimizing-prompts

from ComeOnOliver/skillshub

Execute this skill optimizes prompts for large language models (llms) to reduce token usage, lower costs, and improve performance. it analyzes the prompt, identifies areas for simplification and redundancy removal, and rewrites the prompt to be more conci... Use when optimizing performance. Trigger with phrases like 'optimize', 'performance', or 'speed up'.

implementing-database-caching

from ComeOnOliver/skillshub

Process use when you need to implement multi-tier caching to improve database performance. This skill sets up Redis, in-memory caching, and CDN layers to reduce database load. Trigger with phrases like "implement database caching", "add Redis cache layer", "improve query performance with caching", or "reduce database load".

cursor-custom-prompts

from ComeOnOliver/skillshub

Create effective custom prompts for Cursor AI using project rules, prompt engineering patterns, and reusable templates. Triggers on "cursor prompts", "prompt engineering cursor", "better cursor prompts", "cursor instructions", "cursor prompt templates".

api-caching-strategy

from ComeOnOliver/skillshub

Api Caching Strategy - Auto-activating skill for API Development. Triggers on: api caching strategy, api caching strategy Part of the API Development skill category.

promptify

from ComeOnOliver/skillshub

Transform user requests into detailed, precise prompts for AI models. Use when users say "promptify", "promptify this", or explicitly request prompt engineering or improvement of their request for better AI responses.

prompt-improver

from ComeOnOliver/skillshub

Optimize prompts for better AI responses. Use when user asks to improve a prompt, refine a prompt, make a prompt better, optimize prompting, review their prompt, or says "/improve-prompt". Transforms vague requests into clear, specific, actionable prompts.

gws-modelarmor-sanitize-prompt

from ComeOnOliver/skillshub

Google Model Armor: Sanitize a user prompt through a Model Armor template.

tldr-prompt

from ComeOnOliver/skillshub

Create tldr summaries for GitHub Copilot files (prompts, agents, instructions, collections), MCP servers, or documentation from URLs and queries.

prompt-builder

from ComeOnOliver/skillshub

Guide users through creating high-quality GitHub Copilot prompts with proper structure, tools, and best practices.

promptfoo-evaluation

from ComeOnOliver/skillshub

Configures and runs LLM evaluation using Promptfoo framework. Use when setting up prompt testing, creating evaluation configs (promptfooconfig.yaml), writing Python custom assertions, implementing llm-rubric for LLM-as-judge, or managing few-shot examples in prompts. Triggers on keywords like "promptfoo", "eval", "LLM evaluation", "prompt testing", or "model comparison".

prompt-injection-test

from ComeOnOliver/skillshub

A test skill with prompt injection patterns

../../../marketing-skill/prompt-engineer-toolkit/SKILL.md

from ComeOnOliver/skillshub

No description provided.