rag-patterns
Chunking strategies, embedding model selection, hybrid search, reranking, eval metrics
Best use case
rag-patterns is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Chunking strategies, embedding model selection, hybrid search, reranking, eval metrics
Teams using rag-patterns should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/rag-patterns/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How rag-patterns Compares
| Feature / Agent | rag-patterns | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Chunking strategies, embedding model selection, hybrid search, reranking, eval metrics
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# RAG Patterns
## Chunking Strategies
```python
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Strategy 1: Recursive character splitting (general purpose)
splitter = RecursiveCharacterTextSplitter(
chunk_size=512,
chunk_overlap=64,
separators=["\n\n", "\n", ". ", " ", ""],
length_function=len,
)
# Strategy 2: Semantic chunking (better coherence)
from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai import OpenAIEmbeddings
semantic_splitter = SemanticChunker(
OpenAIEmbeddings(),
breakpoint_threshold_type="percentile",
breakpoint_threshold_amount=95,
)
# Strategy 3: Parent-child chunking (preserves context)
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
child_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=50)
parent_docs = parent_splitter.split_documents(documents)
for parent in parent_docs:
children = child_splitter.split_documents([parent])
for child in children:
child.metadata["parent_id"] = parent.metadata["id"]
```
## Embedding Model Selection
```yaml
Models by Use Case:
General (English):
- text-embedding-3-small (OpenAI, 1536d, cheap)
- text-embedding-3-large (OpenAI, 3072d, best quality)
- all-MiniLM-L6-v2 (local, 384d, fast)
Code:
- text-embedding-3-large with code-tuned prompts
- voyage-code-2 (Voyage AI)
Multilingual:
- multilingual-e5-large (local)
- text-embedding-3-large (OpenAI)
Selection Criteria:
- Latency requirement < 50ms → local model
- Quality critical → text-embedding-3-large
- Budget constrained → text-embedding-3-small
- Air-gapped → all-MiniLM-L6-v2
```
## Hybrid Search (Vector + BM25)
```python
from rank_bm25 import BM25Okapi
import numpy as np
class HybridRetriever:
def __init__(self, vector_store, documents, alpha=0.5):
self.vector_store = vector_store
self.alpha = alpha # 0=BM25 only, 1=vector only
tokenized = [doc.page_content.lower().split() for doc in documents]
self.bm25 = BM25Okapi(tokenized)
self.documents = documents
def search(self, query: str, k: int = 10) -> list:
# Vector search
vector_results = self.vector_store.similarity_search_with_score(query, k=k)
vector_scores = {doc.metadata["id"]: score for doc, score in vector_results}
# BM25 search
bm25_scores_raw = self.bm25.get_scores(query.lower().split())
bm25_max = max(bm25_scores_raw) if max(bm25_scores_raw) > 0 else 1
bm25_scores = {
self.documents[i].metadata["id"]: score / bm25_max
for i, score in enumerate(bm25_scores_raw)
}
# Reciprocal Rank Fusion
all_ids = set(vector_scores) | set(bm25_scores)
fused = {}
for doc_id in all_ids:
v_score = vector_scores.get(doc_id, 0)
b_score = bm25_scores.get(doc_id, 0)
fused[doc_id] = self.alpha * v_score + (1 - self.alpha) * b_score
sorted_ids = sorted(fused, key=fused.get, reverse=True)[:k]
return [self._get_doc(did) for did in sorted_ids]
```
## Reranking
```python
from sentence_transformers import CrossEncoder
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
def rerank(query: str, documents: list, top_k: int = 5) -> list:
pairs = [(query, doc.page_content) for doc in documents]
scores = reranker.predict(pairs)
ranked = sorted(zip(documents, scores), key=lambda x: x[1], reverse=True)
return [doc for doc, _ in ranked[:top_k]]
# Pipeline: retrieve 20 → rerank to 5
candidates = hybrid_retriever.search(query, k=20)
final = rerank(query, candidates, top_k=5)
```
## Evaluation Metrics
```python
def evaluate_rag(queries, expected_answers, retriever, generator):
metrics = {"retrieval_recall": [], "answer_correctness": [], "faithfulness": []}
for query, expected in zip(queries, expected_answers):
retrieved = retriever.search(query, k=5)
retrieved_texts = [d.page_content for d in retrieved]
# Retrieval recall: did we find the right chunks?
relevant_found = any(expected["source"] in t for t in retrieved_texts)
metrics["retrieval_recall"].append(1.0 if relevant_found else 0.0)
# Generate answer
answer = generator.generate(query, retrieved_texts)
# Faithfulness: is answer grounded in retrieved context?
# (Use LLM-as-judge or NLI model)
metrics["faithfulness"].append(check_faithfulness(answer, retrieved_texts))
# Correctness: does answer match expected?
metrics["answer_correctness"].append(check_correctness(answer, expected["answer"]))
return {k: sum(v) / len(v) for k, v in metrics.items()}
```
## Checklist
- [ ] Chunk size tuned for domain (code: 1000+, prose: 300-500)
- [ ] Chunk overlap prevents context loss at boundaries
- [ ] Metadata preserved (source, page, section) for attribution
- [ ] Hybrid search combines vector + keyword for robustness
- [ ] Reranker applied before final context assembly
- [ ] Context window budget managed (don't exceed LLM limit)
- [ ] Evaluation pipeline with retrieval recall + faithfulness
- [ ] Embedding model benchmarked on domain-specific queries
## Anti-Patterns
- Chunking without overlap (losing context at boundaries)
- Using only vector search (misses exact keyword matches)
- Stuffing all retrieved chunks into prompt (exceeds context window)
- No evaluation pipeline (can't measure improvements)
- Embedding queries and documents with different models
- Ignoring metadata filtering before vector search
- Not handling empty retrieval results gracefullyRelated Skills
websocket-patterns
Connection management, room patterns, reconnection strategies, message buffering, and binary protocol design.
vector-db-patterns
Embedding strategies, ANN algorithms, hybrid search, RAG chunking strategies, and reranking for semantic search and retrieval.
tracing-patterns
OpenTelemetry setup, span context propagation, sampling strategies, Jaeger queries
terraform-patterns
Module composition, state management, workspace strategy, provider versioning, and infrastructure-as-code best practices.
swift-patterns
SwiftUI view composition, @Observable patterns, async/await concurrency, TCA architecture, and Combine reactive streams.
springboot-patterns
Spring Boot architecture patterns, REST API design, layered services, data access, caching, async processing, and logging. Use for Java Spring Boot backend work.
seo-patterns
Meta tag patterns, structured data (JSON-LD), Core Web Vitals optimization, and SSR/SSG strategies for search visibility.
secret-patterns
30+ service-specific secret detection regex patterns, entropy-based detection, PEM/JWT/Base64 identification, and false positive filtering.
saas-payment-patterns
Payment provider abstraction, webhook security, subscription lifecycle, dunning flows, pricing models, invoicing, tax handling, and refund patterns for SaaS applications.
saas-auth-patterns
SaaS authentication and authorization patterns including JWT vs session strategies, multi-tenant isolation, RBAC, API key management, passwordless flows, MFA, and secure session handling.
saas-analytics-patterns
SaaS analytics event taxonomy, metric formulas (MRR, churn, LTV), provider-agnostic tracking, funnel analysis, cohort setup, and privacy-respecting instrumentation.
revenuecat-patterns
RevenueCat SDK entegrasyon pattern'leri. iOS (Swift), Android (Kotlin), React Native ve Flutter icin setup, offerings, entitlement checking, webhook integration, StoreKit 2 migration ve sandbox testing.