rag-retrieval-patterns
Use when building or debugging RAG pipelines, when semantic search returns irrelevant results, when implementing hybrid BM25+dense retrieval, or when grounding LLM answers in document sources. Triggers on: retrieval augmented generation, vector search, embeddings, BM25, reranking, knowledge base.
Best use case
rag-retrieval-patterns is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Use when building or debugging RAG pipelines, when semantic search returns irrelevant results, when implementing hybrid BM25+dense retrieval, or when grounding LLM answers in document sources. Triggers on: retrieval augmented generation, vector search, embeddings, BM25, reranking, knowledge base.
Teams using rag-retrieval-patterns should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/rag-retrieval-patterns/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How rag-retrieval-patterns Compares
| Feature / Agent | rag-retrieval-patterns | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Use when building or debugging RAG pipelines, when semantic search returns irrelevant results, when implementing hybrid BM25+dense retrieval, or when grounding LLM answers in document sources. Triggers on: retrieval augmented generation, vector search, embeddings, BM25, reranking, knowledge base.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# RAG & Retrieval Patterns — APEX OS Standard
## Overview
From HandsOnLLM Ch.8. Dense vs sparse retrieval, hybrid pipelines, reranking.
APEX OS skills engine uses `text-embedding-3-small` on Azure + Supabase pgvector.
## Retrieval Strategy Decision Tree
```
┌────────────────────────────────────────────────────────────────────────┐
│ Query type? │
├────────────────────────────────────────────────────────────────────────┤
│ Keyword / exact match / product codes → BM25 (sparse) │
│ Semantic / meaning / concepts → Dense (embeddings) │
│ Mixed / production / best quality → HYBRID (BM25 + dense + rerank)│
└────────────────────────────────────────────────────────────────────────┘
```
## The 3 Retrieval Methods
### 1. Dense Retrieval (Embeddings)
```python
# APEX OS: Azure text-embedding-3-small + Supabase pgvector
from openai import AzureOpenAI
client = AzureOpenAI(
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ["AZURE_OPENAI_API_KEY"],
api_version="2024-02-01"
)
def embed(text: str, input_type: str = "query") -> list[float]:
# CRITICAL: separate input_type for query vs document
# query → retrieval.query
# document → retrieval.passage
response = client.embeddings.create(
input=text,
model="text-embedding-3-small",
extra_body={"input_type": input_type}
)
return response.data[0].embedding
# Supabase pgvector similarity search
results = supabase.rpc("match_skills", {
"query_embedding": embed(query, "query"),
"match_threshold": 0.7,
"match_count": 10
}).execute()
```
### 2. BM25 (Sparse / Keyword)
```python
from rank_bm25 import BM25Okapi
corpus = [doc.split() for doc in documents]
bm25 = BM25Okapi(corpus)
scores = bm25.get_scores(query.split())
top_idx = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:10]
```
### 3. Hybrid Pipeline (Production Standard)
```python
def hybrid_retrieve(query: str, documents: list[str], top_k: int = 5):
# Step 1: Dense retrieval (semantic)
dense_results = dense_search(query, documents, top_k=20)
# Step 2: BM25 (keyword)
sparse_results = bm25_search(query, documents, top_k=20)
# Step 3: Reciprocal Rank Fusion
scores = {}
for rank, doc in enumerate(dense_results):
scores[doc.id] = scores.get(doc.id, 0) + 1 / (60 + rank)
for rank, doc in enumerate(sparse_results):
scores[doc.id] = scores.get(doc.id, 0) + 1 / (60 + rank)
# Step 4: Rerank with cross-encoder
candidates = sorted(scores, key=scores.get, reverse=True)[:20]
return rerank(query, candidates)[:top_k]
```
## Supabase pgvector Setup (APEX OS)
```sql
-- Enable extension
create extension if not exists vector;
-- Table with embedding column
create table skill_registry (
id uuid primary key default gen_random_uuid(),
name text not null,
description text,
embedding vector(1536), -- text-embedding-3-small dimension
created_at timestamptz default now()
);
-- HNSW index (faster than ivfflat for < 1M rows)
create index on skill_registry
using hnsw (embedding vector_cosine_ops);
-- Match function
create or replace function match_skills(
query_embedding vector(1536),
match_threshold float,
match_count int
)
returns table (id uuid, name text, description text, similarity float)
language sql stable as $$
select id, name, description,
1 - (embedding <=> query_embedding) as similarity
from skill_registry
where 1 - (embedding <=> query_embedding) > match_threshold
order by embedding <=> query_embedding
limit match_count;
$$;
```
## RAG Grounded Generation
```python
def rag_answer(query: str) -> str:
# 1. Retrieve
docs = hybrid_retrieve(query, top_k=5)
# 2. Build grounded prompt
context = "\n\n".join(f"[{i+1}] {d.content}" for i, d in enumerate(docs))
prompt = f"""Answer using ONLY the context below. If the answer is not in
the context, say "I don't have that information."
Context:
{context}
Question: {query}
Cite sources as [1], [2], etc."""
# 3. Generate
return llm.complete(prompt)
```
## Critical: input_type Separation
```
┌──────────────────────────────────┬────────────────────────────────────┐
│ Wrong (same type for both) │ Right (separate types) │
├──────────────────────────────────┼────────────────────────────────────┤
│ embed(query, "document") │ embed(query, "query") │
│ embed(document, "document") │ embed(document, "passage") │
│ → retrieval quality degrades 15% │ → optimal similarity alignment │
└──────────────────────────────────┴────────────────────────────────────┘
```
## Common Mistakes
- Using dense-only retrieval for keyword queries — BM25 beats it on exact matches
- Not reranking after fusion — raw scores from two systems don't compare directly
- Wrong `input_type` — query and passage embeddings must use their respective types
- Embedding entire documents — chunk first (512 tokens max), embed chunksRelated Skills
vercel-composition-patterns
React composition patterns that scale. Use when refactoring components with boolean prop proliferation, building flexible component libraries, or designing reusable APIs. Triggers on tasks involving compound components, render props, context providers, or component architecture. Includes React 19 API changes.
tool-definition-patterns
Standards for defining AI agent tools based on Cline's system prompt patterns. Covers parameter typing, documentation, edit formats, safety mechanisms, and operational best practices.
writing-plans
Use when you have a spec or requirements for a multi-step task, before touching code
webtricks-tier-pricing-ui
Build interactive tier-based pricing UI with lock/unlock states, progressive disclosure, and cross-slide consistency. Use when building pricing pages, tier selectors, or feature comparison grids. Tags: webtricks, pricing, tiers, SaaS.
webtricks-browser-qa-audit
Audit live websites using Playwright MCP for browser-based QA. Covers accessibility snapshots, screenshot verification, interactive element testing, and tier differentiation audits. Use after deploying web changes to verify they work. Tags: webtricks, QA, testing, playwright, audit.
webtricks-animated-pipeline
Build animated data flow pipelines with SVG circuits, traveling dots, and ambient animations using Framer Motion + SVG. Use when visualizing architecture, workflows, API pipelines, or any step-by-step data flow. Tags: webtricks, animation, pipeline, SVG, architecture.
web-design-guidelines
Review UI code for Web Interface Guidelines compliance. Use when asked to "review my UI", "check accessibility", "audit design", "review UX", or "check my site against best practices".
verification-before-completion
Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always
vercel-react-native-skills
React Native and Expo best practices for building performant mobile apps. Use when building React Native components, optimizing list performance, implementing animations, or working with native modules. Triggers on tasks involving React Native, Expo, mobile performance, or native platform APIs.
vercel-react-best-practices
React and Next.js performance optimization guidelines from Vercel Engineering. This skill should be used when writing, reviewing, or refactoring React/Next.js code to ensure optimal performance patterns. Triggers on tasks involving React components, Next.js pages, data fetching, bundle optimization, or performance improvements.
ui-ux-pro-max
UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 9 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient. Integrations: shadcn/ui MCP for component search and examples.
test-driven-development
Use when implementing any feature or bugfix, before writing implementation code