rag-retrieval-patterns

Use when building or debugging RAG pipelines, when semantic search returns irrelevant results, when implementing hybrid BM25+dense retrieval, or when grounding LLM answers in document sources. Triggers on: retrieval augmented generation, vector search, embeddings, BM25, reranking, knowledge base.

7 stars

Best use case

rag-retrieval-patterns is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Use when building or debugging RAG pipelines, when semantic search returns irrelevant results, when implementing hybrid BM25+dense retrieval, or when grounding LLM answers in document sources. Triggers on: retrieval augmented generation, vector search, embeddings, BM25, reranking, knowledge base.

Teams using rag-retrieval-patterns should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/rag-retrieval-patterns/SKILL.md --create-dirs "https://raw.githubusercontent.com/fratilanico/apex-os-bad-boy/main/rag-retrieval-patterns/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/rag-retrieval-patterns/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How rag-retrieval-patterns Compares

Feature / Agentrag-retrieval-patternsStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Use when building or debugging RAG pipelines, when semantic search returns irrelevant results, when implementing hybrid BM25+dense retrieval, or when grounding LLM answers in document sources. Triggers on: retrieval augmented generation, vector search, embeddings, BM25, reranking, knowledge base.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# RAG & Retrieval Patterns — APEX OS Standard

## Overview

From HandsOnLLM Ch.8. Dense vs sparse retrieval, hybrid pipelines, reranking.
APEX OS skills engine uses `text-embedding-3-small` on Azure + Supabase pgvector.

## Retrieval Strategy Decision Tree

```
┌────────────────────────────────────────────────────────────────────────┐
│ Query type?                                                             │
├────────────────────────────────────────────────────────────────────────┤
│ Keyword / exact match / product codes  → BM25 (sparse)                 │
│ Semantic / meaning / concepts          → Dense (embeddings)            │
│ Mixed / production / best quality      → HYBRID (BM25 + dense + rerank)│
└────────────────────────────────────────────────────────────────────────┘
```

## The 3 Retrieval Methods

### 1. Dense Retrieval (Embeddings)
```python
# APEX OS: Azure text-embedding-3-small + Supabase pgvector
from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version="2024-02-01"
)

def embed(text: str, input_type: str = "query") -> list[float]:
    # CRITICAL: separate input_type for query vs document
    # query → retrieval.query
    # document → retrieval.passage
    response = client.embeddings.create(
        input=text,
        model="text-embedding-3-small",
        extra_body={"input_type": input_type}
    )
    return response.data[0].embedding

# Supabase pgvector similarity search
results = supabase.rpc("match_skills", {
    "query_embedding": embed(query, "query"),
    "match_threshold": 0.7,
    "match_count": 10
}).execute()
```

### 2. BM25 (Sparse / Keyword)
```python
from rank_bm25 import BM25Okapi

corpus = [doc.split() for doc in documents]
bm25 = BM25Okapi(corpus)
scores = bm25.get_scores(query.split())
top_idx = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:10]
```

### 3. Hybrid Pipeline (Production Standard)
```python
def hybrid_retrieve(query: str, documents: list[str], top_k: int = 5):
    # Step 1: Dense retrieval (semantic)
    dense_results = dense_search(query, documents, top_k=20)

    # Step 2: BM25 (keyword)
    sparse_results = bm25_search(query, documents, top_k=20)

    # Step 3: Reciprocal Rank Fusion
    scores = {}
    for rank, doc in enumerate(dense_results):
        scores[doc.id] = scores.get(doc.id, 0) + 1 / (60 + rank)
    for rank, doc in enumerate(sparse_results):
        scores[doc.id] = scores.get(doc.id, 0) + 1 / (60 + rank)

    # Step 4: Rerank with cross-encoder
    candidates = sorted(scores, key=scores.get, reverse=True)[:20]
    return rerank(query, candidates)[:top_k]
```

## Supabase pgvector Setup (APEX OS)

```sql
-- Enable extension
create extension if not exists vector;

-- Table with embedding column
create table skill_registry (
    id uuid primary key default gen_random_uuid(),
    name text not null,
    description text,
    embedding vector(1536),  -- text-embedding-3-small dimension
    created_at timestamptz default now()
);

-- HNSW index (faster than ivfflat for < 1M rows)
create index on skill_registry
using hnsw (embedding vector_cosine_ops);

-- Match function
create or replace function match_skills(
    query_embedding vector(1536),
    match_threshold float,
    match_count int
)
returns table (id uuid, name text, description text, similarity float)
language sql stable as $$
    select id, name, description,
           1 - (embedding <=> query_embedding) as similarity
    from skill_registry
    where 1 - (embedding <=> query_embedding) > match_threshold
    order by embedding <=> query_embedding
    limit match_count;
$$;
```

## RAG Grounded Generation

```python
def rag_answer(query: str) -> str:
    # 1. Retrieve
    docs = hybrid_retrieve(query, top_k=5)

    # 2. Build grounded prompt
    context = "\n\n".join(f"[{i+1}] {d.content}" for i, d in enumerate(docs))

    prompt = f"""Answer using ONLY the context below. If the answer is not in
the context, say "I don't have that information."

Context:
{context}

Question: {query}

Cite sources as [1], [2], etc."""

    # 3. Generate
    return llm.complete(prompt)
```

## Critical: input_type Separation

```
┌──────────────────────────────────┬────────────────────────────────────┐
│ Wrong (same type for both)        │ Right (separate types)             │
├──────────────────────────────────┼────────────────────────────────────┤
│ embed(query, "document")          │ embed(query, "query")              │
│ embed(document, "document")       │ embed(document, "passage")         │
│ → retrieval quality degrades 15%  │ → optimal similarity alignment     │
└──────────────────────────────────┴────────────────────────────────────┘
```

## Common Mistakes

- Using dense-only retrieval for keyword queries — BM25 beats it on exact matches
- Not reranking after fusion — raw scores from two systems don't compare directly
- Wrong `input_type` — query and passage embeddings must use their respective types
- Embedding entire documents — chunk first (512 tokens max), embed chunks

Related Skills

vercel-composition-patterns

7
from fratilanico/apex-os-bad-boy

React composition patterns that scale. Use when refactoring components with boolean prop proliferation, building flexible component libraries, or designing reusable APIs. Triggers on tasks involving compound components, render props, context providers, or component architecture. Includes React 19 API changes.

tool-definition-patterns

7
from fratilanico/apex-os-bad-boy

Standards for defining AI agent tools based on Cline's system prompt patterns. Covers parameter typing, documentation, edit formats, safety mechanisms, and operational best practices.

writing-plans

7
from fratilanico/apex-os-bad-boy

Use when you have a spec or requirements for a multi-step task, before touching code

webtricks-tier-pricing-ui

7
from fratilanico/apex-os-bad-boy

Build interactive tier-based pricing UI with lock/unlock states, progressive disclosure, and cross-slide consistency. Use when building pricing pages, tier selectors, or feature comparison grids. Tags: webtricks, pricing, tiers, SaaS.

webtricks-browser-qa-audit

7
from fratilanico/apex-os-bad-boy

Audit live websites using Playwright MCP for browser-based QA. Covers accessibility snapshots, screenshot verification, interactive element testing, and tier differentiation audits. Use after deploying web changes to verify they work. Tags: webtricks, QA, testing, playwright, audit.

webtricks-animated-pipeline

7
from fratilanico/apex-os-bad-boy

Build animated data flow pipelines with SVG circuits, traveling dots, and ambient animations using Framer Motion + SVG. Use when visualizing architecture, workflows, API pipelines, or any step-by-step data flow. Tags: webtricks, animation, pipeline, SVG, architecture.

web-design-guidelines

7
from fratilanico/apex-os-bad-boy

Review UI code for Web Interface Guidelines compliance. Use when asked to "review my UI", "check accessibility", "audit design", "review UX", or "check my site against best practices".

verification-before-completion

7
from fratilanico/apex-os-bad-boy

Use when about to claim work is complete, fixed, or passing, before committing or creating PRs - requires running verification commands and confirming output before making any success claims; evidence before assertions always

vercel-react-native-skills

7
from fratilanico/apex-os-bad-boy

React Native and Expo best practices for building performant mobile apps. Use when building React Native components, optimizing list performance, implementing animations, or working with native modules. Triggers on tasks involving React Native, Expo, mobile performance, or native platform APIs.

vercel-react-best-practices

7
from fratilanico/apex-os-bad-boy

React and Next.js performance optimization guidelines from Vercel Engineering. This skill should be used when writing, reviewing, or refactoring React/Next.js code to ensure optimal performance patterns. Triggers on tasks involving React components, Next.js pages, data fetching, bundle optimization, or performance improvements.

ui-ux-pro-max

7
from fratilanico/apex-os-bad-boy

UI/UX design intelligence. 50 styles, 21 palettes, 50 font pairings, 20 charts, 9 stacks (React, Next.js, Vue, Svelte, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app, .html, .tsx, .vue, .svelte. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient. Integrations: shadcn/ui MCP for component search and examples.

test-driven-development

7
from fratilanico/apex-os-bad-boy

Use when implementing any feature or bugfix, before writing implementation code