nlweb-retrieval-backends

Choose and configure NLWeb retrieval backends — Qdrant (local + remote), Azure AI Search, Elasticsearch, OpenSearch (with/without k-NN), Postgres pgvector, Milvus, Snowflake Cortex Search, Cloudflare AutoRAG, Shopify MCP, and Bing Web Search. Covers `config_retrieval.yaml`, the single `write_endpoint` rule, parallel read-fanout with URL dedup, and per-backend setup pages. Use when picking a retrieval store, migrating between backends, or debugging "results are empty."

17 stars

Best use case

nlweb-retrieval-backends is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Choose and configure NLWeb retrieval backends — Qdrant (local + remote), Azure AI Search, Elasticsearch, OpenSearch (with/without k-NN), Postgres pgvector, Milvus, Snowflake Cortex Search, Cloudflare AutoRAG, Shopify MCP, and Bing Web Search. Covers `config_retrieval.yaml`, the single `write_endpoint` rule, parallel read-fanout with URL dedup, and per-backend setup pages. Use when picking a retrieval store, migrating between backends, or debugging "results are empty."

Teams using nlweb-retrieval-backends should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/nlweb-retrieval-backends/SKILL.md --create-dirs "https://raw.githubusercontent.com/OrcaQubits/agentic-commerce-skills-plugins/main/dist/antigravity/nlweb-protocol/.agent/skills/nlweb-retrieval-backends/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/nlweb-retrieval-backends/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How nlweb-retrieval-backends Compares

Feature / Agentnlweb-retrieval-backendsStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Choose and configure NLWeb retrieval backends — Qdrant (local + remote), Azure AI Search, Elasticsearch, OpenSearch (with/without k-NN), Postgres pgvector, Milvus, Snowflake Cortex Search, Cloudflare AutoRAG, Shopify MCP, and Bing Web Search. Covers `config_retrieval.yaml`, the single `write_endpoint` rule, parallel read-fanout with URL dedup, and per-backend setup pages. Use when picking a retrieval store, migrating between backends, or debugging "results are empty."

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# NLWeb Retrieval Backends

## Before writing code

**Fetch live docs**:
1. Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-retrieval.md for the architectural overview.
2. Fetch https://github.com/nlweb-ai/NLWeb/blob/main/config/config_retrieval.yaml for the **canonical list of endpoint names** and their defaults — config keys move release to release.
3. Pick the per-backend setup page from `docs/setup-*.md` (Qdrant, Azure AI Search, Elasticsearch, OpenSearch, Postgres, Snowflake, Cloudflare AutoRAG).
4. Inspect `AskAgent/python/retrieval_providers/<backend>.py` for the exact client signature and required env vars.
5. Verify the embedding dimension and metric (cosine/dot/L2) the backend expects — must match the embedding provider.

## Conceptual Architecture

### The Read-Fanout, Single-Write Pattern

NLWeb does something unusual: it **reads from every enabled retrieval endpoint in parallel and deduplicates by URL**, but writes go to **exactly one** `write_endpoint`. This means:
- You can run a hybrid index (e.g., local Qdrant for site content + Bing for fresh news) without code changes
- You migrate between backends by re-running `db_load` against the new `write_endpoint`
- "Result quality" is the union of all enabled stores — a noisy backend pollutes the top-k

### All Supported Backends

| Endpoint key (config_retrieval.yaml) | Backend | Notes |
|--------------------------------------|---------|-------|
| `qdrant_local` | Qdrant file-backed | Default-enabled; data in `../data/db` |
| `qdrant_url` | Qdrant remote | Set URL + API key in env |
| `nlweb_west` | Azure AI Search | Default-enabled MS-hosted demo instance — usually disable |
| `azure_ai_search` | Azure AI Search (your own) | Bring your own index name |
| `milvus` | Milvus | Flagged "under development" in YAML |
| `elasticsearch` | Elasticsearch | dense_vector + int8_hnsw |
| `opensearch_knn` | OpenSearch + k-NN plugin | The recommended OpenSearch path |
| `opensearch_script` | OpenSearch no plugin | script_score fallback, slower |
| `postgres` | Postgres + pgvector | Good if you already run Postgres |
| `snowflake_cortex_search_1` | Snowflake Cortex Search | Data lives in Snowflake tables |
| `cloudflare_autorag` | Cloudflare AutoRAG | Indexing managed by CF; ingest via R2 |
| `shopify_mcp` | Shopify's MCP endpoint | Default-enabled; live proxy, no ingest |
| `bing_search` | Bing Web Search API | Live web fallback; not a vector store |

### Read vs Write

Every endpoint declares:
- `enabled: true/false` — whether `/ask` queries it
- `read: true/false` — finer-grained: enable for reads
- (only one endpoint should be the `write_endpoint`)

The default config has `qdrant_local`, `nlweb_west`, and `shopify_mcp` enabled — for local dev disable the last two.

### Choosing a Backend

| If you need... | Use |
|----------------|-----|
| Local dev with no cloud deps | `qdrant_local` |
| Largest scale + Microsoft-stack | `azure_ai_search` |
| Already on AWS | `opensearch_knn` |
| Already on Postgres | `postgres` (pgvector) |
| Live e-commerce catalog | `shopify_mcp` |
| Snowflake-resident data | `snowflake_cortex_search_1` |
| Edge deployment | `cloudflare_autorag` |
| Live news/freshness | `bing_search` (combine with a vector backend) |

### Embedding Dimension Compatibility

Each backend stores fixed-dimension vectors. The embedding provider must emit the same dimension:

| Embedding provider | Default model | Dim |
|--------------------|---------------|-----|
| OpenAI `text-embedding-3-small` | default | 1536 |
| OpenAI `text-embedding-3-large` | — | 3072 |
| Azure OpenAI `text-embedding-3-small` | default | 1536 |
| Gemini `text-embedding-004` | — | 768 |
| Snowflake `arctic-embed-m-v1.5` | — | 768 |
| Elasticsearch `multilingual-e5-small` | — | 384 |

Pick the embedding provider FIRST, configure the backend's index to match THAT dimension, then ingest.

### Metric Compatibility

Most NLWeb providers use **cosine similarity**. When creating a new index manually (Azure AI Search, OpenSearch, Postgres) make sure the metric matches what the retrieval provider class expects. Look in `retrieval_providers/<backend>.py` for the metric the SDK call passes.

### The `nlweb_west` Trap

`nlweb_west` is a Microsoft-hosted demo Azure AI Search instance that's **enabled by default**. For most users this:
- Pollutes results with MS demo data
- Requires the demo's Azure credentials to even connect
- Costs nothing but adds latency

Disable it in local dev unless you specifically want the demo content.

## Implementation Guidance

### Switching the Write Endpoint

Edit `config/config_retrieval.yaml`:

```yaml
write_endpoint: azure_ai_search

endpoints:
  qdrant_local:
    enabled: false
  azure_ai_search:
    enabled: true
    api_key_env: AZURE_SEARCH_API_KEY
    endpoint_env: AZURE_SEARCH_ENDPOINT
    index_name: nlweb-main
```

Then re-ingest:
```bash
python -m data_loading.db_load --only-delete delete-site <site>
python -m data_loading.db_load <source> <site> --database azure_ai_search
```

### Running Multiple Backends in Parallel

Leave several `enabled: true` simultaneously — `/ask` will fan out reads, dedup by URL. Useful for:
- Hybrid: Postgres (cheap) for site content + Bing for live web facts
- A/B: Qdrant + Azure AI Search to compare retrieval quality

### Adding a New Backend Provider

If NLWeb doesn't ship the backend you need:
1. Subclass the base in `retrieval_providers/` — look at any existing one for the contract (search, upsert, delete-by-site).
2. Add an endpoint entry in `config_retrieval.yaml`.
3. Register the class in the provider factory (verify location in current code).
4. Ingest a test site.

### Backend-Specific Notes

**Qdrant local**: zero setup; collection lives at `../data/db`. To reset, delete the directory.

**Azure AI Search**: create the index manually (or via the setup doc's ARM template). Vector field must be `vector` (or whatever the provider class names it — verify).

**Postgres + pgvector**: `CREATE EXTENSION vector;` then ensure the column is `vector(1536)` or whichever dim matches your embedding. NLWeb uses cosine distance by default.

**Snowflake Cortex Search**: data is in a Snowflake table; you create a `CORTEX SEARCH SERVICE` over it. NLWeb queries via the Cortex API. No `db_load.py` involvement.

**Cloudflare AutoRAG**: upload files to R2, point AutoRAG at the bucket, wire NLWeb to the AutoRAG endpoint. CF manages indexing.

**Shopify MCP**: zero ingest. NLWeb proxies queries to a Shopify store's MCP endpoint. Configure the shop domain per-site. Disable for non-commerce deployments.

**Bing**: API key required; only useful combined with at least one vector backend (Bing returns web pages, not your indexed content).

### Debugging Empty Results

Diagnostic ladder:
1. `curl http://localhost:8000/sites` — site is registered?
2. `curl 'http://localhost:8000/ask?query=test&site=X&mode=list&streaming=false'` — any results at all?
3. Disable all backends except the one you wrote to — does the write_endpoint return results?
4. Check embedding dimension: `python -c "from embedding_providers import get_default; print(get_default().dim)"` (verify exact API) and compare to your index schema.
5. Inspect the raw store: `qdrant` CLI / Azure Search Studio / `SELECT count(*) FROM index` for Postgres.
6. Re-ingest with the embedding provider that matches the index.

Always re-fetch `config_retrieval.yaml` from the live repo before generating config — keys change.

Related Skills

nlweb-tools-framework

17
from OrcaQubits/agentic-commerce-skills-plugins

Design and implement NLWeb tools — the per-Schema.org-type handlers that turn a query into a specialized response (search, item_details, compare_items, ensemble, recipe_substitution, accompaniment, conversation_search, etc.). Covers `tools.xml`, the ToolSelector router, builtin handlers in `methods/`, writing a custom tool with a `<returnStruc>` contract, and disabling tool selection for raw retrieval. Use when extending NLWeb beyond the default query → results flow.

nlweb-setup

17
from OrcaQubits/agentic-commerce-skills-plugins

Bootstrap a local NLWeb development environment from scratch — clone the repo, configure .env, install Python deps via `nlweb init-python`, run `nlweb init` for interactive LLM/retrieval selection, load sample Schema.org data, and verify with `nlweb check`. Use when starting a new NLWeb deployment from zero.

nlweb-schema-org-grounding

17
from OrcaQubits/agentic-commerce-skills-plugins

Prepare and structure site content as Schema.org JSON-LD for NLWeb ingestion — covers the supported types (Recipe, Product, Movie, Event, Article, RealEstate, Course, etc.), per-type behavior in NLWeb's tool routing, JSON-LD embedding patterns in HTML, sites.xml registration, and how the `schema_object` flows through ranking back to agent results. Use when authoring or auditing the structured data on a site that will be exposed via NLWeb.

nlweb-prompts-customization

17
from OrcaQubits/agentic-commerce-skills-plugins

Customize NLWeb's LLM prompts and per-Schema.org-type behavior via `prompts.xml` and `site_types.xml` — covers the `<promptString>` template format, `<returnStruc>` JSON schemas, prompt inheritance, decontextualization/ranking/generate templates, per-site overrides, and pitfalls of editing prompts in place. Use when tuning answer quality, supporting a new domain, or localizing prompts.

nlweb-mcp-server

17
from OrcaQubits/agentic-commerce-skills-plugins

Expose NLWeb as an MCP (Model Context Protocol) server — JSON-RPC 2.0 endpoint at /mcp, the `ask` / `list_sites` / `who` tools, MCP protocol version 2024-11-05, and integration with ChatGPT, Claude, Gemini, and other agent clients. Use when wiring NLWeb to an AI agent via MCP or building an MCP client that consumes an NLWeb site.

nlweb-llm-providers

17
from OrcaQubits/agentic-commerce-skills-plugins

Configure NLWeb LLM and embedding providers — OpenAI, Azure OpenAI (default), Anthropic, Google Gemini, DeepSeek on Azure, Llama on Azure, HuggingFace, Inception Labs, Snowflake Cortex, Ollama, Pi Labs. Covers `config_llm.yaml` high/low tier model selection, the ModelRouter cost/quality routing logic, `config_embedding.yaml`, and adding a custom provider. Use when picking models, tuning cost, or wiring a new LLM backend.

nlweb-data-loading

17
from OrcaQubits/agentic-commerce-skills-plugins

Ingest site content into NLWeb's vector store using `db_load.py` — supports RSS/Atom feeds, Schema.org JSON-LD, sitemap-driven URL lists, and CSV. Covers chunking, embedding computation, site partitioning, batch sizing, delete-and-reload, and per-backend write_endpoint targeting. Use when bootstrapping a site's index, refreshing content, or migrating between retrieval backends.

nlweb-chatgpt-appsdk

17
from OrcaQubits/agentic-commerce-skills-plugins

Integrate NLWeb with ChatGPT's Apps SDK — the Node.js MCP server in `openai-apps-sdk-integration/`, the `nlweb-list` tool, the React widget at `ui://widget/nlweb-list.html`, and the port-8100 AppSDK adapter that translates NLWeb's message list to OpenAI Apps SDK envelopes. Use when publishing an NLWeb site as a ChatGPT app or wiring NLWeb results into an Apps SDK widget.

nlweb-auth-multitenancy

17
from OrcaQubits/agentic-commerce-skills-plugins

Configure NLWeb authentication and multi-tenant deployments — OAuth providers (GitHub, Google, Microsoft, Facebook), session storage, the `sites:` allowlist in `config_nlweb.yaml`, conversation persistence per authenticated user, and per-tenant data isolation. Use when adding login to an NLWeb instance, hosting multiple customers on one deployment, or persisting conversation history.

nlweb-ask-endpoint

17
from OrcaQubits/agentic-commerce-skills-plugins

Implement and consume the NLWeb /ask REST endpoint — request shape (GET/POST, query-string and v0.55 structured body), SSE streaming response, modes (list/summarize/generate), in-stream "message_type" headers, error envelopes, and client-side parsing. Use when building an NLWeb server route, calling /ask from a custom agent, or debugging /ask responses.

woo-testing

17
from OrcaQubits/agentic-commerce-skills-plugins

Test WooCommerce extensions — PHPUnit unit/integration tests, WP test suite, WooCommerce test helpers, E2E with Playwright, and WP-CLI test scaffolding. Use when writing tests for WooCommerce plugins or setting up a test environment.

woo-shipping

17
from OrcaQubits/agentic-commerce-skills-plugins

Build WooCommerce shipping methods — WC_Shipping_Method, shipping zones, shipping classes, rate calculation, tracking, and integration with carriers. Use when creating custom shipping integrations or configuring shipping logic.