nlweb-llm-providers
Configure NLWeb LLM and embedding providers — OpenAI, Azure OpenAI (default), Anthropic, Google Gemini, DeepSeek on Azure, Llama on Azure, HuggingFace, Inception Labs, Snowflake Cortex, Ollama, Pi Labs. Covers `config_llm.yaml` high/low tier model selection, the ModelRouter cost/quality routing logic, `config_embedding.yaml`, and adding a custom provider. Use when picking models, tuning cost, or wiring a new LLM backend.
Best use case
nlweb-llm-providers is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Configure NLWeb LLM and embedding providers — OpenAI, Azure OpenAI (default), Anthropic, Google Gemini, DeepSeek on Azure, Llama on Azure, HuggingFace, Inception Labs, Snowflake Cortex, Ollama, Pi Labs. Covers `config_llm.yaml` high/low tier model selection, the ModelRouter cost/quality routing logic, `config_embedding.yaml`, and adding a custom provider. Use when picking models, tuning cost, or wiring a new LLM backend.
Teams using nlweb-llm-providers should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/nlweb-llm-providers/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How nlweb-llm-providers Compares
| Feature / Agent | nlweb-llm-providers | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Configure NLWeb LLM and embedding providers — OpenAI, Azure OpenAI (default), Anthropic, Google Gemini, DeepSeek on Azure, Llama on Azure, HuggingFace, Inception Labs, Snowflake Cortex, Ollama, Pi Labs. Covers `config_llm.yaml` high/low tier model selection, the ModelRouter cost/quality routing logic, `config_embedding.yaml`, and adding a custom provider. Use when picking models, tuning cost, or wiring a new LLM backend.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# NLWeb LLM & Embedding Providers
## Before writing code
**Fetch live docs**:
1. Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-providers.md for the canonical provider list and config schema.
2. Fetch https://github.com/nlweb-ai/NLWeb/blob/main/config/config_llm.yaml for the **exact model IDs and env-var names currently shipped**.
3. Fetch https://github.com/nlweb-ai/NLWeb/blob/main/config/config_embedding.yaml for embedding defaults.
4. Inspect `AskAgent/python/llm_providers/<provider>.py` for the SDK calls the provider class makes.
5. Web-search the latest release notes — new providers and models get added often.
## Conceptual Architecture
### Mixed-Mode = Many Small LLM Calls
NLWeb's pipeline doesn't make one big LLM call per query. It makes **many small calls**: decontextualize the query, detect Schema.org item type, route to a tool, rank results, optionally summarize/generate. Each call has a strict `<returnStruc>` JSON schema in `prompts.xml`. Cost and latency are dominated by the *number* of calls, not the size of any single one.
### High / Low Tier Model Selection
`config_llm.yaml` defines a **high model** and a **low model** per provider:
```yaml
providers:
openai:
high: gpt-4.1
low: gpt-4.1-mini
api_key_env: OPENAI_API_KEY
```
The codebase decides which tier to use per call site — e.g., decontextualization is "low", final generate is "high". The exact assignment lives in `core/` modules and the `ModelRouter` subsystem.
### The Default Provider
Out of the box, NLWeb's `preferred_endpoint` (in `config_llm.yaml`) is **`azure_openai`** with `gpt-4.1` / `gpt-4.1-mini`. Most users override this in `.env` or by editing the YAML.
### All Supported LLM Providers
(Verify the live `config_llm.yaml` for current models and key names.)
| Provider | Default high | Default low | Env var |
|----------|--------------|-------------|---------|
| OpenAI | gpt-4.1 | gpt-4.1-mini | `OPENAI_API_KEY` |
| Azure OpenAI | gpt-4.1 | gpt-4.1-mini | `AZURE_OPENAI_API_KEY` + `AZURE_OPENAI_ENDPOINT` |
| Anthropic | claude-3-7-sonnet-latest | claude-3-5-haiku-latest | `ANTHROPIC_API_KEY` |
| Google Gemini | gemini-2.5-pro | gemini-2.0-flash-lite | `GEMINI_API_KEY` |
| DeepSeek on Azure | deepseek-coder-33b | deepseek-coder-7b | `AZURE_DEEPSEEK_ENDPOINT` |
| Llama on Azure | llama-2-70b | llama-2-13b | `AZURE_LLAMA_ENDPOINT` |
| HuggingFace | Qwen2.5-72B | Qwen2.5-Coder-7B | `HF_TOKEN` |
| Inception Labs | mercury-small | mercury-small | `INCEPTION_API_KEY` |
| Snowflake Cortex | claude-3-5-sonnet | llama3.1-8b | Snowflake creds |
| Ollama | configurable | configurable | local — no key |
| Pi Labs | (class present, may not be in default YAML) | — | — |
### Embedding Providers
| Provider | Default model | Dim |
|----------|---------------|-----|
| OpenAI | text-embedding-3-small | 1536 |
| Azure OpenAI | text-embedding-3-small | 1536 |
| Gemini | text-embedding-004 | 768 |
| Snowflake | arctic-embed-m-v1.5 | 768 |
| Elasticsearch | multilingual-e5-small | 384 |
| Ollama | nomic-embed-text (typically) | 768 |
Set `preferred_provider` in `config_embedding.yaml`. **This must match what you used at ingest time** — the most common NLWeb bug is changing the embedding provider after data is loaded, then getting empty results.
### ModelRouter
NLWeb's `ModelRouter/` subsystem is a cost/quality router that picks the right model tier (high vs low) per call site. It's still evolving — verify whether it's active in your release.
### Why So Many Providers?
R.V. Guha's design goal: NLWeb should run on whatever LLM stack the site operator already has. A Snowflake customer uses Cortex; an Azure shop uses Azure OpenAI; a privacy-conscious deployment uses Ollama on prem. The provider abstraction is intentional.
## Implementation Guidance
### Switching the Primary LLM Provider
In `config_llm.yaml`:
```yaml
preferred_endpoint: anthropic
providers:
anthropic:
high: claude-3-7-sonnet-latest
low: claude-3-5-haiku-latest
api_key_env: ANTHROPIC_API_KEY
```
Set `ANTHROPIC_API_KEY` in `.env`. Restart the server.
### Running Locally with Ollama (Offline)
Install Ollama, pull a model:
```bash
ollama pull llama3.1:8b
ollama pull nomic-embed-text
```
In `config_llm.yaml`:
```yaml
preferred_endpoint: ollama
providers:
ollama:
high: llama3.1:8b
low: llama3.1:8b
base_url: http://localhost:11434
```
In `config_embedding.yaml`:
```yaml
preferred_provider: ollama
providers:
ollama:
model: nomic-embed-text
dim: 768
```
Important: re-ingest after switching embedding provider — old vectors are now wrong-dim.
### Adding a Custom Provider
1. Subclass the base class in `llm_providers/` (look at `openai.py` or `anthropic.py` as templates).
2. Implement the required methods (typically `complete()` returning JSON-conformant output for the `<returnStruc>` schemas, plus optional streaming).
3. Register in the provider factory (verify exact location — usually a registry in `core/llm.py`).
4. Add an entry in `config_llm.yaml`.
5. Test against a known-good `<returnStruc>` prompt before deploying.
### Tuning Cost
- Use `low` tier for everything except the final generate (default behavior — verify).
- Set `tool_selection_enabled: false` in `config_nlweb.yaml` to skip the router call entirely.
- Disable `who_endpoint_enabled` to skip federated discovery.
- Pre-compute `decontextualized_query` client-side to skip that LLM call.
### Switching Embedding Providers Safely
```bash
# 1. Stop serving traffic
# 2. Change config_embedding.yaml
# 3. Drop the index
python -m data_loading.db_load --only-delete delete-site <site>
# 4. Re-ingest
python -m data_loading.db_load <source> <site>
# 5. Restart
```
You cannot mix-and-match embedding providers across a single retrieval index. Vectors are not portable across providers.
### Verifying Provider Wiring
`nlweb check` runs connectivity diagnostics for all configured providers. Use it before debugging "the model isn't responding" issues — the answer is usually a missing env var.
### Provider Failure Modes
- **OpenAI / Anthropic / Gemini 429s**: rate limits. Add backoff in the provider class or reduce concurrency.
- **Azure OpenAI 404 on deployment**: the `deployment_name` in config doesn't match what's deployed in Azure. They're per-deployment, not per-model.
- **Ollama "model not found"**: `ollama pull <model>` first.
- **Snowflake Cortex authentication**: requires the warehouse + role to have Cortex enabled.
- **HuggingFace inference endpoint cold-start**: first call takes 30-60s. Pre-warm.
Always re-fetch `config_llm.yaml` from the live repo — provider keys and model IDs change.Related Skills
nlweb-tools-framework
Design and implement NLWeb tools — the per-Schema.org-type handlers that turn a query into a specialized response (search, item_details, compare_items, ensemble, recipe_substitution, accompaniment, conversation_search, etc.). Covers `tools.xml`, the ToolSelector router, builtin handlers in `methods/`, writing a custom tool with a `<returnStruc>` contract, and disabling tool selection for raw retrieval. Use when extending NLWeb beyond the default query → results flow.
nlweb-setup
Bootstrap a local NLWeb development environment from scratch — clone the repo, configure .env, install Python deps via `nlweb init-python`, run `nlweb init` for interactive LLM/retrieval selection, load sample Schema.org data, and verify with `nlweb check`. Use when starting a new NLWeb deployment from zero.
nlweb-schema-org-grounding
Prepare and structure site content as Schema.org JSON-LD for NLWeb ingestion — covers the supported types (Recipe, Product, Movie, Event, Article, RealEstate, Course, etc.), per-type behavior in NLWeb's tool routing, JSON-LD embedding patterns in HTML, sites.xml registration, and how the `schema_object` flows through ranking back to agent results. Use when authoring or auditing the structured data on a site that will be exposed via NLWeb.
nlweb-retrieval-backends
Choose and configure NLWeb retrieval backends — Qdrant (local + remote), Azure AI Search, Elasticsearch, OpenSearch (with/without k-NN), Postgres pgvector, Milvus, Snowflake Cortex Search, Cloudflare AutoRAG, Shopify MCP, and Bing Web Search. Covers `config_retrieval.yaml`, the single `write_endpoint` rule, parallel read-fanout with URL dedup, and per-backend setup pages. Use when picking a retrieval store, migrating between backends, or debugging "results are empty."
nlweb-prompts-customization
Customize NLWeb's LLM prompts and per-Schema.org-type behavior via `prompts.xml` and `site_types.xml` — covers the `<promptString>` template format, `<returnStruc>` JSON schemas, prompt inheritance, decontextualization/ranking/generate templates, per-site overrides, and pitfalls of editing prompts in place. Use when tuning answer quality, supporting a new domain, or localizing prompts.
nlweb-mcp-server
Expose NLWeb as an MCP (Model Context Protocol) server — JSON-RPC 2.0 endpoint at /mcp, the `ask` / `list_sites` / `who` tools, MCP protocol version 2024-11-05, and integration with ChatGPT, Claude, Gemini, and other agent clients. Use when wiring NLWeb to an AI agent via MCP or building an MCP client that consumes an NLWeb site.
nlweb-data-loading
Ingest site content into NLWeb's vector store using `db_load.py` — supports RSS/Atom feeds, Schema.org JSON-LD, sitemap-driven URL lists, and CSV. Covers chunking, embedding computation, site partitioning, batch sizing, delete-and-reload, and per-backend write_endpoint targeting. Use when bootstrapping a site's index, refreshing content, or migrating between retrieval backends.
nlweb-chatgpt-appsdk
Integrate NLWeb with ChatGPT's Apps SDK — the Node.js MCP server in `openai-apps-sdk-integration/`, the `nlweb-list` tool, the React widget at `ui://widget/nlweb-list.html`, and the port-8100 AppSDK adapter that translates NLWeb's message list to OpenAI Apps SDK envelopes. Use when publishing an NLWeb site as a ChatGPT app or wiring NLWeb results into an Apps SDK widget.
nlweb-auth-multitenancy
Configure NLWeb authentication and multi-tenant deployments — OAuth providers (GitHub, Google, Microsoft, Facebook), session storage, the `sites:` allowlist in `config_nlweb.yaml`, conversation persistence per authenticated user, and per-tenant data isolation. Use when adding login to an NLWeb instance, hosting multiple customers on one deployment, or persisting conversation history.
nlweb-ask-endpoint
Implement and consume the NLWeb /ask REST endpoint — request shape (GET/POST, query-string and v0.55 structured body), SSE streaming response, modes (list/summarize/generate), in-stream "message_type" headers, error envelopes, and client-side parsing. Use when building an NLWeb server route, calling /ask from a custom agent, or debugging /ask responses.
woo-testing
Test WooCommerce extensions — PHPUnit unit/integration tests, WP test suite, WooCommerce test helpers, E2E with Playwright, and WP-CLI test scaffolding. Use when writing tests for WooCommerce plugins or setting up a test environment.
woo-shipping
Build WooCommerce shipping methods — WC_Shipping_Method, shipping zones, shipping classes, rate calculation, tracking, and integration with carriers. Use when creating custom shipping integrations or configuring shipping logic.