nlweb-setup

Bootstrap a local NLWeb development environment from scratch — clone the repo, configure .env, install Python deps via `nlweb init-python`, run `nlweb init` for interactive LLM/retrieval selection, load sample Schema.org data, and verify with `nlweb check`. Use when starting a new NLWeb deployment from zero.

17 stars

Best use case

nlweb-setup is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Bootstrap a local NLWeb development environment from scratch — clone the repo, configure .env, install Python deps via `nlweb init-python`, run `nlweb init` for interactive LLM/retrieval selection, load sample Schema.org data, and verify with `nlweb check`. Use when starting a new NLWeb deployment from zero.

Teams using nlweb-setup should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/nlweb-setup/SKILL.md --create-dirs "https://raw.githubusercontent.com/OrcaQubits/agentic-commerce-skills-plugins/main/dist/antigravity/nlweb-protocol/.agent/skills/nlweb-setup/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/nlweb-setup/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How nlweb-setup Compares

Feature / Agentnlweb-setupStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Bootstrap a local NLWeb development environment from scratch — clone the repo, configure .env, install Python deps via `nlweb init-python`, run `nlweb init` for interactive LLM/retrieval selection, load sample Schema.org data, and verify with `nlweb check`. Use when starting a new NLWeb deployment from zero.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# NLWeb Setup

## Before writing code

**Fetch live docs first**:
1. Fetch https://github.com/nlweb-ai/NLWeb (README) for the current minimum Python version and required deps.
2. Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-hello-world.md for the canonical hello-world flow.
3. Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-cli.md for current `nlweb` CLI flags.
4. Web-search `site:github.com/nlweb-ai/NLWeb docs/release_notes` and read the **most recent** dated release note — config keys and required env vars change between releases.
5. Identify the default `write_endpoint` and verify which backends are enabled by default in `config/config_retrieval.yaml` on `main`.

## Conceptual Architecture

### What "setup" produces

A working NLWeb dev environment has four parts:

1. **Cloned repo** + Python virtualenv with requirements installed.
2. **`.env`** file with provider credentials (OpenAI/Azure OpenAI key + retrieval backend secrets).
3. **Sample data** ingested into the local vector store (Qdrant local by default).
4. **A running aiohttp server** on `:8000` with `/ask`, `/mcp`, `/sites` reachable.

### Three Default-Enabled Backends — Watch Out

NLWeb ships with **three retrieval backends enabled by default** in `config_retrieval.yaml`:
- `qdrant_local` (file-backed, fine for dev)
- `nlweb_west` (Azure AI Search — requires Azure credentials)
- `shopify_mcp` (queries Shopify's MCP endpoint, requires network)

For most local-dev cases, disable the latter two by setting `enabled: false` so you don't get connection errors at startup. The `write_endpoint` should point to `qdrant_local` for dev.

### Setup Decision Checklist

- **LLM provider** — OpenAI, Azure OpenAI (default), Anthropic, Gemini, Ollama (offline), Snowflake Cortex?
- **Embedding provider** — must match between ingest and query; default is `text-embedding-3-small` on Azure OpenAI.
- **Retrieval write endpoint** — Qdrant local for dev, Azure AI Search / Snowflake Cortex / pgvector for prod.
- **Data source** — Schema.org JSON-LD on the site, RSS/Atom feed, sitemap.xml, or CSV?
- **Mode** — `development` (allows query-string config overrides) or `production` in `config_webserver.yaml`?
- **OAuth** — anonymous-only, or login-gated (GitHub/Google/Microsoft/Facebook)?

### Project Layout (after setup)

```
NLWeb/                                 # cloned repo
├── AskAgent/python/
│   ├── app-aiohttp.py                 # main entry
│   ├── core/, methods/, webserver/    # core code
│   ├── llm_providers/, embedding_providers/, retrieval_providers/
│   └── data_loading/
├── config/
│   ├── config_llm.yaml
│   ├── config_embedding.yaml
│   ├── config_retrieval.yaml
│   ├── config_nlweb.yaml
│   ├── config_webserver.yaml
│   ├── config_oauth.yaml
│   ├── config_storage.yaml
│   ├── config_tools.yaml
│   ├── site_types.xml
│   └── prompts.xml
├── data/db/                           # qdrant_local file store
├── .env                               # YOUR credentials (gitignored)
└── docs/, scripts/, demo/, tests/
```

### Setup Sequence

1. `git clone https://github.com/nlweb-ai/NLWeb && cd NLWeb`
2. `nlweb init-python` (or manual `python -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt`)
3. `nlweb init` — interactive prompts walk through LLM + retrieval selection and write `.env`
4. Disable the unwanted default backends in `config/config_retrieval.yaml` (`nlweb_west`, `shopify_mcp` for local-only dev)
5. `nlweb data-load <source> <site-name>` — ingest sample content (use a small RSS feed for first run)
6. `nlweb check` — runs connectivity diagnostics; resolve any red flags
7. `nlweb app` — start the server, hit `http://localhost:8000/`
8. Test `/ask?query=hello&site=<site-name>&streaming=false`

### .env Conventions

NLWeb expects credentials via env vars (never YAML). Common keys (verify live):
- `OPENAI_API_KEY`, `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`
- `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`
- `AZURE_SEARCH_ENDPOINT`, `AZURE_SEARCH_API_KEY`
- `QDRANT_API_KEY` (only for remote Qdrant)
- `SNOWFLAKE_USER`, `SNOWFLAKE_PASSWORD`, `SNOWFLAKE_ACCOUNT`
- `CLOUDFLARE_API_TOKEN`, `CLOUDFLARE_ACCOUNT_ID`

### Verification Targets

After setup, these should work:
- `curl http://localhost:8000/sites` → JSON list including your loaded site
- `curl 'http://localhost:8000/ask?query=test&site=<your-site>&streaming=false'` → JSON with results
- `curl -X POST http://localhost:8000/mcp -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'` → `ask`, `list_sites`, optionally `who`

### Common Setup Failures

- **`nlweb check` fails on Azure**: usually `AZURE_OPENAI_ENDPOINT` missing trailing slash or wrong deployment name.
- **Embedding dim mismatch on retrieval**: data was loaded with a different embedding provider than runtime config. Either re-ingest or change `preferred_provider` in `config_embedding.yaml`.
- **Server starts but `/ask` returns empty**: site name in the query doesn't match the `site` value used during ingest, or the `sites:` allowlist in `config_nlweb.yaml` excludes it.
- **Slow first request**: cold model loading + `/who` endpoint pinging `nlwm.azurewebsites.net`. Disable `who_endpoint_enabled` for offline dev.

Always re-verify against the latest hello-world doc — the exact env-var names and CLI flags change.

Related Skills

woo-setup

17
from OrcaQubits/agentic-commerce-skills-plugins

Install WooCommerce, configure the development stack, and set up a local dev environment with WP-CLI, Docker, or wp-env. Use when setting up a new WooCommerce project or development environment.

webmcp-setup

17
from OrcaQubits/agentic-commerce-skills-plugins

Set up a WebMCP project — enable Chrome flags, install MCP-B polyfill, scaffold tool registration, and configure development environment. Use when starting a new WebMCP-enabled website from scratch.

ucp-setup

17
from OrcaQubits/agentic-commerce-skills-plugins

Set up a UCP project — scaffold a merchant server or platform client with discovery profile, SDK installation, and project structure. Use when starting a new UCP implementation.

mpp-setup

17
from OrcaQubits/agentic-commerce-skills-plugins

Scaffold an MPP project — install mppx SDK, configure payment methods, set up server middleware, and create a basic paid API endpoint. Use when starting a new MPP machine payments project from scratch.

spree-setup

17
from OrcaQubits/agentic-commerce-skills-plugins

Bootstrap a new Spree project — `create-spree-app` CLI (v5.2+), `spree-starter` Rails backend, the Next.js storefront repo, `bin/rails g spree:install`, sample data, Docker Compose, and the PostgreSQL + Redis + Sidekiq prerequisites. Use when starting a new Spree project from scratch or onboarding an existing repo.

shopify-setup

17
from OrcaQubits/agentic-commerce-skills-plugins

Set up a Shopify development environment — Shopify CLI installation, Partner account, development stores, environment variables, project structures for themes, apps, and Hydrogen. Use when starting a new Shopify project.

sf-setup

17
from OrcaQubits/agentic-commerce-skills-plugins

Set up a Salesforce Commerce development environment — sfcc-ci CLI for B2C, sf CLI for B2B, Business Manager access, sandbox management, dw.json configuration, .sfdx project setup, and project structures for SFRA, PWA Kit, and Lightning. Use when starting a new Salesforce Commerce project.

saleor-setup

17
from OrcaQubits/agentic-commerce-skills-plugins

Set up a Saleor development environment — saleor-platform Docker Compose, CLI, PostgreSQL/Redis prerequisites, manage.py commands, environment variables, project structure. Use when starting a new Saleor project.

nlweb-tools-framework

17
from OrcaQubits/agentic-commerce-skills-plugins

Design and implement NLWeb tools — the per-Schema.org-type handlers that turn a query into a specialized response (search, item_details, compare_items, ensemble, recipe_substitution, accompaniment, conversation_search, etc.). Covers `tools.xml`, the ToolSelector router, builtin handlers in `methods/`, writing a custom tool with a `<returnStruc>` contract, and disabling tool selection for raw retrieval. Use when extending NLWeb beyond the default query → results flow.

nlweb-schema-org-grounding

17
from OrcaQubits/agentic-commerce-skills-plugins

Prepare and structure site content as Schema.org JSON-LD for NLWeb ingestion — covers the supported types (Recipe, Product, Movie, Event, Article, RealEstate, Course, etc.), per-type behavior in NLWeb's tool routing, JSON-LD embedding patterns in HTML, sites.xml registration, and how the `schema_object` flows through ranking back to agent results. Use when authoring or auditing the structured data on a site that will be exposed via NLWeb.

nlweb-retrieval-backends

17
from OrcaQubits/agentic-commerce-skills-plugins

Choose and configure NLWeb retrieval backends — Qdrant (local + remote), Azure AI Search, Elasticsearch, OpenSearch (with/without k-NN), Postgres pgvector, Milvus, Snowflake Cortex Search, Cloudflare AutoRAG, Shopify MCP, and Bing Web Search. Covers `config_retrieval.yaml`, the single `write_endpoint` rule, parallel read-fanout with URL dedup, and per-backend setup pages. Use when picking a retrieval store, migrating between backends, or debugging "results are empty."

nlweb-prompts-customization

17
from OrcaQubits/agentic-commerce-skills-plugins

Customize NLWeb's LLM prompts and per-Schema.org-type behavior via `prompts.xml` and `site_types.xml` — covers the `<promptString>` template format, `<returnStruc>` JSON schemas, prompt inheritance, decontextualization/ranking/generate templates, per-site overrides, and pitfalls of editing prompts in place. Use when tuning answer quality, supporting a new domain, or localizing prompts.