nlweb-setup
Bootstrap a local NLWeb development environment from scratch — clone the repo, configure .env, install Python deps via `nlweb init-python`, run `nlweb init` for interactive LLM/retrieval selection, load sample Schema.org data, and verify with `nlweb check`. Use when starting a new NLWeb deployment from zero.
Best use case
nlweb-setup is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Bootstrap a local NLWeb development environment from scratch — clone the repo, configure .env, install Python deps via `nlweb init-python`, run `nlweb init` for interactive LLM/retrieval selection, load sample Schema.org data, and verify with `nlweb check`. Use when starting a new NLWeb deployment from zero.
Teams using nlweb-setup should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/nlweb-setup/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How nlweb-setup Compares
| Feature / Agent | nlweb-setup | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Bootstrap a local NLWeb development environment from scratch — clone the repo, configure .env, install Python deps via `nlweb init-python`, run `nlweb init` for interactive LLM/retrieval selection, load sample Schema.org data, and verify with `nlweb check`. Use when starting a new NLWeb deployment from zero.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# NLWeb Setup
## Before writing code
**Fetch live docs first**:
1. Fetch https://github.com/nlweb-ai/NLWeb (README) for the current minimum Python version and required deps.
2. Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-hello-world.md for the canonical hello-world flow.
3. Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-cli.md for current `nlweb` CLI flags.
4. Web-search `site:github.com/nlweb-ai/NLWeb docs/release_notes` and read the **most recent** dated release note — config keys and required env vars change between releases.
5. Identify the default `write_endpoint` and verify which backends are enabled by default in `config/config_retrieval.yaml` on `main`.
## Conceptual Architecture
### What "setup" produces
A working NLWeb dev environment has four parts:
1. **Cloned repo** + Python virtualenv with requirements installed.
2. **`.env`** file with provider credentials (OpenAI/Azure OpenAI key + retrieval backend secrets).
3. **Sample data** ingested into the local vector store (Qdrant local by default).
4. **A running aiohttp server** on `:8000` with `/ask`, `/mcp`, `/sites` reachable.
### Three Default-Enabled Backends — Watch Out
NLWeb ships with **three retrieval backends enabled by default** in `config_retrieval.yaml`:
- `qdrant_local` (file-backed, fine for dev)
- `nlweb_west` (Azure AI Search — requires Azure credentials)
- `shopify_mcp` (queries Shopify's MCP endpoint, requires network)
For most local-dev cases, disable the latter two by setting `enabled: false` so you don't get connection errors at startup. The `write_endpoint` should point to `qdrant_local` for dev.
### Setup Decision Checklist
- **LLM provider** — OpenAI, Azure OpenAI (default), Anthropic, Gemini, Ollama (offline), Snowflake Cortex?
- **Embedding provider** — must match between ingest and query; default is `text-embedding-3-small` on Azure OpenAI.
- **Retrieval write endpoint** — Qdrant local for dev, Azure AI Search / Snowflake Cortex / pgvector for prod.
- **Data source** — Schema.org JSON-LD on the site, RSS/Atom feed, sitemap.xml, or CSV?
- **Mode** — `development` (allows query-string config overrides) or `production` in `config_webserver.yaml`?
- **OAuth** — anonymous-only, or login-gated (GitHub/Google/Microsoft/Facebook)?
### Project Layout (after setup)
```
NLWeb/ # cloned repo
├── AskAgent/python/
│ ├── app-aiohttp.py # main entry
│ ├── core/, methods/, webserver/ # core code
│ ├── llm_providers/, embedding_providers/, retrieval_providers/
│ └── data_loading/
├── config/
│ ├── config_llm.yaml
│ ├── config_embedding.yaml
│ ├── config_retrieval.yaml
│ ├── config_nlweb.yaml
│ ├── config_webserver.yaml
│ ├── config_oauth.yaml
│ ├── config_storage.yaml
│ ├── config_tools.yaml
│ ├── site_types.xml
│ └── prompts.xml
├── data/db/ # qdrant_local file store
├── .env # YOUR credentials (gitignored)
└── docs/, scripts/, demo/, tests/
```
### Setup Sequence
1. `git clone https://github.com/nlweb-ai/NLWeb && cd NLWeb`
2. `nlweb init-python` (or manual `python -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt`)
3. `nlweb init` — interactive prompts walk through LLM + retrieval selection and write `.env`
4. Disable the unwanted default backends in `config/config_retrieval.yaml` (`nlweb_west`, `shopify_mcp` for local-only dev)
5. `nlweb data-load <source> <site-name>` — ingest sample content (use a small RSS feed for first run)
6. `nlweb check` — runs connectivity diagnostics; resolve any red flags
7. `nlweb app` — start the server, hit `http://localhost:8000/`
8. Test `/ask?query=hello&site=<site-name>&streaming=false`
### .env Conventions
NLWeb expects credentials via env vars (never YAML). Common keys (verify live):
- `OPENAI_API_KEY`, `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`
- `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`
- `AZURE_SEARCH_ENDPOINT`, `AZURE_SEARCH_API_KEY`
- `QDRANT_API_KEY` (only for remote Qdrant)
- `SNOWFLAKE_USER`, `SNOWFLAKE_PASSWORD`, `SNOWFLAKE_ACCOUNT`
- `CLOUDFLARE_API_TOKEN`, `CLOUDFLARE_ACCOUNT_ID`
### Verification Targets
After setup, these should work:
- `curl http://localhost:8000/sites` → JSON list including your loaded site
- `curl 'http://localhost:8000/ask?query=test&site=<your-site>&streaming=false'` → JSON with results
- `curl -X POST http://localhost:8000/mcp -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'` → `ask`, `list_sites`, optionally `who`
### Common Setup Failures
- **`nlweb check` fails on Azure**: usually `AZURE_OPENAI_ENDPOINT` missing trailing slash or wrong deployment name.
- **Embedding dim mismatch on retrieval**: data was loaded with a different embedding provider than runtime config. Either re-ingest or change `preferred_provider` in `config_embedding.yaml`.
- **Server starts but `/ask` returns empty**: site name in the query doesn't match the `site` value used during ingest, or the `sites:` allowlist in `config_nlweb.yaml` excludes it.
- **Slow first request**: cold model loading + `/who` endpoint pinging `nlwm.azurewebsites.net`. Disable `who_endpoint_enabled` for offline dev.
Always re-verify against the latest hello-world doc — the exact env-var names and CLI flags change.Related Skills
woo-setup
Install WooCommerce, configure the development stack, and set up a local dev environment with WP-CLI, Docker, or wp-env. Use when setting up a new WooCommerce project or development environment.
webmcp-setup
Set up a WebMCP project — enable Chrome flags, install MCP-B polyfill, scaffold tool registration, and configure development environment. Use when starting a new WebMCP-enabled website from scratch.
ucp-setup
Set up a UCP project — scaffold a merchant server or platform client with discovery profile, SDK installation, and project structure. Use when starting a new UCP implementation.
mpp-setup
Scaffold an MPP project — install mppx SDK, configure payment methods, set up server middleware, and create a basic paid API endpoint. Use when starting a new MPP machine payments project from scratch.
spree-setup
Bootstrap a new Spree project — `create-spree-app` CLI (v5.2+), `spree-starter` Rails backend, the Next.js storefront repo, `bin/rails g spree:install`, sample data, Docker Compose, and the PostgreSQL + Redis + Sidekiq prerequisites. Use when starting a new Spree project from scratch or onboarding an existing repo.
shopify-setup
Set up a Shopify development environment — Shopify CLI installation, Partner account, development stores, environment variables, project structures for themes, apps, and Hydrogen. Use when starting a new Shopify project.
sf-setup
Set up a Salesforce Commerce development environment — sfcc-ci CLI for B2C, sf CLI for B2B, Business Manager access, sandbox management, dw.json configuration, .sfdx project setup, and project structures for SFRA, PWA Kit, and Lightning. Use when starting a new Salesforce Commerce project.
saleor-setup
Set up a Saleor development environment — saleor-platform Docker Compose, CLI, PostgreSQL/Redis prerequisites, manage.py commands, environment variables, project structure. Use when starting a new Saleor project.
nlweb-tools-framework
Design and implement NLWeb tools — the per-Schema.org-type handlers that turn a query into a specialized response (search, item_details, compare_items, ensemble, recipe_substitution, accompaniment, conversation_search, etc.). Covers `tools.xml`, the ToolSelector router, builtin handlers in `methods/`, writing a custom tool with a `<returnStruc>` contract, and disabling tool selection for raw retrieval. Use when extending NLWeb beyond the default query → results flow.
nlweb-schema-org-grounding
Prepare and structure site content as Schema.org JSON-LD for NLWeb ingestion — covers the supported types (Recipe, Product, Movie, Event, Article, RealEstate, Course, etc.), per-type behavior in NLWeb's tool routing, JSON-LD embedding patterns in HTML, sites.xml registration, and how the `schema_object` flows through ranking back to agent results. Use when authoring or auditing the structured data on a site that will be exposed via NLWeb.
nlweb-retrieval-backends
Choose and configure NLWeb retrieval backends — Qdrant (local + remote), Azure AI Search, Elasticsearch, OpenSearch (with/without k-NN), Postgres pgvector, Milvus, Snowflake Cortex Search, Cloudflare AutoRAG, Shopify MCP, and Bing Web Search. Covers `config_retrieval.yaml`, the single `write_endpoint` rule, parallel read-fanout with URL dedup, and per-backend setup pages. Use when picking a retrieval store, migrating between backends, or debugging "results are empty."
nlweb-prompts-customization
Customize NLWeb's LLM prompts and per-Schema.org-type behavior via `prompts.xml` and `site_types.xml` — covers the `<promptString>` template format, `<returnStruc>` JSON schemas, prompt inheritance, decontextualization/ranking/generate templates, per-site overrides, and pitfalls of editing prompts in place. Use when tuning answer quality, supporting a new domain, or localizing prompts.