nlweb-ask-endpoint

Implement and consume the NLWeb /ask REST endpoint — request shape (GET/POST, query-string and v0.55 structured body), SSE streaming response, modes (list/summarize/generate), in-stream "message_type" headers, error envelopes, and client-side parsing. Use when building an NLWeb server route, calling /ask from a custom agent, or debugging /ask responses.

17 stars

byOrcaQubits

View on GitHub Installation ↓

Best use case

nlweb-ask-endpoint is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using nlweb-ask-endpoint should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/nlweb-ask-endpoint/SKILL.md --create-dirs "https://raw.githubusercontent.com/OrcaQubits/agentic-commerce-skills-plugins/main/dist/antigravity/nlweb-protocol/.agent/skills/nlweb-ask-endpoint/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/nlweb-ask-endpoint/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How nlweb-ask-endpoint Compares

Feature / Agent	nlweb-ask-endpoint	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# NLWeb /ask Endpoint

## Before writing code

**Fetch live spec**:
1. Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-rest-api.md for the canonical `/ask` contract — request params, response shape, and streaming format.
2. Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/life-of-a-chat-query.md to trace a request end-to-end.
3. Fetch https://github.com/nlweb-ai/NLWeb/blob/main/docs/nlweb-headers.md for the **in-stream "headers"** mechanism (license, data-retention, rate-limit messages are NOT HTTP headers).
4. Web-search the latest release notes for the **v0.55+ structured POST body** shape (`query.text`, `prefer.mode`, `prefer.streaming`, `meta.version`) — this is newer than the GET-only legacy contract.
5. Check `webserver/routes/api.py` in the live repo to confirm exact param names.

## Conceptual Architecture

### Routes

| Route | Method | Purpose |
|-------|--------|---------|
| `/ask` | GET, POST | Main NL query |
| `/who` | GET | Site relevance for a query (federated) |
| `/sites` | GET | List configured sites |
| `/config` | GET | Public config (safe subset) |

### Request Parameters

Verify exact names against the live `routes/api.py`. Stable subset:

| Param | Type | Required | Default | Notes |
|-------|------|----------|---------|-------|
| `query` | string | yes | — | NL question |
| `site` | string | no | all | Backend partition; in MCP can be array |
| `prev` | string | no | — | Comma-separated previous queries (conversation context) |
| `decontextualized_query` | string | no | — | Pre-resolved query; skips server-side decontextualization |
| `streaming` | bool | no | `true` | `"0"` / `"false"` / `"False"` disables |
| `query_id` | string | no | auto | Echoed in response |
| `mode` | enum | no | `list` | `list` \| `summarize` \| `generate` |
| `scorer` | string | no | default | e.g., `nlwebscorer` for the neural reranker |
| `itemType` | string | no | — | Schema.org type hint (skip type detection) |
| `response_format` | string | no | — | v0.55 structured-body field |

### v0.55 Structured POST Body

The newer body format groups fields:

```json
{
  "query": { "text": "your question" },
  "context": { "prev": ["previous q1", "previous q2"] },
  "prefer": {
    "mode": "list",
    "streaming": true,
    "response_format": "schema"
  },
  "meta": { "version": "0.55" }
}
```

Verify the exact field names against the live docs before relying on this — fields are still settling.

### Streaming Response Format (SSE)

NLWeb uses Server-Sent Events when `streaming=true` (the default):

```
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
X-Accel-Buffering: no
```

Each chunk is:
```
data: <json>\n\n
```

The `<json>` is one of:
- A **message object** (header-like): `{"message_type": "license", "content": {...}}`
- A **partial result**: `{"results": [...]}` (results may arrive incrementally as FastTrack streams)
- A **terminal object**: `{"query_id": "...", "complete": true}` (exact field — verify live)

### In-Stream "Headers" (NLWS Mechanism)

NLWeb's "headers" are NOT HTTP response headers — they are JSON objects in the SSE stream with a `message_type` discriminator. Known types:

| message_type | Purpose |
|--------------|---------|
| `license` | Content license terms |
| `data_retention` | How long the agent may cache results |
| `cache_policy` | Caching directives |
| `ui_component` | Optional rendering hint |
| `usage_terms` | Acceptable use |
| `rate_limits` | Calls/sec / day budget |
| `data_freshness` | When the underlying data was last indexed |
| `api_version` | Server's NLWeb version |

**Client parsing rule**: don't assume `results` is the first chunk. Buffer message objects until you see the result stream or a terminal marker.

### Non-Streaming Response

With `streaming=false`, the server returns a single `application/json` body:

```json
{
  "query_id": "abc-123",
  "messages": [{"message_type": "license", "content": {...}}, ...],
  "results": [
    {
      "url": "https://example.com/article/x",
      "name": "Article X",
      "site": "example",
      "score": 0.83,
      "description": "...",
      "schema_object": { "@type": "Article", "@context": "https://schema.org", ... }
    }
  ]
}
```

The `schema_object` is the original Schema.org JSON-LD that was indexed — this is what makes NLWeb results **agent-actionable**, not just text snippets.

### Three Modes

| Mode | Behavior | Use case |
|------|----------|----------|
| `list` | Return ranked Schema.org results, no LLM synthesis | Agent does its own rendering / re-ranking |
| `summarize` | LLM condenses top results into a short answer + still returns results | Conversational UIs |
| `generate` | Full RAG — LLM synthesizes an answer grounded in results | Q&A endpoints |

### Errors

For `/ask`, errors generally come back as 500 with a JSON envelope. For `/mcp`, errors use JSON-RPC 2.0:

```json
{
  "jsonrpc": "2.0",
  "id": 1,
  "error": { "code": -32603, "message": "Internal error", "data": {...} }
}
```

Always check status code before parsing — partial SSE streams can drop with 200 followed by silence.

## Implementation Guidance

### Server-Side (extending the route)

If you need to extend `/ask` (e.g., add an auth check or custom param):
1. Locate `webserver/routes/api.py`
2. Add middleware in `webserver/middleware/` rather than modifying the route directly — keeps you upgrade-safe
3. Forward to `NLWebHandler` (`core/baseHandler.py`) unchanged so the streaming + ranking pipeline still runs

### Client-Side (calling /ask)

```python
# Python streaming client (sketch — verify response shape against the live spec)
import httpx, json

async with httpx.AsyncClient() as client:
    async with client.stream("GET", "http://localhost:8000/ask",
                              params={"query": "best running shoes", "site": "shoes", "mode": "generate"}) as r:
        async for line in r.aiter_lines():
            if not line.startswith("data: "):
                continue
            obj = json.loads(line[6:])
            if "message_type" in obj:
                handle_header(obj)
            elif "results" in obj:
                handle_results(obj["results"])
```

### When to use which mode

- Agent that re-ranks and selects on its own → `mode=list`
- Quick conversational answer with citations → `mode=summarize`
- Single synthesized answer (chatbot-style) → `mode=generate`

### Debugging /ask

- Set `streaming=false` first — easier to inspect a single JSON body.
- Add `decontextualized_query` to bypass query rewriting and isolate ranking issues.
- Try `mode=list` to see the raw retrieval — if results are bad here, the problem is ingest/embeddings, not the LLM.
- Pass `query_id` and grep server logs for it.
- Disable `tool_selection_enabled` in `config_nlweb.yaml` to bypass the router and force straight retrieval.

Always verify the exact param names and message_type values against the live spec — they evolve.

Related Skills

nlweb-tools-framework

from OrcaQubits/agentic-commerce-skills-plugins

Design and implement NLWeb tools — the per-Schema.org-type handlers that turn a query into a specialized response (search, item_details, compare_items, ensemble, recipe_substitution, accompaniment, conversation_search, etc.). Covers `tools.xml`, the ToolSelector router, builtin handlers in `methods/`, writing a custom tool with a `<returnStruc>` contract, and disabling tool selection for raw retrieval. Use when extending NLWeb beyond the default query → results flow.

nlweb-setup

from OrcaQubits/agentic-commerce-skills-plugins

Bootstrap a local NLWeb development environment from scratch — clone the repo, configure .env, install Python deps via `nlweb init-python`, run `nlweb init` for interactive LLM/retrieval selection, load sample Schema.org data, and verify with `nlweb check`. Use when starting a new NLWeb deployment from zero.

nlweb-schema-org-grounding

from OrcaQubits/agentic-commerce-skills-plugins

Prepare and structure site content as Schema.org JSON-LD for NLWeb ingestion — covers the supported types (Recipe, Product, Movie, Event, Article, RealEstate, Course, etc.), per-type behavior in NLWeb's tool routing, JSON-LD embedding patterns in HTML, sites.xml registration, and how the `schema_object` flows through ranking back to agent results. Use when authoring or auditing the structured data on a site that will be exposed via NLWeb.

nlweb-retrieval-backends

from OrcaQubits/agentic-commerce-skills-plugins

Choose and configure NLWeb retrieval backends — Qdrant (local + remote), Azure AI Search, Elasticsearch, OpenSearch (with/without k-NN), Postgres pgvector, Milvus, Snowflake Cortex Search, Cloudflare AutoRAG, Shopify MCP, and Bing Web Search. Covers `config_retrieval.yaml`, the single `write_endpoint` rule, parallel read-fanout with URL dedup, and per-backend setup pages. Use when picking a retrieval store, migrating between backends, or debugging "results are empty."

nlweb-prompts-customization

from OrcaQubits/agentic-commerce-skills-plugins

Customize NLWeb's LLM prompts and per-Schema.org-type behavior via `prompts.xml` and `site_types.xml` — covers the `<promptString>` template format, `<returnStruc>` JSON schemas, prompt inheritance, decontextualization/ranking/generate templates, per-site overrides, and pitfalls of editing prompts in place. Use when tuning answer quality, supporting a new domain, or localizing prompts.

nlweb-mcp-server

from OrcaQubits/agentic-commerce-skills-plugins

Expose NLWeb as an MCP (Model Context Protocol) server — JSON-RPC 2.0 endpoint at /mcp, the `ask` / `list_sites` / `who` tools, MCP protocol version 2024-11-05, and integration with ChatGPT, Claude, Gemini, and other agent clients. Use when wiring NLWeb to an AI agent via MCP or building an MCP client that consumes an NLWeb site.

nlweb-llm-providers

from OrcaQubits/agentic-commerce-skills-plugins

Configure NLWeb LLM and embedding providers — OpenAI, Azure OpenAI (default), Anthropic, Google Gemini, DeepSeek on Azure, Llama on Azure, HuggingFace, Inception Labs, Snowflake Cortex, Ollama, Pi Labs. Covers `config_llm.yaml` high/low tier model selection, the ModelRouter cost/quality routing logic, `config_embedding.yaml`, and adding a custom provider. Use when picking models, tuning cost, or wiring a new LLM backend.

nlweb-data-loading

from OrcaQubits/agentic-commerce-skills-plugins

Ingest site content into NLWeb's vector store using `db_load.py` — supports RSS/Atom feeds, Schema.org JSON-LD, sitemap-driven URL lists, and CSV. Covers chunking, embedding computation, site partitioning, batch sizing, delete-and-reload, and per-backend write_endpoint targeting. Use when bootstrapping a site's index, refreshing content, or migrating between retrieval backends.

nlweb-chatgpt-appsdk

from OrcaQubits/agentic-commerce-skills-plugins

Integrate NLWeb with ChatGPT's Apps SDK — the Node.js MCP server in `openai-apps-sdk-integration/`, the `nlweb-list` tool, the React widget at `ui://widget/nlweb-list.html`, and the port-8100 AppSDK adapter that translates NLWeb's message list to OpenAI Apps SDK envelopes. Use when publishing an NLWeb site as a ChatGPT app or wiring NLWeb results into an Apps SDK widget.

nlweb-auth-multitenancy

from OrcaQubits/agentic-commerce-skills-plugins

Configure NLWeb authentication and multi-tenant deployments — OAuth providers (GitHub, Google, Microsoft, Facebook), session storage, the `sites:` allowlist in `config_nlweb.yaml`, conversation persistence per authenticated user, and per-tenant data isolation. Use when adding login to an NLWeb instance, hosting multiple customers on one deployment, or persisting conversation history.

woo-testing

from OrcaQubits/agentic-commerce-skills-plugins

Test WooCommerce extensions — PHPUnit unit/integration tests, WP test suite, WooCommerce test helpers, E2E with Playwright, and WP-CLI test scaffolding. Use when writing tests for WooCommerce plugins or setting up a test environment.

woo-shipping

from OrcaQubits/agentic-commerce-skills-plugins

Build WooCommerce shipping methods — WC_Shipping_Method, shipping zones, shipping classes, rate calculation, tracking, and integration with carriers. Use when creating custom shipping integrations or configuring shipping logic.