cloudflare-vectorize

Build semantic search with Cloudflare Vectorize V2 (Sept 2024 GA). Covers V2 breaking changes: async mutations, 5M vectors/index (was 200K), 31ms latency (was 549ms), returnMetadata enum, and V1 deprecation (Dec 2024). Use when: migrating V1→V2, handling async mutations with mutationId, creating metadata indexes before insert, or troubleshooting "returnMetadata must be 'all'", V2 timing issues, metadata index errors, dimension mismatches.

31 stars

byovachiever

View on GitHub Installation ↓

Best use case

cloudflare-vectorize is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using cloudflare-vectorize should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/cloudflare-vectorize/SKILL.md --create-dirs "https://raw.githubusercontent.com/ovachiever/droid-tings/main/skills/cloudflare-vectorize/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/cloudflare-vectorize/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How cloudflare-vectorize Compares

Feature / Agent	cloudflare-vectorize	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Cloudflare Vectorize

Complete implementation guide for Cloudflare Vectorize - a globally distributed vector database for building semantic search, RAG (Retrieval Augmented Generation), and AI-powered applications with Cloudflare Workers.

**Status**: Production Ready ✅
**Last Updated**: 2025-10-21
**Dependencies**: cloudflare-worker-base (for Worker setup), cloudflare-workers-ai (for embeddings)
**Latest Versions**: wrangler@4.43.0, @cloudflare/workers-types@4.20251014.0
**Token Savings**: ~65%
**Errors Prevented**: 8
**Dev Time Saved**: ~3 hours

## What This Skill Provides

### Core Capabilities
- ✅ **Index Management**: Create, configure, and manage vector indexes
- ✅ **Vector Operations**: Insert, upsert, query, delete, and list vectors
- ✅ **Metadata Filtering**: Advanced filtering with 10 metadata indexes per index
- ✅ **Semantic Search**: Find similar vectors using cosine, euclidean, or dot-product metrics
- ✅ **RAG Patterns**: Complete retrieval-augmented generation workflows
- ✅ **Workers AI Integration**: Native embedding generation with @cf/baai/bge-base-en-v1.5
- ✅ **OpenAI Integration**: Support for text-embedding-3-small/large models
- ✅ **Document Processing**: Text chunking and batch ingestion pipelines

### Templates Included
1. **basic-search.ts** - Simple vector search with Workers AI
2. **rag-chat.ts** - Full RAG chatbot with context retrieval
3. **document-ingestion.ts** - Document chunking and embedding pipeline
4. **metadata-filtering.ts** - Advanced filtering patterns

---

## ⚠️ Vectorize V2 Breaking Changes (September 2024)

**IMPORTANT**: Vectorize V2 became GA in September 2024 with significant breaking changes.

### What Changed in V2

**Performance Improvements**:
- **Index capacity**: 200,000 → **5 million vectors** per index
- **Query latency**: 549ms → **31ms** median (18× faster)
- **TopK limit**: 20 → **100** results per query
- **Scale limits**: 100 → **50,000 indexes** per account
- **Namespace limits**: 100 → **50,000 namespaces** per index

**Breaking API Changes**:
1. **Async Mutations** - All mutations now asynchronous:
   ```typescript
   // V2: Returns mutationId
   const result = await env.VECTORIZE_INDEX.insert(vectors);
   console.log(result.mutationId); // "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

   // Vector inserts/deletes may take a few seconds to be reflected
   ```

2. **returnMetadata Parameter** - Boolean → String enum:
   ```typescript
   // ❌ V1 (deprecated)
   { returnMetadata: true }

   // ✅ V2 (required)
   { returnMetadata: 'all' | 'indexed' | 'none' }
   ```

3. **Metadata Indexes Required Before Insert**:
   - V2 requires metadata indexes created BEFORE vectors inserted
   - Vectors added before metadata index won't be indexed
   - Must re-upsert vectors after creating metadata index

**V1 Deprecation Timeline**:
- **December 2024**: Can no longer create V1 indexes
- **Existing V1 indexes**: Continue to work (other operations unaffected)
- **Migration**: Use `wrangler vectorize --deprecated-v1` flag for V1 operations

**Wrangler Version Required**:
- **Minimum**: wrangler@3.71.0 for V2 commands
- **Recommended**: wrangler@4.43.0+ (latest)

### Check Mutation Status

```typescript
// Get index info to check last mutation processed
const info = await env.VECTORIZE_INDEX.describe();
console.log(info.mutationId); // Last mutation ID
console.log(info.processedUpToMutation); // Last processed timestamp
```

---

## Critical Setup Rules

### ⚠️ MUST DO BEFORE INSERTING VECTORS
```bash
# 1. Create the index with FIXED dimensions and metric
npx wrangler vectorize create my-index \
  --dimensions=768 \
  --metric=cosine

# 2. Create metadata indexes IMMEDIATELY (before inserting vectors!)
npx wrangler vectorize create-metadata-index my-index \
  --property-name=category \
  --type=string

npx wrangler vectorize create-metadata-index my-index \
  --property-name=timestamp \
  --type=number
```

**Why**: Metadata indexes MUST exist before vectors are inserted. Vectors added before a metadata index was created won't be filterable on that property.

### Index Configuration (Cannot Be Changed Later)

```bash
# Dimensions MUST match your embedding model output:
# - Workers AI @cf/baai/bge-base-en-v1.5: 768 dimensions
# - OpenAI text-embedding-3-small: 1536 dimensions
# - OpenAI text-embedding-3-large: 3072 dimensions

# Metrics determine similarity calculation:
# - cosine: Best for normalized embeddings (most common)
# - euclidean: Absolute distance between vectors
# - dot-product: For non-normalized vectors
```

## Wrangler Configuration

**wrangler.jsonc**:
```jsonc
{
  "name": "my-vectorize-worker",
  "main": "src/index.ts",
  "compatibility_date": "2025-10-21",
  "vectorize": [
    {
      "binding": "VECTORIZE_INDEX",
      "index_name": "my-index"
    }
  ],
  "ai": {
    "binding": "AI"
  }
}
```

## TypeScript Types

```typescript
export interface Env {
  VECTORIZE_INDEX: VectorizeIndex;
  AI: Ai;
}

interface VectorizeVector {
  id: string;
  values: number[] | Float32Array | Float64Array;
  namespace?: string;
  metadata?: Record<string, string | number | boolean | string[]>;
}

interface VectorizeMatches {
  matches: Array<{
    id: string;
    score: number;
    values?: number[];
    metadata?: Record<string, any>;
    namespace?: string;
  }>;
  count: number;
}
```

## Metadata Filter Operators (V2)

Vectorize V2 supports advanced metadata filtering with range queries:

```typescript
// Equality (implicit $eq)
{ category: "docs" }

// Not equals
{ status: { $ne: "archived" } }

// In/Not in arrays
{ category: { $in: ["docs", "tutorials"] } }
{ category: { $nin: ["deprecated", "draft"] } }

// Range queries (numbers) - NEW in V2
{ timestamp: { $gte: 1704067200, $lt: 1735689600 } }

// Range queries (strings) - prefix searching
{ url: { $gte: "/docs/workers", $lt: "/docs/workersz" } }

// Nested metadata with dot notation
{ "author.id": "user123" }

// Multiple conditions (implicit AND)
{ category: "docs", language: "en", "metadata.published": true }
```

## Metadata Best Practices

### 1. Cardinality Considerations

**Low Cardinality (Good for $eq filters)**:
```typescript
// Few unique values - efficient filtering
metadata: {
  category: "docs",        // ~10 categories
  language: "en",          // ~5 languages
  published: true          // 2 values (boolean)
}
```

**High Cardinality (Avoid in range queries)**:
```typescript
// Many unique values - avoid large range scans
metadata: {
  user_id: "uuid-v4...",         // Millions of unique values
  timestamp_ms: 1704067200123    // Use seconds instead
}
```

### 2. Metadata Limits

- **Max 10 metadata indexes** per Vectorize index
- **Max 10 KiB metadata** per vector
- **String indexes**: First 64 bytes (UTF-8)
- **Number indexes**: Float64 precision
- **Filter size**: Max 2048 bytes (compact JSON)

### 3. Key Restrictions

```typescript
// ❌ INVALID metadata keys
metadata: {
  "": "value",              // Empty key
  "user.name": "John",      // Contains dot (reserved for nesting)
  "$admin": true,           // Starts with $
  "key\"with\"quotes": 1    // Contains quotes
}

// ✅ VALID metadata keys
metadata: {
  "user_name": "John",
  "isAdmin": true,
  "nested": { "allowed": true }  // Access as "nested.allowed" in filters
}
```

## Common Errors & Solutions

### Error 1: Metadata Index Created After Vectors Inserted
```
Problem: Filtering doesn't work on existing vectors
Solution: Delete and re-insert vectors OR create metadata indexes BEFORE inserting
```

### Error 2: Dimension Mismatch
```
Problem: "Vector dimensions do not match index configuration"
Solution: Ensure embedding model output matches index dimensions:
  - Workers AI bge-base: 768
  - OpenAI small: 1536
  - OpenAI large: 3072
```

### Error 3: Invalid Metadata Keys
```
Problem: "Invalid metadata key"
Solution: Keys cannot:
  - Be empty
  - Contain . (dot)
  - Contain " (quote)
  - Start with $ (dollar sign)
```

### Error 4: Filter Too Large
```
Problem: "Filter exceeds 2048 bytes"
Solution: Simplify filter or split into multiple queries
```

### Error 5: Range Query on High Cardinality
```
Problem: Slow queries or reduced accuracy
Solution: Use lower cardinality fields for range queries, or use seconds instead of milliseconds for timestamps
```

### Error 6: Insert vs Upsert Confusion
```
Problem: Updates not reflecting in index
Solution: Use upsert() to overwrite existing vectors, not insert()
```

### Error 7: Missing Bindings
```
Problem: "VECTORIZE_INDEX is not defined"
Solution: Add [[vectorize]] binding to wrangler.jsonc
```

### Error 8: Namespace vs Metadata Confusion
```
Problem: Unclear when to use namespace vs metadata filtering
Solution:
  - Namespace: Partition key, applied BEFORE metadata filters
  - Metadata: Flexible key-value filtering within namespace
```

### Error 9: V2 Async Mutation Timing (NEW in V2)
```
Problem: Inserted vectors not immediately queryable
Solution: V2 mutations are asynchronous - vectors may take a few seconds to be reflected
  - Use mutationId to track mutation status
  - Check env.VECTORIZE_INDEX.describe() for processedUpToMutation timestamp
```

### Error 10: V1 returnMetadata Boolean (BREAKING in V2)
```
Problem: "returnMetadata must be 'all', 'indexed', or 'none'"
Solution: V2 changed returnMetadata from boolean to string enum:
  - ❌ V1: { returnMetadata: true }
  - ✅ V2: { returnMetadata: 'all' }
```

---

## V2 Migration Checklist

**If migrating from V1 to V2**:

1. ✅ Update wrangler to 3.71.0+ (`npm install -g wrangler@latest`)
2. ✅ Create new V2 index (can't upgrade V1 → V2)
3. ✅ Create metadata indexes BEFORE inserting vectors
4. ✅ Update `returnMetadata` boolean → string enum ('all', 'indexed', 'none')
5. ✅ Handle async mutations (expect `mutationId` in responses)
6. ✅ Test with V2 limits (topK up to 100, 5M vectors per index)
7. ✅ Update error handling for async behavior

**V1 Deprecation**:
- After December 2024: Cannot create new V1 indexes
- Existing V1 indexes: Continue to work
- Use `wrangler vectorize --deprecated-v1` for V1 operations

---

## Official Documentation

- **Vectorize V2 Docs**: https://developers.cloudflare.com/vectorize/
- **V2 Changelog**: https://developers.cloudflare.com/vectorize/platform/changelog/
- **V1 to V2 Migration**: https://developers.cloudflare.com/vectorize/reference/transition-vectorize-legacy/
- **Metadata Filtering**: https://developers.cloudflare.com/vectorize/reference/metadata-filtering/
- **Workers AI Models**: https://developers.cloudflare.com/workers-ai/models/

---

**Status**: Production Ready ✅ (Vectorize V2 GA - September 2024)
**Last Updated**: 2025-11-22
**Token Savings**: ~70%
**Errors Prevented**: 10 (includes V2 breaking changes)

Related Skills

cloudflare-workflows

from ovachiever/droid-tings

Build durable, long-running workflows on Cloudflare Workers with automatic retries, state persistence, and multi-step orchestration. Supports step.do, step.sleep, step.waitForEvent, and runs for hours to days. Use when: creating long-running workflows, implementing retry logic, building event-driven processes, coordinating API calls, scheduling multi-step tasks, or troubleshooting NonRetryableError, I/O context, serialization errors, or workflow execution failures. Keywords: cloudflare workflows, workflows workers, durable execution, workflow step, WorkflowEntrypoint, step.do, step.sleep, workflow retries, NonRetryableError, workflow state, wrangler workflows, workflow events, long-running tasks, step.sleepUntil, step.waitForEvent, workflow bindings

cloudflare-workers-ai

from ovachiever/droid-tings

Run LLMs and AI models on Cloudflare's global GPU network with Workers AI. Includes Llama, Flux image generation, BGE embeddings, and streaming support with AI Gateway for caching and logging. Use when: implementing LLM inference, generating images with Flux/Stable Diffusion, building RAG with embeddings, streaming AI responses, using AI Gateway for cost tracking, or troubleshooting AI_ERROR, rate limits, model not found, token limits, or neurons exceeded. Keywords: workers ai, cloudflare ai, ai bindings, llm workers, @cf/meta/llama, workers ai models, ai inference, cloudflare llm, ai streaming, text generation ai, ai embeddings, image generation ai, workers ai rag, ai gateway, llama workers, flux image generation, stable diffusion workers, vision models ai, ai chat completion, AI_ERROR, rate limit ai, model not found, token limit exceeded, neurons exceeded, ai quota exceeded, streaming failed, model unavailable, workers ai hono, ai gateway workers, vercel ai sdk workers, openai compatible workers, workers ai vectorize

cloudflare-worker-base

from ovachiever/droid-tings

Set up Cloudflare Workers with Hono routing, Vite plugin, and Static Assets using production-tested patterns. Prevents 6 errors: export syntax, routing conflicts, HMR crashes, and Service Worker format confusion. Use when: creating Workers projects, configuring Hono or Vite for Workers, deploying with Wrangler, adding Static Assets with SPA fallback, or troubleshooting export syntax, API route conflicts, scheduled handlers, or HMR race conditions. Keywords: Cloudflare Workers, CF Workers, Hono, wrangler, Vite, Static Assets, @cloudflare/vite-plugin, wrangler.jsonc, ES Module, run_worker_first, SPA fallback, API routes, serverless, edge computing, "Cannot read properties of undefined", "Static Assets 404", "A hanging Promise was canceled", "Handler does not export", deployment fails, routing not working, HMR crashes

cloudflare-turnstile

from ovachiever/droid-tings

Add bot protection with Turnstile (CAPTCHA alternative). Use when: protecting forms, securing login/signup, preventing spam, migrating from reCAPTCHA, integrating with React/Next.js/Hono, implementing E2E tests, or debugging CSP errors, token validation failures, or error codes 100*/300*/600*.

cloudflare-r2

from ovachiever/droid-tings

Store objects with R2's S3-compatible storage on Cloudflare's edge. Use when: uploading/downloading files, configuring CORS, generating presigned URLs, multipart uploads, managing metadata, or troubleshooting R2_ERROR, CORS failures, presigned URL issues, or quota errors.

cloudflare-queues

from ovachiever/droid-tings

Build async message queues with Cloudflare Queues for background processing. Use when: handling async tasks, batch processing, implementing retries, configuring dead letter queues, managing consumer concurrency, or troubleshooting queue timeout, batch retry, message loss, or throughput exceeded.

cloudflare-mcp-server

from ovachiever/droid-tings

Build Model Context Protocol (MCP) servers on Cloudflare Workers - the only platform with official remote MCP support. TypeScript-based with OAuth, Durable Objects, and WebSocket hibernation. Use when: deploying remote MCP servers, implementing OAuth (GitHub/Google), using dual transports (SSE/HTTP), or troubleshooting URL path mismatches, McpAgent exports, OAuth redirects, CORS issues.

cloudflare-kv

from ovachiever/droid-tings

Store key-value data globally with Cloudflare KV's edge network. Use when: caching API responses, storing configuration, managing user preferences, handling TTL expiration, or troubleshooting KV_ERROR, 429 rate limits, eventual consistency, or cacheTtl errors.

cloudflare-images

from ovachiever/droid-tings

Store and transform images with Cloudflare Images API and transformations. Use when: uploading images, implementing direct creator uploads, creating variants, generating signed URLs, optimizing formats (WebP/AVIF), transforming via Workers, or debugging CORS, multipart, or error codes 9401-9413.

cloudflare-hyperdrive

from ovachiever/droid-tings

Connect Workers to PostgreSQL/MySQL with Hyperdrive's global pooling and caching. Use when: connecting to existing databases, setting up connection pools, using node-postgres/mysql2, integrating Drizzle/Prisma, or troubleshooting pool acquisition failures, TLS errors, nodejs_compat missing, or eval() disallowed.

cloudflare-durable-objects

from ovachiever/droid-tings

Build stateful Durable Objects for real-time apps, WebSocket servers, coordination, and persistent state. Use when: implementing chat rooms, multiplayer games, rate limiting, session management, WebSocket hibernation, or troubleshooting class export, migration, WebSocket state loss, or binding errors.

cloudflare-d1

from ovachiever/droid-tings

Build with D1 serverless SQLite database on Cloudflare's edge. Use when: creating databases, writing SQL migrations, querying D1 from Workers, handling relational data, or troubleshooting D1_ERROR, statement too long, migration failures, or query performance issues.