exa-load-scale

Implement Exa load testing, capacity planning, and scaling strategies. Use when running performance tests, planning capacity for Exa integrations, or designing high-throughput search architectures. Trigger with phrases like "exa load test", "exa scale", "exa capacity", "exa k6", "exa benchmark", "exa throughput".

25 stars

Best use case

exa-load-scale is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Implement Exa load testing, capacity planning, and scaling strategies. Use when running performance tests, planning capacity for Exa integrations, or designing high-throughput search architectures. Trigger with phrases like "exa load test", "exa scale", "exa capacity", "exa k6", "exa benchmark", "exa throughput".

Teams using exa-load-scale should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/exa-load-scale/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/jeremylongshore/claude-code-plugins-plus-skills/exa-load-scale/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/exa-load-scale/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How exa-load-scale Compares

Feature / Agentexa-load-scaleStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Implement Exa load testing, capacity planning, and scaling strategies. Use when running performance tests, planning capacity for Exa integrations, or designing high-throughput search architectures. Trigger with phrases like "exa load test", "exa scale", "exa capacity", "exa k6", "exa benchmark", "exa throughput".

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Exa Load & Scale

## Overview
Load testing and capacity planning for Exa integrations. Key constraint: Exa's default rate limit is 10 QPS. Scaling strategies focus on caching, request queuing, parallel processing within rate limits, and search type selection for latency budgets.

## Prerequisites
- k6 load testing tool installed
- Test environment Exa API key (separate from production)
- Redis for result caching

## Capacity Reference

| Search Type | Typical Latency | Max Throughput (10 QPS) |
|-------------|----------------|-------------------------|
| `instant` | < 150ms | 10 req/s (600/min) |
| `fast` | < 425ms | 10 req/s (600/min) |
| `auto` | 300-1500ms | 10 req/s (600/min) |
| `neural` | 500-2000ms | 10 req/s (600/min) |
| `deep` | 2-5s | 10 req/s (600/min) |

**With caching (50% hit rate):** Effective throughput doubles to 20 req/s equivalent.

## Instructions

### Step 1: k6 Load Test Against Your Wrapper
```javascript
// exa-load-test.js
import http from "k6/http";
import { check, sleep } from "k6";

export const options = {
  stages: [
    { duration: "1m", target: 5 },    // Ramp up to 5 VUs
    { duration: "3m", target: 5 },    // Steady state
    { duration: "1m", target: 10 },   // Push toward rate limit
    { duration: "2m", target: 10 },   // Stress test
    { duration: "1m", target: 0 },    // Ramp down
  ],
  thresholds: {
    http_req_duration: ["p(95)<3000"],  // 3s P95 for neural search
    http_req_failed: ["rate<0.05"],     // < 5% error rate
  },
};

const queries = [
  "best practices for building RAG systems",
  "transformer architecture improvements 2025",
  "TypeScript 5.5 new features",
  "vector database comparison guide",
  "AI safety alignment research",
];

export default function () {
  const query = queries[Math.floor(Math.random() * queries.length)];

  const response = http.post(
    `${__ENV.APP_URL}/api/search`,
    JSON.stringify({ query, numResults: 3 }),
    {
      headers: { "Content-Type": "application/json" },
      timeout: "10s",
    }
  );

  check(response, {
    "status 200": (r) => r.status === 200,
    "has results": (r) => JSON.parse(r.body).results?.length > 0,
    "latency < 3s": (r) => r.timings.duration < 3000,
  });

  sleep(0.5 + Math.random()); // 0.5-1.5s between requests
}
```

```bash
# Run load test
k6 run --env APP_URL=http://localhost:3000 exa-load-test.js
```

### Step 2: Throughput Maximizer with Request Queue
```typescript
import Exa from "exa-js";
import PQueue from "p-queue";

const exa = new Exa(process.env.EXA_API_KEY);

// Stay under 10 QPS rate limit
const searchQueue = new PQueue({
  concurrency: 8,        // max concurrent requests
  interval: 1000,        // per second
  intervalCap: 10,       // Exa's QPS limit
});

async function highThroughputSearch(queries: string[]) {
  const results = [];

  for (const query of queries) {
    const promise = searchQueue.add(async () => {
      const result = await exa.searchAndContents(query, {
        type: "auto",
        numResults: 3,
        text: { maxCharacters: 500 },
      });
      return { query, results: result.results };
    });
    results.push(promise);
  }

  return Promise.all(results);
}

// Process 100 queries respecting rate limits
const queries = Array.from({ length: 100 }, (_, i) => `research topic ${i}`);
console.time("batch");
const results = await highThroughputSearch(queries);
console.timeEnd("batch");
// Expected: ~10-12 seconds (100 queries / 10 QPS)
```

### Step 3: Caching for Scale
```typescript
import { LRUCache } from "lru-cache";

// Cache eliminates repeat queries entirely
const cache = new LRUCache<string, any>({
  max: 10000,
  ttl: 3600 * 1000, // 1-hour TTL
});

async function scalableSearch(query: string, opts: any) {
  const key = `${query.toLowerCase().trim()}:${opts.type}:${opts.numResults}`;
  const cached = cache.get(key);
  if (cached) return cached;

  const result = await searchQueue.add(() =>
    exa.searchAndContents(query, opts)
  );
  cache.set(key, result);
  return result;
}

// With 50% cache hit rate:
// 100 unique queries → 50 API calls → 5 seconds instead of 10
```

### Step 4: Capacity Planning Calculator
```typescript
interface CapacityEstimate {
  dailySearches: number;
  peakQPS: number;
  cacheHitRate: number;
  effectiveQPS: number;
  withinLimits: boolean;
  recommendation: string;
}

function estimateCapacity(
  dailySearches: number,
  peakMultiplier = 3,
  expectedCacheHitRate = 0.5
): CapacityEstimate {
  const avgQPS = dailySearches / (24 * 3600);
  const peakQPS = avgQPS * peakMultiplier;
  const effectiveQPS = peakQPS * (1 - expectedCacheHitRate);
  const withinLimits = effectiveQPS <= 10; // Default Exa limit

  let recommendation = "Within default limits";
  if (effectiveQPS > 10 && effectiveQPS <= 50) {
    recommendation = "Contact hello@exa.ai for Enterprise rate limits";
  } else if (effectiveQPS > 50) {
    recommendation = "Requires Enterprise plan + aggressive caching + request queue";
  }

  return { dailySearches, peakQPS, cacheHitRate: expectedCacheHitRate, effectiveQPS, withinLimits, recommendation };
}

// Example: 50,000 searches/day
const estimate = estimateCapacity(50000);
console.log(estimate);
// { effectiveQPS: ~0.87, withinLimits: true, recommendation: "Within default limits" }
```

## Benchmark Results Template
```markdown
## Exa Performance Benchmark
**Date:** YYYY-MM-DD | **SDK:** exa-js X.Y.Z

| Metric | Value |
|--------|-------|
| Total Requests | N |
| Success Rate | X% |
| Cache Hit Rate | X% |
| P50 Latency | Xms |
| P95 Latency | Xms |
| Peak QPS (actual API calls) | X |
| 429 Rate Limit Errors | N |
```

## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| 429 errors in load test | Exceeding 10 QPS | Reduce concurrency, add cache |
| Inconsistent latency | Different search types | Standardize on one type per test |
| Timeout errors | Deep search under load | Use `fast` or `auto` for load tests |
| Cache miss rate high | Unique queries per request | Use a fixed query pool |

## Resources
- [Exa Rate Limits](https://docs.exa.ai/reference/rate-limits)
- [k6 Documentation](https://k6.io/docs/)
- [p-queue](https://github.com/sindresorhus/p-queue)

## Next Steps
For reliability patterns, see `exa-reliability-patterns`.

Related Skills

running-load-tests

25
from ComeOnOliver/skillshub

Create and execute load tests for performance validation using k6, JMeter, and Artillery. Use when validating application performance under load conditions or identifying bottlenecks. Trigger with phrases like "run load test", "create stress test", or "validate performance under load".

load-testing-apis

25
from ComeOnOliver/skillshub

Execute comprehensive load and stress testing to validate API performance and scalability. Use when validating API performance under load. Trigger with phrases like "load test the API", "stress test API", or "benchmark API performance".

load-test-scenario-planner

25
from ComeOnOliver/skillshub

Load Test Scenario Planner - Auto-activating skill for Performance Testing. Triggers on: load test scenario planner, load test scenario planner Part of the Performance Testing skill category.

testing-load-balancers

25
from ComeOnOliver/skillshub

This skill enables Claude to test load balancing strategies. It validates traffic distribution across backend servers, tests failover scenarios when servers become unavailable, verifies sticky sessions, and assesses health check functionality. Use this skill when the user asks to "test load balancer", "validate traffic distribution", "test failover", "verify sticky sessions", or "test health checks". It is specifically designed for testing load balancing configurations using the `load-balancer-tester` plugin.

configuring-load-balancers

25
from ComeOnOliver/skillshub

This skill configures load balancers, including ALB, NLB, Nginx, and HAProxy. It generates production-ready configurations based on specified requirements and infrastructure. Use this skill when the user asks to "configure load balancer", "create load balancer config", "generate nginx config", "setup HAProxy", or mentions specific load balancer types like "ALB" or "NLB". It's ideal for DevOps tasks, infrastructure automation, and generating load balancer configurations for different environments.

lazy-loading-implementer

25
from ComeOnOliver/skillshub

Lazy Loading Implementer - Auto-activating skill for Frontend Development. Triggers on: lazy loading implementer, lazy loading implementer Part of the Frontend Development skill category.

incremental-load-setup

25
from ComeOnOliver/skillshub

Incremental Load Setup - Auto-activating skill for Data Pipelines. Triggers on: incremental load setup, incremental load setup Part of the Data Pipelines skill category.

dataset-loader-creator

25
from ComeOnOliver/skillshub

Dataset Loader Creator - Auto-activating skill for ML Training. Triggers on: dataset loader creator, dataset loader creator Part of the ML Training skill category.

customerio-load-scale

25
from ComeOnOliver/skillshub

Implement Customer.io load testing and horizontal scaling. Use when preparing for high traffic, running load tests, or designing queue-based architectures for scale. Trigger: "customer.io load test", "customer.io scale", "customer.io high volume", "customer.io k6", "customer.io performance test".

clay-load-scale

25
from ComeOnOliver/skillshub

Scale Clay enrichment pipelines for high-volume processing (10K-100K+ leads/month). Use when planning capacity for large enrichment runs, optimizing batch processing, or designing high-volume Clay architectures. Trigger with phrases like "clay scale", "clay high volume", "clay large batch", "clay capacity planning", "clay 100k leads", "clay bulk enrichment".

clade-load-scale

25
from ComeOnOliver/skillshub

Scale Claude usage for high-throughput applications — batches, queues, Use when working with load-scale patterns. concurrency control, and tier upgrades. Trigger with "anthropic scale", "claude high volume", "anthropic throughput", "scale claude api", "anthropic concurrent requests".

canva-load-scale

25
from ComeOnOliver/skillshub

Implement Canva Connect API load testing, auto-scaling, and capacity planning. Use when running performance tests, planning capacity around Canva rate limits, or scaling Canva integrations for production workloads. Trigger with phrases like "canva load test", "canva scale", "canva performance test", "canva capacity", "canva k6", "canva benchmark".