exa-performance-tuning

Optimize Exa API performance with search type selection, caching, and parallelization. Use when experiencing slow responses, implementing caching strategies, or optimizing request throughput for Exa integrations. Trigger with phrases like "exa performance", "optimize exa", "exa latency", "exa caching", "exa slow", "exa fast".

1,868 stars

Best use case

exa-performance-tuning is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Optimize Exa API performance with search type selection, caching, and parallelization. Use when experiencing slow responses, implementing caching strategies, or optimizing request throughput for Exa integrations. Trigger with phrases like "exa performance", "optimize exa", "exa latency", "exa caching", "exa slow", "exa fast".

Teams using exa-performance-tuning should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/exa-performance-tuning/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/saas-packs/exa-pack/skills/exa-performance-tuning/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/exa-performance-tuning/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How exa-performance-tuning Compares

Feature / Agentexa-performance-tuningStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Optimize Exa API performance with search type selection, caching, and parallelization. Use when experiencing slow responses, implementing caching strategies, or optimizing request throughput for Exa integrations. Trigger with phrases like "exa performance", "optimize exa", "exa latency", "exa caching", "exa slow", "exa fast".

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Exa Performance Tuning

## Overview
Optimize Exa search API response times for production workloads. Key levers: search type selection (instant < fast < auto < neural < deep), result count reduction, content scope control, result caching, and parallel query execution.

## Latency by Search Type

| Type | Typical Latency | Use Case |
|------|----------------|----------|
| `instant` | < 150ms | Real-time autocomplete, typeahead |
| `fast` | p50 < 425ms | Speed-critical user-facing search |
| `auto` | 300-1500ms | General purpose (default) |
| `neural` | 500-2000ms | Best semantic quality |
| `deep` | 2-5s | Maximum coverage, light deep search |
| `deep-reasoning` | 5-15s | Complex research questions |

## Instructions

### Step 1: Match Search Type to Latency Budget
```typescript
import Exa from "exa-js";

const exa = new Exa(process.env.EXA_API_KEY);

function selectSearchType(latencyBudgetMs: number) {
  if (latencyBudgetMs < 200) return "instant";
  if (latencyBudgetMs < 500) return "fast";
  if (latencyBudgetMs < 1500) return "auto";
  if (latencyBudgetMs < 3000) return "neural";
  return "deep";
}

async function optimizedSearch(query: string, latencyBudgetMs: number) {
  const type = selectSearchType(latencyBudgetMs);
  const numResults = latencyBudgetMs < 500 ? 3 : latencyBudgetMs < 2000 ? 5 : 10;

  return exa.search(query, { type, numResults });
}
```

### Step 2: Minimize Content Retrieval
```typescript
// Each content option adds latency. Only request what you need.

// Fastest: metadata only (no content retrieval)
const metadataOnly = await exa.search("query", { numResults: 5 });

// Medium: highlights only (much smaller than full text)
const highlightsOnly = await exa.searchAndContents("query", {
  numResults: 5,
  highlights: { maxCharacters: 300 },
  // No text or summary — saves content retrieval time
});

// Slower: full text (use maxCharacters to limit)
const withText = await exa.searchAndContents("query", {
  numResults: 3,  // fewer results = faster
  text: { maxCharacters: 1000 },  // limit content size
});
```

### Step 3: Cache Search Results
```typescript
import { LRUCache } from "lru-cache";

const searchCache = new LRUCache<string, any>({
  max: 5000,
  ttl: 2 * 3600 * 1000, // 2-hour TTL
});

async function cachedSearch(query: string, opts: any) {
  const key = `${query}:${opts.type || "auto"}:${opts.numResults || 10}`;
  const cached = searchCache.get(key);
  if (cached) return cached; // Cache hit: 0ms vs 500-2000ms

  const results = await exa.search(query, opts);
  searchCache.set(key, results);
  return results;
}
```

### Step 4: Parallelize Independent Searches
```typescript
// Run independent queries concurrently instead of sequentially
async function parallelSearch(queries: string[]) {
  const searches = queries.map(q =>
    cachedSearch(q, { type: "auto", numResults: 3 })
  );
  return Promise.all(searches);
  // 3 parallel searches: ~600ms total (limited by slowest)
  // 3 sequential searches: ~1800ms total
}
```

### Step 5: Two-Phase Search Pattern
```typescript
// Phase 1: Fast search for URLs only
// Phase 2: Selective content retrieval for top results only
async function twoPhaseSearch(query: string) {
  // Phase 1: metadata only (fast)
  const results = await exa.search(query, { type: "auto", numResults: 10 });

  // Phase 2: get content only for top 3 results
  const topUrls = results.results.slice(0, 3).map(r => r.url);
  const contents = await exa.getContents(topUrls, {
    text: { maxCharacters: 2000 },
    highlights: { maxCharacters: 500, query },
  });

  return contents;
  // Saves content retrieval time for 7 results you won't use
}
```

### Step 6: Query Normalization for Cache Hits
```typescript
function normalizeQuery(query: string): string {
  return query
    .toLowerCase()
    .trim()
    .replace(/\s+/g, " ")       // collapse whitespace
    .replace(/[?.!,;:]+$/, ""); // strip trailing punctuation
}

async function normalizedSearch(query: string, opts: any) {
  return cachedSearch(normalizeQuery(query), opts);
}
// Increases cache hit rate by 20-40% for user-generated queries
```

## Performance Comparison

| Strategy | Latency Savings | Implementation |
|----------|----------------|----------------|
| `instant` type | 5-10x faster than neural | One-line change |
| Reduce numResults (10 -> 3) | ~200-500ms saved | One-line change |
| Highlights instead of text | ~100-300ms saved | Replace `text` with `highlights` |
| LRU cache | 100% for cache hits | ~20 lines |
| Parallel queries | 2-3x throughput | `Promise.all` wrapper |
| Two-phase search | ~30-50% for large result sets | ~15 lines |

## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Search taking 3s+ | Neural search on complex query | Switch to `fast` or `auto` type |
| Timeout on content | Large pages, slow sources | Set `maxCharacters` limit |
| Cache miss rate high | Unique queries each time | Normalize queries before caching |
| Rate limit (429) | Too many concurrent searches | Add request queue with concurrency limit |

## Resources
- [Exa Search Types](https://docs.exa.ai/reference/search)
- [Exa Contents Retrieval](https://docs.exa.ai/reference/contents-retrieval)

## Next Steps
For cost optimization, see `exa-cost-tuning`. For reliability, see `exa-reliability-patterns`.

Related Skills

running-performance-tests

1868
from jeremylongshore/claude-code-plugins-plus-skills

Execute load testing, stress testing, and performance benchmarking. Use when performing specialized testing. Trigger with phrases like "run load tests", "test performance", or "benchmark the system".

workhuman-performance-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Workhuman performance tuning for employee recognition and rewards API. Use when integrating Workhuman Social Recognition, or building recognition workflows with HRIS systems. Trigger: "workhuman performance tuning".

workhuman-cost-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Workhuman cost tuning for employee recognition and rewards API. Use when integrating Workhuman Social Recognition, or building recognition workflows with HRIS systems. Trigger: "workhuman cost tuning".

wispr-performance-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Wispr Flow performance tuning for voice-to-text API integration. Use when integrating Wispr Flow dictation, WebSocket streaming, or building voice-powered applications. Trigger: "wispr performance tuning".

wispr-cost-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Wispr Flow cost tuning for voice-to-text API integration. Use when integrating Wispr Flow dictation, WebSocket streaming, or building voice-powered applications. Trigger: "wispr cost tuning".

windsurf-performance-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Optimize Windsurf IDE performance: indexing speed, Cascade responsiveness, and memory usage. Use when Windsurf is slow, indexing takes too long, Cascade times out, or the IDE uses too much memory. Trigger with phrases like "windsurf slow", "windsurf performance", "optimize windsurf", "windsurf memory", "cascade slow", "indexing slow".

windsurf-cost-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Optimize Windsurf licensing costs through seat management, tier selection, and credit monitoring. Use when analyzing Windsurf billing, reducing per-seat costs, or implementing usage monitoring and budget controls. Trigger with phrases like "windsurf cost", "windsurf billing", "reduce windsurf costs", "windsurf pricing", "windsurf budget".

webflow-performance-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Optimize Webflow API performance with response caching, bulk endpoint batching, CDN-cached live item reads, pagination optimization, and connection pooling. Use when experiencing slow API responses or optimizing request throughput. Trigger with phrases like "webflow performance", "optimize webflow", "webflow latency", "webflow caching", "webflow slow", "webflow batch".

webflow-cost-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Optimize Webflow costs through plan selection, CDN read optimization, bulk endpoint usage, and API usage monitoring with budget alerts. Use when analyzing Webflow billing, reducing API costs, or implementing usage monitoring for Webflow integrations. Trigger with phrases like "webflow cost", "webflow billing", "reduce webflow costs", "webflow pricing", "webflow budget".

vercel-performance-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Optimize Vercel deployment performance with caching, bundle optimization, and cold start reduction. Use when experiencing slow page loads, optimizing Core Web Vitals, or reducing serverless function cold start times. Trigger with phrases like "vercel performance", "optimize vercel", "vercel latency", "vercel caching", "vercel slow", "vercel cold start".

vercel-cost-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Optimize Vercel costs through plan selection, function efficiency, and usage monitoring. Use when analyzing Vercel billing, reducing function execution costs, or implementing spend management and budget alerts. Trigger with phrases like "vercel cost", "vercel billing", "reduce vercel costs", "vercel pricing", "vercel expensive", "vercel budget".

veeva-performance-tuning

1868
from jeremylongshore/claude-code-plugins-plus-skills

Veeva Vault performance tuning for REST API and clinical operations. Use when working with Veeva Vault document management and CRM. Trigger: "veeva performance tuning".