exa-data-handling

Implement Exa search result processing, content extraction, caching, and RAG context management. Use when handling search results, implementing caching, building citation pipelines, or managing content payloads for LLM context windows. Trigger with phrases like "exa data", "exa results processing", "exa cache", "exa RAG context", "exa content extraction".

1,868 stars

byjeremylongshore

View on GitHub Installation ↓

Best use case

exa-data-handling is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using exa-data-handling should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/exa-data-handling/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/saas-packs/exa-pack/skills/exa-data-handling/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/exa-data-handling/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How exa-data-handling Compares

Feature / Agent	exa-data-handling	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

ChatGPT vs Claude for Agent Skills

Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

SKILL.md Source

# Exa Data Handling

## Overview
Manage search result data from Exa's neural search API. Covers content extraction scope control (text vs highlights vs summary), result caching with TTL, citation deduplication, token budget management for LLM context windows, and structured summary extraction.

## Prerequisites
- `exa-js` SDK installed and configured
- Optional: `lru-cache` for in-memory caching, `ioredis` for Redis
- Understanding of Exa content options (text, highlights, summary)

## Instructions

### Step 1: Control Content Extraction Scope
```typescript
import Exa from "exa-js";

const exa = new Exa(process.env.EXA_API_KEY);

// Tier 1: Metadata only (cheapest, fastest)
async function searchMetadataOnly(query: string) {
  return exa.search(query, {
    type: "auto",
    numResults: 10,
    // No content options — returns URLs, titles, scores only
  });
}

// Tier 2: Highlights only (balanced cost/value)
async function searchWithHighlights(query: string) {
  return exa.searchAndContents(query, {
    numResults: 10,
    highlights: {
      maxCharacters: 500,
      query: query,  // focus highlights on the original query
    },
  });
}

// Tier 3: Full text with character limit
async function searchWithText(query: string, maxChars = 2000) {
  return exa.searchAndContents(query, {
    numResults: 5,
    text: { maxCharacters: maxChars },
    highlights: { maxCharacters: 300 },
  });
}

// Tier 4: Structured summary (LLM-generated per result)
async function searchWithSummary(query: string) {
  return exa.searchAndContents(query, {
    numResults: 5,
    summary: { query: query },
    // summary returns a concise LLM-generated summary per result
  });
}
```

### Step 2: Result Caching with TTL
```typescript
import { LRUCache } from "lru-cache";
import { createHash } from "crypto";

const searchCache = new LRUCache<string, any>({
  max: 500,
  ttl: 1000 * 60 * 60, // 1 hour default
});

function cacheKey(query: string, options: any): string {
  return createHash("sha256")
    .update(JSON.stringify({ query, ...options }))
    .digest("hex");
}

async function cachedSearch(query: string, options: any = {}, ttlMs?: number) {
  const key = cacheKey(query, options);
  const cached = searchCache.get(key);
  if (cached) return cached;

  const results = await exa.searchAndContents(query, options);
  searchCache.set(key, results, { ttl: ttlMs });
  return results;
}
```

### Step 3: Token Budget Management for RAG
```typescript
interface ProcessedResult {
  url: string;
  title: string;
  score: number;
  snippet: string;
  tokenEstimate: number;
}

function processForRAG(results: any[], maxSnippetLength = 500): ProcessedResult[] {
  return results.map(r => {
    const snippet = (r.text || r.highlights?.join(" ") || r.summary || "")
      .slice(0, maxSnippetLength);
    return {
      url: r.url,
      title: r.title || "Untitled",
      score: r.score,
      snippet,
      tokenEstimate: Math.ceil(snippet.length / 4),
    };
  });
}

function fitToTokenBudget(results: ProcessedResult[], maxTokens: number) {
  const sorted = [...results].sort((a, b) => b.score - a.score);
  const selected: ProcessedResult[] = [];
  let tokenCount = 0;

  for (const result of sorted) {
    if (tokenCount + result.tokenEstimate > maxTokens) break;
    selected.push(result);
    tokenCount += result.tokenEstimate;
  }

  return { selected, tokenCount, dropped: sorted.length - selected.length };
}

// Usage: fit search results into a 4K token context window
const results = await exa.searchAndContents("query", {
  numResults: 15,
  text: { maxCharacters: 1500 },
});
const processed = processForRAG(results.results);
const { selected, tokenCount } = fitToTokenBudget(processed, 4000);
```

### Step 4: Citation Deduplication
```typescript
function deduplicateResults(results: any[]): any[] {
  const seen = new Map<string, any>();

  for (const result of results) {
    const domain = new URL(result.url).hostname;
    const key = `${domain}:${result.title}`;
    if (!seen.has(key) || result.score > seen.get(key).score) {
      seen.set(key, result);
    }
  }

  return Array.from(seen.values());
}
```

### Step 5: Structured Summary Extraction
```typescript
// Use summary.schema for structured data extraction
const results = await exa.searchAndContents(
  "YC-backed AI startups Series A 2025",
  {
    numResults: 10,
    category: "company",
    summary: {
      query: "company name, funding amount, what they do",
      // schema can define JSON structure for the summary output
    },
  }
);

// Each result.summary contains a structured summary
for (const r of results.results) {
  console.log(`${r.title}: ${r.summary}`);
}
```

## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Large response payload | Full text for many URLs | Use highlights or limit `maxCharacters` |
| Cache stale for news | Default TTL too long | Use 5-minute TTL for time-sensitive queries |
| Duplicate sources | Same article syndicated | Deduplicate by domain + title |
| Token budget exceeded | Too much context for LLM | Use `fitToTokenBudget` to trim by score |
| Missing `.text` field | Content not requested | Use `searchAndContents` not `search` |

## Examples

### RAG-Optimized Search Pipeline
```typescript
async function ragSearch(query: string, tokenBudget = 4000) {
  const results = await cachedSearch(query, {
    numResults: 15,
    type: "neural",
    text: { maxCharacters: 1500 },
    highlights: { maxCharacters: 300, query },
  });

  const deduped = deduplicateResults(results.results);
  const processed = processForRAG(deduped);
  const { selected, tokenCount } = fitToTokenBudget(processed, tokenBudget);

  return {
    context: selected.map((r, i) =>
      `[${i + 1}] ${r.title} (${r.url})\n${r.snippet}`
    ).join("\n\n---\n\n"),
    sources: selected.map(r => ({ title: r.title, url: r.url })),
    tokenCount,
  };
}
```

## Resources
- [Exa Contents Retrieval](https://docs.exa.ai/reference/contents-retrieval)
- [Exa Search Reference](https://docs.exa.ai/reference/search)

## Next Steps
For rate limit handling, see `exa-rate-limits`. For cost optimization, see `exa-cost-tuning`.

Related Skills

generating-test-data

1868

from jeremylongshore/claude-code-plugins-plus-skills

Generate realistic test data including edge cases and boundary conditions. Use when creating realistic fixtures or edge case test data. Trigger with phrases like "generate test data", "create fixtures", or "setup test database".

managing-database-tests

1868

from jeremylongshore/claude-code-plugins-plus-skills

Test database testing including fixtures, transactions, and rollback management. Use when performing specialized testing. Trigger with phrases like "test the database", "run database tests", or "validate data integrity".

encrypting-and-decrypting-data

1868

from jeremylongshore/claude-code-plugins-plus-skills

Validate encryption implementations and cryptographic practices. Use when reviewing data security measures. Trigger with 'check encryption', 'validate crypto', or 'review security keys'.

scanning-for-data-privacy-issues

1868

from jeremylongshore/claude-code-plugins-plus-skills

Scan for data privacy issues and sensitive information exposure. Use when reviewing data handling practices. Trigger with 'scan privacy issues', 'check sensitive data', or 'validate data protection'.

windsurf-data-handling

1868

from jeremylongshore/claude-code-plugins-plus-skills

Control what code and data Windsurf AI can access and process in your workspace. Use when handling sensitive data, implementing data exclusion patterns, or ensuring compliance with privacy regulations in Windsurf environments. Trigger with phrases like "windsurf data privacy", "windsurf PII", "windsurf GDPR", "windsurf compliance", "codeium data", "windsurf telemetry".

webflow-data-handling

1868

from jeremylongshore/claude-code-plugins-plus-skills

Implement Webflow data handling — CMS content delivery patterns, PII redaction in form submissions, GDPR/CCPA compliance for ecommerce data, and data retention policies. Trigger with phrases like "webflow data", "webflow PII", "webflow GDPR", "webflow data retention", "webflow privacy", "webflow CCPA", "webflow forms data".

vercel-data-handling

1868

from jeremylongshore/claude-code-plugins-plus-skills

Implement data handling, PII protection, and GDPR/CCPA compliance for Vercel deployments. Use when handling sensitive data in serverless functions, implementing data redaction, or ensuring privacy compliance on Vercel. Trigger with phrases like "vercel data", "vercel PII", "vercel GDPR", "vercel data retention", "vercel privacy", "vercel compliance".

veeva-data-handling

1868

from jeremylongshore/claude-code-plugins-plus-skills

Veeva Vault data handling for enterprise operations. Use when implementing advanced Veeva Vault patterns. Trigger: "veeva data handling".

vastai-data-handling

1868

from jeremylongshore/claude-code-plugins-plus-skills

Manage training data and model artifacts securely on Vast.ai GPU instances. Use when transferring data to instances, managing checkpoints, or implementing secure data lifecycle on rented hardware. Trigger with phrases like "vastai data", "vastai upload data", "vastai checkpoints", "vastai data security", "vastai artifacts".

twinmind-data-handling

1868

from jeremylongshore/claude-code-plugins-plus-skills

Handle TwinMind meeting data with GDPR compliance: transcript storage, memory vault management, data export, and deletion policies. Use when implementing data handling, or managing TwinMind meeting AI operations. Trigger with phrases like "twinmind data handling", "twinmind data handling".

supabase-data-handling

1868

from jeremylongshore/claude-code-plugins-plus-skills

Implement GDPR/CCPA compliance with Supabase: RLS for data isolation, user deletion via auth.admin.deleteUser(), data export via SQL, PII column management, backup/restore workflows, and retention policies. Use when handling sensitive data, implementing right-to-deletion, configuring data retention, or auditing PII in Supabase database columns. Trigger: "supabase GDPR", "supabase data handling", "supabase PII", "supabase compliance", "supabase data retention", "supabase delete user", "supabase data export".

speak-data-handling

1868

from jeremylongshore/claude-code-plugins-plus-skills

Handle student audio data, assessment records, and learning progress with GDPR/COPPA compliance. Use when implementing data handling, or managing Speak language learning platform operations. Trigger with phrases like "speak data handling", "speak data handling".