algolia-data-handling

Implement Algolia data handling: record transforms, PII filtering before indexing, data retention, GDPR/CCPA compliance with Algolia's deleteByQuery and Insights deletion. Trigger: "algolia data", "algolia PII", "algolia GDPR", "algolia data retention", "algolia privacy", "algolia CCPA", "algolia data sync".

25 stars

Best use case

algolia-data-handling is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Implement Algolia data handling: record transforms, PII filtering before indexing, data retention, GDPR/CCPA compliance with Algolia's deleteByQuery and Insights deletion. Trigger: "algolia data", "algolia PII", "algolia GDPR", "algolia data retention", "algolia privacy", "algolia CCPA", "algolia data sync".

Teams using algolia-data-handling should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/algolia-data-handling/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/jeremylongshore/claude-code-plugins-plus-skills/algolia-data-handling/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/algolia-data-handling/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How algolia-data-handling Compares

Feature / Agentalgolia-data-handlingStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Implement Algolia data handling: record transforms, PII filtering before indexing, data retention, GDPR/CCPA compliance with Algolia's deleteByQuery and Insights deletion. Trigger: "algolia data", "algolia PII", "algolia GDPR", "algolia data retention", "algolia privacy", "algolia CCPA", "algolia data sync".

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Algolia Data Handling

## Overview

Algolia stores your records in their cloud. You control what data goes in (via `saveObjects`), what comes back (via `attributesToRetrieve`), and what users can search (via `searchableAttributes`). For privacy compliance, you must filter PII before indexing and implement deletion workflows.

## Data Flow: Source → Algolia → User

```
Source Database        Transform           Algolia Index           Search Response
┌──────────┐     ┌──────────────┐     ┌──────────────────┐     ┌──────────────┐
│ Full user │     │ Strip PII    │     │ Searchable       │     │ Retrieved    │
│ record    │ ──▶ │ Truncate     │ ──▶ │ fields only      │ ──▶ │ fields only  │
│ (all cols)│     │ Normalize    │     │ + ranking data   │     │ (UI needs)   │
└──────────┘     └──────────────┘     └──────────────────┘     └──────────────┘
```

## Instructions

### Step 1: Transform Records Before Indexing

```typescript
import { algoliasearch } from 'algoliasearch';

const client = algoliasearch(process.env.ALGOLIA_APP_ID!, process.env.ALGOLIA_ADMIN_KEY!);

// Define what goes into Algolia — NOT everything from your DB
interface AlgoliaProduct {
  objectID: string;
  name: string;
  description: string;     // Truncated, plain text
  category: string;
  brand: string;
  price: number;
  in_stock: boolean;
  image_url: string;
  rating: number;
  _tags: string[];
}

function transformForAlgolia(dbRecord: any): AlgoliaProduct {
  return {
    objectID: dbRecord.id,
    name: dbRecord.name,
    description: stripHtml(dbRecord.description).substring(0, 5000),
    category: dbRecord.category?.name || 'uncategorized',
    brand: dbRecord.brand?.name || '',
    price: dbRecord.price_cents / 100,
    in_stock: dbRecord.inventory_count > 0,
    image_url: dbRecord.images?.[0]?.url || '',
    rating: dbRecord.avg_rating || 0,
    _tags: buildTags(dbRecord),
  };
}

// Strip HTML tags for clean search text
function stripHtml(html: string): string {
  return html?.replace(/<[^>]*>/g, ' ').replace(/\s+/g, ' ').trim() || '';
}

function buildTags(record: any): string[] {
  const tags: string[] = [];
  if (record.is_featured) tags.push('featured');
  if (record.is_new) tags.push('new-arrival');
  if (record.discount_percent > 0) tags.push('on-sale');
  return tags;
}
```

### Step 2: PII Detection and Filtering

```typescript
// NEVER index PII unless absolutely necessary for search
const PII_FIELDS = ['email', 'phone', 'ssn', 'address', 'credit_card', 'password', 'api_key'];

function stripPII(record: Record<string, any>): Record<string, any> {
  const clean = { ...record };
  for (const field of PII_FIELDS) {
    delete clean[field];
  }
  return clean;
}

// If you MUST index user-facing names (e.g., author names in articles)
// Use unretrievableAttributes so they're searchable but never returned
await client.setSettings({
  indexName: 'articles',
  indexSettings: {
    searchableAttributes: ['title', 'author_name', 'content'],
    unretrievableAttributes: ['author_name'],  // Searchable but never in response
    attributesToRetrieve: ['title', 'excerpt', 'url', 'published_at'],
  },
});
```

### Step 3: Algolia-Side Data Access Control

```typescript
// Use secured API keys to filter what each user can see
function generateUserKey(userId: string, tenantId: string) {
  return client.generateSecuredApiKey({
    parentApiKey: process.env.ALGOLIA_SEARCH_KEY!,
    restrictions: {
      filters: `tenant_id:${tenantId} AND (visibility:public OR created_by:${userId})`,
      validUntil: Math.floor(Date.now() / 1000) + 3600,
    },
  });
}

// User can only search records where:
// - tenant_id matches their org AND
// - visibility is public OR they created it
```

### Step 4: GDPR Right to Deletion

```typescript
// When a user requests data deletion:
async function deleteUserData(userId: string) {
  const results: Record<string, string> = {};

  // 1. Delete user's records from all indices
  for (const indexName of ['products', 'reviews', 'wishlists']) {
    try {
      await client.deleteBy({
        indexName,
        deleteByParams: { filters: `created_by:${userId}` },
      });
      results[indexName] = 'deleted';
    } catch (e) {
      results[indexName] = `failed: ${e}`;
    }
  }

  // 2. Delete Insights/Analytics data for this user
  // Algolia retains events for 90 days by default
  // Use the Insights API to request user data deletion
  await client.deleteUserToken({ userToken: userId });

  // 3. Log the deletion for compliance audit
  console.log({
    event: 'gdpr.deletion',
    userId,
    timestamp: new Date().toISOString(),
    results,
  });

  return results;
}
```

### Step 5: Data Subject Access Request (DSAR)

```typescript
// Export all data associated with a user
async function exportUserData(userId: string) {
  const exportData: Record<string, any[]> = {};

  for (const indexName of ['products', 'reviews', 'wishlists']) {
    const records: any[] = [];
    let cursor: string | undefined;

    // Browse all records matching the user
    do {
      const result = await client.browse({
        indexName,
        browseParams: {
          filters: `created_by:${userId}`,
          hitsPerPage: 1000,
          cursor,
        },
      });
      records.push(...result.hits);
      cursor = result.cursor;
    } while (cursor);

    exportData[indexName] = records;
  }

  return {
    exportedAt: new Date().toISOString(),
    userId,
    data: exportData,
  };
}
```

### Step 6: Data Retention and Cleanup

```typescript
// Scheduled job: delete old records past retention period
async function enforceRetention(indexName: string, retentionDays: number) {
  const cutoffTimestamp = Math.floor(
    (Date.now() - retentionDays * 24 * 60 * 60 * 1000) / 1000
  );

  await client.deleteBy({
    indexName,
    deleteByParams: {
      filters: `created_at_timestamp < ${cutoffTimestamp}`,
    },
  });

  console.log(`Deleted records older than ${retentionDays} days from ${indexName}`);
}

// Run daily: enforceRetention('activity_logs', 90);
```

## Data Classification for Algolia

| Category | Examples | Index It? | Retrieve It? |
|----------|----------|-----------|-------------|
| Public product data | Name, price, category | Yes | Yes |
| Searchable metadata | Tags, internal categories | Yes | No (`unretrievableAttributes`) |
| User-generated content | Reviews, comments | Yes (anonymized) | Yes |
| PII | Email, phone, address | NO | NO |
| Sensitive business data | Margins, supplier costs | NO | NO |

## Error Handling

| Issue | Cause | Solution |
|-------|-------|----------|
| PII in Algolia index | Transform didn't strip | Add PII check to indexing pipeline |
| `deleteBy` no effect | Filter doesn't match | Verify field is in `attributesForFaceting` |
| DSAR export incomplete | Paginated results | Use cursor-based browsing |
| Retention job deletes too much | Wrong timestamp format | Use Unix timestamp (seconds), not milliseconds |

## Resources

- [Algolia Privacy & GDPR](https://www.algolia.com/policies/privacy/)
- [deleteBy Reference](https://www.algolia.com/doc/api-reference/api-methods/delete-by/)
- [browse Reference](https://www.algolia.com/doc/api-reference/api-methods/browse/)
- [Insights User Deletion](https://www.algolia.com/doc/guides/sending-events/getting-started/)

## Next Steps

For enterprise access control, see `algolia-enterprise-rbac`.

Related Skills

College Football Data (CFB)

25
from ComeOnOliver/skillshub

Before writing queries, consult `references/api-reference.md` for endpoints, conference IDs, team IDs, and data shapes.

College Basketball Data (CBB)

25
from ComeOnOliver/skillshub

Before writing queries, consult `references/api-reference.md` for endpoints, conference IDs, team IDs, and data shapes.

validating-database-integrity

25
from ComeOnOliver/skillshub

Process use when you need to ensure database integrity through comprehensive data validation. This skill validates data types, ranges, formats, referential integrity, and business rules. Trigger with phrases like "validate database data", "implement data validation rules", "enforce data integrity constraints", or "validate data formats".

forecasting-time-series-data

25
from ComeOnOliver/skillshub

This skill enables Claude to forecast future values based on historical time series data. It analyzes time-dependent data to identify trends, seasonality, and other patterns. Use this skill when the user asks to predict future values of a time series, analyze trends in data over time, or requires insights into time-dependent data. Trigger terms include "forecast," "predict," "time series analysis," "future values," and requests involving temporal data.

generating-test-data

25
from ComeOnOliver/skillshub

This skill enables Claude to generate realistic test data for software development. It uses the test-data-generator plugin to create users, products, orders, and custom schemas for comprehensive testing. Use this skill when you need to populate databases, simulate user behavior, or create fixtures for automated tests. Trigger phrases include "generate test data", "create fake users", "populate database", "generate product data", "create test orders", or "generate data based on schema". This skill is especially useful for populating testing environments or creating sample data for demonstrations.

test-data-builder

25
from ComeOnOliver/skillshub

Test Data Builder - Auto-activating skill for Test Automation. Triggers on: test data builder, test data builder Part of the Test Automation skill category.

splitting-datasets

25
from ComeOnOliver/skillshub

Process split datasets into training, validation, and testing sets for ML model development. Use when requesting "split dataset", "train-test split", or "data partitioning". Trigger with relevant phrases based on skill purpose.

scanning-database-security

25
from ComeOnOliver/skillshub

Process use when you need to work with security and compliance. This skill provides security scanning and vulnerability detection with comprehensive guidance and automation. Trigger with phrases like "scan for vulnerabilities", "implement security controls", or "audit security".

preprocessing-data-with-automated-pipelines

25
from ComeOnOliver/skillshub

Process automate data cleaning, transformation, and validation for ML tasks. Use when requesting "preprocess data", "clean data", "ETL pipeline", or "data transformation". Trigger with relevant phrases based on skill purpose.

optimizing-database-connection-pooling

25
from ComeOnOliver/skillshub

Process use when you need to work with connection management. This skill provides connection pooling and management with comprehensive guidance and automation. Trigger with phrases like "manage connections", "configure pooling", or "optimize connection usage".

modeling-nosql-data

25
from ComeOnOliver/skillshub

This skill enables Claude to design NoSQL data models. It activates when the user requests assistance with NoSQL database design, including schema creation, data modeling for MongoDB or DynamoDB, or defining document structures. Use this skill when the user mentions "NoSQL data model", "design MongoDB schema", "create DynamoDB table", or similar phrases related to NoSQL database architecture. It assists in understanding NoSQL modeling principles like embedding vs. referencing, access pattern optimization, and sharding key selection.

monitoring-database-transactions

25
from ComeOnOliver/skillshub

Monitor use when you need to work with monitoring and observability. This skill provides health monitoring and alerting with comprehensive guidance and automation. Trigger with phrases like "monitor system health", "set up alerts", or "track metrics".