algolia-data-handling
Implement Algolia data handling: record transforms, PII filtering before indexing, data retention, GDPR/CCPA compliance with Algolia's deleteByQuery and Insights deletion. Trigger: "algolia data", "algolia PII", "algolia GDPR", "algolia data retention", "algolia privacy", "algolia CCPA", "algolia data sync".
Best use case
algolia-data-handling is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Implement Algolia data handling: record transforms, PII filtering before indexing, data retention, GDPR/CCPA compliance with Algolia's deleteByQuery and Insights deletion. Trigger: "algolia data", "algolia PII", "algolia GDPR", "algolia data retention", "algolia privacy", "algolia CCPA", "algolia data sync".
Teams using algolia-data-handling should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/algolia-data-handling/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How algolia-data-handling Compares
| Feature / Agent | algolia-data-handling | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Implement Algolia data handling: record transforms, PII filtering before indexing, data retention, GDPR/CCPA compliance with Algolia's deleteByQuery and Insights deletion. Trigger: "algolia data", "algolia PII", "algolia GDPR", "algolia data retention", "algolia privacy", "algolia CCPA", "algolia data sync".
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Algolia Data Handling
## Overview
Algolia stores your records in their cloud. You control what data goes in (via `saveObjects`), what comes back (via `attributesToRetrieve`), and what users can search (via `searchableAttributes`). For privacy compliance, you must filter PII before indexing and implement deletion workflows.
## Data Flow: Source → Algolia → User
```
Source Database Transform Algolia Index Search Response
┌──────────┐ ┌──────────────┐ ┌──────────────────┐ ┌──────────────┐
│ Full user │ │ Strip PII │ │ Searchable │ │ Retrieved │
│ record │ ──▶ │ Truncate │ ──▶ │ fields only │ ──▶ │ fields only │
│ (all cols)│ │ Normalize │ │ + ranking data │ │ (UI needs) │
└──────────┘ └──────────────┘ └──────────────────┘ └──────────────┘
```
## Instructions
### Step 1: Transform Records Before Indexing
```typescript
import { algoliasearch } from 'algoliasearch';
const client = algoliasearch(process.env.ALGOLIA_APP_ID!, process.env.ALGOLIA_ADMIN_KEY!);
// Define what goes into Algolia — NOT everything from your DB
interface AlgoliaProduct {
objectID: string;
name: string;
description: string; // Truncated, plain text
category: string;
brand: string;
price: number;
in_stock: boolean;
image_url: string;
rating: number;
_tags: string[];
}
function transformForAlgolia(dbRecord: any): AlgoliaProduct {
return {
objectID: dbRecord.id,
name: dbRecord.name,
description: stripHtml(dbRecord.description).substring(0, 5000),
category: dbRecord.category?.name || 'uncategorized',
brand: dbRecord.brand?.name || '',
price: dbRecord.price_cents / 100,
in_stock: dbRecord.inventory_count > 0,
image_url: dbRecord.images?.[0]?.url || '',
rating: dbRecord.avg_rating || 0,
_tags: buildTags(dbRecord),
};
}
// Strip HTML tags for clean search text
function stripHtml(html: string): string {
return html?.replace(/<[^>]*>/g, ' ').replace(/\s+/g, ' ').trim() || '';
}
function buildTags(record: any): string[] {
const tags: string[] = [];
if (record.is_featured) tags.push('featured');
if (record.is_new) tags.push('new-arrival');
if (record.discount_percent > 0) tags.push('on-sale');
return tags;
}
```
### Step 2: PII Detection and Filtering
```typescript
// NEVER index PII unless absolutely necessary for search
const PII_FIELDS = ['email', 'phone', 'ssn', 'address', 'credit_card', 'password', 'api_key'];
function stripPII(record: Record<string, any>): Record<string, any> {
const clean = { ...record };
for (const field of PII_FIELDS) {
delete clean[field];
}
return clean;
}
// If you MUST index user-facing names (e.g., author names in articles)
// Use unretrievableAttributes so they're searchable but never returned
await client.setSettings({
indexName: 'articles',
indexSettings: {
searchableAttributes: ['title', 'author_name', 'content'],
unretrievableAttributes: ['author_name'], // Searchable but never in response
attributesToRetrieve: ['title', 'excerpt', 'url', 'published_at'],
},
});
```
### Step 3: Algolia-Side Data Access Control
```typescript
// Use secured API keys to filter what each user can see
function generateUserKey(userId: string, tenantId: string) {
return client.generateSecuredApiKey({
parentApiKey: process.env.ALGOLIA_SEARCH_KEY!,
restrictions: {
filters: `tenant_id:${tenantId} AND (visibility:public OR created_by:${userId})`,
validUntil: Math.floor(Date.now() / 1000) + 3600,
},
});
}
// User can only search records where:
// - tenant_id matches their org AND
// - visibility is public OR they created it
```
### Step 4: GDPR Right to Deletion
```typescript
// When a user requests data deletion:
async function deleteUserData(userId: string) {
const results: Record<string, string> = {};
// 1. Delete user's records from all indices
for (const indexName of ['products', 'reviews', 'wishlists']) {
try {
await client.deleteBy({
indexName,
deleteByParams: { filters: `created_by:${userId}` },
});
results[indexName] = 'deleted';
} catch (e) {
results[indexName] = `failed: ${e}`;
}
}
// 2. Delete Insights/Analytics data for this user
// Algolia retains events for 90 days by default
// Use the Insights API to request user data deletion
await client.deleteUserToken({ userToken: userId });
// 3. Log the deletion for compliance audit
console.log({
event: 'gdpr.deletion',
userId,
timestamp: new Date().toISOString(),
results,
});
return results;
}
```
### Step 5: Data Subject Access Request (DSAR)
```typescript
// Export all data associated with a user
async function exportUserData(userId: string) {
const exportData: Record<string, any[]> = {};
for (const indexName of ['products', 'reviews', 'wishlists']) {
const records: any[] = [];
let cursor: string | undefined;
// Browse all records matching the user
do {
const result = await client.browse({
indexName,
browseParams: {
filters: `created_by:${userId}`,
hitsPerPage: 1000,
cursor,
},
});
records.push(...result.hits);
cursor = result.cursor;
} while (cursor);
exportData[indexName] = records;
}
return {
exportedAt: new Date().toISOString(),
userId,
data: exportData,
};
}
```
### Step 6: Data Retention and Cleanup
```typescript
// Scheduled job: delete old records past retention period
async function enforceRetention(indexName: string, retentionDays: number) {
const cutoffTimestamp = Math.floor(
(Date.now() - retentionDays * 24 * 60 * 60 * 1000) / 1000
);
await client.deleteBy({
indexName,
deleteByParams: {
filters: `created_at_timestamp < ${cutoffTimestamp}`,
},
});
console.log(`Deleted records older than ${retentionDays} days from ${indexName}`);
}
// Run daily: enforceRetention('activity_logs', 90);
```
## Data Classification for Algolia
| Category | Examples | Index It? | Retrieve It? |
|----------|----------|-----------|-------------|
| Public product data | Name, price, category | Yes | Yes |
| Searchable metadata | Tags, internal categories | Yes | No (`unretrievableAttributes`) |
| User-generated content | Reviews, comments | Yes (anonymized) | Yes |
| PII | Email, phone, address | NO | NO |
| Sensitive business data | Margins, supplier costs | NO | NO |
## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| PII in Algolia index | Transform didn't strip | Add PII check to indexing pipeline |
| `deleteBy` no effect | Filter doesn't match | Verify field is in `attributesForFaceting` |
| DSAR export incomplete | Paginated results | Use cursor-based browsing |
| Retention job deletes too much | Wrong timestamp format | Use Unix timestamp (seconds), not milliseconds |
## Resources
- [Algolia Privacy & GDPR](https://www.algolia.com/policies/privacy/)
- [deleteBy Reference](https://www.algolia.com/doc/api-reference/api-methods/delete-by/)
- [browse Reference](https://www.algolia.com/doc/api-reference/api-methods/browse/)
- [Insights User Deletion](https://www.algolia.com/doc/guides/sending-events/getting-started/)
## Next Steps
For enterprise access control, see `algolia-enterprise-rbac`.Related Skills
College Football Data (CFB)
Before writing queries, consult `references/api-reference.md` for endpoints, conference IDs, team IDs, and data shapes.
College Basketball Data (CBB)
Before writing queries, consult `references/api-reference.md` for endpoints, conference IDs, team IDs, and data shapes.
validating-database-integrity
Process use when you need to ensure database integrity through comprehensive data validation. This skill validates data types, ranges, formats, referential integrity, and business rules. Trigger with phrases like "validate database data", "implement data validation rules", "enforce data integrity constraints", or "validate data formats".
forecasting-time-series-data
This skill enables Claude to forecast future values based on historical time series data. It analyzes time-dependent data to identify trends, seasonality, and other patterns. Use this skill when the user asks to predict future values of a time series, analyze trends in data over time, or requires insights into time-dependent data. Trigger terms include "forecast," "predict," "time series analysis," "future values," and requests involving temporal data.
generating-test-data
This skill enables Claude to generate realistic test data for software development. It uses the test-data-generator plugin to create users, products, orders, and custom schemas for comprehensive testing. Use this skill when you need to populate databases, simulate user behavior, or create fixtures for automated tests. Trigger phrases include "generate test data", "create fake users", "populate database", "generate product data", "create test orders", or "generate data based on schema". This skill is especially useful for populating testing environments or creating sample data for demonstrations.
test-data-builder
Test Data Builder - Auto-activating skill for Test Automation. Triggers on: test data builder, test data builder Part of the Test Automation skill category.
splitting-datasets
Process split datasets into training, validation, and testing sets for ML model development. Use when requesting "split dataset", "train-test split", or "data partitioning". Trigger with relevant phrases based on skill purpose.
scanning-database-security
Process use when you need to work with security and compliance. This skill provides security scanning and vulnerability detection with comprehensive guidance and automation. Trigger with phrases like "scan for vulnerabilities", "implement security controls", or "audit security".
preprocessing-data-with-automated-pipelines
Process automate data cleaning, transformation, and validation for ML tasks. Use when requesting "preprocess data", "clean data", "ETL pipeline", or "data transformation". Trigger with relevant phrases based on skill purpose.
optimizing-database-connection-pooling
Process use when you need to work with connection management. This skill provides connection pooling and management with comprehensive guidance and automation. Trigger with phrases like "manage connections", "configure pooling", or "optimize connection usage".
modeling-nosql-data
This skill enables Claude to design NoSQL data models. It activates when the user requests assistance with NoSQL database design, including schema creation, data modeling for MongoDB or DynamoDB, or defining document structures. Use this skill when the user mentions "NoSQL data model", "design MongoDB schema", "create DynamoDB table", or similar phrases related to NoSQL database architecture. It assists in understanding NoSQL modeling principles like embedding vs. referencing, access pattern optimization, and sharding key selection.
monitoring-database-transactions
Monitor use when you need to work with monitoring and observability. This skill provides health monitoring and alerting with comprehensive guidance and automation. Trigger with phrases like "monitor system health", "set up alerts", or "track metrics".