data-processor

Process and transform arrays of data with common operations like filtering, mapping, and aggregation

25 stars

Best use case

data-processor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Process and transform arrays of data with common operations like filtering, mapping, and aggregation

Teams using data-processor should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/data-processor/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/aiskillstore/marketplace/artemisai/data-processor/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/data-processor/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How data-processor Compares

Feature / Agentdata-processorStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Process and transform arrays of data with common operations like filtering, mapping, and aggregation

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Data Processor Skill

A general-purpose data processing skill for transforming arrays of objects. This skill demonstrates the token efficiency benefits of code execution - instead of describing transformations in natural language, write code once and reuse it.

## What This Skill Does

Processes arrays of data with common transformations:
- Filter records based on conditions
- Map fields to new values
- Aggregate data (sum, average, count, etc.)
- Sort and group data
- Remove duplicates
- Merge datasets

## When to Use This Skill

Use this skill when you need to:
- Transform large datasets (hundreds or thousands of records)
- Apply consistent business logic to data
- Aggregate or summarize data
- Clean or normalize data
- Combine data from multiple sources

**Token Efficiency**: Processing 1000 records in code uses ~500 tokens. Describing the same operations in natural language would use ~50,000 tokens.

## Implementation

```javascript
/**
 * Data Processor - General purpose data transformation
 * @param {Array} data - Array of objects to process
 * @param {Object} operations - Operations to apply
 * @returns {Object} Processed data and statistics
 */
async function processData(data, operations = {}) {
  if (!Array.isArray(data)) {
    throw new Error('Data must be an array');
  }
  
  let result = [...data];
  const stats = {
    inputCount: data.length,
    operations: [],
  };
  
  // Filter operation
  if (operations.filter) {
    const beforeCount = result.length;
    result = result.filter(operations.filter);
    stats.operations.push({
      type: 'filter',
      recordsRemoved: beforeCount - result.length
    });
  }
  
  // Map operation (transform fields)
  if (operations.map) {
    result = result.map(operations.map);
    stats.operations.push({ type: 'map' });
  }
  
  // Sort operation
  if (operations.sort) {
    const { field, order = 'asc' } = operations.sort;
    result.sort((a, b) => {
      const aVal = a[field];
      const bVal = b[field];
      const comparison = aVal < bVal ? -1 : aVal > bVal ? 1 : 0;
      return order === 'asc' ? comparison : -comparison;
    });
    stats.operations.push({ type: 'sort', field, order });
  }
  
  // Aggregate operation
  if (operations.aggregate) {
    const { field, operation: aggOp } = operations.aggregate;
    const values = result.map(r => r[field]).filter(v => v != null);
    
    let aggregateResult;
    switch (aggOp) {
      case 'sum':
        aggregateResult = values.reduce((sum, v) => sum + v, 0);
        break;
      case 'average':
        aggregateResult = values.reduce((sum, v) => sum + v, 0) / values.length;
        break;
      case 'count':
        aggregateResult = values.length;
        break;
      case 'min':
        aggregateResult = Math.min(...values);
        break;
      case 'max':
        aggregateResult = Math.max(...values);
        break;
      default:
        throw new Error(`Unknown aggregate operation: ${aggOp}`);
    }
    
    stats.aggregateResult = {
      field,
      operation: aggOp,
      value: aggregateResult
    };
  }
  
  // Remove duplicates
  if (operations.unique) {
    const { field } = operations.unique;
    const seen = new Set();
    const beforeCount = result.length;
    result = result.filter(item => {
      const key = item[field];
      if (seen.has(key)) return false;
      seen.add(key);
      return true;
    });
    stats.operations.push({
      type: 'unique',
      field,
      duplicatesRemoved: beforeCount - result.length
    });
  }
  
  stats.outputCount = result.length;
  
  return {
    data: result,
    stats
  };
}

module.exports = processData;
```

## Examples

### Example 1: Filter and Sort

```javascript
const processData = require('/skills/data-processor.js');

const salesData = [
  { id: 1, amount: 150, status: 'completed' },
  { id: 2, amount: 200, status: 'pending' },
  { id: 3, amount: 175, status: 'completed' },
  { id: 4, amount: 225, status: 'completed' }
];

const result = await processData(salesData, {
  filter: (record) => record.status === 'completed',
  sort: { field: 'amount', order: 'desc' }
});

console.log(result);
// Output:
// {
//   data: [
//     { id: 4, amount: 225, status: 'completed' },
//     { id: 3, amount: 175, status: 'completed' },
//     { id: 1, amount: 150, status: 'completed' }
//   ],
//   stats: {
//     inputCount: 4,
//     operations: [
//       { type: 'filter', recordsRemoved: 1 },
//       { type: 'sort', field: 'amount', order: 'desc' }
//     ],
//     outputCount: 3
//   }
// }
```

### Example 2: Aggregate Data

```javascript
const processData = require('/skills/data-processor.js');

const orders = [
  { orderId: 1, total: 100 },
  { orderId: 2, total: 150 },
  { orderId: 3, total: 200 }
];

const result = await processData(orders, {
  aggregate: { field: 'total', operation: 'sum' }
});

console.log(result.stats.aggregateResult);
// Output: { field: 'total', operation: 'sum', value: 450 }
```

### Example 3: Complex Transformation

```javascript
const processData = require('/skills/data-processor.js');

const customers = [
  { name: '  John Doe  ', email: 'JOHN@EXAMPLE.COM', age: 30 },
  { name: 'Jane Smith', email: 'jane@example.com', age: 25 },
  { name: '  John Doe  ', email: 'JOHN@EXAMPLE.COM', age: 30 } // duplicate
];

const result = await processData(customers, {
  map: (customer) => ({
    name: customer.name.trim(),
    email: customer.email.toLowerCase(),
    age: customer.age
  }),
  unique: { field: 'email' },
  filter: (customer) => customer.age >= 25,
  sort: { field: 'age', order: 'asc' }
});

console.log(result.data);
// Output:
// [
//   { name: 'Jane Smith', email: 'jane@example.com', age: 25 },
//   { name: 'John Doe', email: 'john@example.com', age: 30 }
// ]
```

## Integration with MCP Tools

This skill works great in combination with MCP tools:

```javascript
// Fetch data from an MCP tool
const rawData = await callMCPTool('database__query', {
  query: 'SELECT * FROM customers WHERE created_date > "2024-01-01"'
});

// Process with the skill
const processData = require('/skills/data-processor.js');
const result = await processData(rawData, {
  filter: (r) => r.status === 'active',
  sort: { field: 'revenue', order: 'desc' },
  aggregate: { field: 'revenue', operation: 'sum' }
});

// Save results
await callMCPTool('storage__save', {
  key: 'processed_customers',
  value: result.data
});

// Return summary to agent (not full data)
return {
  processedRecords: result.stats.outputCount,
  totalRevenue: result.stats.aggregateResult.value
};
```

## Tips and Best Practices

1. **Save Intermediate Results**: For large datasets, save to `/workspace` after each major operation
2. **Return Summaries**: Send statistics to the agent, not full datasets
3. **Chain Operations**: Combine multiple operations for complex transformations
4. **Validate Input**: Always check data types and handle edge cases
5. **Reuse This Skill**: Save to `/skills` and use across multiple tasks

## Related Skills

- `validator` - Validate data before processing
- `exporter` - Export processed data to various formats
- `aggregator` - Advanced statistical aggregations

## Performance Notes

This skill can process:
- 1,000 records: < 50ms
- 10,000 records: < 200ms
- 100,000 records: < 2s

All operations use efficient JavaScript array methods with O(n) or O(n log n) complexity.

---

**Inspired by**: The Anthropic skills pattern for token-efficient data processing. See [Code Execution with MCP](https://www.anthropic.com/engineering/code-execution-with-mcp) for the philosophy behind this approach.

Related Skills

data-visualization-tool

25
from ComeOnOliver/skillshub

Chart and visualization generation for DBX Studio. Use when a user wants to visualize data — bar charts, line graphs, pie charts, scatter plots, etc.

data-exploration

25
from ComeOnOliver/skillshub

Systematic database and table profiling for DBX Studio. Use when a user wants to understand their data, explore schema structure, or profile a dataset.

data-exploration-tool

25
from ComeOnOliver/skillshub

Systematic database and table profiling for DBX Studio. Use when a user wants to understand their data, explore schema structure, or profile a dataset.

data-privacy-compliance

25
from ComeOnOliver/skillshub

Data privacy and regulatory compliance specialist for GDPR, CCPA, HIPAA, and international data protection laws. Use when implementing privacy controls, conducting data protection impact assessments, ensuring regulatory compliance, or managing data subject rights. Expert in consent management, data minimization, and privacy-by-design principles.

database-fundamentals

25
from ComeOnOliver/skillshub

Auto-invoke when reviewing schema design, database queries, ORM usage, or migrations. Enforces normalization, indexing awareness, query optimization, and migration safety.

seed-data-generator

25
from ComeOnOliver/skillshub

Generate realistic test data for database development, testing, and demos.

csv-processor

25
from ComeOnOliver/skillshub

Parse, transform, and analyze CSV files with advanced data manipulation capabilities.

database-operations

25
from ComeOnOliver/skillshub

Instructions on how to write database queries with SQLAlchemy.

database-migration-helper

25
from ComeOnOliver/skillshub

Creates database migration files following project conventions for Prisma, Sequelize, Alembic, Knex, TypeORM, and other ORMs. Use when adding tables, modifying schemas, or when user mentions database changes.

database-migration

25
from ComeOnOliver/skillshub

Guide for creating idempotent Supabase database migrations with RLS policies and workspace isolation

reading-logseq-data

25
from ComeOnOliver/skillshub

Expert in reading data from Logseq DB graphs via HTTP API or CLI. Auto-invokes when users want to fetch pages, blocks, or properties from Logseq, execute Datalog queries against their graph, search content, or retrieve backlinks and relationships. Provides the logseq-client library for operations.

querying-logseq-data

25
from ComeOnOliver/skillshub

Expert in building Datalog queries for Logseq DB graphs. Auto-invokes when users need help writing Logseq queries, understanding Datalog syntax, optimizing query performance, or working with the Datascript query engine. Covers advanced query patterns, pull syntax, aggregations, and DB-specific query techniques.