test-data-generation

Synthetic test data generation and management using Faker.js and similar tools. Generate realistic test data, create data factories, implement database seeding, and manage test data anonymization.

509 stars

Best use case

test-data-generation is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Synthetic test data generation and management using Faker.js and similar tools. Generate realistic test data, create data factories, implement database seeding, and manage test data anonymization.

Teams using test-data-generation should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/test-data-generation/SKILL.md --create-dirs "https://raw.githubusercontent.com/a5c-ai/babysitter/main/library/specializations/qa-testing-automation/skills/test-data-generation/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/test-data-generation/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How test-data-generation Compares

Feature / Agenttest-data-generationStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Synthetic test data generation and management using Faker.js and similar tools. Generate realistic test data, create data factories, implement database seeding, and manage test data anonymization.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# test-data-generation

You are **test-data-generation** - a specialized skill for synthetic test data generation and management, providing capabilities for creating realistic, reproducible test data.

## Overview

This skill enables AI-powered test data management including:
- Generating realistic test data with Faker.js
- Creating data factories and builders
- Database seeding scripts
- Test data anonymization and masking
- Generating boundary value test data
- Configuring data cleanup strategies
- Creating deterministic test data with seeds
- Integration with ORM factories (Fishery, Factory Bot)

## Prerequisites

- Node.js or Python environment
- Faker library installed (@faker-js/faker or faker-python)
- Database access for seeding operations
- Optional: ORM (Prisma, Sequelize, SQLAlchemy) for factory integration

## Capabilities

### 1. Basic Data Generation

Generate realistic test data with Faker.js:

```javascript
import { faker } from '@faker-js/faker';

// Generate user data
const generateUser = () => ({
  id: faker.string.uuid(),
  email: faker.internet.email(),
  firstName: faker.person.firstName(),
  lastName: faker.person.lastName(),
  phone: faker.phone.number(),
  address: {
    street: faker.location.streetAddress(),
    city: faker.location.city(),
    state: faker.location.state(),
    zipCode: faker.location.zipCode(),
    country: faker.location.country()
  },
  company: faker.company.name(),
  jobTitle: faker.person.jobTitle(),
  avatar: faker.image.avatar(),
  createdAt: faker.date.past(),
  updatedAt: faker.date.recent()
});

// Generate multiple users
const users = faker.helpers.multiple(generateUser, { count: 100 });
```

### 2. Data Factory Pattern

Create reusable data factories:

```javascript
import { faker } from '@faker-js/faker';

// User Factory
class UserFactory {
  static defaults = {
    id: () => faker.string.uuid(),
    email: () => faker.internet.email(),
    firstName: () => faker.person.firstName(),
    lastName: () => faker.person.lastName(),
    role: () => 'user',
    isActive: () => true,
    createdAt: () => faker.date.past()
  };

  static create(overrides = {}) {
    const defaults = Object.fromEntries(
      Object.entries(this.defaults).map(([key, fn]) => [key, fn()])
    );
    return { ...defaults, ...overrides };
  }

  static createMany(count, overrides = {}) {
    return Array.from({ length: count }, () => this.create(overrides));
  }

  // Trait methods
  static admin(overrides = {}) {
    return this.create({ role: 'admin', ...overrides });
  }

  static inactive(overrides = {}) {
    return this.create({ isActive: false, ...overrides });
  }
}

// Usage
const user = UserFactory.create();
const admin = UserFactory.admin({ firstName: 'Admin' });
const users = UserFactory.createMany(50);
```

### 3. Fishery Factory (TypeScript)

Using Fishery for typed factories:

```typescript
import { Factory } from 'fishery';
import { faker } from '@faker-js/faker';

interface User {
  id: string;
  email: string;
  firstName: string;
  lastName: string;
  role: 'user' | 'admin';
  profile: Profile;
}

interface Profile {
  bio: string;
  avatar: string;
}

const profileFactory = Factory.define<Profile>(() => ({
  bio: faker.person.bio(),
  avatar: faker.image.avatar()
}));

const userFactory = Factory.define<User>(({ associations, sequence }) => ({
  id: faker.string.uuid(),
  email: faker.internet.email(),
  firstName: faker.person.firstName(),
  lastName: faker.person.lastName(),
  role: 'user',
  profile: associations.profile || profileFactory.build()
}));

// Usage
const user = userFactory.build();
const admin = userFactory.build({ role: 'admin' });
const usersWithProfiles = userFactory.buildList(10, {}, {
  associations: { profile: profileFactory.build() }
});
```

### 4. Database Seeding

Seed databases with test data:

```javascript
// seed.js - Database seeding script
import { PrismaClient } from '@prisma/client';
import { faker } from '@faker-js/faker';

const prisma = new PrismaClient();

async function seed() {
  // Set seed for reproducibility
  faker.seed(12345);

  // Clear existing data
  await prisma.order.deleteMany();
  await prisma.product.deleteMany();
  await prisma.user.deleteMany();

  // Create users
  const users = await Promise.all(
    Array.from({ length: 50 }, () =>
      prisma.user.create({
        data: {
          email: faker.internet.email(),
          name: faker.person.fullName(),
          password: faker.internet.password()
        }
      })
    )
  );

  // Create products
  const products = await Promise.all(
    Array.from({ length: 100 }, () =>
      prisma.product.create({
        data: {
          name: faker.commerce.productName(),
          description: faker.commerce.productDescription(),
          price: parseFloat(faker.commerce.price()),
          sku: faker.string.alphanumeric(8).toUpperCase(),
          inStock: faker.datatype.boolean()
        }
      })
    )
  );

  // Create orders
  for (const user of users) {
    const orderCount = faker.number.int({ min: 1, max: 5 });
    for (let i = 0; i < orderCount; i++) {
      await prisma.order.create({
        data: {
          userId: user.id,
          status: faker.helpers.arrayElement(['pending', 'processing', 'shipped', 'delivered']),
          total: parseFloat(faker.commerce.price({ min: 10, max: 500 })),
          items: {
            create: faker.helpers.arrayElements(products, { min: 1, max: 5 }).map(p => ({
              productId: p.id,
              quantity: faker.number.int({ min: 1, max: 3 }),
              price: p.price
            }))
          }
        }
      });
    }
  }

  console.log('Seeding complete!');
}

seed()
  .catch(console.error)
  .finally(() => prisma.$disconnect());
```

### 5. Boundary Value Generation

Generate edge case test data:

```javascript
import { faker } from '@faker-js/faker';

const boundaryValues = {
  // String boundaries
  strings: {
    empty: '',
    singleChar: 'a',
    maxLength: 'a'.repeat(255),
    unicode: '日本語テスト',
    emoji: '🎉🚀💡',
    specialChars: '<script>alert("xss")</script>',
    sqlInjection: "'; DROP TABLE users; --",
    whitespace: '   spaces   ',
    newlines: 'line1\nline2\rline3'
  },

  // Number boundaries
  numbers: {
    zero: 0,
    negative: -1,
    maxInt: Number.MAX_SAFE_INTEGER,
    minInt: Number.MIN_SAFE_INTEGER,
    decimal: 0.1 + 0.2, // Famous floating point issue
    infinity: Infinity,
    nan: NaN
  },

  // Date boundaries
  dates: {
    epochStart: new Date(0),
    farPast: new Date('1900-01-01'),
    farFuture: new Date('2100-12-31'),
    leapYear: new Date('2024-02-29'),
    endOfMonth: new Date('2024-01-31'),
    timezoneEdge: new Date('2024-03-10T02:30:00') // DST transition
  },

  // Array boundaries
  arrays: {
    empty: [],
    single: [1],
    large: Array.from({ length: 10000 }, (_, i) => i)
  }
};

// Generate boundary test cases
function generateBoundaryTestCases(schema) {
  const testCases = [];

  for (const [field, config] of Object.entries(schema)) {
    if (config.type === 'string') {
      testCases.push(
        { [field]: '', expected: config.required ? 'error' : 'success' },
        { [field]: 'a'.repeat(config.maxLength + 1), expected: 'error' },
        { [field]: 'a'.repeat(config.maxLength), expected: 'success' }
      );
    }
    if (config.type === 'number') {
      testCases.push(
        { [field]: config.min - 1, expected: 'error' },
        { [field]: config.min, expected: 'success' },
        { [field]: config.max, expected: 'success' },
        { [field]: config.max + 1, expected: 'error' }
      );
    }
  }

  return testCases;
}
```

### 6. Data Anonymization

Anonymize production data for testing:

```javascript
import { faker } from '@faker-js/faker';
import crypto from 'crypto';

const anonymize = {
  // Consistent anonymization (same input = same output)
  email: (email) => {
    const hash = crypto.createHash('md5').update(email).digest('hex').slice(0, 8);
    return `user_${hash}@example.com`;
  },

  // Full replacement
  name: () => faker.person.fullName(),

  // Partial masking
  phone: (phone) => phone.replace(/\d(?=\d{4})/g, '*'),

  // Format preservation
  creditCard: (cc) => {
    const last4 = cc.slice(-4);
    return `****-****-****-${last4}`;
  },

  // Consistent fake data
  ssn: (ssn) => {
    faker.seed(crypto.createHash('md5').update(ssn).digest('hex'));
    return faker.string.numeric('###-##-####');
  },

  // Address anonymization
  address: () => ({
    street: faker.location.streetAddress(),
    city: faker.location.city(),
    state: faker.location.state(),
    zip: faker.location.zipCode()
  })
};

// Anonymize dataset
function anonymizeDataset(records) {
  return records.map(record => ({
    ...record,
    email: anonymize.email(record.email),
    name: anonymize.name(),
    phone: anonymize.phone(record.phone),
    creditCard: record.creditCard ? anonymize.creditCard(record.creditCard) : null,
    address: anonymize.address()
  }));
}
```

### 7. Multi-Locale Support

Generate data in different locales:

```javascript
import { faker, Faker } from '@faker-js/faker';
import { de, fr, ja, es } from '@faker-js/faker';

// German locale
const fakerDE = new Faker({ locale: [de] });
const germanUser = {
  name: fakerDE.person.fullName(),
  address: fakerDE.location.streetAddress(),
  city: fakerDE.location.city()
};

// Japanese locale
const fakerJA = new Faker({ locale: [ja] });
const japaneseUser = {
  name: fakerJA.person.fullName(),
  address: fakerJA.location.streetAddress(),
  city: fakerJA.location.city()
};

// Generate test data for multiple locales
const locales = { de, fr, ja, es };
function generateMultiLocaleData(count = 10) {
  return Object.entries(locales).flatMap(([code, locale]) => {
    const localFaker = new Faker({ locale: [locale] });
    return Array.from({ length: count }, () => ({
      locale: code,
      name: localFaker.person.fullName(),
      email: localFaker.internet.email(),
      phone: localFaker.phone.number(),
      address: localFaker.location.streetAddress()
    }));
  });
}
```

### 8. Deterministic Data with Seeds

Create reproducible test data:

```javascript
import { faker } from '@faker-js/faker';

// Set global seed for reproducibility
faker.seed(12345);

// Generate same data every time
const user1 = faker.person.fullName(); // Always same name
const user2 = faker.person.fullName(); // Always same name

// Reset seed for new sequence
faker.seed(12345);
const user1Again = faker.person.fullName(); // Same as user1

// Environment-based seeding
const testSeed = process.env.TEST_SEED || Date.now();
faker.seed(testSeed);
console.log(`Using seed: ${testSeed}`);
```

## MCP Server Integration

This skill can leverage the following MCP servers for enhanced capabilities:

| Server | Description | Installation |
|--------|-------------|--------------|
| funsjanssen/faker-mcp | Faker.js MCP Server | [GitHub](https://github.com/funsjanssen/faker-mcp) |

## Best Practices

1. **Use seeds** - Enable reproducible test data
2. **Factories over inline** - Use factory patterns for maintainability
3. **Realistic but safe** - Data should look real but not match real people
4. **Boundary coverage** - Include edge cases in test data
5. **Cleanup** - Implement data cleanup strategies
6. **Performance** - Generate data in batches for large datasets
7. **Validation** - Validate generated data matches expected schema

## Process Integration

This skill integrates with the following processes:
- `test-data-management.js` - All phases of test data handling
- `e2e-test-suite.js` - E2E test data setup
- `api-testing.js` - API test data generation
- `environment-management.js` - Environment data seeding

## Output Format

When executing operations, provide structured output:

```json
{
  "operation": "generate",
  "dataType": "users",
  "count": 100,
  "seed": 12345,
  "locale": "en",
  "schema": {
    "id": "uuid",
    "email": "email",
    "name": "fullName"
  },
  "outputFile": "./test-data/users.json",
  "statistics": {
    "generated": 100,
    "uniqueEmails": 100,
    "executionTime": "45ms"
  }
}
```

## Error Handling

- Validate schema before generation
- Handle large dataset memory constraints
- Provide seed information for debugging
- Log generation failures with context
- Support partial data generation recovery

## Constraints

- Never use real personal data as seeds
- Ensure generated emails don't match real domains
- Avoid generating data that could pass as real credentials
- Respect data privacy regulations (GDPR, etc.)
- Document seed values for test reproducibility

Related Skills

vitest

509
from a5c-ai/babysitter

Vitest configuration, mocking, coverage, snapshot testing, and performance.

structured-data

509
from a5c-ai/babysitter

JSON-LD schema markup and validation.

react-testing-library

509
from a5c-ai/babysitter

React Testing Library patterns, queries, user events, and accessibility testing.

pdf-generation

509
from a5c-ai/babysitter

Professional PDF documentation generation. Convert Markdown to PDF with custom templates, styling, table of contents, cross-references, and optimized output for print and archival.

diagram-generation

509
from a5c-ai/babysitter

Multi-format diagram generation from text descriptions. Create Mermaid, PlantUML, D2, and Graphviz diagrams including flowcharts, sequence diagrams, architecture diagrams (C4), and data models.

load-test-generator

509
from a5c-ai/babysitter

Generate load test scripts for k6, Locust, and Gatling from OpenAPI specs

CVE/CWE Database Skill

509
from a5c-ai/babysitter

CVE and CWE database querying and management

cloud-security-testing

509
from a5c-ai/babysitter

Multi-cloud security assessment and penetration testing capabilities. Execute Prowler/ScoutSuite assessments, analyze IAM policies, identify cloud misconfigurations, test permissions, and enumerate cloud resources across AWS/GCP/Azure.

contract-test-framework

509
from a5c-ai/babysitter

Consumer-driven contract testing for SDK-API compatibility. Generate Pact consumer tests, verify provider contracts, configure Pact broker, and implement can-i-deploy checks.

compatibility-test-matrix

509
from a5c-ai/babysitter

Multi-version, multi-platform SDK compatibility testing

Stryker Mutation Testing

509
from a5c-ai/babysitter

Stryker mutation testing for assessing test suite quality and effectiveness

pytest Testing

509
from a5c-ai/babysitter

Expert pytest framework for Python unit, integration, and functional testing