test-data
Test data management patterns: factory functions, fixtures, database seeders, test isolation strategies, and safely anonymizing production data for testing. Covers TypeScript, Python, and Go.
Best use case
test-data is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Test data management patterns: factory functions, fixtures, database seeders, test isolation strategies, and safely anonymizing production data for testing. Covers TypeScript, Python, and Go.
Teams using test-data should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/test-data/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How test-data Compares
| Feature / Agent | test-data | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Test data management patterns: factory functions, fixtures, database seeders, test isolation strategies, and safely anonymizing production data for testing. Covers TypeScript, Python, and Go.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
SKILL.md Source
# Test Data Skill
Bad test data leads to flaky tests, test interdependence, and production data leaking into dev. Good test data is isolated, realistic, and generated programmatically.
## When to Activate
- Setting up test infrastructure for a new project
- Tests are slow because they share state
- Tests fail when run in different order (order-dependent tests)
- Writing integration tests that need DB rows
- Creating realistic seed data for development
- Replacing hard-coded fixture JSON files with programmatic factory functions
- Ensuring production data is anonymized before being restored to a staging or development environment
- Choosing between transaction rollback, truncation, or unique-ID strategies for test isolation
---
## Core Principles
1. **Each test creates its own data** — never depend on state from another test
2. **Use factories, not fixtures** — factories generate data, fixtures are static snapshots
3. **Clean up after yourself** — or use transactions that roll back
4. **Realistic but fake** — use faker, not `test@test.com` / `password`
5. **Seeders for dev, factories for tests** — different tools for different purposes
---
## Pattern 1: Factory Functions
### TypeScript (with Drizzle/Prisma)
```typescript
// tests/factories/user.factory.ts
import { faker } from '@faker-js/faker';
import { db } from '../../src/db';
import { users } from '../../src/db/schema';
interface UserOptions {
email?: string;
role?: 'admin' | 'manager' | 'customer';
plan?: 'free' | 'pro';
}
export async function createUser(options: UserOptions = {}) {
const [user] = await db.insert(users).values({
email: options.email ?? faker.internet.email(),
name: faker.person.fullName(),
role: options.role ?? 'customer',
plan: options.plan ?? 'free',
passwordHash: await bcrypt.hash('test-password', 10),
createdAt: new Date(),
}).returning();
return user;
}
// Compose factories for related data
export async function createOrderWithItems(userId: string, itemCount = 2) {
const [order] = await db.insert(orders).values({
userId,
status: 'pending',
total: 0,
}).returning();
const items = await Promise.all(
Array.from({ length: itemCount }, async () => {
const product = await createProduct();
return db.insert(orderItems).values({
orderId: order.id,
productId: product.id,
quantity: faker.number.int({ min: 1, max: 5 }),
price: product.price,
}).returning();
})
);
return { order, items: items.flat() };
}
```
### Python (with SQLAlchemy)
```python
# tests/factories.py
import factory
from factory.alchemy import SQLAlchemyModelFactory
from faker import Faker
fake = Faker()
class UserFactory(SQLAlchemyModelFactory):
class Meta:
model = User
sqlalchemy_session = None # set in conftest
email = factory.LazyAttribute(lambda _: fake.email())
name = factory.LazyAttribute(lambda _: fake.name())
role = 'customer'
plan = 'free'
password_hash = factory.LazyAttribute(lambda _: bcrypt.hash('test-password'))
# Usage
user = UserFactory.create()
admin = UserFactory.create(role='admin', plan='pro')
users = UserFactory.create_batch(5)
```
### Go
```go
// tests/factories/user.go
package factories
import (
"github.com/brianvoe/gofakeit/v7"
"github.com/myapp/internal/domain"
)
type UserFactory struct{ db *sqlx.DB }
func (f *UserFactory) Create(opts ...func(*domain.User)) (*domain.User, error) {
u := &domain.User{
Email: gofakeit.Email(),
Name: gofakeit.Name(),
Role: "customer",
}
for _, opt := range opts {
opt(u)
}
return f.db.InsertUser(context.Background(), u)
}
// Usage
user, _ := factory.Create(func(u *domain.User) { u.Role = "admin" })
```
---
## Pattern 2: Test Isolation
### Option A: Transaction Rollback (fastest)
```typescript
// vitest / jest
let txn: Transaction;
beforeEach(async () => {
txn = await db.transaction();
// Override db in the module under test to use this transaction
});
afterEach(async () => {
await txn.rollback(); // All changes undone — no cleanup needed
});
```
### Option B: Truncate After Each Test
```typescript
// If rollback isn't possible (e.g. autocommit drivers)
afterEach(async () => {
await db.execute(sql`TRUNCATE users, orders, order_items RESTART IDENTITY CASCADE`);
});
```
### Option C: Unique Data Per Test (no cleanup needed)
```typescript
// Use a unique prefix so tests don't collide even without cleanup
const testId = randomUUID();
const user = await createUser({ email: `test+${testId}@example.com` });
```
---
## Pattern 3: Database Seeder (for development)
```typescript
// scripts/seed.ts — for local development, not tests
import { faker } from '@faker-js/faker';
async function seed() {
console.log('Seeding database...');
// Create predictable users for easy login
const admin = await createUser({ email: 'admin@example.com', role: 'admin' });
const user = await createUser({ email: 'user@example.com', role: 'customer' });
// Create realistic random data
await Promise.all(
Array.from({ length: 50 }, () => createUser())
);
await Promise.all(
Array.from({ length: 100 }, () => createOrderWithItems(user.id, faker.number.int({ min: 1, max: 5 })))
);
console.log('Done.');
}
seed().then(() => process.exit(0)).catch(console.error);
```
```json
// package.json
"scripts": {
"db:seed": "tsx scripts/seed.ts",
"db:reset": "npx prisma migrate reset --force && npm run db:seed"
}
```
---
## Pattern 4: Anonymizing Production Data
When you need production-scale data for realistic load tests or debugging:
```typescript
// scripts/anonymize.ts — run AFTER dumping production DB to staging
async function anonymize() {
await db.execute(sql`
UPDATE users SET
email = 'user_' || id || '@example-test.com',
name = 'Test User ' || id,
phone = NULL,
address = NULL
WHERE email NOT LIKE '%@yourcompany.com' -- preserve internal accounts
`);
await db.execute(sql`
UPDATE orders SET
shipping_address = '123 Test Street, Testville',
billing_address = '123 Test Street, Testville'
`);
// Delete actual payment data — never anonymize, always delete
await db.execute(sql`UPDATE orders SET stripe_payment_intent_id = NULL`);
console.log('Anonymization complete.');
}
```
**Rules for anonymization:**
- Never bring real emails into dev/staging (GDPR, accidental emails to real users)
- Never bring real payment methods or tokens
- Never bring real passwords (hash them to a known value)
- Preserve data shape/volume (the whole point is realistic scale)
---
## Anti-Patterns
### Using Hard-Coded Fixture Files Shared Across Tests
**Wrong:**
```typescript
// tests/fixtures/users.json — shared by all tests
[{ "id": "1", "email": "alice@example.com", "role": "admin" }]
// test-a.test.ts
const user = fixtures.users[0]
user.role = 'customer' // mutates shared object — breaks test-b
```
**Correct:**
```typescript
// Each test creates isolated data via factory
const user = await createUser({ role: 'admin' })
// No shared mutable state — test runs in any order
```
**Why:** Shared fixture files create hidden dependencies between tests; factories produce isolated, independent data for every test run.
### Using Predictable or Real-Looking PII as Test Data
**Wrong:**
```typescript
const user = await createUser({
email: 'test@test.com',
name: 'Test User',
phone: '555-1234',
})
```
**Correct:**
```typescript
import { faker } from '@faker-js/faker'
const user = await createUser({
email: faker.internet.email(),
name: faker.person.fullName(),
phone: faker.phone.number(),
})
```
**Why:** Placeholder values like `test@test.com` can end up in logs, emails, or analytics; faker-generated data is realistic enough to catch formatting bugs without leaking patterns.
### Not Cleaning Up Between Tests (Leaving DB State)
**Wrong:**
```typescript
it('creates an order', async () => {
const order = await createOrder({ userId: 'user-1' })
expect(order.status).toBe('pending')
// no cleanup — next test sees this order
})
it('lists pending orders', async () => {
const orders = await listPendingOrders()
expect(orders).toHaveLength(1) // fails if previous test ran first and left data
})
```
**Correct:**
```typescript
beforeEach(async () => { await db.execute(sql`TRUNCATE orders RESTART IDENTITY CASCADE`) })
it('creates an order', async () => { ... })
it('lists pending orders', async () => { expect(orders).toHaveLength(0) }) // always clean
```
**Why:** Tests that rely on implicit pre-existing data fail unpredictably depending on run order, parallelism, or leftover data from prior failures.
### Using Seeders (Dev Scripts) Inside Unit/Integration Tests
**Wrong:**
```typescript
beforeAll(async () => {
await runSeedScript() // seeds 50 users, 100 orders — slow and imprecise
})
it('finds admin users', async () => {
const admins = await findByRole('admin')
expect(admins.length).toBeGreaterThan(0) // fragile — depends on seed content
})
```
**Correct:**
```typescript
it('finds admin users', async () => {
await createUser({ role: 'admin' })
await createUser({ role: 'customer' }) // control exact state
const admins = await findByRole('admin')
expect(admins).toHaveLength(1)
})
```
**Why:** Seeders are designed for developer convenience with bulk data, not test precision; factories give each test exact control over what exists in the database.
### Bringing Real Production Data Into Development Without Anonymization
**Wrong:**
```bash
# Restore prod dump directly to dev DB
pg_restore --dbname=myapp_dev prod-backup.dump
# Real emails, names, payment tokens now in dev
```
**Correct:**
```bash
pg_restore --dbname=myapp_dev prod-backup.dump
tsx scripts/anonymize.ts # replace PII before any developer touches the data
```
**Why:** Real PII in dev environments violates GDPR, risks accidental emails to real users, and exposes payment tokens — anonymize immediately after restore, never after.
## Checklist
- [ ] Tests use factories, not shared fixture files
- [ ] Each test creates its own data (no test order dependency)
- [ ] Tests clean up (rollback, truncate, or unique IDs)
- [ ] Seeders use realistic fake data (faker) not `test@test.com`
- [ ] Production data anonymized before use in dev/staging
- [ ] No real PII (emails, names, payment data) in dev environments
- [ ] Factory defaults are sensible (no required fields without defaults)Related Skills
visual-testing
Visual Regression Testing: tool comparison (Chromatic/Percy/Playwright screenshots/BackstopJS), pixel-diff vs AI-based comparison, baseline management, flakiness strategies (masks, tolerances, waitForLoadState), CI integration with GitHub Actions, and Storybook integration.
typescript-testing
TypeScript testing patterns: Vitest for unit/integration, Playwright for E2E, MSW for API mocking, Testing Library for React components. Core TDD methodology for TypeScript/JavaScript projects.
swift-testing
Swift testing patterns: Swift Testing framework (Swift 6+), XCTest for UI tests, async/await test cases, actor testing, Combine testing, and XCUITest for UI automation. TDD for Swift/SwiftUI.
swift-protocol-di-testing
Protocol-based dependency injection for testable Swift code — mock file system, network, and external APIs using focused protocols and Swift Testing.
scala-testing
Scala testing with ScalaTest, MUnit, and ScalaCheck: FunSpec/FlatSpec test structure, property-based testing with forAll, mocking with MockitoSugar, Cats Effect testing with munit-cats-effect (runTest/IOSuite), ZIO Test, Testcontainers-Scala for database integration tests, and CI integration with sbt. Use when writing or reviewing Scala tests.
rust-testing
Rust testing patterns — unit tests with mockall, integration tests with sqlx transactions, HTTP handler testing (axum), benchmarks (criterion), property tests (proptest), fuzzing, and CI with cargo-nextest.
rust-testing-advanced
Advanced Rust testing anti-patterns and corrections — cfg(test) placement, expect() over unwrap(), mockall expectation ordering, executor mixing (#[tokio::test] vs block_on), PgPool isolation with
ruby-testing
RSpec testing patterns for Ruby and Rails — factories, mocks, request specs, feature specs, VCR, and SimpleCov coverage.
r-testing
R testing patterns: testthat 3e with expect_* assertions, snapshot testing, mocking with mockery and httptest2, covr code coverage, lintr static analysis, property-based testing with hedgehog, testing Shiny apps with shinytest2. Use when writing or reviewing R tests.
python-testing
Python testing strategies using pytest, TDD methodology, fixtures, mocking, and parametrization. Core testing fundamentals.
python-testing-advanced
Advanced Python testing — async testing with pytest-asyncio, exception/side-effect testing, test organization, common patterns (API, database, class methods), pytest configuration, and CLI reference. Extends python-testing.
php-testing
PHP testing patterns: PHPUnit 11 with mocks and data providers, Pest v3 with expectations and datasets, Laravel feature/HTTP tests with RefreshDatabase, Symfony WebTestCase, PHPStan static analysis, Infection mutation testing. Use when writing or reviewing PHP tests.