data-seeding-for-testing
Use when creating test data for scratch orgs, sandboxes, or CI pipelines: Apex @testSetup factories, sf data import tree plans, CumulusCI datasets, Snowfakery. NOT for production data migration or ETL pipelines.
Best use case
data-seeding-for-testing is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Use when creating test data for scratch orgs, sandboxes, or CI pipelines: Apex @testSetup factories, sf data import tree plans, CumulusCI datasets, Snowfakery. NOT for production data migration or ETL pipelines.
Teams using data-seeding-for-testing should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/data-seeding-for-testing/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How data-seeding-for-testing Compares
| Feature / Agent | data-seeding-for-testing | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Use when creating test data for scratch orgs, sandboxes, or CI pipelines: Apex @testSetup factories, sf data import tree plans, CumulusCI datasets, Snowfakery. NOT for production data migration or ETL pipelines.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Data Seeding For Testing
This skill activates when a practitioner needs to create, load, or manage test data for Salesforce development and testing environments. It covers the three primary seeding layers — Apex `@testSetup` factories for unit tests, `sf data import tree` plans for scratch orgs and developer sandboxes, and CumulusCI datasets with Snowfakery for partial and full sandboxes.
---
## Before Starting
Gather this context before working on anything in this domain:
- Identify the target org type (scratch org, developer sandbox, partial sandbox, full sandbox) — this determines which seeding layer is appropriate.
- Determine record volume and relationship depth — large volumes (>200MB JSON) exceed the `sf data import tree` limit and require CumulusCI or Bulk API.
- Confirm whether data must survive across test transactions (`@testSetup`) or exist persistently in the org (data plans, datasets).
---
## Core Concepts
### Three Seeding Layers
Salesforce has three distinct seeding layers, each suited to a different context:
1. **Apex `@testSetup`** — creates records in-memory within the test transaction. Records are rolled back after every test method. Use for unit and integration tests that run in Apex. Limit: `@isTest(SeeAllData=true)` is incompatible with `@testSetup` and should never be used together.
2. **`sf data import tree` plan (JSON)** — imports hierarchical record trees from JSON files into a target org. Cap is 200MB of JSON per plan execution. Supports parent-child relationships via `@sf_reference_id`. Use for scratch orgs and developer sandboxes in CI pipelines.
3. **CumulusCI datasets + Snowfakery** — designed for partial and full sandboxes where realistic, anonymized, or synthetic volumes of data are needed. CumulusCI `capture_dataset` extracts and anonymizes records; Snowfakery generates synthetic records from YAML recipes.
### `@isTest(SeeAllData=true)` Incompatibility
`@isTest(SeeAllData=true)` is incompatible with `@testSetup`. If both annotations appear on the same class, `@testSetup` methods are ignored and the class sees production data — this is a silent failure. The correct pattern is to always use `@testSetup` for isolated data and never mix with `SeeAllData=true`.
### Scratch Org Snapshots
Scratch Org Snapshots (available with Dev Hub) capture both metadata and data state of a scratch org and allow it to be cloned. Snapshots reduce CI pipeline time by eliminating repeated data seeding. Each snapshot consumes an allocation from the Dev Hub snapshot limit (25 active snapshots by default). Snapshots are not a replacement for data seeding — they pre-bake a seeded state that can be rapidly re-used.
### `sf data import tree` Plan JSON
A plan JSON file describes the import sequence for related objects. Parent objects must appear before child objects in the plan array. Each record in the source JSON must have an `@sf_reference_id` (a client-side string) that allows child records to reference parent records by ID within the same import. The `sf data import tree` command resolves these references at import time and substitutes real Salesforce IDs.
---
## Common Patterns
### Apex Test Data Factory
**When to use:** Unit and integration tests in Apex that need isolated records not shared with other tests.
**How it works:**
```apex
@isTest
public class AccountFactory {
public static Account create(String name) {
Account a = new Account(Name = name, BillingCountry = 'US');
insert a;
return a;
}
public static List<Account> createBulk(Integer count) {
List<Account> accts = new List<Account>();
for (Integer i = 0; i < count; i++) {
accts.add(new Account(Name = 'Test Account ' + i, BillingCountry = 'US'));
}
insert accts;
return accts;
}
}
@isTest
private class OpportunityServiceTest {
@testSetup
static void setup() {
Account a = AccountFactory.create('ACME');
Opportunity o = new Opportunity(
Name = 'Test Opp', AccountId = a.Id,
StageName = 'Prospecting', CloseDate = Date.today().addDays(30)
);
insert o;
}
@isTest
static void testOpportunityStageUpdate() {
Opportunity o = [SELECT Id, StageName FROM Opportunity LIMIT 1];
// test logic here
}
}
```
**Why not `SeeAllData=true`:** Test isolation is broken — tests pass locally but fail in fresh scratch orgs because production data is not present. CI pipelines will be unreliable.
### sf data import tree Plan
**When to use:** Loading hierarchical records into a scratch org or developer sandbox as part of a CI pipeline setup step.
**How it works:**
1. Create source JSON files for each object (e.g., `Account.json`, `Contact.json`).
2. Use `@sf_reference_id` in parent records, reference them in child records:
```json
// Account.json
{
"records": [
{
"attributes": {"type": "Account", "referenceId": "AccountRef1"},
"Name": "ACME Corp",
"BillingCountry": "US"
}
]
}
```
```json
// Contact.json
{
"records": [
{
"attributes": {"type": "Contact", "referenceId": "ContactRef1"},
"FirstName": "Jane",
"LastName": "Doe",
"AccountId": "@AccountRef1"
}
]
}
```
3. Create a plan JSON (`plan.json`):
```json
[
{
"sobject": "Account",
"saveRefs": true,
"resolveRefs": false,
"files": ["Account.json"]
},
{
"sobject": "Contact",
"saveRefs": true,
"resolveRefs": true,
"files": ["Contact.json"]
}
]
```
4. Run: `sf data import tree --plan test-data/plan.json --target-org MyScratchOrg`
**Why not Bulk API:** For small volumes (< 200MB), `sf data import tree` handles parent-child relationships automatically. Bulk API requires manual ID resolution.
### CumulusCI Dataset Capture and Load
**When to use:** Partial or full sandboxes where anonymized production-like data is needed, or where Snowfakery synthetic data generation is required for realistic volumes.
**How it works:**
- Capture: `cci task run capture_dataset --org qa --dataset my_dataset` — extracts records, anonymizes PII fields, stores as dataset files.
- Load: `cci task run load_dataset --org staging --dataset my_dataset` — loads the captured dataset into target org.
- Snowfakery: define a YAML recipe and use `cci task run generate_and_load_from_yaml` for fully synthetic data at scale.
---
## Decision Guidance
| Situation | Recommended Approach | Reason |
|---|---|---|
| Apex unit or integration tests | `@testSetup` + Test Data Factory | Records are isolated per test class, rolled back automatically |
| Scratch org CI pipeline setup | `sf data import tree` with plan JSON | Declarative, version-controlled, supports parent-child refs |
| Record volume > 200MB or complex relationships | CumulusCI datasets or Bulk API | Exceeds `sf data tree` limit; CumulusCI handles complex graphs |
| Partial/full sandbox with realistic data | CumulusCI capture_dataset + Snowfakery | Extracts and anonymizes real data or generates synthetic records |
| Quick scratch org with pre-seeded data (repeated use) | Scratch Org Snapshot | Eliminates repeated seeding cost; consumes snapshot allocation |
| Anonymized production data for staging | CumulusCI capture_dataset with field masking | Preserves data shape, removes PII per GDPR/CCPA requirements |
---
## Recommended Workflow
Step-by-step instructions for an AI agent or practitioner working on this task:
1. Identify target org type (scratch org, developer sandbox, partial sandbox) to determine the appropriate seeding layer.
2. For unit tests: create a Test Data Factory class with static helper methods; use `@testSetup` in test classes; never mix with `@isTest(SeeAllData=true)`.
3. For scratch org CI: export source JSON files per object, define parent-child references using `@sf_reference_id`, create a plan.json, run `sf data import tree --plan`.
4. For sandboxes with volume: use CumulusCI `capture_dataset` for production-sourced anonymized data, or Snowfakery YAML recipes for fully synthetic data.
5. Validate the seeded data by running a SOQL spot-check query after each import to confirm record counts and relationship integrity.
6. For repeated CI use: evaluate Scratch Org Snapshots to pre-bake seeded state and reduce pipeline time.
---
## Review Checklist
Run through these before marking work in this area complete:
- [ ] `@testSetup` methods exist in test classes that need shared data — no `SeeAllData=true` mixing
- [ ] `sf data import tree` plan has parent objects listed before child objects in the plan array
- [ ] All `@sf_reference_id` values in child JSON files match the `referenceId` values in parent JSON files
- [ ] Total plan JSON size is under 200MB
- [ ] CumulusCI dataset captures PII-sensitive fields with masking configured
- [ ] Scratch Org Snapshot consumption has been accounted for against Dev Hub limits
---
## Salesforce-Specific Gotchas
Non-obvious platform behaviors that cause real production problems:
1. **`@isTest(SeeAllData=true)` silently disables `@testSetup`** — If both annotations appear on the same class, the `@testSetup` method is ignored without an error. Tests appear to pass locally (using production data) but fail in CI scratch orgs where production data is absent.
2. **`sf data import tree` order determines ID resolution** — Child records that reference a parent using `@sf_reference_id` must appear in a later plan step than the parent. If a child step has `resolveRefs: true` but the parent step has not yet run, the import fails with a reference resolution error.
3. **Scratch Org Snapshots consume allocations even if the snapshot is inactive** — Snapshots count against the 25-snapshot Dev Hub limit from the moment of creation until explicitly deleted. Teams that create snapshots per-branch in CI exhaust allocations quickly without a cleanup policy.
---
## Output Artifacts
| Artifact | Description |
|---|---|
| Test Data Factory class | Reusable Apex class with static helper methods for creating test records |
| sf data import tree plan | plan.json + source JSON files defining the record hierarchy for CLI import |
| CumulusCI dataset | Captured and anonymized dataset files loadable via CumulusCI `load_dataset` |
| Snowfakery recipe | YAML file defining synthetic record generation logic for large-volume seeding |
---
## Related Skills
- scratch-org-management — lifecycle management of scratch orgs that consume the seeded data
- continuous-integration-testing — CI pipeline integration that runs after data seeding completes
- sandbox-refresh-and-templates — sandbox refresh automation that may trigger data seeding post-copyRelated Skills
sandbox-data-masking
Use this skill when configuring or reviewing Salesforce Data Mask to protect PII/PHI in partial or full copy sandboxes after a refresh. Trigger keywords: data mask, sandbox masking, PII in sandbox, GDPR sandbox, HIPAA non-production, mask contacts, obfuscate fields non-production. NOT for sandbox refresh mechanics (use sandbox-refresh-and-templates), NOT for production data anonymization, NOT for Shield Platform Encryption at rest.
gdpr-data-privacy
Use this skill when implementing GDPR or CCPA data privacy controls in Salesforce: Individual sObject linkage, consent tracking, Right to Be Forgotten (RTBF) requests, data subject request handling, and Privacy Center configuration. Trigger keywords: GDPR, data privacy, consent management, right to erasure, Individual object, ContactPointConsent, ShouldForget, data subject request, Privacy Center, data portability. NOT for general data quality cleanup, duplicate management, field-level encryption (see platform-encryption skill), or sandbox data masking (see sandbox-data-masking skill).
data-classification-labels
Classify Salesforce fields by data sensitivity and compliance category using the four built-in classification attributes (SecurityClassification, ComplianceGroup, BusinessOwnerId, BusinessStatus). Covers Metadata API deployment, Tooling API querying, and Einstein Data Detect recommendations. NOT for data masking, Shield Platform Encryption, or runtime access control enforcement.
customer-data-request-workflow
Implement GDPR/CCPA data subject rights (access, deletion, rectification) using Salesforce Privacy Center and/or custom workflow. NOT for general backup or org-level data retention policy.
omnistudio-testing-patterns
Use when testing or validating OmniStudio components — OmniScript preview, Integration Procedure step debugging, DataRaptor field-mapping validation, and end-to-end UTAM-based automation. NOT for Apex unit testing or standard Flow debugging.
omnistudio-deployment-datapacks
Use when exporting, importing, or version-controlling OmniStudio components using DataPacks via the OmniStudio DataPacks tool or vlocity CLI. Covers DataPack export/import, Git version control integration, CI/CD for OmniStudio. NOT for SFDX-based metadata deployment of non-OmniStudio components.
omnistudio-asynchronous-data-operations
Use Integration Procedures queues, DataRaptor Chain, and Remote Actions with async patterns for long-running OmniStudio flows. NOT for simple DataRaptor reads.
dataraptor-transform-optimization
Use when DataRaptor Transform operations are slow, hit governor limits, or use Apex where formula fields would suffice. Covers formula vs Apex expressions, bulk transform sizing, and chained transform composition. Triggers: 'dataraptor transform slow', 'dataraptor formula vs apex', 'dataraptor bulk transform', 'dr governor limit'. NOT for DataRaptor Extract or Load performance.
dataraptor-patterns
Use when designing or reviewing OmniStudio DataRaptors, especially Extract versus Turbo Extract versus Transform versus Load, field mapping strategy, performance tradeoffs, and when to move work into Integration Procedures or Apex. Triggers: 'DataRaptor Extract', 'Turbo Extract', 'DataRaptor Load', 'DataRaptor Transform', 'OmniStudio data mapping'. NOT for overall OmniScript journey design or Integration Procedure sequencing when the main question is not the DataRaptor shape itself.
lwc-testing
Use when setting up or reviewing Lightning Web Component unit tests with Jest, including `@salesforce/sfdx-lwc-jest`, wire adapter mocks, imperative Apex mocks, async rerender handling, and accessibility smoke checks. Triggers: 'how do I test @wire in LWC', 'Jest test is flaky', 'mock Apex in LWC test', 'flushPromises pattern'. NOT for Apex unit tests, browser end-to-end automation, or performance testing.
lwc-jest-testing-with-accessibility
Use when authoring or reviewing Jest unit tests for Lightning Web Components and the test plan must include explicit accessibility assertions — covers `@salesforce/sfdx-lwc-jest` setup, `createElement` / `document.body.appendChild` render harness, wire-service mocks via `@salesforce/wire-service-jest-util`, imperative Apex mocks via `jest.fn()`, simulated user interactions (`click`, `keydown`, `focus`), ARIA attribute and accessible-name assertions, focus-management tests, keyboard-navigation tests, and optional `axe-core` integration via `jest-axe`. Triggers: 'add a11y assertions to my LWC jest tests', 'how do I test focus management in LWC', 'jest test for keyboard navigation', 'integrate axe-core into sfdx-lwc-jest', 'assert ARIA attributes after interaction', 'how do I prove the LWC is accessible in CI'. NOT for general LWC jest setup without an a11y angle (use lwc/lwc-testing — this skill is the accessibility-deep-dive sibling), NOT for accessibility-pattern authoring inside the component itself (use lwc/lwc-accessibility-patterns), NOT for end-to-end UI automation via UTAM, NOT for manual screen-reader QA workflows.
lwc-datatable-advanced
Advanced lightning-datatable patterns — inline edit + draftValues, custom cell types via extending LightningDatatable, sortable columns, infinite scroll with onloadmore, row-level errors, and the cost of large data sets. NOT for read-only display of small lists (plain lightning-datatable suffices) or fully custom grids (use a third-party library).