sandbox-refresh-data-strategies

Use this skill when designing data management strategies for sandbox refreshes: deciding which reference data must be re-seeded after every refresh, using Salesforce's native Data Seeding feature, writing SandboxPostCopy implementations that hand off to Queueable jobs for large data loads, and cleaning up stale data between sprints. Trigger keywords: seed data after sandbox refresh, SandboxPostCopy data strategy, native data seeding Salesforce, reference data re-seed after refresh, post-refresh data cleanup. NOT for sandbox administration setup, sandbox type selection (use sandbox-strategy), sandbox refresh mechanics and PII masking (use sandbox-refresh-and-templates), or scratch org data seeding for CI (use data-seeding-for-testing).

8 stars

byPranavNagrecha

View on GitHub Installation ↓

Best use case

sandbox-refresh-data-strategies is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using sandbox-refresh-data-strategies should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/sandbox-refresh-data-strategies/SKILL.md --create-dirs "https://raw.githubusercontent.com/PranavNagrecha/AwesomeSalesforceSkills/main/skills/data/sandbox-refresh-data-strategies/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/sandbox-refresh-data-strategies/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How sandbox-refresh-data-strategies Compares

Feature / Agent	sandbox-refresh-data-strategies	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Sandbox Refresh Data Strategies

Use this skill when designing the data management layer for sandbox refreshes — specifically the strategy for re-seeding reference data after a refresh, using Salesforce's native Data Seeding feature, and implementing SandboxPostCopy for automated post-refresh data population. Distinct from sandbox-refresh-and-templates (which covers refresh mechanics and PII masking) and data-seeding-for-testing (which covers scratch org and CI seeding).

---

## Before Starting

Gather this context before working on anything in this domain:

- What sandbox type is being refreshed? Developer, Developer Pro, Partial, or Full? Full Copy sandboxes include production data; others typically do not.
- What reference data (picklist-driving custom objects, configuration tables, territory hierarchies, product catalogs) must exist for the application to function after refresh?
- How large is the seed dataset per object? SandboxPostCopy cannot perform large DML directly — large seeding must be delegated to a Queueable.
- Are automations (Flows, Apex triggers, Process Builder) enabled on the seeded objects? They must be disabled during large seeding to avoid unwanted side effects and limit consumption.

---

## Core Concepts

### SandboxPostCopy Interface and Its DML Limitations

The `SandboxPostCopy` Apex interface provides a `runApexClass(SandboxContext context)` method that runs **immediately after sandbox creation or refresh**, as the **Automated Process user**. This user has DML capability but operates under specific constraints:

- SandboxPostCopy runs **synchronously** in the context of the refresh job. Large DML inside the method will time out.
- The Automated Process user has **no profile-based field access restrictions** but is subject to org-wide sharing and trigger/automation execution.
- **Recommended pattern for large data seeding**: The SandboxPostCopy method should only **enqueue a Queueable** (using `System.enqueueJob()`) that performs the actual DML. The Queueable runs asynchronously after the refresh completes.

The canonical anti-pattern is attempting large DML directly inside `runApexClass` — this times out for datasets exceeding a few hundred records per object.

### Native Salesforce Data Seeding Feature

Salesforce provides a native **Data Seeding** feature (accessible in Setup > Data Seeding) that supports three template types:
1. **Nodes**: Seed specific records from selected objects individually
2. **Levels**: Seed records up to a defined relationship depth (parent → child hierarchies)
3. **Generate**: Create synthetic test data matching field-level data types

Key capabilities:
- **Automation-disable during seed**: The feature automatically disables triggers, flows, and validation rules during seeding to prevent unwanted side effects
- **Incremental seeding**: Supports adding records to existing seed datasets without full replacement
- **Record type and picklist validation**: Validates that seeded records match valid record types and picklist values in the target org

**Native Data Seeding does NOT support these object types:**
- Big Objects
- Calculated fields
- Chatter-related objects (FeedItem, FeedComment)
- Files and ContentDocument
- External objects
- AgentWork

For unsupported object types, use SFDMU, Apex scripts, or Data Loader instead.

### Reference Data Classification

Not all data must be re-seeded after every refresh. Reference data falls into categories:

1. **Must seed every refresh**: Configuration tables that control application behavior (Custom Settings if not deployed via script, territory hierarchies, product category trees, approval chain configuration objects)
2. **Seed on initial setup only**: Master data that rarely changes (country/region codes, currency codes)
3. **Never seed**: Transactional data (Cases, Opportunities, Orders) — these are test artifacts that should be generated by test scripts, not pre-seeded

---

## Common Patterns

### Pattern: SandboxPostCopy with Queueable Delegation

**When to use:** Any time more than ~100 records per object need to be seeded post-refresh.

**How it works:**
```apex
global class PostRefreshSeedHandler implements SandboxPostCopy {
global void runApexClass(SandboxContext context) {
// Only enqueue — never do large DML here
if (!Test.isRunningTest()) {
System.enqueueJob(new SeedReferenceDataQueueable());
}
}
}
```

The `SeedReferenceDataQueueable` class chains additional Queueables if needed (Salesforce allows up to 50 chained Queueable jobs). Each Queueable processes one object or a manageable batch.

**Why not DML in SandboxPostCopy directly:** Large DML in the SandboxPostCopy method causes the refresh completion step to time out, resulting in a failed or incomplete seeding with no clear error.

### Pattern: Native Data Seeding Template for Reference Objects

**When to use:** Reference object data is stable enough to be defined once and re-used across all sandbox refreshes without code.

**How it works:**
1. In production or a reference sandbox, navigate to Setup > Data Seeding.
2. Create a seeding template selecting the reference objects and record criteria.
3. Choose template type: Nodes (specific records) or Levels (with children).
4. Assign the template to sandbox environments that need this data after refresh.
5. Configure the seeding to run automatically on sandbox creation/refresh.

**Why use Data Seeding over scripts for stable reference data:** Data Seeding handles automation-disable automatically, supports incremental updates, and requires no Apex deployment. For stable reference data, it is lower maintenance than custom scripts.

---

## Decision Guidance

| Situation | Recommended Approach | Reason |
|---|---|---|
| < 100 records, simple object | SandboxPostCopy with direct DML | Small datasets are safe within the method timeout |
| > 100 records or complex parent-child hierarchy | SandboxPostCopy → Queueable delegation | Direct DML in SandboxPostCopy times out for large datasets |
| Stable reference data, no Apex team needed | Native Data Seeding templates | No code; handles automation-disable; incremental updates |
| ContentDocument or Big Object seeding needed | SFDMU or Data Loader post-refresh step | Not supported by native Data Seeding |
| Stale transactional data cleanup between sprints | Separate cleanup Queueable or Data Loader delete job | Transactional data should not be pre-seeded |

---

## Recommended Workflow

Step-by-step instructions for an AI agent or practitioner working on this task:

1. **Classify reference data** — Identify which objects must be seeded after every refresh (application configuration) vs. objects that should never be pre-seeded (transactional test data).
2. **Assess data volume per object** — Count the records that must be seeded. Objects over ~100 records require Queueable delegation; very large sets may need native Data Seeding or SFDMU.
3. **Check native Data Seeding eligibility** — Confirm the objects are supported by native Data Seeding (not Big Objects, Files, Chatter, external objects).
4. **Design the SandboxPostCopy class** — Write a minimal `runApexClass` that only enqueues the seeding Queueable. All actual DML lives in the Queueable(s).
5. **Build Queueable chain** — Implement `SeedReferenceDataQueueable` that processes one object per execution, chains to the next Queueable for the next object.
6. **Configure automation disable** — For objects with active triggers or flows, add logic to disable automations before seeding (or use native Data Seeding which handles this automatically).
7. **Test the full refresh cycle** — Refresh a Developer sandbox, confirm seeding runs automatically, verify all reference data is present and valid.

---

## Review Checklist

Run through these before marking work in this area complete:

- [ ] SandboxPostCopy only enqueues — no large DML directly in runApexClass
- [ ] Queueable chain handles all large object seeding
- [ ] Native Data Seeding configured for stable reference data if applicable
- [ ] Automation disable/enable logic present for seeded objects
- [ ] Big Objects, Files, ContentDocument not attempted via native Data Seeding
- [ ] Stale transactional data cleanup process documented for sprint transitions
- [ ] Full refresh test completed in Developer sandbox

---

## Salesforce-Specific Gotchas

1. **SandboxPostCopy large DML times out** — Running DML for hundreds or thousands of records directly inside `runApexClass` causes the post-copy step to time out, leaving the sandbox seeded incompletely with no clear failure message. Always delegate large seeding to a Queueable.

2. **Native Data Seeding does not support Big Objects, Files, or Chatter** — These object types silently fail in Data Seeding templates. Use SFDMU or Apex scripts for these types.

3. **SandboxPostCopy runs as Automated Process user** — This user is not a named user and does not appear in audit logs as a standard user. Org-wide sharing rules apply, but profile-based field-level security does not. Be aware that seeded records may have visibility behavior that differs from named user-created records.

4. **Automations fire on seeded records by default** — Without explicit automation disable logic, triggers and flows execute on seeded records. This can cause unwanted data changes, platform event publishing, or email sends. Use native Data Seeding's automation-disable feature or write explicit disable/enable logic.

---

## Output Artifacts

| Artifact | Description |
|---|---|
| Reference data seeding strategy | Classification of objects by seed frequency with rationale |
| SandboxPostCopy class | Minimal class that enqueues the seeding Queueable |
| Seeding Queueable | Queueable chain handling all large-volume reference data seeding |
| Native Data Seeding template plan | Object selection, template type, and automation settings |

---

## Related Skills

- `devops/sandbox-refresh-and-templates` — Use for sandbox refresh mechanics, refresh intervals, and PII masking — the operational layer above data seeding strategy
- `data/deployment-data-dependencies` — Use for managing org-specific ID remapping in data records being deployed
- `data/data-seeding-for-testing` — Use for scratch org and CI pipeline test data seeding

Related Skills

sandbox-data-masking

from PranavNagrecha/AwesomeSalesforceSkills

Use this skill when configuring or reviewing Salesforce Data Mask to protect PII/PHI in partial or full copy sandboxes after a refresh. Trigger keywords: data mask, sandbox masking, PII in sandbox, GDPR sandbox, HIPAA non-production, mask contacts, obfuscate fields non-production. NOT for sandbox refresh mechanics (use sandbox-refresh-and-templates), NOT for production data anonymization, NOT for Shield Platform Encryption at rest.

gdpr-data-privacy

from PranavNagrecha/AwesomeSalesforceSkills

Use this skill when implementing GDPR or CCPA data privacy controls in Salesforce: Individual sObject linkage, consent tracking, Right to Be Forgotten (RTBF) requests, data subject request handling, and Privacy Center configuration. Trigger keywords: GDPR, data privacy, consent management, right to erasure, Individual object, ContactPointConsent, ShouldForget, data subject request, Privacy Center, data portability. NOT for general data quality cleanup, duplicate management, field-level encryption (see platform-encryption skill), or sandbox data masking (see sandbox-data-masking skill).

data-classification-labels

from PranavNagrecha/AwesomeSalesforceSkills

Classify Salesforce fields by data sensitivity and compliance category using the four built-in classification attributes (SecurityClassification, ComplianceGroup, BusinessOwnerId, BusinessStatus). Covers Metadata API deployment, Tooling API querying, and Einstein Data Detect recommendations. NOT for data masking, Shield Platform Encryption, or runtime access control enforcement.

customer-data-request-workflow

from PranavNagrecha/AwesomeSalesforceSkills

Implement GDPR/CCPA data subject rights (access, deletion, rectification) using Salesforce Privacy Center and/or custom workflow. NOT for general backup or org-level data retention policy.

omnistudio-deployment-datapacks

from PranavNagrecha/AwesomeSalesforceSkills

Use when exporting, importing, or version-controlling OmniStudio components using DataPacks via the OmniStudio DataPacks tool or vlocity CLI. Covers DataPack export/import, Git version control integration, CI/CD for OmniStudio. NOT for SFDX-based metadata deployment of non-OmniStudio components.

omnistudio-cache-strategies

from PranavNagrecha/AwesomeSalesforceSkills

Configure caching on DataRaptors and Integration Procedures to cut response times, with cache-bust and freshness guarantees. NOT for platform-level org cache.

omnistudio-asynchronous-data-operations

from PranavNagrecha/AwesomeSalesforceSkills

Use Integration Procedures queues, DataRaptor Chain, and Remote Actions with async patterns for long-running OmniStudio flows. NOT for simple DataRaptor reads.

dataraptor-transform-optimization

from PranavNagrecha/AwesomeSalesforceSkills

Use when DataRaptor Transform operations are slow, hit governor limits, or use Apex where formula fields would suffice. Covers formula vs Apex expressions, bulk transform sizing, and chained transform composition. Triggers: 'dataraptor transform slow', 'dataraptor formula vs apex', 'dataraptor bulk transform', 'dr governor limit'. NOT for DataRaptor Extract or Load performance.

dataraptor-patterns

from PranavNagrecha/AwesomeSalesforceSkills

Use when designing or reviewing OmniStudio DataRaptors, especially Extract versus Turbo Extract versus Transform versus Load, field mapping strategy, performance tradeoffs, and when to move work into Integration Procedures or Apex. Triggers: 'DataRaptor Extract', 'Turbo Extract', 'DataRaptor Load', 'DataRaptor Transform', 'OmniStudio data mapping'. NOT for overall OmniScript journey design or Integration Procedure sequencing when the main question is not the DataRaptor shape itself.

lwc-wire-refresh-patterns

from PranavNagrecha/AwesomeSalesforceSkills

refreshApex, getRecordNotifyChange, and RefreshView API for LWC data refresh: when wired data is stale, forcing re-fetch after imperative DML, cross-component refresh, 2024 RefreshView replacement of getRecordNotifyChange. NOT for wire basics (use lwc-wire-service). NOT for Lightning Data Service writes (use lwc-lds-writes).

lwc-datatable-advanced

from PranavNagrecha/AwesomeSalesforceSkills

Advanced lightning-datatable patterns — inline edit + draftValues, custom cell types via extending LightningDatatable, sortable columns, infinite scroll with onloadmore, row-level errors, and the cost of large data sets. NOT for read-only display of small lists (plain lightning-datatable suffices) or fully custom grids (use a third-party library).

lwc-data-table

from PranavNagrecha/AwesomeSalesforceSkills

Use when designing or reviewing `lightning-datatable` usage in Lightning Web Components, including column configuration, stable `key-field` values, inline editing, row actions, infinite loading, and custom cell types. Triggers: 'lightning datatable inline edit', 'row actions in lwc datatable', 'key field missing', 'infinite loading in datatable'. NOT for highly custom virtualized grids or broad page-performance work outside the datatable boundary.