service-data-archival

Use this skill when Service Cloud orgs are consuming excessive data or file storage due to Case-related records, or when compliance requirements demand structured retention and deletion of case history. Trigger keywords: EmailMessage bloat, Email-to-Case storage, ContentDocument archival, case attachment cleanup, compliance retention policy, service storage optimization. NOT for generic data archival across non-Service objects — use data-archival-strategies instead. NOT for CPQ, Sales Cloud, or FSL record archival. NOT for purging custom object data unrelated to Cases.

8 stars

byPranavNagrecha

View on GitHub Installation ↓

Best use case

service-data-archival is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using service-data-archival should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/service-data-archival/SKILL.md --create-dirs "https://raw.githubusercontent.com/PranavNagrecha/AwesomeSalesforceSkills/main/skills/data/service-data-archival/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/service-data-archival/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How service-data-archival Compares

Feature / Agent	service-data-archival	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Service Data Archival

This skill activates when a Service Cloud org needs to reduce storage consumption or meet data retention obligations by archiving or deleting Case-related records — specifically EmailMessage, ContentDocument, and ContentDocumentLink records that accumulate from Email-to-Case workflows. It guides practitioners through the three dependent deletion sequences required to avoid orphaned records, storage leaks, and compliance violations.

---

## Before Starting

Gather this context before working on anything in this domain:

- Confirm current data storage GB used and file storage GB used separately from Setup > Company Information > Storage Usage. EmailMessage records count against **data storage**; ContentDocument/ContentVersion records count against **file storage**. These are two separate storage pools with separate limits, and they require separate archival pipelines.
- Confirm whether any Cases or related records are under legal hold. Legal-hold records **cannot be bulk-deleted even via the API** — the platform enforces a hard block. Attempting to delete them in bulk without a hold exemption list will cause partial job failures that are difficult to recover from.
- Understand the cascade behavior gap: deleting a Case does **not** automatically delete its EmailMessage children or ContentDocumentLink junction records. Storage does not reclaim until all three object layers are explicitly deleted in the correct dependency order.

---

## Core Concepts

### EmailMessage as the Primary Data Storage Driver

In Service Cloud orgs that use Email-to-Case, `EmailMessage` records are typically the largest data storage consumer — often larger than the Case records themselves. Each inbound and outbound email thread creates one or more `EmailMessage` records stored as sObjects against data storage quota. Unlike file attachments, email bodies are stored as long text fields on the `EmailMessage` object, not as ContentDocument blobs. This means they consume data storage (the record-based quota), not file storage. Practitioners who look only at file storage GB when diagnosing storage overruns will miss the EmailMessage problem entirely.

`EmailMessage` records have a parent `CaseId` lookup. When a Case is deleted, the related `EmailMessage` records are **not automatically deleted** in all configurations — behavior depends on whether the delete operation cascades through child relationships. Do not assume cascade deletion will reclaim EmailMessage storage. Explicitly query and delete EmailMessage records by CaseId before or after Case deletion, and confirm row counts before and after.

### ContentDocument and ContentVersion — The Separate File Pipeline

File attachments on Cases (uploaded via the Chatter Files component or email attachments processed by Email-to-Case) are stored as `ContentDocument` (the file container) and `ContentVersion` (the versioned blob). The linkage between a Case and its files is managed by `ContentDocumentLink`, a junction object that associates a `LinkedEntityId` (the Case Id) with a `ContentDocumentId`.

Critically, a `ContentDocument` can be linked to **multiple entities** simultaneously. Deleting a `ContentDocumentLink` removes the association but does **not** delete the underlying `ContentDocument` or reclaim file storage. The `ContentDocument` is only deleted — and file storage is only reclaimed — when the `ContentDocument` record itself is deleted and no other `ContentDocumentLink` references remain. Any archival process that only deletes Cases and their `ContentDocumentLink` records will leave `ContentDocument` records as orphans: unlinked files that still consume file storage indefinitely.

Before deleting any `ContentDocument`, confirm it has no other `ContentDocumentLink` associations (e.g., linked to Accounts, Contacts, or Knowledge articles) to avoid destroying shared files.

### The Three Deletion Sequences and Dependency Order

Compliance-safe archival of Case data requires coordinating three deletion sequences in strict dependency order:

1. **Export and archive** — Before any deletion, export Case data, EmailMessage records, and ContentDocument/ContentVersion blobs to an external store (Big Objects, external database, or file export). This is the irreversible step. Do not skip this for records that fall within retention windows.
2. **ContentDocumentLink deletion** — Remove junction records linking Cases to ContentDocuments. This unlinks files from Cases without yet deleting the files.
3. **ContentDocument/ContentVersion deletion** — Delete ContentDocument records that are now fully unlinked (no remaining ContentDocumentLink references). This reclaims file storage.
4. **EmailMessage deletion** — Delete EmailMessage records by CaseId. This reclaims data storage.
5. **Case deletion** — Delete the Case records. Note: in some configurations, Case deletion may cascade-delete EmailMessage children. If it does, the EmailMessage step above can be skipped, but verify this in a sandbox first. Never assume cascade behavior.

Legal-hold exemptions must be filtered out at each sequence step via an exclusion list of record IDs or a custom `Legal_Hold__c` field checked in SOQL WHERE clauses.

### Salesforce Archive Feature

Salesforce offers a **Legacy Salesforce Archive** feature (available in some editions) that can move records to archive storage with reduced cost. This feature does not replace the deletion sequences above — archived records still occupy storage until they are purged from the archive tier. Retention policies can be configured through the Archive feature to automate record movement, but the feature requires explicit enablement and configuration; it is not on by default. Evaluate the Archive feature as a complement to bulk deletion, not a substitute.

---

## Common Patterns

### Pattern: Phased Compliance Archival with Export-Before-Delete

**When to use:** Org has a formal data retention policy (e.g., Cases closed more than 7 years ago must be deleted; Cases closed within 7 years must be retained). Legal team requires an audit trail of what was deleted and when.

**How it works:**
1. Query Cases meeting the retention threshold: `SELECT Id FROM Case WHERE IsClosed = true AND ClosedDate < LAST_N_YEARS:7 AND Legal_Hold__c = false`.
2. For each Case batch, query and export EmailMessage records, ContentDocumentLink records, and ContentDocument/ContentVersion records to an external store.
3. Delete ContentDocumentLink records for the target Cases (only where ContentDocument has no other links).
4. Delete now-orphaned ContentDocument records.
5. Delete EmailMessage records by CaseId.
6. Delete Case records.
7. Run a post-deletion storage report (Setup > Storage Usage) and compare to the pre-deletion baseline.

**Why not the alternative:** Deleting Cases directly without the preparatory steps leaves ContentDocument orphans consuming file storage silently, and may leave EmailMessage records depending on cascade behavior. There is no bulk undo — once deleted, records are gone from the recycle bin after 15 days.

### Pattern: EmailMessage-Only Purge for Storage Relief

**When to use:** Data storage is near the limit due to Email-to-Case volume but there is no compliance requirement to delete Cases. The org wants to reduce storage cost without touching Case records.

**How it works:**
1. Query EmailMessage records older than the retention threshold: `SELECT Id, CaseId FROM EmailMessage WHERE CreatedDate < LAST_N_YEARS:2 AND Incoming = true`.
2. Filter out records associated with open Cases or Cases under legal hold.
3. Export the EmailMessage bodies to an external store or Big Object if archival is required.
4. Delete EmailMessage records via Bulk API 2.0 in batches of up to 10,000 records per job.

**Why not the alternative:** Deleting Cases to eliminate EmailMessage storage is disproportionate — it removes the case history, service metrics, and SLA records. An EmailMessage-only purge reduces storage without losing the Case audit trail.

---

## Decision Guidance

| Situation | Recommended Approach | Reason |
|---|---|---|
| Storage overrun is primarily data storage (EmailMessage bloat) | EmailMessage-only purge via Bulk API 2.0 | Targeted; preserves Case history; fastest storage reclaim |
| Storage overrun is primarily file storage (ContentDocument) | ContentDocument orphan cleanup after verifying no other links | File storage reclaim requires ContentDocument deletion, not just ContentDocumentLink removal |
| Compliance-driven deletion of Cases and all child records | Phased export-then-delete in dependency order: export → ContentDocumentLink → ContentDocument → EmailMessage → Case | Avoids orphaned records; provides audit trail; satisfies legal hold requirements |
| Records under legal hold mixed into archival batch | Filter legal-hold records from all SOQL queries via exclusion list or custom field | Platform blocks deletion of legal-hold records; partial failures corrupt Bulk API jobs |
| Org uses Salesforce Archive feature | Configure retention policies in Archive setup; complement with deletion of purge-eligible archive records | Archive moves records to lower-cost tier; explicit purge still required to reclaim storage |

---

## Recommended Workflow

Step-by-step instructions for an AI agent or practitioner working on this task:

1. **Audit current storage usage by object type** — Run Setup > Storage Usage and note data storage GB and file storage GB separately. Use SOQL aggregate queries (`SELECT COUNT(Id), SUM(BodyLength) FROM EmailMessage` and `SELECT COUNT(Id) FROM ContentDocument`) to quantify record counts by object. Identify the primary storage driver before prescribing a solution.
2. **Define retention policies per record type** — Work with the legal and compliance team to produce a retention matrix: minimum retention period per object (Case, EmailMessage, ContentDocument), legal-hold criteria, and export-before-delete requirements. Document which Cases are in-scope for archival.
3. **Export and archive ContentDocument/ContentVersion and EmailMessage records before deletion** — For all in-scope records, export blobs and metadata to an external store (e.g., S3, Big Objects, or a compliance archive). Confirm exports are complete and checksummed before proceeding to deletion. This step is irreversible once deletion completes.
4. **Execute deletion in dependency order using Bulk API 2.0** — Delete in this sequence: (a) ContentDocumentLink records for in-scope Cases where ContentDocument has no other links, (b) now-orphaned ContentDocument records, (c) EmailMessage records by CaseId, (d) Case records. Each step must exclude legal-hold records. Use Bulk API 2.0 for volumes above 200 records to avoid hitting DML governor limits.
5. **Validate with storage report and orphan scan** — After deletion, re-run Setup > Storage Usage and compare to the pre-deletion baseline to confirm storage reclaim. Run the `check_service_data_archival.py` script to scan for orphaned ContentDocumentLink records (links pointing to deleted Cases), orphaned ContentDocument records (no remaining links), and any remaining EmailMessage records for deleted Cases.

---

## Review Checklist

Run through these before marking work in this area complete:

- [ ] Data storage GB and file storage GB baseline documented before any deletion
- [ ] Retention policy matrix confirmed with legal/compliance team and documented
- [ ] Legal-hold record IDs identified and excluded from all SOQL queries
- [ ] Export-before-delete confirmed for all records within retention windows
- [ ] Deletion executed in dependency order: ContentDocumentLink → ContentDocument → EmailMessage → Case
- [ ] Post-deletion storage report run and delta confirmed
- [ ] Orphaned ContentDocumentLink and ContentDocument records scanned with checker script
- [ ] Recycle bin monitored for 15 days before treating deletion as final

---

## Salesforce-Specific Gotchas

Non-obvious platform behaviors that cause real production problems:

1. **EmailMessage deletion does not cascade to ContentDocumentLink** — Deleting an EmailMessage that had attachments does not delete the ContentDocumentLink records or ContentDocument records created when the attachment was processed. These remain as orphans and continue to consume file storage. Always run a ContentDocumentLink query after EmailMessage deletion to catch stranded links.
2. **Archived Cases still consume storage if attachments are not separately archived** — If you use the Salesforce Archive feature to move Cases off the active tier, the ContentDocument records linked to those Cases remain in the active file storage pool unless the Archive is explicitly configured to include file objects. Many implementations assume Archive = storage reclaim, but file storage is unaffected until ContentDocument records are explicitly deleted or moved.
3. **Legal-hold records cannot be bulk-deleted even via the API** — The platform enforces a hard deletion block on records flagged as under legal hold. Including even one legal-hold record in a Bulk API 2.0 delete job causes the entire batch containing that record to fail with a non-retriable error. Pre-filter legal-hold records at the SOQL level, not in post-processing, to prevent cascading batch failures across large jobs.

---

## Output Artifacts

| Artifact | Description |
|---|---|
| Archival strategy document | Scoped to Case/EmailMessage/ContentDocument objects; includes retention matrix, legal-hold list, and estimated storage reclaim |
| Deletion sequence script | SOQL queries and Bulk API 2.0 job definitions in dependency order; parameterized by retention date threshold |
| Compliance checklist | Pre-deletion export confirmations, legal-hold exclusions, post-deletion storage delta, orphan scan results |

---

## Related Skills

- `data-archival-strategies` — use for generic multi-object archival strategy across non-Service objects, Big Object design, and org-wide storage governance
- `data-storage-management` — use for diagnosing storage limits, understanding storage allocation, and storage alert response
- `case-history-migration` — use when migrating Case history across orgs, not for in-place archival or deletion

Related Skills

service-account-credential-rotation

from PranavNagrecha/AwesomeSalesforceSkills

Use when designing credential rotation for integration users, connected apps, named credentials, and OAuth client secrets in Salesforce. Covers rotation cadence, zero-downtime handover, secret storage, and detection of stale credentials. Triggers: 'rotate integration user password', 'connected app secret rotation', 'named credential rotation', 'stale service account', 'zero downtime secret rotation'. NOT for end-user password policies.

sandbox-data-masking

from PranavNagrecha/AwesomeSalesforceSkills

Use this skill when configuring or reviewing Salesforce Data Mask to protect PII/PHI in partial or full copy sandboxes after a refresh. Trigger keywords: data mask, sandbox masking, PII in sandbox, GDPR sandbox, HIPAA non-production, mask contacts, obfuscate fields non-production. NOT for sandbox refresh mechanics (use sandbox-refresh-and-templates), NOT for production data anonymization, NOT for Shield Platform Encryption at rest.

gdpr-data-privacy

from PranavNagrecha/AwesomeSalesforceSkills

Use this skill when implementing GDPR or CCPA data privacy controls in Salesforce: Individual sObject linkage, consent tracking, Right to Be Forgotten (RTBF) requests, data subject request handling, and Privacy Center configuration. Trigger keywords: GDPR, data privacy, consent management, right to erasure, Individual object, ContactPointConsent, ShouldForget, data subject request, Privacy Center, data portability. NOT for general data quality cleanup, duplicate management, field-level encryption (see platform-encryption skill), or sandbox data masking (see sandbox-data-masking skill).

data-classification-labels

from PranavNagrecha/AwesomeSalesforceSkills

Classify Salesforce fields by data sensitivity and compliance category using the four built-in classification attributes (SecurityClassification, ComplianceGroup, BusinessOwnerId, BusinessStatus). Covers Metadata API deployment, Tooling API querying, and Einstein Data Detect recommendations. NOT for data masking, Shield Platform Encryption, or runtime access control enforcement.

customer-data-request-workflow

from PranavNagrecha/AwesomeSalesforceSkills

Implement GDPR/CCPA data subject rights (access, deletion, rectification) using Salesforce Privacy Center and/or custom workflow. NOT for general backup or org-level data retention policy.

omnistudio-deployment-datapacks

from PranavNagrecha/AwesomeSalesforceSkills

Use when exporting, importing, or version-controlling OmniStudio components using DataPacks via the OmniStudio DataPacks tool or vlocity CLI. Covers DataPack export/import, Git version control integration, CI/CD for OmniStudio. NOT for SFDX-based metadata deployment of non-OmniStudio components.

omnistudio-asynchronous-data-operations

from PranavNagrecha/AwesomeSalesforceSkills

Use Integration Procedures queues, DataRaptor Chain, and Remote Actions with async patterns for long-running OmniStudio flows. NOT for simple DataRaptor reads.

dataraptor-transform-optimization

from PranavNagrecha/AwesomeSalesforceSkills

Use when DataRaptor Transform operations are slow, hit governor limits, or use Apex where formula fields would suffice. Covers formula vs Apex expressions, bulk transform sizing, and chained transform composition. Triggers: 'dataraptor transform slow', 'dataraptor formula vs apex', 'dataraptor bulk transform', 'dr governor limit'. NOT for DataRaptor Extract or Load performance.

dataraptor-patterns

from PranavNagrecha/AwesomeSalesforceSkills

Use when designing or reviewing OmniStudio DataRaptors, especially Extract versus Turbo Extract versus Transform versus Load, field mapping strategy, performance tradeoffs, and when to move work into Integration Procedures or Apex. Triggers: 'DataRaptor Extract', 'Turbo Extract', 'DataRaptor Load', 'DataRaptor Transform', 'OmniStudio data mapping'. NOT for overall OmniScript journey design or Integration Procedure sequencing when the main question is not the DataRaptor shape itself.

wire-service-patterns

from PranavNagrecha/AwesomeSalesforceSkills

Use when designing or reviewing Lightning Web Components that use `@wire`, Lightning Data Service, UI API, or the GraphQL wire adapter, especially for reactive parameters, cache behavior, and refresh strategy. Triggers: 'wire service', 'refreshApex', 'reactive parameter', 'getRecord', 'wire vs imperative Apex'. NOT for component communication or generic lifecycle issues when data provisioning is not the main concern.

lwc-datatable-advanced

from PranavNagrecha/AwesomeSalesforceSkills

Advanced lightning-datatable patterns — inline edit + draftValues, custom cell types via extending LightningDatatable, sortable columns, infinite scroll with onloadmore, row-level errors, and the cost of large data sets. NOT for read-only display of small lists (plain lightning-datatable suffices) or fully custom grids (use a third-party library).

lwc-data-table

from PranavNagrecha/AwesomeSalesforceSkills

Use when designing or reviewing `lightning-datatable` usage in Lightning Web Components, including column configuration, stable `key-field` values, inline editing, row actions, infinite loading, and custom cell types. Triggers: 'lightning datatable inline edit', 'row actions in lwc datatable', 'key field missing', 'infinite loading in datatable'. NOT for highly custom virtualized grids or broad page-performance work outside the datatable boundary.