error-handling-in-integrations
Use this skill to design orchestration-layer error handling for Salesforce integrations — covering Platform Event replay recovery, dead-letter queue routing, cross-channel error notification patterns, circuit breaker design, and trigger suspension recovery. Trigger keywords: integration error handling, Platform Event retry, integration dead letter queue, EventBus RetryableException, integration circuit breaker, event bus trigger suspended. NOT for Apex exception handling (use apex-exception-handling skill), HTTP error response contracts (use api-error-handling-design), or retry backoff patterns (use retry-and-backoff-patterns).
Best use case
error-handling-in-integrations is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Use this skill to design orchestration-layer error handling for Salesforce integrations — covering Platform Event replay recovery, dead-letter queue routing, cross-channel error notification patterns, circuit breaker design, and trigger suspension recovery. Trigger keywords: integration error handling, Platform Event retry, integration dead letter queue, EventBus RetryableException, integration circuit breaker, event bus trigger suspended. NOT for Apex exception handling (use apex-exception-handling skill), HTTP error response contracts (use api-error-handling-design), or retry backoff patterns (use retry-and-backoff-patterns).
Teams using error-handling-in-integrations should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/error-handling-in-integrations/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How error-handling-in-integrations Compares
| Feature / Agent | error-handling-in-integrations | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Use this skill to design orchestration-layer error handling for Salesforce integrations — covering Platform Event replay recovery, dead-letter queue routing, cross-channel error notification patterns, circuit breaker design, and trigger suspension recovery. Trigger keywords: integration error handling, Platform Event retry, integration dead letter queue, EventBus RetryableException, integration circuit breaker, event bus trigger suspended. NOT for Apex exception handling (use apex-exception-handling skill), HTTP error response contracts (use api-error-handling-design), or retry backoff patterns (use retry-and-backoff-patterns).
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Error Handling in Integrations
This skill activates when a developer or integration architect needs to design orchestration-layer error handling for Salesforce integrations. It covers Platform Event trigger suspension recovery, dead-letter queue patterns, circuit breaker design, and cross-channel failure notifications — distinct from single-transaction Apex exception handling and HTTP error response contracts.
---
## Before Starting
Gather this context before working on anything in this domain:
- Platform Events retain messages for 72 hours (replay window). The stable deduplication key is the event message ID — not the Replay ID, which can be corrupted after Salesforce maintenance events.
- `EventBus.RetryableException` triggers up to 9 automatic retries before the Platform Event trigger is suspended. A suspended trigger stops processing ALL new events on that channel.
- Existing skills cover related but distinct topics: `integration/retry-and-backoff-patterns` covers HTTP retry backoff; `integration/api-error-handling-design` covers HTTP error response contracts. This skill covers orchestration-layer routing and recovery.
- The most critical mistake: throwing RetryableException for permanent errors (bad data, invalid config) causes 9 retries of a record that will never succeed, then suspends the trigger — blocking all event processing.
---
## Core Concepts
### Platform Event Trigger Suspension and Recovery
Failure escalation path:
1. EventBus.RetryableException thrown → up to 9 automatic retries
2. After 9 failures: trigger suspended — all new events stop processing
3. Re-enable: Setup > Platform Events > [Event] > Subscribe Triggers > Resume
4. Set Replay ID on re-enable to replay missed events from the 72-hour window
Recovery requires: knowing the last successfully processed Replay ID (must be stored by the subscriber), fixing the root cause, then resuming with replay from the correct position.
### Dead-Letter Queue Pattern
Salesforce has no built-in DLQ. Implement explicitly:
- Custom object `Integration_DLQ__c` with: Source_System__c, Event_Type__c, Payload__c (JSON text), Error_Message__c, Retry_Count__c, Status__c (Pending / Failed_Max_Retries / Resolved)
- Scheduled Apex retries Pending DLQ entries periodically
- After max retries: mark as Failed_Max_Retries and trigger ops notification
### Circuit Breaker
For unstable external systems:
- Track consecutive failures in a Custom Setting (failure count + timestamp)
- CLOSED (normal): calls go through
- OPEN (threshold exceeded): skip external call, log OPEN state, notify ops
- HALF_OPEN (after cooldown): attempt one test call; success → CLOSE; failure → OPEN again
### Cross-Channel Notification
- Platform Event `Integration_Error__e` → Flow → Email Alert for standard failures
- Platform Event → Named Credential callout to Slack webhook for high-severity
- Case creation for SLA-impacting failures requiring human resolution
- CRM Analytics or Lightning report on DLQ volume for operations dashboards
---
## Common Patterns
### Pattern: RetryableException for Transient, DLQ for Permanent
```apex
trigger OrderEventTrigger on OrderEvent__e (after insert) {
for (OrderEvent__e event : Trigger.new) {
try {
OrderIntegrationService.processEvent(event);
// Store Replay ID on success
Integration_State__c.getInstance().Last_Replay_Id__c = event.ReplayId;
update Integration_State__c.getInstance();
} catch (OrderIntegrationService.TransientException e) {
// Transient: throw RetryableException for auto-retry
throw new EventBus.RetryableException('Transient: ' + e.getMessage());
} catch (Exception e) {
// Permanent: write to DLQ, do NOT throw RetryableException
insert new Integration_DLQ__c(
Event_Type__c = 'OrderEvent',
Payload__c = JSON.serialize(event),
Error_Message__c = e.getMessage(),
Status__c = 'Pending_Retry'
);
}
}
}
```
---
## Decision Guidance
| Failure Scenario | Recommended Pattern | Reason |
|---|---|---|
| Transient external error (timeout, 503) | RetryableException | Platform auto-retry handles transient failures |
| Permanent data error (invalid payload) | Write to DLQ; no RetryableException | RetryableException on permanent errors suspends trigger |
| External system down > 1 hour | Circuit breaker OPEN + DLQ accumulate | Prevent cascade failures and API limit exhaustion |
| Trigger suspended | Recovery runbook: Replay ID re-enable | 72-hour window enables recovery |
| Silent failures not visible to ops | Cross-channel notification platform event | Ops must know about failures immediately |
---
## Recommended Workflow
1. Identify the integration pattern (Platform Events, REST, CDC, Bulk API) — each has different failure modes.
2. For Platform Event subscribers: implement RetryableException for transient errors only; DLQ for permanent failures.
3. Implement Replay ID tracking: store last successful Replay ID in a Custom Setting on every successful event.
4. Design DLQ object schema and Scheduled Apex retry job with configurable max retry count.
5. Design cross-channel notification: Platform Event for failures → Flow Email + Slack + Case creation based on severity.
6. For unstable external systems: implement circuit breaker using Custom Setting for failure count and circuit state.
7. Document the trigger suspension recovery runbook in team operations documentation.
---
## Review Checklist
- [ ] RetryableException used only for transient errors (not permanent)
- [ ] DLQ pattern implemented for permanent failures
- [ ] Replay ID tracking implemented on every successful Platform Event
- [ ] Trigger suspension recovery runbook documented
- [ ] Cross-channel error notification designed
- [ ] Circuit breaker designed for unstable external systems
- [ ] DLQ retry job with max retry limit and ops alert threshold
---
## Salesforce-Specific Gotchas
Non-obvious platform behaviors that cause real production problems:
1. **Trigger suspension affects ALL events — not just the failing ones** — Suspending a Platform Event trigger blocks all new events on that channel until manually re-enabled. One bad payload can halt all integration processing.
2. **Replay ID is unstable after Salesforce maintenance** — Replay IDs can become stale after maintenance. Store the event message ID for deduplication; use Replay ID only for starting the replay position.
3. **RetryableException on permanent errors suspends the trigger faster** — Throwing RetryableException on a permanent failure wastes all 9 retries on a record that will never succeed, then suspends the trigger. Only use RetryableException for genuinely transient errors.
---
## Output Artifacts
| Artifact | Description |
|---|---|
| DLQ schema and retry job | Integration_DLQ__c design and Scheduled Apex pattern |
| Trigger suspension recovery runbook | Steps to re-enable and replay after trigger suspension |
| Circuit breaker design | Custom Setting schema and state-transition logic |
| Cross-channel notification design | Error event → notification channel mapping |
---
## Related Skills
- `integration/retry-and-backoff-patterns` — HTTP retry backoff for external API calls
- `integration/api-error-handling-design` — HTTP error response contracts
- `integration/event-driven-architecture-patterns` — Platform Event architecture
- `admin/integration-pattern-selection` — upstream pattern selectionRelated Skills
omnistudio-error-handling-patterns
Use when designing fault behavior across Integration Procedures, DataRaptors, OmniScripts, and FlexCards — error routing, user-facing messaging, retry semantics, and idempotency. Triggers: 'omnistudio error', 'integration procedure fault', 'dataraptor error handling', 'omniscript retry', 'flexcard action failure'. NOT for general Apex exception design or Flow fault paths.
lwc-error-boundaries
Isolate component errors so one failure does not blank an entire page using errorCallback and graceful fallbacks. NOT for server-side Apex exception design.
lightning-navigation-dead-link-handling
Use when an LWC navigates via NavigationMixin to records or pages that may no longer exist, lack the user's access, or be permanently moved. Triggers: 'lightning navigation 404', 'navigate to deleted record', 'NavigationMixin error toast', 'graceful fallback when target page missing', 'permission denied on navigation'. NOT for general routing within an SPA or for Experience Cloud public-facing routing.
common-lwc-runtime-errors
Diagnose and fix runtime errors in Lightning Web Components including wire adapter failures, shadow DOM boundary violations, event propagation mistakes, async rendering timing bugs, NavigationMixin errors, Lightning Locker vs Lightning Web Security conflicts, and slot projection problems. Triggers: 'wire adapter returns undefined', 'querySelector returns null in LWC', 'custom event not received by parent', 'LWC component not rendering after connected callback', 'NavigationMixin page reference error'. NOT for LWC fundamentals, build/deployment errors, or Aura component debugging.
api-error-handling-design
Designing HTTP error classification, RFC 7807-style error payload structure, and client-side error parsing for Salesforce REST/SOAP integrations and custom Apex REST endpoints. Use when deciding which HTTP status codes to return from custom Apex REST services, how to structure error response bodies, how to classify inbound API errors as retry-safe vs non-retry-safe, or how to parse Salesforce error responses on the consumer side. NOT for retry execution mechanics or circuit breaker implementation (use retry-and-backoff-patterns). NOT for Apex exception class design (use apex-error-handling-framework). NOT for OAuth error flows (use oauth-flows-and-connected-apps).
flow-runtime-error-diagnosis
Use when a Salesforce Flow throws a runtime error, sends an unhandled fault email, or produces unexpected results in production or sandbox. Triggers: 'Flow error email', 'Flow failed at element', 'null reference in Flow', 'Flow SOQL limit error', 'Flow DML in loop error'. NOT for Flow design or building new flows (use record-triggered-flow-patterns or other flow/* skills), NOT for Flow debug log setup (use flow-debugging).
flow-error-monitoring
Set up monitoring + alerting for Flow runtime errors at org scale: routing fault emails, Flow runtime error reports, custom centralized logging (Integration_Log__c), escalation thresholds, and trend detection. NOT for diagnosing a specific flow error (use flow-runtime-error-diagnosis). NOT for debug-mode setup (use flow-debugging).
fault-handling
Use when designing, reviewing, or troubleshooting Salesforce Flow fault handling, error logging, and bulk-safe automation paths. Triggers: 'fault connector', '$Flow.FaultMessage', 'flow failed', 'record-triggered flow rollback', 'screen flow error'. NOT for generic Flow type selection unless the main risk is failure handling; NOT for Apex exception handling (see apex/exception-handling-patterns).
deployment-error-troubleshooting
Use when a Salesforce metadata deployment fails and you need to diagnose and fix the error. Trigger keywords: 'deployment failed', 'component failure', 'dependent class is invalid', 'code coverage failed', 'UNSUPPORTED_API_VERSION', 'deploy error', 'test failure blocking deploy', 'rollbackOnError', 'missing dependency deploy'. NOT for authoring destructive changes manifests (use destructive-changes-deployment). NOT for CI/CD pipeline setup (use github-actions-for-salesforce or gitlab-ci-for-salesforce). NOT for change set mechanics (use change-set-deployment).
deployment-error-diagnosis
Pattern catalog of common Salesforce metadata-deploy errors and their fixes — `Cannot change type` (field type already in use), dependent-metadata ordering (deploy field before profile that references it), profile / permission set delta issues (deactivated permissions blocking deploy), missing-reference errors, test class coverage failures, and the package.xml-shape mistakes that produce confusing first-line errors. Covers the SFDX / Metadata API error message shapes and how to translate them into the actual fix. NOT for designing the deployment pipeline (use devops/sfdx-cicd-pipeline), NOT for change set orchestration (use admin/changeset-builder).
exception-handling
Use when writing, reviewing, or debugging Apex exception handling, DmlException behavior, custom exception hierarchies, or user-safe error messages. Triggers: 'DmlException', 'swallowed exception', 'AuraHandledException', 'trigger rollback', 'try catch'. NOT for choosing async execution models or general governor-limit tuning.
error-handling-framework
Use when designing or implementing a cross-cutting Apex error handling framework: custom exception hierarchies, rollback-safe logging via Platform Events, BatchApexErrorEvent processing, correlation ID threading, or a unified catch/log/rethrow utility class. Trigger keywords: 'error framework', 'centralized logging', 'rollback-safe log', 'BatchApexErrorEvent', 'correlation ID async', 'AuraHandledException boundary', 'Error_Log__c design'. NOT for individual try/catch block syntax help, basic DmlException handling, or choosing between synchronous and asynchronous execution models.