prompt-injection-defense

Red-team an Agentforce agent against prompt-injection and jailbreak attacks; codify test cases and guardrails. NOT for general application-security reviews outside the agent boundary.

8 stars

byPranavNagrecha

View on GitHub Installation ↓

Best use case

prompt-injection-defense is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Red-team an Agentforce agent against prompt-injection and jailbreak attacks; codify test cases and guardrails. NOT for general application-security reviews outside the agent boundary.

Teams using prompt-injection-defense should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/prompt-injection-defense/SKILL.md --create-dirs "https://raw.githubusercontent.com/PranavNagrecha/AwesomeSalesforceSkills/main/skills/agentforce/prompt-injection-defense/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/prompt-injection-defense/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How prompt-injection-defense Compares

Feature / Agent	prompt-injection-defense	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Red-team an Agentforce agent against prompt-injection and jailbreak attacks; codify test cases and guardrails. NOT for general application-security reviews outside the agent boundary.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

SKILL.md Source

# Prompt Injection Defense

Agentforce uses the Einstein Trust Layer for dynamic grounding, masking, and toxicity filtering — but topic instructions and Invocable action scopes still need explicit hardening. Injection attempts include: instruction override, role-reversal, system-prompt leaks, tool-use coercion, and data exfiltration via crafted record content. This skill builds a reusable adversarial test suite and maps findings to concrete guardrails.

## Adoption Signals

Pre-production review for any Agentforce agent that (a) ingests user-controlled text, (b) has write access via Invocables, or (c) is exposed to external/Experience Cloud users. Required for Service agents, Sales agents with Data Cloud grounding, and any custom channel.

- Required when stakeholders ask whether the agent can be jailbroken — produce a documented adversarial-test pass before exposure.
- Required for any agent that exposes Invocable actions with side effects (DML, callouts, record sharing).

## Recommended Workflow

1. Enumerate the attack surface: every Invocable action, every grounded DMO/sObject, and every conversational input channel.
2. Build the adversarial test set covering the five OWASP LLM-01 families: instruction override, context leakage, tool-use coercion, exfil via output, and role impersonation.
3. Run each test through Agentforce Testing Center; capture verbatim responses and tool invocations into a results matrix.
4. For each failed test, apply one of four mitigations: (a) narrow the action scope via `with sharing` + field-level checks, (b) add an explicit topic instruction, (c) raise Trust Layer toxicity/PII thresholds, (d) remove the dangerous capability.
5. Re-run the suite until all tests pass; commit the suite to `tests/agentforce/<agent>_adversarial.md` so regressions are caught on every agent change.

## Key Considerations

- Topic instructions are concatenated into the system prompt — a long instruction list dilutes priority. Keep hard constraints in the first 200 tokens.
- Trust Layer masking happens pre-LLM; it doesn't prevent tool-use coercion if the action runs as a privileged user.
- Always test with the least-privileged channel user, not an admin clone.
- Data Cloud grounding returns raw DMO content; a malicious record can contain injection payloads. Sanitize DMO text fields at ingestion when feasible.

## Worked Examples (see `references/examples.md`)

- *Instruction-override test case* — A Service agent has an Invocable `RefundOrder` with guardrail 'only refund orders where Status=Delivered'.
- *Data exfiltration via crafted Case.Description* — Agent reads Case.Description via Data Cloud grounding to answer customer questions.

## Common Gotchas (see `references/gotchas.md`)

- **Testing only with English** — Injection passes the English suite but succeeds in Spanish/French.
- **Trust Layer toxicity threshold too low** — Jailbreaks phrased politely pass filters; toxic but benign content is blocked.
- **Over-indexing on topic instructions** — 100-line topic instructions dilute priority and slow every turn.

## Top LLM Anti-Patterns (full list in `references/llm-anti-patterns.md`)

- Relying on Trust Layer alone — it handles toxicity/PII, not business-policy bypass via tool coercion.
- Adding ad-hoc instructions after incidents instead of maintaining a test suite.
- Using a privileged user for agent execution — scope creep becomes a data-exposure vector.

## Official Sources Used

- Agentforce Developer Guide — https://developer.salesforce.com/docs/einstein/genai/guide/agentforce.html
- Einstein Trust Layer — https://help.salesforce.com/s/articleView?id=sf.generative_ai_trust_layer.htm
- Invocable Actions (Apex) — https://developer.salesforce.com/docs/atlas.en-us.apexref.meta/apexref/apex_classes_invocable_action.htm
- Agentforce Testing Center — https://help.salesforce.com/s/articleView?id=sf.agentforce_testing_center.htm

Related Skills

xss-and-injection-prevention

from PranavNagrecha/AwesomeSalesforceSkills

Use when writing or reviewing Visualforce pages, Apex controllers, or LWC components that output user-supplied data, build dynamic queries, or construct HTTP responses. Triggers: 'XSS in Visualforce', 'SOQL injection vulnerability', 'how to encode output in Apex', 'JSENCODE Visualforce', 'open redirect prevention'. NOT for Apex CRUD/FLS enforcement (use soql-security or apex-crud-and-fls), NOT for Shield encryption (use shield-encryption-key-management), NOT for AppExchange security review process (use secure-coding-review-checklist).

environment-specific-value-injection

from PranavNagrecha/AwesomeSalesforceSkills

Use this skill when configuring, reviewing, or troubleshooting how environment-specific values — endpoint URLs, client IDs, thresholds, feature flags — are managed across Salesforce orgs without hardcoding. Triggers: 'named credential per environment', 'custom metadata for config', 'sfdx string replacement', 'CI variable substitution', 'secrets in org configuration', 'org-specific values'. NOT for sandbox refresh automation (use sandbox-refresh-and-templates), NOT for general deployment pipeline setup (use github-actions-for-salesforce or bitbucket-pipelines-for-salesforce), and NOT for per-user or per-profile configuration overrides.

prompt-template-versioning

from PranavNagrecha/AwesomeSalesforceSkills

Lifecycle management for Prompt Builder templates: version, test, promote, roll back via CMDT-backed bindings. NOT for authoring initial templates or generic prompt engineering.

prompt-builder-templates

from PranavNagrecha/AwesomeSalesforceSkills

Use when creating, reviewing, or troubleshooting Prompt Builder templates (Field Generation, Record Summary, Sales Email, or Flex types), including grounding with merge fields, Flow, or Apex. Trigger keywords: prompt template, Prompt Builder, field generation, record summary, sales email template, flex template, grounding, merge fields, LLM template, Einstein generative AI. NOT for agent topic instructions, Copilot action configuration, or Data Cloud segment activation.

agentforce-prompt-versioning

from PranavNagrecha/AwesomeSalesforceSkills

Version Prompt Templates and agent topic prompts: source-control shape, change review, model-version pinning, A/B, and rollback. Trigger keywords: prompt template versioning, prompt changelog, prompt rollback, A/B prompt test, agentforce prompt release. Does NOT cover: prompt engineering tips, general LLM fine-tuning, or Classify / Einstein Generate studio UI walkthroughs.

visualforce-security-and-modernization

from PranavNagrecha/AwesomeSalesforceSkills

Use when hardening or modernizing legacy Visualforce pages — covers the platform CSRF token model and when disabling it is a security regression, view state encryption guarantees and the 170 KB ceiling, FLS/CRUD enforcement gaps on `<apex:outputField>` and on getters that return sObjects, `<apex:includeScript>` interaction with the org Content Security Policy, hosting LWC inside a VF page via `lightning:container` / `lightning-out`, and the retire-vs-harden-vs-leave-alone decision for an inventory of legacy pages. Triggers: 'should I rewrite this Visualforce page in LWC', 'CSRF protection disabled on Visualforce page is that safe', 'community user sees a field they should not on a Visualforce page', 'view state encryption is that enough for sensitive data', 'how do I host an LWC inside a Visualforce page', 'apex:dynamicComponent and apex:actionFunction safe to keep'. NOT for greenfield Visualforce architecture (use apex/visualforce-fundamentals — controller types, view state pattern selection, PDF rendering); NOT for Visualforce email template authoring (use apex/visualforce-email-templates if/when that skill is authored); NOT for general Apex security review across triggers and async (use apex/soql-security and security/secure-coding-review-checklist).

transaction-security-policies

from PranavNagrecha/AwesomeSalesforceSkills

Transaction Security policy creation and configuration: condition builder, enhanced policies, enforcement actions (block, MFA, notification, end session), real-time monitoring mode, and policy troubleshooting. NOT for Event Monitoring log analysis or Shield Event Monitoring setup (use event-monitoring). NOT for Apex testing or debug-log analysis.

sso-saml-troubleshooting

from PranavNagrecha/AwesomeSalesforceSkills

Diagnosing broken SAML SSO into Salesforce — IdP-initiated vs SP-initiated flows, signing-certificate validity / expiry, NameID format mismatches, RelayState handling, audience / entityId / issuer mismatches, clock skew, the SAML Assertion Validator in Setup, the Login History debug log, and the My Domain prerequisite for SSO. Covers the standard diagnostic loop: read the SAML response, identify which check failed, fix at the IdP or SP. NOT for OAuth / OpenID Connect SSO (see security/oauth-openid-troubleshooting), NOT for setting up SSO from scratch (see security/sso-saml-setup).

shield-kms-byok-setup

from PranavNagrecha/AwesomeSalesforceSkills

Configure Shield Platform Encryption with customer-supplied (BYOK) or customer-held (Cache-Only Key Service) tenant secrets, rotate them, and recover. NOT for Classic Encryption or field masking.

shield-event-log-retention-strategy

from PranavNagrecha/AwesomeSalesforceSkills

Use when designing Salesforce Shield Event Monitoring retention, SIEM routing, and storage-tier strategy — which event types to keep, for how long, where, and how to answer audit queries across hot/warm/cold tiers. Triggers: 'shield event log retention', 'route event monitoring to splunk', 'how long to keep login history', 'siem salesforce integration', 'event monitoring storage tier'. NOT for enabling Shield (see salesforce-shield-deployment).

session-management-and-timeout

from PranavNagrecha/AwesomeSalesforceSkills

Use this skill when configuring session timeout values, concurrent session limits, session IP locking, or logout behavior in Salesforce. Covers org-wide session settings, profile-level overrides, Connected App session policies, and Metadata API SecuritySettings deployment. NOT for OAuth token refresh flows, login IP ranges, or MFA/identity-provider configuration.

session-high-assurance-policies

from PranavNagrecha/AwesomeSalesforceSkills

Enforce step-up authentication for sensitive pages/objects using High Assurance session level and login flow policies. NOT for initial MFA enrollment UX.