agentforce-cost-optimization
Use when Agentforce run costs are climbing, you need to forecast scale, or you want to reduce tokens per conversation without hurting quality. Covers topic design impact on cost, prompt/template reuse, grounding size discipline, caching, and model-tier selection. Triggers: 'agentforce cost', 'tokens per conversation too high', 'reduce agentforce runs spend', 'forecast agentforce scale cost', 'einstein trust layer tokens'. NOT for general LLM pricing strategy outside Salesforce.
Best use case
agentforce-cost-optimization is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Use when Agentforce run costs are climbing, you need to forecast scale, or you want to reduce tokens per conversation without hurting quality. Covers topic design impact on cost, prompt/template reuse, grounding size discipline, caching, and model-tier selection. Triggers: 'agentforce cost', 'tokens per conversation too high', 'reduce agentforce runs spend', 'forecast agentforce scale cost', 'einstein trust layer tokens'. NOT for general LLM pricing strategy outside Salesforce.
Teams using agentforce-cost-optimization should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/agentforce-cost-optimization/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How agentforce-cost-optimization Compares
| Feature / Agent | agentforce-cost-optimization | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Use when Agentforce run costs are climbing, you need to forecast scale, or you want to reduce tokens per conversation without hurting quality. Covers topic design impact on cost, prompt/template reuse, grounding size discipline, caching, and model-tier selection. Triggers: 'agentforce cost', 'tokens per conversation too high', 'reduce agentforce runs spend', 'forecast agentforce scale cost', 'einstein trust layer tokens'. NOT for general LLM pricing strategy outside Salesforce.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Agentforce Cost Optimization
Agentforce cost looks like "we'll just pay per run" right up until volume meets reality. A customer-service agent handling 200,000 conversations/month can consume 10× the tokens of a well-tuned version of the same agent — same quality, same topics, different token discipline. The cost drivers are predictable: topic instruction length, prompt template verbosity, grounding payload size, tool-call round-trips, and model tier. None of these are free to change, but they all respond to focused work.
The job is to measure first, then optimize the top three contributors. Most orgs find that topic instructions and grounding dominate — often 60-80% of tokens per conversation. Once those are disciplined, the remaining optimizations (template reuse, model-tier selection) become viable.
---
## Before Starting
- Pull 7 days of Agentforce runs; compute average and p95 token counts per conversation.
- Inventory topics, prompt templates, and grounding sources.
- Confirm model tier currently in use and any rate-limit headroom.
- Confirm business tolerance for quality-vs-cost tradeoffs.
## Core Concepts
### What Tokens Are You Paying For?
Every conversation pays for:
1. **System prompt** — the framework-level Agentforce prompt.
2. **Topic instructions** — active topic's instructions injected verbatim.
3. **Prompt template** — any custom template rendered per turn.
4. **Grounding** — retrieved content from Data Cloud, Knowledge, or explicit variables.
5. **Conversation history** — full turn history on each call.
6. **Tool output** — action results returned into context.
### The 80/20 Rule
For most agents, topic instructions + grounding = 60-80% of token spend. Conversation history grows linearly in long sessions. Tool output is lumpy but occasionally large (SOQL result sets dumped raw into context).
### Reducing Topic Instruction Tokens
- Delete department-name preamble ("As a customer service agent working for Acme Insurance...").
- Collapse redundant examples; 2 good examples outperform 10 mediocre ones.
- Externalize static policy ("always use formal English") into the system prompt instead of per-topic.
### Reducing Grounding Tokens
- Retrieve k=3, not k=10, unless evaluation shows quality improves.
- Chunk sizes: 300-500 tokens usually beats 1000-2000.
- Reranker before final injection when using Data Cloud retrievers.
- Strip boilerplate (legal footers, headers) from Knowledge articles before indexing.
### Conversation History Discipline
Long sessions inflate every turn's token count. Patterns:
- Summarize older turns ("Summary of first 5 turns: …") rather than sending verbatim.
- Archive turns beyond a threshold; keep only the last N in active context.
### Model Tier Selection
Not every action needs the most capable model. Use tiered routing:
- Classification / intent detection → smaller model.
- Reasoning / final response → larger model.
- Tool-calling / structured output → mid-tier is often enough.
### Caching Opportunities
- Topic instructions are stable across conversations — framework should cache; you don't need to change anything unless your template is dynamic.
- Grounding retrieval can cache per query; watch freshness needs.
---
## Common Patterns
### Pattern 1: Topic Instruction Audit And Trim
Per-topic, measure instruction token count. Target 150-300 tokens per topic instruction. Trim anything above 500 without a compelling reason.
### Pattern 2: k-3 Retriever With Reranker
Retrieve 10 candidates; rerank; inject top 3. Cuts grounding tokens 70% vs retrieve-10-inject-10.
### Pattern 3: Conversation Summarization Trigger
After N turns or M tokens of history, replace older turns with a one-line summary.
### Pattern 4: Tiered Model Routing
Route classification / intent steps to a smaller model; reasoning/response to the capable model.
### Pattern 5: Tool Output Projection
When a tool returns a large payload (e.g. SOQL result), project the fields the agent actually needs instead of dumping the full response.
---
## Decision Guidance
| Situation | Recommended Approach | Reason |
|---|---|---|
| Token usage high, unknown contributor | Instrument and measure first | Avoid guessing |
| Topic instructions > 500 tokens | Trim (Pattern 1) | Biggest win |
| Grounding k ≥ 5 without evaluation | Reduce k + rerank (Pattern 2) | Second biggest win |
| Long conversations | Summarize (Pattern 3) | Linear savings per turn |
| Classification step using largest model | Switch to smaller tier (Pattern 4) | Cheap wins |
| Tool returns wide records | Project fields (Pattern 5) | Eliminates silent waste |
## Review Checklist
- [ ] Per-conversation token metrics collected and dashboarded.
- [ ] Top 3 token contributors identified per agent.
- [ ] Topic instruction length audited.
- [ ] Grounding k and chunk size justified.
- [ ] Long-conversation strategy exists.
- [ ] Model tier routing considered.
- [ ] Tool output projection in place.
## Recommended Workflow
1. Measure — 7 days of run data broken down by token source.
2. Identify top 3 contributors.
3. Optimize topic instructions first.
4. Optimize grounding second.
5. Add conversation summarization if sessions are long.
6. Apply tier routing where quality allows.
7. Re-measure; document cost savings.
---
## Salesforce-Specific Gotchas
1. Trust Layer adds tokens — masking, citation, guardrails all add context weight.
2. Grounding sources can include large boilerplate (Knowledge article footers); index selectively.
3. Tool output is counted even if the agent ignores it.
4. Managed topics may have opaque instruction length; audit via runtime logs.
5. Switching model tier changes quality — do not do this without A/B evaluation.
## Proactive Triggers
- Topic instruction > 500 tokens → Flag High.
- Retriever k ≥ 10 without reranker → Flag High.
- Average conversation > 20 turns with no summarization → Flag Medium.
- Classification step on flagship model → Flag Medium.
- Token growth > 15%/month without volume growth → Flag High.
## Output Artifacts
| Artifact | Description |
|---|---|
| Cost model | Tokens per conversation by contributor |
| Optimization plan | Prioritized trim list with expected savings |
| Tier routing design | Step → model mapping |
## Related Skills
- `agentforce/agent-topic-design` — topic structure quality.
- `agentforce/prompt-builder-templates` — prompt template hygiene.
- `agentforce/data-cloud-grounding-for-agentforce` — grounding retrieval.
- `agentforce/agentforce-observability` — measurement infrastructure.Related Skills
dataraptor-transform-optimization
Use when DataRaptor Transform operations are slow, hit governor limits, or use Apex where formula fields would suffice. Covers formula vs Apex expressions, bulk transform sizing, and chained transform composition. Triggers: 'dataraptor transform slow', 'dataraptor formula vs apex', 'dataraptor bulk transform', 'dr governor limit'. NOT for DataRaptor Extract or Load performance.
flow-performance-optimization
Tune Flow runtime performance: pick Before-Save over After-Save, consolidate Get Records, eliminate loop-DML, cache lookups, split with Scheduled Paths, and measure actual runtime. Covers benchmarking methodology, profiling tools, and the 80/20 wins. NOT for governor-limit math (use flow-governor-limits-deep-dive). NOT for LDV strategy (use flow-large-data-volume-patterns).
flow-get-records-optimization
Optimize Get Records elements in Flow: filter sharpness, field selection, sort-and-limit placement, caching via formula resources, and avoiding repeated queries in loops. Trigger keywords: get records, flow soql, flow query limit, flow performance, record lookup. Does NOT cover Apex SOQL, Data Cloud queries, or external object lookups.
soql-query-optimization
Use when a SOQL query is running slowly, causing timeouts, or returning UNABLE_TO_LOCK_ROW errors in large data volume orgs. Covers index-aware query writing, selectivity rules, the Query Plan tool, skinny tables, and dynamic field-set queries. Triggers: slow soql query, query timeout, non-selective query, query plan tool, index usage, soql optimization, large object performance. NOT for Apex CPU or heap governor limit issues (use apex-cpu-and-heap-optimization) or for writing basic SOQL (use soql-fundamentals).
cpq-performance-optimization
Use when diagnosing or resolving slow CPQ quote calculation, QLEx timeouts, or governor limit errors on large quotes. Trigger keywords: Large Quote Mode, QCP field declaration, quote calculation performance, SBQQ calculation timeout, async pricing. NOT for generic Apex performance tuning, CPQ pricing rule logic design, or billing engine performance.
analytics-dataset-optimization
Use this skill when tuning CRM Analytics dataset performance through field selection, date granularity choices, dataset splitting strategy, and run-budget optimization. Trigger keywords: dataset too many fields, SAQL timeseries slow, epoch vs date storage, dataset field count limit, dataset partition, split dataset by year, CRM Analytics performance tuning. NOT for SOQL optimization, Salesforce report tuning, Data Cloud segmentation performance, or choosing between analytics tools.
license-optimization-strategy
Auditing, right-sizing, and reclaiming Salesforce licenses to reduce cost and ensure compliant allocation. Trigger keywords: license audit, license cost reduction, unused licenses, permission set license, login-based license, inactive users, license reclamation, right-size licenses. NOT for provisioning net-new licenses (contact AE). NOT for Experience Cloud community license troubleshooting. NOT for permission set assignment logic outside of license gating.
fsl-optimization-architecture
Use this skill when designing or evaluating the FSL scheduling engine architecture: optimization mode selection (Global/In-Day/Resource/Reshuffle), ESO adoption strategy, territory sizing for optimization, and fallback planning. Trigger keywords: FSL optimization engine, ESO enhanced scheduling, global optimization timeout, in-day optimization, OAAS architecture, territory optimization design. NOT for admin-level scheduling policy configuration, scheduling rule setup in Setup, or per-appointment scheduling API calls (covered by apex/fsl-scheduling-api).
einstein-bots-to-agentforce-migration
Use when migrating an existing Einstein Bot (legacy or Enhanced) to Agentforce: feature mapping, conversation design translation, cutover planning, hybrid bot/agent architecture, and context handoff. Triggers: 'migrate einstein bot to agentforce', 'convert legacy bot to agentforce', 'einstein bot retiring deadline', 'hybrid bot agentforce pattern', 'bot dialog to topic migration'. NOT for new Agentforce setup with no existing bot — use agentforce/agentforce-agent-creation instead.
data-cloud-grounding-for-agentforce
Use when grounding an Agentforce agent with Data Cloud retrievers, DMO selection, chunking, and freshness windows. Triggers: agent grounding, retriever, DMO, data graph, RAG, vector index, citations. Does NOT cover Data Cloud ingestion pipelines or Data Cloud identity resolution tuning.
agentforce-tool-use-patterns
Pick the right tool shape for each agent action: Apex invocable vs Flow action vs External Service vs Prompt Template vs Data Cloud retrieval. Covers action selection by use case, argument design for LLM clarity, return-shape contracts, error-surfacing, cost implications, and when to chain tools vs keep a single action. NOT for authoring a specific action (use custom-agent-actions-apex). NOT for topic design (use agent-topic-design).
agentforce-testing-strategy
Design Agentforce testing: topic coverage, action unit tests, deterministic golden sets, adversarial prompts, and regression harness. Trigger keywords: agentforce testing, agent eval, agent regression suite, prompt golden set, action unit test agentforce. Does NOT cover: generic LLM evaluation academia, human-labeled RLHF pipelines, or Einstein Classify accuracy.