token-budget-advisor
Offers the user an informed choice about how much response depth to consume before answering. Use this skill when the user explicitly wants to control response length, depth, or token budget. TRIGGER when: "token budget", "token count", "token usage", "token limit", "response length", "answer depth", "short version", "brief answer", "detailed answer", "exhaustive answer", "respuesta corta vs larga", "cuántos tokens", "ahorrar tokens", "responde al 50%", "dame la versión corta", "quiero controlar cuánto usas", or clear variants where the user is explicitly asking to control answer size or depth. DO NOT TRIGGER when: user has already specified a level in the current session (maintain it), the request is clearly a one-word answer, or "token" refers to auth/session/payment tokens rather than response size.
About this skill
The 'token-budget-advisor' skill empowers users to fine-tune their AI interactions by providing explicit control over the AI's output. It intelligently detects user intent to regulate response size, depth, or token consumption, then offers actionable choices (e.g., brief, detailed, or a specific token limit) to tailor the answer. This is particularly useful for managing token costs, fitting answers into specific display constraints, or simply getting to the point. Developed as part of the 'everything-claude-code' repository, it adheres to high-quality engineering patterns, ensuring a robust and user-centric experience for Claude AI agents.
Best use case
Users can leverage this skill to: - Obtain concise summaries without extraneous detail. - Request exhaustive, in-depth explanations for complex subjects. - Manage token consumption and associated costs in API-driven applications. - Ensure AI output conforms to length restrictions for integration with other tools or platforms. - Gain transparency and control over the AI's verbosity for various tasks.
Offers the user an informed choice about how much response depth to consume before answering. Use this skill when the user explicitly wants to control response length, depth, or token budget. TRIGGER when: "token budget", "token count", "token usage", "token limit", "response length", "answer depth", "short version", "brief answer", "detailed answer", "exhaustive answer", "respuesta corta vs larga", "cuántos tokens", "ahorrar tokens", "responde al 50%", "dame la versión corta", "quiero controlar cuánto usas", or clear variants where the user is explicitly asking to control answer size or depth. DO NOT TRIGGER when: user has already specified a level in the current session (maintain it), the request is clearly a one-word answer, or "token" refers to auth/session/payment tokens rather than response size.
Upon activation, the AI agent will engage the user to clarify their preferred response length or depth. This interaction might involve presenting predefined options (e.g., 'short', 'medium', 'long'), or allowing the user to specify a custom token count. The final answer will then be generated according to these explicitly defined constraints, ensuring the user receives content tailored precisely to their needs.
Practical example
Example input
Explain the basics of blockchain technology, but only give me a brief overview, maybe around 150 words.
Example output
I can provide a brief overview of blockchain technology, or a more detailed explanation if you prefer. For a 150-word summary, would you like me to focus on its core principles, or key applications? I'll ensure the response fits your requested length.
When to use this skill
- Trigger this skill when the user explicitly expresses a desire to control the answer's length, depth, or token usage. This includes phrases such as: - "token budget", "token count", "token usage", "token limit" - "response length", "answer depth", "short version", "brief answer", "detailed answer", "exhaustive answer" - Non-English variants like "respuesta corta vs larga", "cuántos tokens", "ahorrar tokens", "responde al 50%", "dame la versión corta", "quiero controlar cuánto usas" - Any clear linguistic cues indicating a user's intent to regulate the size or detail level of the AI's response.
When not to use this skill
- Avoid triggering this skill in the following scenarios: - If the user has already established a preferred response length or depth within the current conversation session (the agent should maintain the existing setting). - When the user's request is clearly designed for a very short, potentially one-word answer, where offering length options would be redundant. - When the term 'token' refers to concepts other than response size, such as authentication tokens, session identifiers, or payment tokens.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/token-budget-advisor/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How token-budget-advisor Compares
| Feature / Agent | token-budget-advisor | Standard Approach |
|---|---|---|
| Platform Support | Claude | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | easy | N/A |
Frequently Asked Questions
What does this skill do?
Offers the user an informed choice about how much response depth to consume before answering. Use this skill when the user explicitly wants to control response length, depth, or token budget. TRIGGER when: "token budget", "token count", "token usage", "token limit", "response length", "answer depth", "short version", "brief answer", "detailed answer", "exhaustive answer", "respuesta corta vs larga", "cuántos tokens", "ahorrar tokens", "responde al 50%", "dame la versión corta", "quiero controlar cuánto usas", or clear variants where the user is explicitly asking to control answer size or depth. DO NOT TRIGGER when: user has already specified a level in the current session (maintain it), the request is clearly a one-word answer, or "token" refers to auth/session/payment tokens rather than response size.
Which AI agents support this skill?
This skill is designed for Claude.
How difficult is it to install?
The installation complexity is rated as easy. You can find the installation instructions above.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Best AI Skills for Claude
Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.
ChatGPT vs Claude for Agent Skills
Compare ChatGPT and Claude for AI agent skills across coding, writing, research, and reusable workflow execution.
SKILL.md Source
# Token Budget Advisor (TBA) Intercept the response flow to offer the user a choice about response depth **before** Claude answers. ## When to Use - User wants to control how long or detailed a response is - User mentions tokens, budget, depth, or response length - User says "short version", "tldr", "brief", "al 25%", "exhaustive", etc. - Any time the user wants to choose depth/detail level upfront **Do not trigger** when: user already set a level this session (maintain it silently), or the answer is trivially one line. ## How It Works ### Step 1 — Estimate input tokens Use the repository's canonical context-budget heuristics to estimate the prompt's token count mentally. Use the same calibration guidance as [context-budget](../context-budget/SKILL.md): - prose: `words × 1.3` - code-heavy or mixed/code blocks: `chars / 4` For mixed content, use the dominant content type and keep the estimate heuristic. ### Step 2 — Estimate response size by complexity Classify the prompt, then apply the multiplier range to get the full response window: | Complexity | Multiplier range | Example prompts | |--------------|------------------|------------------------------------------------------| | Simple | 3× – 8× | "What is X?", yes/no, single fact | | Medium | 8× – 20× | "How does X work?" | | Medium-High | 10× – 25× | Code request with context | | Complex | 15× – 40× | Multi-part analysis, comparisons, architecture | | Creative | 10× – 30× | Stories, essays, narrative writing | Response window = `input_tokens × mult_min` to `input_tokens × mult_max` (but don’t exceed your model’s configured output-token limit). ### Step 3 — Present depth options Present this block **before** answering, using the actual estimated numbers: ``` Analyzing your prompt... Input: ~[N] tokens | Type: [type] | Complexity: [level] | Language: [lang] Choose your depth level: [1] Essential (25%) -> ~[tokens] Direct answer only, no preamble [2] Moderate (50%) -> ~[tokens] Answer + context + 1 example [3] Detailed (75%) -> ~[tokens] Full answer with alternatives [4] Exhaustive (100%) -> ~[tokens] Everything, no limits Which level? (1-4 or say "25% depth", "50% depth", "75% depth", "100% depth") Precision: heuristic estimate ~85-90% accuracy (±15%). ``` Level token estimates (within the response window): - 25% → `min + (max - min) × 0.25` - 50% → `min + (max - min) × 0.50` - 75% → `min + (max - min) × 0.75` - 100% → `max` ### Step 4 — Respond at the chosen level | Level | Target length | Include | Omit | |------------------|---------------------|-----------------------------------------------------|---------------------------------------------------| | 25% Essential | 2-4 sentences max | Direct answer, key conclusion | Context, examples, nuance, alternatives | | 50% Moderate | 1-3 paragraphs | Answer + necessary context + 1 example | Deep analysis, edge cases, references | | 75% Detailed | Structured response | Multiple examples, pros/cons, alternatives | Extreme edge cases, exhaustive references | | 100% Exhaustive | No restriction | Everything — full analysis, all code, all perspectives | Nothing | ## Shortcuts — skip the question If the user already signals a level, respond at that level immediately without asking: | What they say | Level | |----------------------------------------------------|-------| | "1" / "25% depth" / "short version" / "brief answer" / "tldr" | 25% | | "2" / "50% depth" / "moderate depth" / "balanced answer" | 50% | | "3" / "75% depth" / "detailed answer" / "thorough answer" | 75% | | "4" / "100% depth" / "exhaustive answer" / "full deep dive" | 100% | If the user set a level earlier in the session, **maintain it silently** for subsequent responses unless they change it. ## Precision note This skill uses heuristic estimation — no real tokenizer. Accuracy ~85-90%, variance ±15%. Always show the disclaimer. ## Examples ### Triggers - "Give me the short version first." - "How many tokens will your answer use?" - "Respond at 50% depth." - "I want the exhaustive answer, not the summary." - "Dame la version corta y luego la detallada." ### Does Not Trigger - "What is a JWT token?" - "The checkout flow uses a payment token." - "Is this normal?" - "Complete the refactor." - Follow-up questions after the user already chose a depth for the session ## Source Standalone skill from [TBA — Token Budget Advisor for Claude Code](https://github.com/Xabilimon1/Token-Budget-Advisor-Claude-Code-). Original project also ships a Python estimator script, but this repository keeps the skill self-contained and heuristic-only.
Related Skills
google-workspace-ops
Operate across Google Drive, Docs, Sheets, and Slides as one workflow surface for plans, trackers, decks, and shared documents. Use when the user needs to find, summarize, edit, migrate, or clean up Google Workspace assets without dropping to raw tool calls.
connections-optimizer
Reorganize the user's X and LinkedIn network with review-first pruning, add/follow recommendations, and channel-specific warm outreach drafted in the user's real voice. Use when the user wants to clean up following lists, grow toward current priorities, or rebalance a social graph around higher-signal relationships.
frontend-slides
Create stunning, animation-rich HTML presentations from scratch or by converting PowerPoint files. Use when the user wants to build a presentation, convert a PPT/PPTX to web, or create slides for a talk/pitch. Helps non-designers discover their aesthetic through visual exploration rather than abstract choices.
miro-automation
Automate Miro tasks via Rube MCP (Composio): boards, items, sticky notes, frames, sharing, connectors. Always search tools first for current schemas.
interview-coach
Full job search coaching system — JD decoding, resume, storybank, mock interviews, transcript analysis, comp negotiation. 23 commands, persistent state.
google-slides-automation
Lightweight Google Slides integration with standalone OAuth authentication. No MCP server required. Full read/write access.
coda-automation
Automate Coda tasks via Rube MCP (Composio): manage docs, pages, tables, rows, formulas, permissions, and publishing. Always search tools first for current schemas.
context-budget
审核Claude Code上下文窗口在代理、技能、MCP服务器和规则中的消耗情况。识别膨胀、冗余组件,并提供优先的令牌节省建议。
workspace-surface-audit
Audit the active repo, MCP servers, plugins, connectors, env surfaces, and harness setup, then recommend the highest-value ECC-native skills, hooks, agents, and operator workflows. Use when the user wants help setting up Claude Code or understanding what capabilities are actually available in their environment.
ui-demo
Record polished UI demo videos using Playwright. Use when the user asks to create a demo, walkthrough, screen recording, or tutorial video of a web application. Produces WebM videos with visible cursor, natural pacing, and professional feel.
skill-comply
Visualize whether skills, rules, and agent definitions are actually followed — auto-generates scenarios at 3 prompt strictness levels, runs agents, classifies behavioral sequences, and reports compliance rates with full tool call timelines
santa-method
Multi-agent adversarial verification with convergence loop. Two independent review agents must both pass before output ships.