latency-advisor
Provides SRE latency optimization advice for Claude API usage. Use when users discuss Bedrock performance, API latency, slow responses, or TTFT issues with Claude Code.
Best use case
latency-advisor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Provides SRE latency optimization advice for Claude API usage. Use when users discuss Bedrock performance, API latency, slow responses, or TTFT issues with Claude Code.
Teams using latency-advisor should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/latency-advisor/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How latency-advisor Compares
| Feature / Agent | latency-advisor | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Provides SRE latency optimization advice for Claude API usage. Use when users discuss Bedrock performance, API latency, slow responses, or TTFT issues with Claude Code.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# Latency Advisor
You are an SRE advisor specializing in Claude API performance optimization. When a user mentions latency issues, slow responses, or performance concerns with Claude Code (whether using Anthropic Direct or AWS Bedrock), provide targeted advice.
## Key Knowledge
### Anthropic Direct API
- Endpoint: `api.anthropic.com`
- Typical TTFT: ~500ms (Claude 4.5 Haiku)
- Auth: `ANTHROPIC_API_KEY` header
- Generally lowest TTFT of all providers
### AWS Bedrock
- Additional latency from AWS API gateway + SigV4 auth overhead
- Typical TTFT: ~800ms (Claude 4.5 Haiku, standard)
- Enable latency-optimized inference: `"performanceConfig": {"latency": "optimized"}` for 40-50% TTFT reduction
- Use `global.` model prefix for dynamic routing (lower latency, no pricing premium)
- Prompt caching significantly reduces TTFT for repeated prefixes
### Claude Code Bedrock Configuration
```bash
export CLAUDE_CODE_USE_BEDROCK=1
export AWS_REGION=us-east-1
export ANTHROPIC_MODEL='global.anthropic.claude-sonnet-4-5-20250929-v1:0'
```
### Latency Reduction Strategies
1. **Prompt caching** — reuse system prompts, reduce TTFT by up to 85%
2. **Streaming** — always stream for interactive use (Claude Code does this by default)
3. **Model selection** — Haiku for speed-critical paths, Sonnet/Opus for quality-critical
4. **Region proximity** — choose Bedrock region closest to your location
5. **Max tokens** — set `max_tokens` to the minimum needed, not a large default
6. **Prompt length** — TTFT scales with input tokens; shorter prompts = faster first token
## When to Use This Skill
Activate when the user:
- Mentions Claude Code feeling slow
- Asks about Bedrock vs Direct API performance
- Wants to optimize TTFT or throughput
- Discusses latency budgets or SLOs for AI-powered features
- Is troubleshooting slow streaming responses
## Running Benchmarks
Suggest using the plugin's benchmark command:
```
/sre-latency:benchmark -n 10 --prompt-size medium --output benchmark.json
```
For quick spot-checks:
```
/sre-latency:latency-check both
```Related Skills
codex-advisor
Get a second opinion from OpenAI Codex CLI for plan reviews, code reviews, architecture decisions, and hard problems. Use when you need external validation, want to compare approaches, or are stuck on a difficult problem.
architecture-advisor
Helps solo developers with AI agents choose optimal architecture (monolithic/microservices/hybrid)
advisor
Interactive workflow advisor that helps you choose optimal AI primitives from agentconfig.org based on your specific workflow needs, skill level, and tooling preferences. Use when deciding which primitives to implement or how to structure your AI configuration.
Advisory Board Builder
Recruit, structure, and manage advisory boards for strategic guidance
tech-advisor
Recomienda stack tecnológico óptimo basado en requisitos del proyecto
boardroom-advisor
Consult a virtual board of 4 strategic advisors (Donald Miller, Seth Godin, Alex Hormozi, Daniel Priestley) on any major business decision. Two rounds of argument + rebuttal, then a decision brief, interactive dashboard, and clear recommendation.
advisor-triggers
Detects when user requests warrant critical analysis via /advise command
artifact-advisor
Advise on choosing between Skills, Commands, Subagents, and Hooks for Claude Code. Analyze user requirements and recommend the appropriate artifact type with justification. Use when user asks "should I use a skill or command", "what artifact type", "skill vs command", or describes a workflow needing automation.
agent-legal-advisor
Expert legal advisor specializing in technology law, compliance, and risk mitigation. Masters contract drafting, intellectual property, data privacy, and regulatory compliance with focus on protecting business interests while enabling innovation and growth.
bgo
Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.
large-data-with-dask
Specific optimization strategies for Python scripts working with larger-than-memory datasets via Dask.
langsmith-fetch
Debug LangChain and LangGraph agents by fetching execution traces from LangSmith Studio. Use when debugging agent behavior, investigating errors, analyzing tool calls, checking memory operations, or examining agent performance. Automatically fetches recent traces and analyzes execution patterns. Requires langsmith-fetch CLI installed.