latency-advisor

Provides SRE latency optimization advice for Claude API usage. Use when users discuss Bedrock performance, API latency, slow responses, or TTFT issues with Claude Code.

16 stars

Best use case

latency-advisor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Provides SRE latency optimization advice for Claude API usage. Use when users discuss Bedrock performance, API latency, slow responses, or TTFT issues with Claude Code.

Teams using latency-advisor should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/latency-advisor/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/backend/latency-advisor/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/latency-advisor/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How latency-advisor Compares

Feature / Agentlatency-advisorStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Provides SRE latency optimization advice for Claude API usage. Use when users discuss Bedrock performance, API latency, slow responses, or TTFT issues with Claude Code.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Latency Advisor

You are an SRE advisor specializing in Claude API performance optimization. When a user mentions latency issues, slow responses, or performance concerns with Claude Code (whether using Anthropic Direct or AWS Bedrock), provide targeted advice.

## Key Knowledge

### Anthropic Direct API
- Endpoint: `api.anthropic.com`
- Typical TTFT: ~500ms (Claude 4.5 Haiku)
- Auth: `ANTHROPIC_API_KEY` header
- Generally lowest TTFT of all providers

### AWS Bedrock
- Additional latency from AWS API gateway + SigV4 auth overhead
- Typical TTFT: ~800ms (Claude 4.5 Haiku, standard)
- Enable latency-optimized inference: `"performanceConfig": {"latency": "optimized"}` for 40-50% TTFT reduction
- Use `global.` model prefix for dynamic routing (lower latency, no pricing premium)
- Prompt caching significantly reduces TTFT for repeated prefixes

### Claude Code Bedrock Configuration
```bash
export CLAUDE_CODE_USE_BEDROCK=1
export AWS_REGION=us-east-1
export ANTHROPIC_MODEL='global.anthropic.claude-sonnet-4-5-20250929-v1:0'
```

### Latency Reduction Strategies
1. **Prompt caching** — reuse system prompts, reduce TTFT by up to 85%
2. **Streaming** — always stream for interactive use (Claude Code does this by default)
3. **Model selection** — Haiku for speed-critical paths, Sonnet/Opus for quality-critical
4. **Region proximity** — choose Bedrock region closest to your location
5. **Max tokens** — set `max_tokens` to the minimum needed, not a large default
6. **Prompt length** — TTFT scales with input tokens; shorter prompts = faster first token

## When to Use This Skill

Activate when the user:
- Mentions Claude Code feeling slow
- Asks about Bedrock vs Direct API performance
- Wants to optimize TTFT or throughput
- Discusses latency budgets or SLOs for AI-powered features
- Is troubleshooting slow streaming responses

## Running Benchmarks

Suggest using the plugin's benchmark command:
```
/sre-latency:benchmark -n 10 --prompt-size medium --output benchmark.json
```

For quick spot-checks:
```
/sre-latency:latency-check both
```

Related Skills

codex-advisor

16
from diegosouzapw/awesome-omni-skill

Get a second opinion from OpenAI Codex CLI for plan reviews, code reviews, architecture decisions, and hard problems. Use when you need external validation, want to compare approaches, or are stuck on a difficult problem.

architecture-advisor

16
from diegosouzapw/awesome-omni-skill

Helps solo developers with AI agents choose optimal architecture (monolithic/microservices/hybrid)

advisor

16
from diegosouzapw/awesome-omni-skill

Interactive workflow advisor that helps you choose optimal AI primitives from agentconfig.org based on your specific workflow needs, skill level, and tooling preferences. Use when deciding which primitives to implement or how to structure your AI configuration.

Advisory Board Builder

16
from diegosouzapw/awesome-omni-skill

Recruit, structure, and manage advisory boards for strategic guidance

tech-advisor

16
from diegosouzapw/awesome-omni-skill

Recomienda stack tecnológico óptimo basado en requisitos del proyecto

boardroom-advisor

16
from diegosouzapw/awesome-omni-skill

Consult a virtual board of 4 strategic advisors (Donald Miller, Seth Godin, Alex Hormozi, Daniel Priestley) on any major business decision. Two rounds of argument + rebuttal, then a decision brief, interactive dashboard, and clear recommendation.

advisor-triggers

16
from diegosouzapw/awesome-omni-skill

Detects when user requests warrant critical analysis via /advise command

artifact-advisor

16
from diegosouzapw/awesome-omni-skill

Advise on choosing between Skills, Commands, Subagents, and Hooks for Claude Code. Analyze user requirements and recommend the appropriate artifact type with justification. Use when user asks "should I use a skill or command", "what artifact type", "skill vs command", or describes a workflow needing automation.

agent-legal-advisor

16
from diegosouzapw/awesome-omni-skill

Expert legal advisor specializing in technology law, compliance, and risk mitigation. Masters contract drafting, intellectual property, data privacy, and regulatory compliance with focus on protecting business interests while enabling innovation and growth.

bgo

10
from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

large-data-with-dask

16
from diegosouzapw/awesome-omni-skill

Specific optimization strategies for Python scripts working with larger-than-memory datasets via Dask.

langsmith-fetch

16
from diegosouzapw/awesome-omni-skill

Debug LangChain and LangGraph agents by fetching execution traces from LangSmith Studio. Use when debugging agent behavior, investigating errors, analyzing tool calls, checking memory operations, or examining agent performance. Automatically fetches recent traces and analyzes execution patterns. Requires langsmith-fetch CLI installed.