latency-advisor

Provides SRE latency optimization advice for Claude API usage. Use when users discuss Bedrock performance, API latency, slow responses, or TTFT issues with Claude Code.

16 stars

bydiegosouzapw

View on GitHub Installation ↓

Best use case

latency-advisor is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Provides SRE latency optimization advice for Claude API usage. Use when users discuss Bedrock performance, API latency, slow responses, or TTFT issues with Claude Code.

Teams using latency-advisor should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/latency-advisor/SKILL.md --create-dirs "https://raw.githubusercontent.com/diegosouzapw/awesome-omni-skill/main/skills/backend/latency-advisor/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/latency-advisor/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How latency-advisor Compares

Feature / Agent	latency-advisor	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Provides SRE latency optimization advice for Claude API usage. Use when users discuss Bedrock performance, API latency, slow responses, or TTFT issues with Claude Code.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Latency Advisor

You are an SRE advisor specializing in Claude API performance optimization. When a user mentions latency issues, slow responses, or performance concerns with Claude Code (whether using Anthropic Direct or AWS Bedrock), provide targeted advice.

## Key Knowledge

### Anthropic Direct API
- Endpoint: `api.anthropic.com`
- Typical TTFT: ~500ms (Claude 4.5 Haiku)
- Auth: `ANTHROPIC_API_KEY` header
- Generally lowest TTFT of all providers

### AWS Bedrock
- Additional latency from AWS API gateway + SigV4 auth overhead
- Typical TTFT: ~800ms (Claude 4.5 Haiku, standard)
- Enable latency-optimized inference: `"performanceConfig": {"latency": "optimized"}` for 40-50% TTFT reduction
- Use `global.` model prefix for dynamic routing (lower latency, no pricing premium)
- Prompt caching significantly reduces TTFT for repeated prefixes

### Claude Code Bedrock Configuration
```bash
export CLAUDE_CODE_USE_BEDROCK=1
export AWS_REGION=us-east-1
export ANTHROPIC_MODEL='global.anthropic.claude-sonnet-4-5-20250929-v1:0'
```

### Latency Reduction Strategies
1. **Prompt caching** — reuse system prompts, reduce TTFT by up to 85%
2. **Streaming** — always stream for interactive use (Claude Code does this by default)
3. **Model selection** — Haiku for speed-critical paths, Sonnet/Opus for quality-critical
4. **Region proximity** — choose Bedrock region closest to your location
5. **Max tokens** — set `max_tokens` to the minimum needed, not a large default
6. **Prompt length** — TTFT scales with input tokens; shorter prompts = faster first token

## When to Use This Skill

Activate when the user:
- Mentions Claude Code feeling slow
- Asks about Bedrock vs Direct API performance
- Wants to optimize TTFT or throughput
- Discusses latency budgets or SLOs for AI-powered features
- Is troubleshooting slow streaming responses

## Running Benchmarks

Suggest using the plugin's benchmark command:
```
/sre-latency:benchmark -n 10 --prompt-size medium --output benchmark.json
```

For quick spot-checks:
```
/sre-latency:latency-check both
```

Related Skills

codex-advisor

from diegosouzapw/awesome-omni-skill

Get a second opinion from OpenAI Codex CLI for plan reviews, code reviews, architecture decisions, and hard problems. Use when you need external validation, want to compare approaches, or are stuck on a difficult problem.

architecture-advisor

from diegosouzapw/awesome-omni-skill

Helps solo developers with AI agents choose optimal architecture (monolithic/microservices/hybrid)

advisor

from diegosouzapw/awesome-omni-skill

Interactive workflow advisor that helps you choose optimal AI primitives from agentconfig.org based on your specific workflow needs, skill level, and tooling preferences. Use when deciding which primitives to implement or how to structure your AI configuration.

Advisory Board Builder

from diegosouzapw/awesome-omni-skill

Recruit, structure, and manage advisory boards for strategic guidance

tech-advisor

from diegosouzapw/awesome-omni-skill

Recomienda stack tecnológico óptimo basado en requisitos del proyecto

boardroom-advisor

from diegosouzapw/awesome-omni-skill

Consult a virtual board of 4 strategic advisors (Donald Miller, Seth Godin, Alex Hormozi, Daniel Priestley) on any major business decision. Two rounds of argument + rebuttal, then a decision brief, interactive dashboard, and clear recommendation.

advisor-triggers

from diegosouzapw/awesome-omni-skill

Detects when user requests warrant critical analysis via /advise command

artifact-advisor

from diegosouzapw/awesome-omni-skill

Advise on choosing between Skills, Commands, Subagents, and Hooks for Claude Code. Analyze user requirements and recommend the appropriate artifact type with justification. Use when user asks "should I use a skill or command", "what artifact type", "skill vs command", or describes a workflow needing automation.

agent-legal-advisor

from diegosouzapw/awesome-omni-skill

Expert legal advisor specializing in technology law, compliance, and risk mitigation. Masters contract drafting, intellectual property, data privacy, and regulatory compliance with focus on protecting business interests while enabling innovation and growth.

bgo

from diegosouzapw/awesome-omni-skill

Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.

Coding & Development

large-data-with-dask

from diegosouzapw/awesome-omni-skill

Specific optimization strategies for Python scripts working with larger-than-memory datasets via Dask.

langsmith-fetch

from diegosouzapw/awesome-omni-skill

Debug LangChain and LangGraph agents by fetching execution traces from LangSmith Studio. Use when debugging agent behavior, investigating errors, analyzing tool calls, checking memory operations, or examining agent performance. Automatically fetches recent traces and analyzes execution patterns. Requires langsmith-fetch CLI installed.