clade-load-scale

Scale Claude usage for high-throughput applications — batches, queues, Use when working with load-scale patterns. concurrency control, and tier upgrades. Trigger with "anthropic scale", "claude high volume", "anthropic throughput", "scale claude api", "anthropic concurrent requests".

25 stars

byComeOnOliver

View on GitHub Installation ↓

Best use case

clade-load-scale is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using clade-load-scale should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/clade-load-scale/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/jeremylongshore/claude-code-plugins-plus-skills/clade-load-scale/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/clade-load-scale/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How clade-load-scale Compares

Feature / Agent	clade-load-scale	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Anthropic Load & Scale

## Overview
Scale Claude usage for high-throughput applications. Covers four strategies: Message Batches (10K requests, 50% off, no rate limits), request queues with concurrency control via p-limit, tier upgrades (Tier 1-4 + Scale), and model selection for throughput (Haiku is 3-4x faster than Sonnet).


## Scaling Strategies

## Instructions

### Step 1: Message Batches (Best for Bulk)
```typescript
// 10K requests per batch, 50% cheaper, no rate limits
const batch = await client.messages.batches.create({
  requests: items.map((item, i) => ({
    custom_id: `${i}`,
    params: { model: 'claude-sonnet-4-20250514', max_tokens: 1024, messages: [{ role: 'user', content: item }] },
  })),
});
// Process up to 100 concurrent batches
```

### Step 2: Request Queue with Concurrency Control
```typescript
import pLimit from 'p-limit';

// Match your rate limit tier
const limit = pLimit(10); // 10 concurrent requests

const results = await Promise.all(
  inputs.map(input =>
    limit(() => client.messages.create({
      model: 'claude-sonnet-4-20250514',
      max_tokens: 1024,
      messages: [{ role: 'user', content: input }],
    }))
  )
);
```

### Step 3: Tier Upgrades
Increase your spending to unlock higher tiers:

| Tier | RPM | Input TPM | How to Qualify |
|------|-----|-----------|----------------|
| 1 | 50 | 40K | Free |
| 2 | 1,000 | 80K | $40+ total spend |
| 3 | 2,000 | 160K | $200+ total spend |
| 4 | 4,000 | 400K | $400+ total spend |
| Scale | Custom | Custom | Contact sales |

### Step 4: Model Selection for Throughput
```typescript
// Haiku processes 3-4x faster than Sonnet, 8x faster than Opus
// Use the fastest model that meets quality requirements
const model = taskComplexity === 'simple' ? 'claude-haiku-4-5-20251001' : 'claude-sonnet-4-20250514';
```

## Monitoring at Scale
```typescript
// Track throughput metrics
let requestCount = 0;
let tokenCount = 0;

setInterval(() => {
  console.log(`Throughput: ${requestCount} req/min, ${tokenCount} tokens/min`);
  requestCount = 0;
  tokenCount = 0;
}, 60_000);
```

## Output
- Batch processing configured for bulk workloads (50% cheaper, no rate limits)
- Concurrency-controlled request queue matching rate limit tier
- Rate limit tier upgraded by increasing cumulative spend
- Throughput metrics tracked (requests/min, tokens/min)

## Error Handling
| Error | Cause | Solution |
|-------|-------|----------|
| API Error | Check error type and status code | See `clade-common-errors` |

## Examples
See Message Batches example, p-limit concurrency control, Tier Upgrades table, and Monitoring at Scale metrics tracking above.

## Resources
- [Rate Limits](https://docs.anthropic.com/en/api/rate-limits)
- [Message Batches](https://docs.anthropic.com/en/api/creating-message-batches)

## Next Steps
See `clade-reliability-patterns` for fault-tolerant high-scale patterns.

## Prerequisites
- Completed `clade-rate-limits` for understanding tier limits
- High-volume use case requiring more than basic tier throughput
- For batches: tolerance for async processing (24h SLA)

Related Skills

running-load-tests

from ComeOnOliver/skillshub

Create and execute load tests for performance validation using k6, JMeter, and Artillery. Use when validating application performance under load conditions or identifying bottlenecks. Trigger with phrases like "run load test", "create stress test", or "validate performance under load".

load-testing-apis

from ComeOnOliver/skillshub

Execute comprehensive load and stress testing to validate API performance and scalability. Use when validating API performance under load. Trigger with phrases like "load test the API", "stress test API", or "benchmark API performance".

load-test-scenario-planner

from ComeOnOliver/skillshub

Load Test Scenario Planner - Auto-activating skill for Performance Testing. Triggers on: load test scenario planner, load test scenario planner Part of the Performance Testing skill category.

testing-load-balancers

from ComeOnOliver/skillshub

This skill enables Claude to test load balancing strategies. It validates traffic distribution across backend servers, tests failover scenarios when servers become unavailable, verifies sticky sessions, and assesses health check functionality. Use this skill when the user asks to "test load balancer", "validate traffic distribution", "test failover", "verify sticky sessions", or "test health checks". It is specifically designed for testing load balancing configurations using the `load-balancer-tester` plugin.

configuring-load-balancers

from ComeOnOliver/skillshub

This skill configures load balancers, including ALB, NLB, Nginx, and HAProxy. It generates production-ready configurations based on specified requirements and infrastructure. Use this skill when the user asks to "configure load balancer", "create load balancer config", "generate nginx config", "setup HAProxy", or mentions specific load balancer types like "ALB" or "NLB". It's ideal for DevOps tasks, infrastructure automation, and generating load balancer configurations for different environments.

lazy-loading-implementer

from ComeOnOliver/skillshub

Lazy Loading Implementer - Auto-activating skill for Frontend Development. Triggers on: lazy loading implementer, lazy loading implementer Part of the Frontend Development skill category.

incremental-load-setup

from ComeOnOliver/skillshub

Incremental Load Setup - Auto-activating skill for Data Pipelines. Triggers on: incremental load setup, incremental load setup Part of the Data Pipelines skill category.

exa-load-scale

from ComeOnOliver/skillshub

Implement Exa load testing, capacity planning, and scaling strategies. Use when running performance tests, planning capacity for Exa integrations, or designing high-throughput search architectures. Trigger with phrases like "exa load test", "exa scale", "exa capacity", "exa k6", "exa benchmark", "exa throughput".

dataset-loader-creator

from ComeOnOliver/skillshub

Dataset Loader Creator - Auto-activating skill for ML Training. Triggers on: dataset loader creator, dataset loader creator Part of the ML Training skill category.

customerio-load-scale

from ComeOnOliver/skillshub

Implement Customer.io load testing and horizontal scaling. Use when preparing for high traffic, running load tests, or designing queue-based architectures for scale. Trigger: "customer.io load test", "customer.io scale", "customer.io high volume", "customer.io k6", "customer.io performance test".

clay-load-scale

from ComeOnOliver/skillshub

Scale Clay enrichment pipelines for high-volume processing (10K-100K+ leads/month). Use when planning capacity for large enrichment runs, optimizing batch processing, or designing high-volume Clay architectures. Trigger with phrases like "clay scale", "clay high volume", "clay large batch", "clay capacity planning", "clay 100k leads", "clay bulk enrichment".

clade-webhooks-events

from ComeOnOliver/skillshub

Use Anthropic Message Batches for async bulk processing and event handling. Use when working with webhooks-events patterns. Trigger with "anthropic batches", "claude batch api", "anthropic async", "bulk claude processing", "anthropic webhook".