ai-models
Latest AI models reference - Claude, OpenAI, Gemini, Eleven Labs, Replicate
Best use case
ai-models is best used when you need a repeatable AI agent workflow instead of a one-off prompt.
Latest AI models reference - Claude, OpenAI, Gemini, Eleven Labs, Replicate
Teams using ai-models should expect a more consistent output, faster repeated execution, less prompt rewriting.
When to use this skill
- You want a reusable workflow that can be run more than once with consistent structure.
When not to use this skill
- You only need a quick one-off answer and do not need a reusable workflow.
- You cannot install or maintain the underlying files, dependencies, or repository context.
Installation
Claude Code / Cursor / Codex
Manual Installation
- Download SKILL.md from GitHub
- Place it in
.claude/skills/ai-models/SKILL.mdinside your project - Restart your AI agent — it will auto-discover the skill
How ai-models Compares
| Feature / Agent | ai-models | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | Unknown | N/A |
Frequently Asked Questions
What does this skill do?
Latest AI models reference - Claude, OpenAI, Gemini, Eleven Labs, Replicate
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
SKILL.md Source
# AI Models Reference Skill
*Load with: base.md + llm-patterns.md*
**Last Updated: December 2025**
## Philosophy
**Use the right model for the job.** Bigger isn't always better - match model capabilities to task requirements. Consider cost, latency, and accuracy tradeoffs.
## Model Selection Matrix
| Task | Recommended | Why |
|------|-------------|-----|
| Complex reasoning | Claude Opus 4.5, o3, Gemini 3 Pro | Highest accuracy |
| Fast chat/completion | Claude Haiku, GPT-4.1 mini, Gemini Flash | Low latency, cheap |
| Code generation | Claude Sonnet 4.5, Codestral, GPT-4.1 | Strong coding |
| Vision/images | Claude Sonnet, GPT-4o, Gemini 3 Pro | Multimodal |
| Embeddings | text-embedding-3-small, Voyage | Cost-effective |
| Voice synthesis | Eleven Labs v3, OpenAI TTS | Natural sounding |
| Image generation | FLUX.2, DALL-E 3, SD 3.5 | Different styles |
---
## Anthropic (Claude)
### Documentation
- **API Docs**: https://docs.anthropic.com
- **Models Overview**: https://docs.anthropic.com/en/docs/about-claude/models/overview
- **Pricing**: https://www.anthropic.com/pricing
### Latest Models (December 2025)
```typescript
const CLAUDE_MODELS = {
// Flagship - highest capability
opus: 'claude-opus-4-5-20251101',
// Balanced - best for most tasks
sonnet: 'claude-sonnet-4-5-20250929',
// Previous generation (still excellent)
opus4: 'claude-opus-4-20250514',
sonnet4: 'claude-sonnet-4-20250514',
// Fast & cheap - high volume tasks
haiku: 'claude-haiku-3-5-20241022',
} as const;
```
### Usage
```typescript
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-5-20250929',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Hello, Claude!' }
],
});
```
### Model Selection
```
claude-opus-4-5-20251101 (Opus 4.5)
├── Best for: Complex analysis, research, nuanced writing
├── Context: 200K tokens
├── Cost: $5/$25 per 1M tokens (input/output)
└── Use when: Accuracy matters most
claude-sonnet-4-5-20250929 (Sonnet 4.5)
├── Best for: Code, general tasks, balanced performance
├── Context: 200K tokens
├── Cost: $3/$15 per 1M tokens
└── Use when: Default choice for most applications
claude-haiku-3-5-20241022 (Haiku 3.5)
├── Best for: Classification, extraction, high-volume
├── Context: 200K tokens
├── Cost: $0.25/$1.25 per 1M tokens
└── Use when: Speed and cost matter most
```
---
## OpenAI
### Documentation
- **API Docs**: https://platform.openai.com/docs
- **Models**: https://platform.openai.com/docs/models
- **Pricing**: https://openai.com/pricing
### Latest Models (December 2025)
```typescript
const OPENAI_MODELS = {
// GPT-5 series (latest)
gpt5: 'gpt-5.2',
gpt5Mini: 'gpt-5-mini',
// GPT-4.1 series (recommended for most)
gpt41: 'gpt-4.1',
gpt41Mini: 'gpt-4.1-mini',
gpt41Nano: 'gpt-4.1-nano',
// Reasoning models (o-series)
o3: 'o3',
o3Pro: 'o3-pro',
o4Mini: 'o4-mini',
// Legacy but still useful
gpt4o: 'gpt-4o', // Still has audio support
gpt4oMini: 'gpt-4o-mini',
// Embeddings
embeddingSmall: 'text-embedding-3-small',
embeddingLarge: 'text-embedding-3-large',
// Image generation
dalle3: 'dall-e-3',
gptImage: 'gpt-image-1',
// Audio
tts: 'tts-1',
ttsHd: 'tts-1-hd',
whisper: 'whisper-1',
} as const;
```
### Usage
```typescript
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
// Chat completion
const response = await openai.chat.completions.create({
model: 'gpt-4.1',
messages: [
{ role: 'user', content: 'Hello!' }
],
});
// With vision
const visionResponse = await openai.chat.completions.create({
model: 'gpt-4.1',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'What is in this image?' },
{ type: 'image_url', image_url: { url: 'https://...' } },
],
},
],
});
// Embeddings
const embedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'Your text here',
});
```
### Model Selection
```
o3 / o3-pro
├── Best for: Math, coding, complex multi-step reasoning
├── Context: 200K tokens
├── Cost: Premium pricing
└── Use when: Hardest problems, need chain-of-thought
gpt-4.1
├── Best for: General tasks, coding, instruction following
├── Context: 1M tokens (!)
├── Cost: Lower than GPT-4o
└── Use when: Default choice, replaces GPT-4o
gpt-4.1-mini / gpt-4.1-nano
├── Best for: High-volume, cost-sensitive
├── Context: 1M tokens
├── Cost: Very low
└── Use when: Simple tasks at scale
o4-mini
├── Best for: Fast reasoning at low cost
├── Context: 200K tokens
├── Cost: Budget reasoning
└── Use when: Need reasoning but cost-conscious
```
---
## Google (Gemini)
### Documentation
- **API Docs**: https://ai.google.dev/docs
- **Models**: https://ai.google.dev/gemini-api/docs/models/gemini
- **Pricing**: https://ai.google.dev/pricing
### Latest Models (December 2025)
```typescript
const GEMINI_MODELS = {
// Gemini 3 (Latest)
gemini3Pro: 'gemini-3-pro-preview',
gemini3ProImage: 'gemini-3-pro-image-preview',
gemini3Flash: 'gemini-3-flash-preview',
// Gemini 2.5 (Stable)
gemini25Pro: 'gemini-2.5-pro',
gemini25Flash: 'gemini-2.5-flash',
gemini25FlashLite: 'gemini-2.5-flash-lite',
// Specialized
gemini25FlashTTS: 'gemini-2.5-flash-preview-tts',
gemini25FlashAudio: 'gemini-2.5-flash-native-audio-preview-12-2025',
// Previous generation
gemini2Flash: 'gemini-2.0-flash',
} as const;
```
### Usage
```typescript
import { GoogleGenerativeAI } from '@google/generative-ai';
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY);
const model = genAI.getGenerativeModel({ model: 'gemini-2.5-flash' });
const result = await model.generateContent('Hello!');
const response = result.response.text();
// With vision
const visionModel = genAI.getGenerativeModel({ model: 'gemini-2.5-pro' });
const imagePart = {
inlineData: {
data: base64Image,
mimeType: 'image/jpeg',
},
};
const result = await visionModel.generateContent(['Describe this:', imagePart]);
```
### Model Selection
```
gemini-3-pro-preview
├── Best for: "Best model in the world for multimodal"
├── Context: 2M tokens
├── Cost: Premium
└── Use when: Need absolute best quality
gemini-2.5-pro
├── Best for: State-of-the-art thinking, complex tasks
├── Context: 2M tokens
├── Cost: $1.25/$5 per 1M tokens
└── Use when: Long context, complex reasoning
gemini-2.5-flash
├── Best for: Fast, balanced performance
├── Context: 1M tokens
├── Cost: $0.075/$0.30 per 1M tokens
└── Use when: Speed and cost matter
gemini-2.5-flash-lite
├── Best for: Ultra-fast, lowest cost
├── Context: 1M tokens
├── Cost: $0.04/$0.15 per 1M tokens
└── Use when: High volume, simple tasks
```
---
## Eleven Labs (Voice)
### Documentation
- **API Docs**: https://elevenlabs.io/docs
- **Models**: https://elevenlabs.io/docs/models
- **Pricing**: https://elevenlabs.io/pricing
### Latest Models (December 2025)
```typescript
const ELEVENLABS_MODELS = {
// Latest - highest quality (alpha)
v3: 'eleven_v3',
// Production ready
multilingualV2: 'eleven_multilingual_v2',
turboV2_5: 'eleven_turbo_v2_5',
// Ultra-low latency
flashV2_5: 'eleven_flash_v2_5',
flashV2: 'eleven_flash_v2', // English only
} as const;
```
### Usage
```typescript
import { ElevenLabsClient } from 'elevenlabs';
const elevenlabs = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY,
});
// Text to speech
const audio = await elevenlabs.textToSpeech.convert('voice-id', {
text: 'Hello, world!',
model_id: 'eleven_turbo_v2_5',
voice_settings: {
stability: 0.5,
similarity_boost: 0.75,
},
});
// Stream audio (for real-time)
const audioStream = await elevenlabs.textToSpeech.convertAsStream('voice-id', {
text: 'Streaming audio...',
model_id: 'eleven_flash_v2_5',
});
```
### Model Selection
```
eleven_v3 (Alpha)
├── Best for: Highest quality, emotional range
├── Latency: ~1s+ (not for real-time)
├── Languages: 74
└── Use when: Quality over speed, pre-rendered
eleven_turbo_v2_5
├── Best for: Balanced quality and speed
├── Latency: ~250-300ms
├── Languages: 32
└── Use when: Good quality with reasonable latency
eleven_flash_v2_5
├── Best for: Real-time, conversational AI
├── Latency: <75ms
├── Languages: 32
└── Use when: Live voice agents, chatbots
```
---
## Replicate
### Documentation
- **API Docs**: https://replicate.com/docs
- **Models**: https://replicate.com/explore
- **Pricing**: https://replicate.com/pricing
### Popular Models (December 2025)
```typescript
const REPLICATE_MODELS = {
// FLUX.2 (Latest - November 2025)
flux2Pro: 'black-forest-labs/flux-2-pro',
flux2Flex: 'black-forest-labs/flux-2-flex',
flux2Dev: 'black-forest-labs/flux-2-dev',
// FLUX.1 (Still excellent)
flux11Pro: 'black-forest-labs/flux-1.1-pro',
fluxKontext: 'black-forest-labs/flux-kontext', // Image editing
fluxSchnell: 'black-forest-labs/flux-schnell',
// Video
stableVideo4D: 'stability-ai/sv4d-2.0',
// Audio
musicgen: 'meta/musicgen',
// LLMs (if needed outside main providers)
llama: 'meta/llama-3.2-90b-vision',
} as const;
```
### Usage
```typescript
import Replicate from 'replicate';
const replicate = new Replicate({
auth: process.env.REPLICATE_API_TOKEN,
});
// Image generation with FLUX.2
const output = await replicate.run('black-forest-labs/flux-2-pro', {
input: {
prompt: 'A serene mountain landscape at sunset',
aspect_ratio: '16:9',
output_format: 'webp',
},
});
// Image editing with Kontext
const edited = await replicate.run('black-forest-labs/flux-kontext', {
input: {
image: 'https://...',
prompt: 'Change the sky to sunset colors',
},
});
```
### Model Selection
```
flux-2-pro
├── Best for: Highest quality, up to 4MP
├── Speed: ~6s
├── Cost: $0.015 + per megapixel
└── Use when: Professional quality needed
flux-2-flex
├── Best for: Fine details, typography
├── Speed: ~22s
├── Cost: $0.06 per megapixel
└── Use when: Need precise control
flux-2-dev (Open source)
├── Best for: Fast generation
├── Speed: ~2.5s
├── Cost: $0.012 per megapixel
└── Use when: Speed over quality
flux-kontext
├── Best for: Image editing with text
├── Speed: Variable
├── Cost: Per run
└── Use when: Edit existing images
```
---
## Stability AI
### Documentation
- **API Docs**: https://platform.stability.ai/docs/api-reference
- **Models**: https://stability.ai/stable-image
- **Pricing**: https://platform.stability.ai/pricing
### Latest Models (December 2025)
```typescript
const STABILITY_MODELS = {
// Image generation
sd35Large: 'sd3.5-large',
sd35LargeTurbo: 'sd3.5-large-turbo',
sd3Medium: 'sd3-medium',
// Video
sv4d: 'sv4d-2.0', // Stable Video 4D 2.0
// Upscaling
upscale: 'esrgan-v1-x2plus',
} as const;
```
### Usage
```typescript
const response = await fetch(
'https://api.stability.ai/v2beta/stable-image/generate/sd3',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${process.env.STABILITY_API_KEY}`,
},
body: JSON.stringify({
prompt: 'A futuristic city at night',
output_format: 'webp',
aspect_ratio: '16:9',
model: 'sd3.5-large',
}),
}
);
```
---
## Mistral AI
### Documentation
- **API Docs**: https://docs.mistral.ai
- **Models**: https://docs.mistral.ai/getting-started/models
- **Pricing**: https://mistral.ai/technology/#pricing
### Latest Models (December 2025)
```typescript
const MISTRAL_MODELS = {
// Flagship
large: 'mistral-large-latest', // Points to 2411
// Medium tier
medium: 'mistral-medium-2505', // Medium 3
// Small/Fast
small: 'mistral-small-2506', // Small 3.2
// Code specialized
codestral: 'codestral-2508',
devstral: 'devstral-medium-2507',
// Reasoning (Magistral)
magistralMedium: 'magistral-medium-2507',
magistralSmall: 'magistral-small-2507',
// Audio
voxtral: 'voxtral-small-2507',
// OCR
ocr: 'mistral-ocr-2505',
} as const;
```
### Usage
```typescript
import MistralClient from '@mistralai/mistralai';
const client = new MistralClient(process.env.MISTRAL_API_KEY);
const response = await client.chat({
model: 'mistral-large-latest',
messages: [{ role: 'user', content: 'Hello!' }],
});
// Code completion with Codestral
const codeResponse = await client.chat({
model: 'codestral-2508',
messages: [{ role: 'user', content: 'Write a Python function to...' }],
});
```
### Model Selection
```
mistral-large-latest (123B params)
├── Best for: Complex reasoning, knowledge tasks
├── Context: 128K tokens
└── Use when: Need high capability
codestral-2508
├── Best for: Code generation, 80+ languages
├── Speed: 2.5x faster than predecessor
└── Use when: Code-focused tasks
magistral-medium-2507
├── Best for: Multi-step reasoning
├── Specialty: Transparent chain-of-thought
└── Use when: Need reasoning traces
```
---
## Voyage AI (Embeddings)
### Documentation
- **API Docs**: https://docs.voyageai.com
- **Models**: https://docs.voyageai.com/docs/embeddings
- **Pricing**: https://www.voyageai.com/pricing
### Latest Models (December 2025)
```typescript
const VOYAGE_MODELS = {
// General purpose
large2: 'voyage-large-2',
large2Instruct: 'voyage-large-2-instruct',
// Code specialized
code2: 'voyage-code-2',
code3: 'voyage-code-3',
// Multilingual
multilingual2: 'voyage-multilingual-2',
// Domain specific
law2: 'voyage-law-2',
finance2: 'voyage-finance-2',
} as const;
```
### Usage
```typescript
const response = await fetch('https://api.voyageai.com/v1/embeddings', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${process.env.VOYAGE_API_KEY}`,
},
body: JSON.stringify({
model: 'voyage-code-3',
input: ['Your code to embed'],
}),
});
const { data } = await response.json();
const embedding = data[0].embedding;
```
---
## Quick Reference
### Cost Comparison (per 1M tokens, approx.)
| Provider | Cheap | Mid | Premium |
|----------|-------|-----|---------|
| Anthropic | $0.25 (Haiku) | $3 (Sonnet 4.5) | $5 (Opus 4.5) |
| OpenAI | $0.15 (4.1-nano) | $2 (4.1) | $15+ (o3) |
| Google | $0.04 (Flash-lite) | $0.08 (Flash) | $1.25 (Pro) |
| Mistral | $0.25 (Small) | $2.70 (Medium) | $8 (Large) |
### Best For Each Task
```
Reasoning/Analysis → Claude Opus 4.5, o3, Gemini 3 Pro
Code Generation → Claude Sonnet 4.5, Codestral 2508, GPT-4.1
Fast Responses → Claude Haiku, GPT-4.1-mini, Gemini Flash
Long Context → Gemini 2.5 Pro (2M), GPT-4.1 (1M), Claude (200K)
Vision → GPT-4.1, Claude Sonnet, Gemini 3 Pro
Embeddings → Voyage code-3, text-embedding-3-small
Voice Synthesis → Eleven Labs v3/flash, OpenAI TTS
Image Generation → FLUX.2 Pro, DALL-E 3, SD 3.5
Video Generation → Stable Video 4D 2.0, Runway
Image Editing → FLUX Kontext, gpt-image-1
```
### Environment Variables Template
```bash
# .env.example (NEVER commit actual keys)
# LLMs
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=AI...
MISTRAL_API_KEY=...
# Media
ELEVENLABS_API_KEY=...
REPLICATE_API_TOKEN=r8_...
STABILITY_API_KEY=sk-...
# Embeddings
VOYAGE_API_KEY=pa-...
```
### Model Update Checklist
```
When models update:
□ Check official changelog/blog
□ Update model ID strings
□ Test with existing prompts
□ Compare output quality
□ Check pricing changes
□ Update context limits if changed
```
---
## Sources
- [Anthropic Models](https://docs.anthropic.com/en/docs/about-claude/models/overview)
- [OpenAI Models](https://platform.openai.com/docs/models)
- [OpenAI o3 Announcement](https://openai.com/index/introducing-o3-and-o4-mini/)
- [GPT-4.1 Announcement](https://openai.com/index/gpt-4-1/)
- [Google Gemini Models](https://ai.google.dev/gemini-api/docs/models/gemini)
- [Eleven Labs Models](https://elevenlabs.io/docs/models)
- [Replicate FLUX.2](https://replicate.com/blog/run-flux-2-on-replicate)
- [Mistral Models](https://docs.mistral.ai/getting-started/models)
- [Voyage AI](https://docs.voyageai.com)Related Skills
ios-foundation-models-diag
Use when debugging Foundation Models issues — context exceeded, guardrail violations, slow generation, availability problems, unsupported language, or unexpected output. Systematic diagnostics with production crisis defense.
axiom-foundation-models
Use when implementing on-device AI with Apple's Foundation Models framework — prevents context overflow, blocking UI, wrong model use cases, and manual JSON parsing when @Generable should be used. iOS 26+, macOS 26+, iPadOS 26+, axiom-visionOS 26+
avalonia-viewmodels-zafiro
Optimal ViewModel and Wizard creation patterns for Avalonia using Zafiro and ReactiveUI.
defining-typescript-models
Defines standard TypeScript interfaces for Appwrite Collections. Use when creating new models for Tours, Users, or Bookings to ensure full type safety.
sqlmodel-task-models
This skill should be used when defining a robust, type-safe, and async-compatible database schema for the Todo application using SQLModel, ensuring compatibility with Better Auth and optimized for PostgreSQL.
pydantic-models-py
Create Pydantic models following the multi-model pattern with Base, Create, Update, Response, and InDB variants. Use when defining API request/response schemas, database models, or data validation ...
API Models
Your approach to handling API models. Use this skill when working on files where API models comes into play.
models-dev
Query AI model specifications, pricing, and capabilities from models.dev database. Use when users ask about AI model parameters (context window, token limits, cost per token), model comparisons, provider information, or need to look up specific model IDs for AI SDK integration. Triggers on queries like "What's the context window for GPT-4o?", "Compare Claude vs GPT", "How much does Gemini Pro cost?", "List OpenAI models", or "What models support tool calling?".
agent-selecting-models
Guidance for selecting appropriate AI model (sonnet vs haiku) based on task complexity, reasoning requirements, and performance needs. Use when implementing agents or justifying model selection.
adding-models
Guide for adding new LLM models to Letta Code. Use when the user wants to add support for a new model, needs to know valid model handles, or wants to update the model configuration. Covers models.json configuration, CI test matrix, and handle validation.
bgo
Automates the complete Blender build-go workflow, from building and packaging your extension/add-on to removing old versions, installing, enabling, and launching Blender for quick testing and iteration.
mcp-create-declarative-agent
Skill converted from mcp-create-declarative-agent.prompt.md