Fireworks AI — Fast Open-Source Model Inference

## Overview

25 stars

Best use case

Fireworks AI — Fast Open-Source Model Inference is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

## Overview

Teams using Fireworks AI — Fast Open-Source Model Inference should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/fireworks-ai/SKILL.md --create-dirs "https://raw.githubusercontent.com/ComeOnOliver/skillshub/main/skills/TerminalSkills/skills/fireworks-ai/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/fireworks-ai/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How Fireworks AI — Fast Open-Source Model Inference Compares

Feature / AgentFireworks AI — Fast Open-Source Model InferenceStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

## Overview

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

SKILL.md Source

# Fireworks AI — Fast Open-Source Model Inference


## Overview


Fireworks AI, the platform for running open-source LLMs (Llama, Mixtral, Qwen, etc.) with enterprise-grade speed and reliability. Helps developers integrate Fireworks' inference API, fine-tune models, and deploy custom model endpoints with function calling and structured output support.


## Instructions

### Chat Completions

```typescript
// src/llm/fireworks.ts — Fireworks AI inference (OpenAI-compatible)
import OpenAI from "openai";

const fireworks = new OpenAI({
  apiKey: process.env.FIREWORKS_API_KEY!,
  baseURL: "https://api.fireworks.ai/inference/v1",
});

// Chat completion with open-source models
async function chat(prompt: string, model = "accounts/fireworks/models/llama-v3p3-70b-instruct") {
  const response = await fireworks.chat.completions.create({
    model,
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: prompt },
    ],
    temperature: 0.7,
    max_tokens: 1024,
  });
  return response.choices[0].message.content;
}

// Streaming
async function streamChat(prompt: string, onChunk: (text: string) => void) {
  const stream = await fireworks.chat.completions.create({
    model: "accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages: [{ role: "user", content: prompt }],
    stream: true,
  });
  let full = "";
  for await (const chunk of stream) {
    const text = chunk.choices[0]?.delta?.content ?? "";
    full += text;
    onChunk(text);
  }
  return full;
}
```

### Structured Output (JSON Mode & Grammar)

```typescript
// Force structured JSON output
async function extractData(text: string) {
  const response = await fireworks.chat.completions.create({
    model: "accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages: [
      {
        role: "system",
        content: `Extract product information. Return JSON: { "name": string, "price": number, "category": string, "features": string[] }`,
      },
      { role: "user", content: text },
    ],
    response_format: { type: "json_object" },
    temperature: 0,
  });
  return JSON.parse(response.choices[0].message.content!);
}

// Grammar-constrained generation (Fireworks-specific)
async function generateWithGrammar(prompt: string) {
  const response = await fetch("https://api.fireworks.ai/inference/v1/chat/completions", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.FIREWORKS_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "accounts/fireworks/models/llama-v3p3-70b-instruct",
      messages: [{ role: "user", content: prompt }],
      response_format: {
        type: "json_object",
        schema: {
          type: "object",
          properties: {
            sentiment: { type: "string", enum: ["positive", "negative", "neutral"] },
            confidence: { type: "number", minimum: 0, maximum: 1 },
            keywords: { type: "array", items: { type: "string" } },
          },
          required: ["sentiment", "confidence", "keywords"],
        },
      },
    }),
  });
  return response.json();
}
```

### Function Calling

```typescript
// Tool use with Fireworks
async function agentWithTools(prompt: string) {
  const response = await fireworks.chat.completions.create({
    model: "accounts/fireworks/models/firefunction-v2",  // Optimized for function calling
    messages: [{ role: "user", content: prompt }],
    tools: [
      {
        type: "function",
        function: {
          name: "search_database",
          description: "Search the product database",
          parameters: {
            type: "object",
            properties: {
              query: { type: "string" },
              category: { type: "string", enum: ["electronics", "clothing", "books"] },
              max_price: { type: "number" },
            },
            required: ["query"],
          },
        },
      },
    ],
    tool_choice: "auto",
  });
  return response;
}
```

### Fine-Tuning

```python
# fine_tune.py — Fine-tune a model on Fireworks
import requests

FIREWORKS_API_KEY = os.environ["FIREWORKS_API_KEY"]
BASE_URL = "https://api.fireworks.ai/inference/v1"

# Upload training data (JSONL format)
def upload_dataset(filepath: str):
    with open(filepath, "rb") as f:
        response = requests.post(
            f"{BASE_URL}/files",
            headers={"Authorization": f"Bearer {FIREWORKS_API_KEY}"},
            files={"file": (filepath, f, "application/jsonl")},
            data={"purpose": "fine-tune"},
        )
    return response.json()["id"]

# Start fine-tuning job
def create_fine_tune(dataset_id: str, base_model: str = "accounts/fireworks/models/llama-v3p1-8b-instruct"):
    response = requests.post(
        f"{BASE_URL}/fine_tuning/jobs",
        headers={
            "Authorization": f"Bearer {FIREWORKS_API_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "model": base_model,
            "training_file": dataset_id,
            "hyperparameters": {
                "n_epochs": 3,
                "learning_rate_multiplier": 1.0,
                "batch_size": 8,
            },
        },
    )
    return response.json()

# Training data format (JSONL):
# {"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
```

### Available Models

```markdown
## Popular Models on Fireworks
- **llama-v3p3-70b-instruct** — Best open-source general-purpose model
- **llama-v3p1-8b-instruct** — Fast, cheap, good for simple tasks
- **mixtral-8x22b-instruct** — Strong multilingual, large context
- **qwen2p5-72b-instruct** — Excellent for coding and math
- **firefunction-v2** — Optimized for function calling / tool use
- **deepseek-v3** — Strong reasoning and code generation
- **gemma-2-27b-it** — Google's compact model
```

## Installation

```bash
# Use any OpenAI-compatible SDK
npm install openai
# Set baseURL to https://api.fireworks.ai/inference/v1

pip install openai
# Set base_url to https://api.fireworks.ai/inference/v1
```


## Examples


### Example 1: Integrating Fireworks Ai into an existing application

**User request:**

```
Add Fireworks Ai to my Next.js app for the AI chat feature. I want streaming responses.
```

The agent installs the SDK, creates an API route that initializes the Fireworks Ai client, configures streaming, selects an appropriate model, and wires up the frontend to consume the stream. It handles error cases and sets up proper environment variable management for the API key.

### Example 2: Optimizing structured output performance

**User request:**

```
My Fireworks Ai calls are slow and expensive. Help me optimize the setup.
```

The agent reviews the current implementation, identifies issues (wrong model selection, missing caching, inefficient prompting, no batching), and applies optimizations specific to Fireworks Ai's capabilities — adjusting model parameters, adding response caching, and implementing retry logic with exponential backoff.


## Guidelines

1. **OpenAI SDK compatibility** — Use the standard OpenAI SDK with a different base URL; zero code changes to switch
2. **firefunction-v2 for tools** — Use the function-calling-optimized model for reliable tool use
3. **JSON schema for structure** — Fireworks supports JSON schema constraints; use them for reliable structured output
4. **Fine-tune 8B for cost** — Fine-tune Llama 3.1 8B for domain-specific tasks; cheaper and faster than using 70B
5. **Batch API for throughput** — Use Fireworks' batch API for bulk processing at lower cost
6. **Model routing** — Use 8B for simple tasks, 70B for complex reasoning; route based on query complexity
7. **Serverless vs dedicated** — Start with serverless; switch to dedicated endpoints for consistent latency at scale
8. **Monitor token usage** — Fireworks pricing is per-token; track usage per feature to optimize costs

Related Skills

opensource-guide-coach

25
from ComeOnOliver/skillshub

Use when a user wants guidance on starting, contributing to, growing, governing, funding, securing, or sustaining an open source project, or asks about contributor onboarding, community health, maintainer burnout, code of conduct, metrics, legal basics, or open source project adoption.

openclaw-secure-linux-cloud

25
from ComeOnOliver/skillshub

Use when self-hosting OpenClaw on a cloud server, hardening a remote OpenClaw gateway, choosing between SSH tunneling, Tailscale, or reverse-proxy exposure, or reviewing Podman, pairing, sandboxing, token auth, and tool-permission defaults for a secure personal deployment.

opencontext

25
from ComeOnOliver/skillshub

Persistent memory and context management for AI agents using OpenContext. Keep context across sessions/repos/dates, store conclusions, and provide document search workflows.

threat-modeling-expert

25
from ComeOnOliver/skillshub

Expert in threat modeling methodologies, security architecture review, and risk assessment. Masters STRIDE, PASTA, attack trees, and security requirement extraction. Use for security architecture reviews, threat identification, and secure-by-design planning.

startup-financial-modeling

25
from ComeOnOliver/skillshub

This skill should be used when the user asks to "create financial projections", "build a financial model", "forecast revenue", "calculate burn rate", "estimate runway", "model cash flow", or requests 3-5 year financial planning for a startup.

python-fastapi-development

25
from ComeOnOliver/skillshub

Python FastAPI backend development with async patterns, SQLAlchemy, Pydantic, authentication, and production API patterns.

pydantic-models-py

25
from ComeOnOliver/skillshub

Create Pydantic models following the multi-model pattern with Base, Create, Update, Response, and InDB variants. Use when defining API request/response schemas, database models, or data validation in Python applications using Pydantic v2.

fastapi-templates

25
from ComeOnOliver/skillshub

Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.

fastapi-router-py

25
from ComeOnOliver/skillshub

Create FastAPI routers with CRUD operations, authentication dependencies, and proper response models. Use when building REST API endpoints, creating new routes, implementing CRUD operations, or adding authenticated endpoints in FastAPI applications.

fastapi-pro

25
from ComeOnOliver/skillshub

Build high-performance async APIs with FastAPI, SQLAlchemy 2.0, and Pydantic V2. Master microservices, WebSockets, and modern Python async patterns. Use PROACTIVELY for FastAPI development, async optimization, or API architecture.

azure-resource-manager-sql-dotnet

25
from ComeOnOliver/skillshub

Azure Resource Manager SDK for Azure SQL in .NET. Use for MANAGEMENT PLANE operations: creating/managing SQL servers, databases, elastic pools, firewall rules, and failover groups via Azure Resource Manager. NOT for data plane operations (executing queries) - use Microsoft.Data.SqlClient for that. Triggers: "SQL server", "create SQL database", "manage SQL resources", "ARM SQL", "SqlServerResource", "provision Azure SQL", "elastic pool", "firewall rule".

azure-resource-manager-redis-dotnet

25
from ComeOnOliver/skillshub

Azure Resource Manager SDK for Redis in .NET. Use for MANAGEMENT PLANE operations: creating/managing Azure Cache for Redis instances, firewall rules, access keys, patch schedules, linked servers (geo-replication), and private endpoints via Azure Resource Manager. NOT for data plane operations (get/set keys, pub/sub) - use StackExchange.Redis for that. Triggers: "Redis cache", "create Redis", "manage Redis", "ARM Redis", "RedisResource", "provision Redis", "Azure Cache for Redis".