together-local-dev-loop

Together AI local dev loop for inference, fine-tuning, and model deployment. Use when working with Together AI's OpenAI-compatible API. Trigger: "together local dev loop".

1,868 stars

Best use case

together-local-dev-loop is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Together AI local dev loop for inference, fine-tuning, and model deployment. Use when working with Together AI's OpenAI-compatible API. Trigger: "together local dev loop".

Teams using together-local-dev-loop should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

  • You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

  • You only need a quick one-off answer and do not need a reusable workflow.
  • You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/together-local-dev-loop/SKILL.md --create-dirs "https://raw.githubusercontent.com/jeremylongshore/claude-code-plugins-plus-skills/main/plugins/saas-packs/together-pack/skills/together-local-dev-loop/SKILL.md"

Manual Installation

  1. Download SKILL.md from GitHub
  2. Place it in .claude/skills/together-local-dev-loop/SKILL.md inside your project
  3. Restart your AI agent — it will auto-discover the skill

How together-local-dev-loop Compares

Feature / Agenttogether-local-dev-loopStandard Approach
Platform SupportNot specifiedLimited / Varies
Context Awareness High Baseline
Installation ComplexityUnknownN/A

Frequently Asked Questions

What does this skill do?

Together AI local dev loop for inference, fine-tuning, and model deployment. Use when working with Together AI's OpenAI-compatible API. Trigger: "together local dev loop".

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

SKILL.md Source

# Together AI Local Dev Loop

## Overview
Local development workflow for Together AI inference API integration. Provides a fast feedback loop with mock chat completions, embeddings, and model listing endpoints so you can build AI-powered applications without consuming live API credits. Together AI is OpenAI-compatible, so the same client libraries work with both. Toggle between mock mode for rapid iteration and live mode for model evaluation.

## Environment Setup
```bash
cp .env.example .env
# Set your credentials:
# TOGETHER_API_KEY=tog_xxxxxxxxxxxx
# TOGETHER_BASE_URL=https://api.together.xyz/v1
# MOCK_MODE=true
npm install express axios dotenv tsx typescript @types/node
npm install -D vitest supertest @types/express
# Or for Python: pip install together openai httpx pytest
```

## Dev Server
```typescript
// src/dev/server.ts
import express from "express";
import { createProxyMiddleware } from "http-proxy-middleware";
const app = express();
app.use(express.json());
const MOCK = process.env.MOCK_MODE === "true";
if (!MOCK) {
  app.use("/v1", createProxyMiddleware({
    target: process.env.TOGETHER_BASE_URL,
    changeOrigin: true,
    headers: { Authorization: `Bearer ${process.env.TOGETHER_API_KEY}` },
  }));
} else {
  const { mountMockRoutes } = require("./mocks");
  mountMockRoutes(app);
}
app.listen(3009, () => console.log(`Together dev server on :3009 [mock=${MOCK}]`));
```

## Mock Mode
```typescript
// src/dev/mocks.ts — OpenAI-compatible mock responses for inference
export function mountMockRoutes(app: any) {
  app.post("/v1/chat/completions", (req: any, res: any) => res.json({
    id: "chatcmpl-mock-001", object: "chat.completion", model: req.body.model || "meta-llama/Llama-3-70b-chat-hf",
    choices: [{ index: 0, message: { role: "assistant", content: "This is a mock response from Together AI." }, finish_reason: "stop" }],
    usage: { prompt_tokens: 25, completion_tokens: 12, total_tokens: 37 },
  }));
  app.post("/v1/embeddings", (req: any, res: any) => res.json({
    object: "list", model: req.body.model || "togethercomputer/m2-bert-80M-8k-retrieval",
    data: [{ object: "embedding", index: 0, embedding: Array(768).fill(0).map(() => Math.random() * 2 - 1) }],
  }));
  app.get("/v1/models", (_req: any, res: any) => res.json({
    data: [
      { id: "meta-llama/Llama-3-70b-chat-hf", type: "chat", context_length: 8192 },
      { id: "mistralai/Mixtral-8x22B-Instruct-v0.1", type: "chat", context_length: 65536 },
      { id: "togethercomputer/m2-bert-80M-8k-retrieval", type: "embedding", context_length: 8192 },
    ],
  }));
}
```

## Testing Workflow
```bash
npm run dev:mock &                    # Start mock server in background
npm run test                          # Unit tests with vitest
npm run test -- --watch               # Watch mode for rapid iteration
MOCK_MODE=false npm run test:integration  # Integration test against real API
```

## Debug Tips
- Together AI is OpenAI-compatible — set `base_url` to `http://localhost:3009/v1` for local dev
- Use `/v1/models` to discover available model IDs instead of hardcoding them
- Monitor `usage.total_tokens` in responses to estimate costs before switching to live mode
- Batch inference (`/v1/batch`) runs at 50% cost but is async — poll for completion
- Set `max_tokens` explicitly to avoid unexpectedly large responses and costs

## Error Handling
| Issue | Cause | Fix |
|-------|-------|-----|
| `401 Unauthorized` | Invalid API key | Regenerate at api.together.xyz dashboard |
| `404 Model not found` | Wrong model ID string | Use `client.models.list()` to verify |
| `429 Rate Limited` | Too many requests per minute | Implement exponential backoff |
| `500 Server Error` | Model overloaded or cold start | Retry with backoff after 5s |
| `ECONNREFUSED :3009` | Dev server not running | Run `npm run dev:mock` first |

## Resources
- [Together AI Docs](https://docs.together.ai/)
- [API Reference](https://docs.together.ai/reference/chat-completions-1)
- [Model List](https://docs.together.ai/docs/inference-models)

## Next Steps
See `together-debug-bundle`.

Related Skills

workhuman-local-dev-loop

1868
from jeremylongshore/claude-code-plugins-plus-skills

Workhuman local dev loop for employee recognition and rewards API. Use when integrating Workhuman Social Recognition, or building recognition workflows with HRIS systems. Trigger: "workhuman local dev loop".

wispr-local-dev-loop

1868
from jeremylongshore/claude-code-plugins-plus-skills

Wispr Flow local dev loop for voice-to-text API integration. Use when integrating Wispr Flow dictation, WebSocket streaming, or building voice-powered applications. Trigger: "wispr local dev loop".

windsurf-local-dev-loop

1868
from jeremylongshore/claude-code-plugins-plus-skills

Configure Windsurf local development workflow with Cascade, Previews, and terminal integration. Use when setting up a development environment, configuring Turbo mode, or establishing a fast iteration cycle with Windsurf AI. Trigger with phrases like "windsurf dev setup", "windsurf local development", "windsurf dev environment", "windsurf workflow", "develop with windsurf".

webflow-local-dev-loop

1868
from jeremylongshore/claude-code-plugins-plus-skills

Configure a Webflow local development workflow with TypeScript, hot reload, mocked API tests, and webhook tunneling via ngrok. Use when setting up a development environment, configuring test workflows, or establishing a fast iteration cycle with the Webflow Data API. Trigger with phrases like "webflow dev setup", "webflow local development", "webflow dev environment", "develop with webflow".

vercel-local-dev-loop

1868
from jeremylongshore/claude-code-plugins-plus-skills

Configure Vercel local development with vercel dev, environment variables, and hot reload. Use when setting up a development environment, testing serverless functions locally, or establishing a fast iteration cycle with Vercel. Trigger with phrases like "vercel dev setup", "vercel local development", "vercel dev environment", "develop with vercel locally".

veeva-local-dev-loop

1868
from jeremylongshore/claude-code-plugins-plus-skills

Veeva Vault local dev loop for REST API and clinical operations. Use when working with Veeva Vault document management and CRM. Trigger: "veeva local dev loop".

vastai-local-dev-loop

1868
from jeremylongshore/claude-code-plugins-plus-skills

Configure Vast.ai local development with testing and fast iteration. Use when setting up a development environment, testing instance provisioning, or building a fast iteration cycle for GPU workloads. Trigger with phrases like "vastai dev setup", "vastai local development", "vastai dev environment", "develop with vastai".

twinmind-local-dev-loop

1868
from jeremylongshore/claude-code-plugins-plus-skills

Set up local development workflow with TwinMind API integration. Use when building applications that integrate TwinMind transcription, testing API calls locally, or developing meeting automation tools. Trigger with phrases like "twinmind dev setup", "twinmind local development", "twinmind API testing", "build with twinmind".

together-webhooks-events

1868
from jeremylongshore/claude-code-plugins-plus-skills

Together AI webhooks events for inference, fine-tuning, and model deployment. Use when working with Together AI's OpenAI-compatible API. Trigger: "together webhooks events".

together-upgrade-migration

1868
from jeremylongshore/claude-code-plugins-plus-skills

Together AI upgrade migration for inference, fine-tuning, and model deployment. Use when working with Together AI's OpenAI-compatible API. Trigger: "together upgrade migration".

together-security-basics

1868
from jeremylongshore/claude-code-plugins-plus-skills

Together AI security basics for inference, fine-tuning, and model deployment. Use when working with Together AI's OpenAI-compatible API. Trigger: "together security basics".

together-sdk-patterns

1868
from jeremylongshore/claude-code-plugins-plus-skills

Together AI sdk patterns for inference, fine-tuning, and model deployment. Use when working with Together AI's OpenAI-compatible API. Trigger: "together sdk patterns".