code-mode

This skill guides an AI agent to help users implement 'code mode' on their MCP servers, enabling LLMs to write processing scripts for large API responses within a sandboxed runtime, significantly reducing token usage.

20 stars

bychenhunghan

Complexity: hard

View on GitHub Installation ↓

About this skill

Code-mode is an innovative approach to dramatically cut down on LLM token consumption when interacting with verbose API responses. Instead of feeding the entire raw data payload into the LLM's context window, 'code mode' empowers the LLM to generate a concise processing script. This script is then executed securely within a sandboxed environment on the MCP (Multi-tool Coordination Protocol) server, and only the compact, processed output is returned to the LLM. This method drastically reduces the amount of data the LLM needs to process, making interactions with large datasets more efficient and cost-effective. This specific skill functions as an interactive development assistant. An AI agent, utilizing this skill, will guide a user step-by-step through the process of integrating a 'code mode' tool into their existing MCP server infrastructure. This involves collaboratively understanding the user's server setup (language, framework, specific tools returning large data), selecting the most appropriate and secure sandboxed runtime environment, planning the implementation details, and finally assisting with the actual coding and integration of the code mode tool and its script execution system. The primary benefit is unlocking more powerful and economical LLM interactions with data-rich APIs (like Kubernetes, GitHub, Stripe, or AWS). By allowing LLMs to focus on processed, relevant information, this skill helps overcome context window limitations and improves overall agent performance when dealing with complex data extraction and manipulation tasks.

Best use case

The primary use case is for developers, system architects, and AI practitioners managing MCP servers who need to optimize LLM interactions with vast API responses. This skill is invaluable for implementing a 'code mode' solution that drastically reduces token costs and enhances LLM efficiency by ensuring only relevant, processed data enters the context window. Those building sophisticated AI agents that frequently process large datasets from well-known APIs will find this skill particularly beneficial for improving performance and managing expenses.

The user will have successfully integrated a "code mode" capability into their MCP server, allowing LLMs to execute processing scripts for efficient context reduction.

Practical example

Example input

I want to add a 'code mode' to my MCP server to reduce token usage when my LLM interacts with large GitHub API responses. My server is written in Python using FastAPI. Can you help me plan and implement it?

Example output

Certainly! Let's start by understanding your current Python FastAPI MCP server setup. Could you describe which tools return large GitHub data and what kind of processing you typically envision for this data? We'll then look into suitable sandboxing options for Python.

When to use this skill

When an MCP tool returns excessively large API responses that consume too many LLM tokens.
To implement a secure, sandboxed environment for LLM-generated processing scripts on an MCP server.
When aiming to significantly reduce LLM context window usage and improve performance for data extraction.
To add a code execution tool that pre-processes raw data before feeding it to an LLM.

When not to use this skill

For simple API responses that are already concise and don't require pre-processing.
If you don't have an existing MCP server or a similar multi-tool agent setup to integrate with.
When the LLM needs to analyze the *entire* raw data structure, not just a processed output.
If you lack the technical expertise or permissions to modify your server environment.

How code-mode Compares

Feature / Agent	code-mode	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	hard	N/A

Frequently Asked Questions

What does this skill do?

How difficult is it to install?

The installation complexity is rated as hard. You can find the installation instructions above.

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Cursor vs Codex for AI Workflows

Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.

Top AI Agents for Productivity

See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.

SKILL.md Source

# Code Mode for MCP Servers

## What is Code Mode?

When an MCP tool returns a large API response (e.g. listing 500 Kubernetes pods,
200 SCIM users, or thousands of GitHub issues), that entire payload enters the
LLM's context window — consuming tokens and degrading performance.

Code mode flips the approach: instead of dumping raw data into context, the LLM
writes a small processing script. The MCP server runs the script in a **sandboxed
runtime** against the raw data, and only the script's stdout enters context.

This works especially well with well-known APIs (SCIM, Kubernetes, GitHub, Stripe,
Slack, AWS, etc.) because the LLM already knows the response schema from training
data — it can write the extraction script in one shot without inspecting the data.

**Typical results: 65–99% context reduction.**

### Inspiration

- [Cloudflare Code Mode](https://blog.cloudflare.com/code-mode-mcp/)
- [claude-context-mode](https://github.com/mksglu/claude-context-mode)

---

## How This Skill Works

This is an **interactive planning skill**. Work with the user step-by-step:

1. **Understand** their MCP server (language, framework, what tools return large data)
2. **Select** a sandbox that fits their server language and security needs
3. **Plan** the implementation together
4. **Implement** the code mode tool, sandbox executor, and benchmark
5. **Verify** with benchmarks comparing before/after context sizes

Do not jump ahead. Confirm each step with the user before proceeding.

---

## Step 1: Understand the Existing MCP Server

Ask the user (or discover by reading their codebase):

- **Server language**: TypeScript/JavaScript, Python, Go, Rust, or other?
- **MCP framework**: XMCP, FastMCP, mcp-go, custom, etc.?
- **Which tools return large responses?** (e.g. list users, get pods, search issues)
- **What APIs do they call?** Well-known APIs (SCIM, K8s, GitHub, Stripe) are ideal
  candidates because the LLM already knows the schema.
- **What languages should the sandbox support for script execution?**
  Usually JavaScript is sufficient. Python is a common second choice.

Summarize your understanding back to the user and confirm before moving on.

---

## Step 2: Select a Sandbox

The sandbox must be **isolated from the host filesystem and network by default**
and **secure by default**. Present the user with options that match their server
language, using the reference in `references/sandbox-options.md`.

### Quick Selection Guide

**If the server is TypeScript/JavaScript:**

| Sandbox | Script Language | Isolation | Size | Notes |
|---|---|---|---|---|
| `quickjs-emscripten` | JavaScript | WASM (no fs/net) | ~1MB | Lightweight, actively maintained, best default |
| `pyodide` | Python | WASM (no fs/net) | ~20MB | Full CPython in WASM, heavier |
| `isolated-vm` | JavaScript | V8 isolate (no fs/net) | ~5MB native | Fast, separate V8 heap, not WASM |

**If the server is Python:**

| Sandbox | Script Language | Isolation | Size | Notes |
|---|---|---|---|---|
| `RestrictedPython` | Python | AST-restricted compile | Tiny | Compiles to restricted bytecode, no I/O by default |
| `pyodide` (in-process WASM) | Python | WASM | ~20MB | Heavier but stronger isolation than RestrictedPython |
| `quickjs` (via `quickjs` PyPI) | JavaScript | WASM/native | Small | Run JS from Python |

**If the server is Go:**

| Sandbox | Script Language | Isolation | Size | Notes |
|---|---|---|---|---|
| `goja` | JavaScript | Pure Go interpreter | Zero CGO | No fs/net, widely used (used by Grafana) |
| `Wazero` | WASM guest (JS/Python compiled to WASM) | WASM runtime, pure Go | Zero CGO | Strongest isolation, runs any WASM module |
| `starlark-go` | Starlark (Python dialect) | Pure Go interpreter | Zero CGO | Deterministic, no I/O, used by Bazel |

**If the server is Rust:**

| Sandbox | Script Language | Isolation | Size | Notes |
|---|---|---|---|---|
| `boa_engine` | JavaScript | Pure Rust interpreter | No unsafe deps | ES2024 support, embeddable |
| `wasmtime` / `wasmer` | WASM guest | WASM runtime | Strong | Run any WASM module, strongest isolation |
| `deno_core` | JavaScript/TypeScript | V8-based | Larger | Full V8, powerful but heavier |
| `rustpython` | Python | Pure Rust interpreter | Moderate | Less mature but functional |

Read `references/sandbox-options.md` for detailed tradeoffs on each option.

**Present 2–3 options** to the user (filtered to their server language), explain
the tradeoffs briefly, and let them choose. If they're unsure, recommend the
lightest WASM-based option for their language.

---

## Step 3: Plan the Implementation

Once the sandbox is selected, create a concrete plan with the user. The plan
should cover these components:

### 3a. Code Mode Tool

A new MCP tool (e.g. `code_mode` or `<domain>_code_mode`) that accepts:

- **`command`** or **`args`**: The underlying API call / query to execute
  (e.g. kubectl args, SCIM endpoint + params, GraphQL query)
- **`code`**: The processing script the LLM writes
- **`language`** (optional): Script language, defaults to `javascript`

The tool handler:
1. Executes the underlying API call (reusing existing logic)
2. Passes the raw response as a `DATA` variable into the sandbox
3. Runs the script in the sandbox
4. Returns only the script's stdout, plus a size measurement line:
   `[code-mode: 18.0KB -> 6.2KB (65.5% reduction)]`

### 3b. Sandbox Executor

A utility module that:
- Initializes the chosen sandbox runtime
- Injects `DATA` (the raw API response as a string) into the sandbox
- Executes the user-provided script
- Captures stdout and returns it
- Enforces a timeout (e.g. 10 seconds)
- Handles errors gracefully (script syntax errors, runtime errors)

### 3c. Wiring

- Register the new tool in the MCP server's tool list
- Optionally gate behind an env var (ask the user if they want this)

### 3d. Benchmark

A benchmark script that compares tool output size vs. code-mode output size
across realistic scenarios. See `references/benchmark-pattern.md` for the
template.

**Present the plan to the user and confirm before implementing.**

---

## Step 4: Implement

Follow the confirmed plan. Implement in this order:

1. **Install the sandbox dependency** (e.g. `npm i quickjs-emscripten`)
2. **Create the executor module** — the sandbox wrapper
3. **Create the code mode tool** — the MCP tool handler
4. **Wire it into the server** — register the tool
5. **Create the benchmark script**

Keep the implementation minimal — don't over-abstract. The executor and tool
can each be a single file.

### Implementation Tips

- The `DATA` variable should always be a **string** (JSON-serialized). The
  script is responsible for parsing it if needed (`JSON.parse(DATA)` in JS,
  `json.loads(DATA)` in Python).
- Include the reduction measurement in every response so the user/LLM can
  see the savings: `[code-mode: {before}KB -> {after}KB ({pct}% reduction)]`
- Set a reasonable default timeout (10s) and memory limit if the sandbox
  supports it.
- Return clear error messages if the script fails — the LLM will use the
  error to fix its script on the next call.

---

## Step 5: Benchmark and Verify

After implementation, run the benchmark to verify code mode actually reduces
context size. Read `references/benchmark-pattern.md` for the full template.

The benchmark should:

1. **Generate or fetch realistic test data** — use faker/mock data if no live
   API is available, or hit a real endpoint if the user has one.
2. **Run each scenario through both paths:**
   - Regular tool response (full JSON)
   - Code mode with a representative extraction script
3. **Print a comparison table** showing before/after sizes and reduction %
4. **Print a total** across all scenarios

Present the benchmark results to the user. Typical expectations:
- Simple list extractions: 60–80% reduction
- Filtered queries (e.g. "only inactive users"): 90–99% reduction
- Aggregations (e.g. "count per department"): 95–99% reduction

---

## Reference Files

- `references/sandbox-options.md` — Detailed comparison of all sandbox options
  by server language, with security analysis and setup instructions
- `references/benchmark-pattern.md` — Benchmark script template and methodology

error-debugging-error-analysis

31392

from sickn33/antigravity-awesome-skills

You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions.

DevOps & InfrastructureClaude

docker-expert

31392

from sickn33/antigravity-awesome-skills

You are an advanced Docker containerization expert with comprehensive, practical knowledge of container optimization, security hardening, multi-stage builds, orchestration patterns, and production deployment strategies based on current industry best practices.

DevOps & InfrastructureClaude