code-mode
This skill guides an AI agent to help users implement 'code mode' on their MCP servers, enabling LLMs to write processing scripts for large API responses within a sandboxed runtime, significantly reducing token usage.
About this skill
Code-mode is an innovative approach to dramatically cut down on LLM token consumption when interacting with verbose API responses. Instead of feeding the entire raw data payload into the LLM's context window, 'code mode' empowers the LLM to generate a concise processing script. This script is then executed securely within a sandboxed environment on the MCP (Multi-tool Coordination Protocol) server, and only the compact, processed output is returned to the LLM. This method drastically reduces the amount of data the LLM needs to process, making interactions with large datasets more efficient and cost-effective. This specific skill functions as an interactive development assistant. An AI agent, utilizing this skill, will guide a user step-by-step through the process of integrating a 'code mode' tool into their existing MCP server infrastructure. This involves collaboratively understanding the user's server setup (language, framework, specific tools returning large data), selecting the most appropriate and secure sandboxed runtime environment, planning the implementation details, and finally assisting with the actual coding and integration of the code mode tool and its script execution system. The primary benefit is unlocking more powerful and economical LLM interactions with data-rich APIs (like Kubernetes, GitHub, Stripe, or AWS). By allowing LLMs to focus on processed, relevant information, this skill helps overcome context window limitations and improves overall agent performance when dealing with complex data extraction and manipulation tasks.
Best use case
The primary use case is for developers, system architects, and AI practitioners managing MCP servers who need to optimize LLM interactions with vast API responses. This skill is invaluable for implementing a 'code mode' solution that drastically reduces token costs and enhances LLM efficiency by ensuring only relevant, processed data enters the context window. Those building sophisticated AI agents that frequently process large datasets from well-known APIs will find this skill particularly beneficial for improving performance and managing expenses.
This skill guides an AI agent to help users implement 'code mode' on their MCP servers, enabling LLMs to write processing scripts for large API responses within a sandboxed runtime, significantly reducing token usage.
The user will have successfully integrated a "code mode" capability into their MCP server, allowing LLMs to execute processing scripts for efficient context reduction.
Practical example
Example input
I want to add a 'code mode' to my MCP server to reduce token usage when my LLM interacts with large GitHub API responses. My server is written in Python using FastAPI. Can you help me plan and implement it?
Example output
Certainly! Let's start by understanding your current Python FastAPI MCP server setup. Could you describe which tools return large GitHub data and what kind of processing you typically envision for this data? We'll then look into suitable sandboxing options for Python.
When to use this skill
- When an MCP tool returns excessively large API responses that consume too many LLM tokens.
- To implement a secure, sandboxed environment for LLM-generated processing scripts on an MCP server.
- When aiming to significantly reduce LLM context window usage and improve performance for data extraction.
- To add a code execution tool that pre-processes raw data before feeding it to an LLM.
When not to use this skill
- For simple API responses that are already concise and don't require pre-processing.
- If you don't have an existing MCP server or a similar multi-tool agent setup to integrate with.
- When the LLM needs to analyze the *entire* raw data structure, not just a processed output.
- If you lack the technical expertise or permissions to modify your server environment.
How code-mode Compares
| Feature / Agent | code-mode | Standard Approach |
|---|---|---|
| Platform Support | Not specified | Limited / Varies |
| Context Awareness | High | Baseline |
| Installation Complexity | hard | N/A |
Frequently Asked Questions
What does this skill do?
This skill guides an AI agent to help users implement 'code mode' on their MCP servers, enabling LLMs to write processing scripts for large API responses within a sandboxed runtime, significantly reducing token usage.
How difficult is it to install?
The installation complexity is rated as hard. You can find the installation instructions above.
Where can I find the source code?
You can find the source code on GitHub using the link provided at the top of the page.
Related Guides
AI Agents for Coding
Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.
Cursor vs Codex for AI Workflows
Compare Cursor and Codex for AI coding workflows, repository assistance, debugging, refactoring, and reusable developer skills.
Top AI Agents for Productivity
See the top AI agent skills for productivity, workflow automation, operational systems, documentation, and everyday task execution.
SKILL.md Source
# Code Mode for MCP Servers
## What is Code Mode?
When an MCP tool returns a large API response (e.g. listing 500 Kubernetes pods,
200 SCIM users, or thousands of GitHub issues), that entire payload enters the
LLM's context window — consuming tokens and degrading performance.
Code mode flips the approach: instead of dumping raw data into context, the LLM
writes a small processing script. The MCP server runs the script in a **sandboxed
runtime** against the raw data, and only the script's stdout enters context.
This works especially well with well-known APIs (SCIM, Kubernetes, GitHub, Stripe,
Slack, AWS, etc.) because the LLM already knows the response schema from training
data — it can write the extraction script in one shot without inspecting the data.
**Typical results: 65–99% context reduction.**
### Inspiration
- [Cloudflare Code Mode](https://blog.cloudflare.com/code-mode-mcp/)
- [claude-context-mode](https://github.com/mksglu/claude-context-mode)
---
## How This Skill Works
This is an **interactive planning skill**. Work with the user step-by-step:
1. **Understand** their MCP server (language, framework, what tools return large data)
2. **Select** a sandbox that fits their server language and security needs
3. **Plan** the implementation together
4. **Implement** the code mode tool, sandbox executor, and benchmark
5. **Verify** with benchmarks comparing before/after context sizes
Do not jump ahead. Confirm each step with the user before proceeding.
---
## Step 1: Understand the Existing MCP Server
Ask the user (or discover by reading their codebase):
- **Server language**: TypeScript/JavaScript, Python, Go, Rust, or other?
- **MCP framework**: XMCP, FastMCP, mcp-go, custom, etc.?
- **Which tools return large responses?** (e.g. list users, get pods, search issues)
- **What APIs do they call?** Well-known APIs (SCIM, K8s, GitHub, Stripe) are ideal
candidates because the LLM already knows the schema.
- **What languages should the sandbox support for script execution?**
Usually JavaScript is sufficient. Python is a common second choice.
Summarize your understanding back to the user and confirm before moving on.
---
## Step 2: Select a Sandbox
The sandbox must be **isolated from the host filesystem and network by default**
and **secure by default**. Present the user with options that match their server
language, using the reference in `references/sandbox-options.md`.
### Quick Selection Guide
**If the server is TypeScript/JavaScript:**
| Sandbox | Script Language | Isolation | Size | Notes |
|---|---|---|---|---|
| `quickjs-emscripten` | JavaScript | WASM (no fs/net) | ~1MB | Lightweight, actively maintained, best default |
| `pyodide` | Python | WASM (no fs/net) | ~20MB | Full CPython in WASM, heavier |
| `isolated-vm` | JavaScript | V8 isolate (no fs/net) | ~5MB native | Fast, separate V8 heap, not WASM |
**If the server is Python:**
| Sandbox | Script Language | Isolation | Size | Notes |
|---|---|---|---|---|
| `RestrictedPython` | Python | AST-restricted compile | Tiny | Compiles to restricted bytecode, no I/O by default |
| `pyodide` (in-process WASM) | Python | WASM | ~20MB | Heavier but stronger isolation than RestrictedPython |
| `quickjs` (via `quickjs` PyPI) | JavaScript | WASM/native | Small | Run JS from Python |
**If the server is Go:**
| Sandbox | Script Language | Isolation | Size | Notes |
|---|---|---|---|---|
| `goja` | JavaScript | Pure Go interpreter | Zero CGO | No fs/net, widely used (used by Grafana) |
| `Wazero` | WASM guest (JS/Python compiled to WASM) | WASM runtime, pure Go | Zero CGO | Strongest isolation, runs any WASM module |
| `starlark-go` | Starlark (Python dialect) | Pure Go interpreter | Zero CGO | Deterministic, no I/O, used by Bazel |
**If the server is Rust:**
| Sandbox | Script Language | Isolation | Size | Notes |
|---|---|---|---|---|
| `boa_engine` | JavaScript | Pure Rust interpreter | No unsafe deps | ES2024 support, embeddable |
| `wasmtime` / `wasmer` | WASM guest | WASM runtime | Strong | Run any WASM module, strongest isolation |
| `deno_core` | JavaScript/TypeScript | V8-based | Larger | Full V8, powerful but heavier |
| `rustpython` | Python | Pure Rust interpreter | Moderate | Less mature but functional |
Read `references/sandbox-options.md` for detailed tradeoffs on each option.
**Present 2–3 options** to the user (filtered to their server language), explain
the tradeoffs briefly, and let them choose. If they're unsure, recommend the
lightest WASM-based option for their language.
---
## Step 3: Plan the Implementation
Once the sandbox is selected, create a concrete plan with the user. The plan
should cover these components:
### 3a. Code Mode Tool
A new MCP tool (e.g. `code_mode` or `<domain>_code_mode`) that accepts:
- **`command`** or **`args`**: The underlying API call / query to execute
(e.g. kubectl args, SCIM endpoint + params, GraphQL query)
- **`code`**: The processing script the LLM writes
- **`language`** (optional): Script language, defaults to `javascript`
The tool handler:
1. Executes the underlying API call (reusing existing logic)
2. Passes the raw response as a `DATA` variable into the sandbox
3. Runs the script in the sandbox
4. Returns only the script's stdout, plus a size measurement line:
`[code-mode: 18.0KB -> 6.2KB (65.5% reduction)]`
### 3b. Sandbox Executor
A utility module that:
- Initializes the chosen sandbox runtime
- Injects `DATA` (the raw API response as a string) into the sandbox
- Executes the user-provided script
- Captures stdout and returns it
- Enforces a timeout (e.g. 10 seconds)
- Handles errors gracefully (script syntax errors, runtime errors)
### 3c. Wiring
- Register the new tool in the MCP server's tool list
- Optionally gate behind an env var (ask the user if they want this)
### 3d. Benchmark
A benchmark script that compares tool output size vs. code-mode output size
across realistic scenarios. See `references/benchmark-pattern.md` for the
template.
**Present the plan to the user and confirm before implementing.**
---
## Step 4: Implement
Follow the confirmed plan. Implement in this order:
1. **Install the sandbox dependency** (e.g. `npm i quickjs-emscripten`)
2. **Create the executor module** — the sandbox wrapper
3. **Create the code mode tool** — the MCP tool handler
4. **Wire it into the server** — register the tool
5. **Create the benchmark script**
Keep the implementation minimal — don't over-abstract. The executor and tool
can each be a single file.
### Implementation Tips
- The `DATA` variable should always be a **string** (JSON-serialized). The
script is responsible for parsing it if needed (`JSON.parse(DATA)` in JS,
`json.loads(DATA)` in Python).
- Include the reduction measurement in every response so the user/LLM can
see the savings: `[code-mode: {before}KB -> {after}KB ({pct}% reduction)]`
- Set a reasonable default timeout (10s) and memory limit if the sandbox
supports it.
- Return clear error messages if the script fails — the LLM will use the
error to fix its script on the next call.
---
## Step 5: Benchmark and Verify
After implementation, run the benchmark to verify code mode actually reduces
context size. Read `references/benchmark-pattern.md` for the full template.
The benchmark should:
1. **Generate or fetch realistic test data** — use faker/mock data if no live
API is available, or hit a real endpoint if the user has one.
2. **Run each scenario through both paths:**
- Regular tool response (full JSON)
- Code mode with a representative extraction script
3. **Print a comparison table** showing before/after sizes and reduction %
4. **Print a total** across all scenarios
Present the benchmark results to the user. Typical expectations:
- Simple list extractions: 60–80% reduction
- Filtered queries (e.g. "only inactive users"): 90–99% reduction
- Aggregations (e.g. "count per department"): 95–99% reduction
---
## Reference Files
- `references/sandbox-options.md` — Detailed comparison of all sandbox options
by server language, with security analysis and setup instructions
- `references/benchmark-pattern.md` — Benchmark script template and methodologyRelated Skills
Legacy System Modernization Engine
Complete methodology for assessing, planning, and executing legacy system modernization — from monolith decomposition to cloud migration. Works for any tech stack, any scale.
linux-shell-scripting
Provide production-ready shell script templates for common Linux system administration tasks including backups, monitoring, user management, log analysis, and automation. These scripts serve as building blocks for security operations and penetration testing environments.
iterate-pr
Iterate on a PR until CI passes. Use when you need to fix CI failures, address review feedback, or continuously push fixes until all checks are green. Automates the feedback-fix-push-wait cycle.
istio-traffic-management
Comprehensive guide to Istio traffic management for production service mesh deployments.
incident-runbook-templates
Production-ready templates for incident response runbooks covering detection, triage, mitigation, resolution, and communication.
incident-response-smart-fix
[Extended thinking: This workflow implements a sophisticated debugging and resolution pipeline that leverages AI-assisted debugging tools and observability platforms to systematically diagnose and res
incident-responder
Expert SRE incident responder specializing in rapid problem resolution, modern observability, and comprehensive incident management.
expo-cicd-workflows
Helps understand and write EAS workflow YAML files for Expo projects. Use this skill when the user asks about CI/CD or workflows in an Expo or EAS context, mentions .eas/workflows/, or wants help with EAS build pipelines or deployment automation.
error-diagnostics-error-trace
You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging,
error-debugging-error-trace
You are an error tracking and observability expert specializing in implementing comprehensive error monitoring solutions. Set up error tracking systems, configure alerts, implement structured logging, and ensure teams can quickly identify and resolve production issues.
error-debugging-error-analysis
You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions.
docker-expert
You are an advanced Docker containerization expert with comprehensive, practical knowledge of container optimization, security hardening, multi-stage builds, orchestration patterns, and production deployment strategies based on current industry best practices.