mcp-builder

Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).

2,707 stars

bydavepoon

View on GitHub Installation ↓

Best use case

mcp-builder is best used when you need a repeatable AI agent workflow instead of a one-off prompt.

Teams using mcp-builder should expect a more consistent output, faster repeated execution, less prompt rewriting.

When to use this skill

You want a reusable workflow that can be run more than once with consistent structure.

When not to use this skill

You only need a quick one-off answer and do not need a reusable workflow.
You cannot install or maintain the underlying files, dependencies, or repository context.

Installation

Claude Code / Cursor / Codex

$curl -o ~/.claude/skills/mcp-builder/SKILL.md --create-dirs "https://raw.githubusercontent.com/davepoon/buildwithclaude/main/plugins/all-skills/skills/mcp-builder/SKILL.md"

Manual Installation

Download SKILL.md from GitHub
Place it in .claude/skills/mcp-builder/SKILL.md inside your project
Restart your AI agent — it will auto-discover the skill

How mcp-builder Compares

Feature / Agent	mcp-builder	Standard Approach
Platform Support	Not specified	Limited / Varies
Context Awareness	High	Baseline
Installation Complexity	Unknown	N/A

Frequently Asked Questions

What does this skill do?

Where can I find the source code?

You can find the source code on GitHub using the link provided at the top of the page.

Related Guides

AI Agents for Coding

Browse AI agent skills for coding, debugging, testing, refactoring, code review, and developer workflows across Claude, Cursor, and Codex.

Best AI Skills for Claude

Explore the best AI skills for Claude and Claude Code across coding, research, workflow automation, documentation, and agent operations.

AI Agent for YouTube Script Writing

Find AI agent skills for YouTube script writing, video research, content outlining, and repeatable channel production workflows.

SKILL.md Source

# MCP Server Development Guide

## Overview

To create high-quality MCP (Model Context Protocol) servers that enable LLMs to effectively interact with external services, use this skill. An MCP server provides tools that allow LLMs to access external services and APIs. The quality of an MCP server is measured by how well it enables LLMs to accomplish real-world tasks using the tools provided.

---

# Process

## 🚀 High-Level Workflow

Creating a high-quality MCP server involves four main phases:

### Phase 1: Deep Research and Planning

#### 1.1 Understand Agent-Centric Design Principles

Before diving into implementation, understand how to design tools for AI agents by reviewing these principles:

**Build for Workflows, Not Just API Endpoints:**
- Don't simply wrap existing API endpoints - build thoughtful, high-impact workflow tools
- Consolidate related operations (e.g., `schedule_event` that both checks availability and creates event)
- Focus on tools that enable complete tasks, not just individual API calls
- Consider what workflows agents actually need to accomplish

**Optimize for Limited Context:**
- Agents have constrained context windows - make every token count
- Return high-signal information, not exhaustive data dumps
- Provide "concise" vs "detailed" response format options
- Default to human-readable identifiers over technical codes (names over IDs)
- Consider the agent's context budget as a scarce resource

**Design Actionable Error Messages:**
- Error messages should guide agents toward correct usage patterns
- Suggest specific next steps: "Try using filter='active_only' to reduce results"
- Make errors educational, not just diagnostic
- Help agents learn proper tool usage through clear feedback

**Follow Natural Task Subdivisions:**
- Tool names should reflect how humans think about tasks
- Group related tools with consistent prefixes for discoverability
- Design tools around natural workflows, not just API structure

**Use Evaluation-Driven Development:**
- Create realistic evaluation scenarios early
- Let agent feedback drive tool improvements
- Prototype quickly and iterate based on actual agent performance

#### 1.3 Study MCP Protocol Documentation

**Fetch the latest MCP protocol documentation:**

Use WebFetch to load: `https://modelcontextprotocol.io/llms-full.txt`

This comprehensive document contains the complete MCP specification and guidelines.

#### 1.4 Study Framework Documentation

**Load and read the following reference files:**

- **MCP Best Practices**: [📋 View Best Practices](./reference/mcp_best_practices.md) - Core guidelines for all MCP servers

**For Python implementations, also load:**
- **Python SDK Documentation**: Use WebFetch to load `https://raw.githubusercontent.com/modelcontextprotocol/python-sdk/main/README.md`
- [🐍 Python Implementation Guide](./reference/python_mcp_server.md) - Python-specific best practices and examples

**For Node/TypeScript implementations, also load:**
- **TypeScript SDK Documentation**: Use WebFetch to load `https://raw.githubusercontent.com/modelcontextprotocol/typescript-sdk/main/README.md`
- [⚡ TypeScript Implementation Guide](./reference/node_mcp_server.md) - Node/TypeScript-specific best practices and examples

#### 1.5 Exhaustively Study API Documentation

To integrate a service, read through **ALL** available API documentation:
- Official API reference documentation
- Authentication and authorization requirements
- Rate limiting and pagination patterns
- Error responses and status codes
- Available endpoints and their parameters
- Data models and schemas

**To gather comprehensive information, use web search and the WebFetch tool as needed.**

#### 1.6 Create a Comprehensive Implementation Plan

Based on your research, create a detailed plan that includes:

**Tool Selection:**
- List the most valuable endpoints/operations to implement
- Prioritize tools that enable the most common and important use cases
- Consider which tools work together to enable complex workflows

**Shared Utilities and Helpers:**
- Identify common API request patterns
- Plan pagination helpers
- Design filtering and formatting utilities
- Plan error handling strategies

**Input/Output Design:**
- Define input validation models (Pydantic for Python, Zod for TypeScript)
- Design consistent response formats (e.g., JSON or Markdown), and configurable levels of detail (e.g., Detailed or Concise)
- Plan for large-scale usage (thousands of users/resources)
- Implement character limits and truncation strategies (e.g., 25,000 tokens)

**Error Handling Strategy:**
- Plan graceful failure modes
- Design clear, actionable, LLM-friendly, natural language error messages which prompt further action
- Consider rate limiting and timeout scenarios
- Handle authentication and authorization errors

---

### Phase 2: Implementation

Now that you have a comprehensive plan, begin implementation following language-specific best practices.

#### 2.1 Set Up Project Structure

**For Python:**
- Create a single `.py` file or organize into modules if complex (see [🐍 Python Guide](./reference/python_mcp_server.md))
- Use the MCP Python SDK for tool registration
- Define Pydantic models for input validation

**For Node/TypeScript:**
- Create proper project structure (see [⚡ TypeScript Guide](./reference/node_mcp_server.md))
- Set up `package.json` and `tsconfig.json`
- Use MCP TypeScript SDK
- Define Zod schemas for input validation

#### 2.2 Implement Core Infrastructure First

**To begin implementation, create shared utilities before implementing tools:**
- API request helper functions
- Error handling utilities
- Response formatting functions (JSON and Markdown)
- Pagination helpers
- Authentication/token management

#### 2.3 Implement Tools Systematically

For each tool in the plan:

**Define Input Schema:**
- Use Pydantic (Python) or Zod (TypeScript) for validation
- Include proper constraints (min/max length, regex patterns, min/max values, ranges)
- Provide clear, descriptive field descriptions
- Include diverse examples in field descriptions

**Write Comprehensive Docstrings/Descriptions:**
- One-line summary of what the tool does
- Detailed explanation of purpose and functionality
- Explicit parameter types with examples
- Complete return type schema
- Usage examples (when to use, when not to use)
- Error handling documentation, which outlines how to proceed given specific errors

**Implement Tool Logic:**
- Use shared utilities to avoid code duplication
- Follow async/await patterns for all I/O
- Implement proper error handling
- Support multiple response formats (JSON and Markdown)
- Respect pagination parameters
- Check character limits and truncate appropriately

**Add Tool Annotations:**
- `readOnlyHint`: true (for read-only operations)
- `destructiveHint`: false (for non-destructive operations)
- `idempotentHint`: true (if repeated calls have same effect)
- `openWorldHint`: true (if interacting with external systems)

#### 2.4 Follow Language-Specific Best Practices

**At this point, load the appropriate language guide:**

**For Python: Load [🐍 Python Implementation Guide](./reference/python_mcp_server.md) and ensure the following:**
- Using MCP Python SDK with proper tool registration
- Pydantic v2 models with `model_config`
- Type hints throughout
- Async/await for all I/O operations
- Proper imports organization
- Module-level constants (CHARACTER_LIMIT, API_BASE_URL)

**For Node/TypeScript: Load [⚡ TypeScript Implementation Guide](./reference/node_mcp_server.md) and ensure the following:**
- Using `server.registerTool` properly
- Zod schemas with `.strict()`
- TypeScript strict mode enabled
- No `any` types - use proper types
- Explicit Promise<T> return types
- Build process configured (`npm run build`)

---

### Phase 3: Review and Refine

After initial implementation:

#### 3.1 Code Quality Review

To ensure quality, review the code for:
- **DRY Principle**: No duplicated code between tools
- **Composability**: Shared logic extracted into functions
- **Consistency**: Similar operations return similar formats
- **Error Handling**: All external calls have error handling
- **Type Safety**: Full type coverage (Python type hints, TypeScript types)
- **Documentation**: Every tool has comprehensive docstrings/descriptions

#### 3.2 Test and Build

**Important:** MCP servers are long-running processes that wait for requests over stdio/stdin or sse/http. Running them directly in your main process (e.g., `python server.py` or `node dist/index.js`) will cause your process to hang indefinitely.

**Safe ways to test the server:**
- Use the evaluation harness (see Phase 4) - recommended approach
- Run the server in tmux to keep it outside your main process
- Use a timeout when testing: `timeout 5s python server.py`

**For Python:**
- Verify Python syntax: `python -m py_compile your_server.py`
- Check imports work correctly by reviewing the file
- To manually test: Run server in tmux, then test with evaluation harness in main process
- Or use the evaluation harness directly (it manages the server for stdio transport)

**For Node/TypeScript:**
- Run `npm run build` and ensure it completes without errors
- Verify dist/index.js is created
- To manually test: Run server in tmux, then test with evaluation harness in main process
- Or use the evaluation harness directly (it manages the server for stdio transport)

#### 3.3 Use Quality Checklist

To verify implementation quality, load the appropriate checklist from the language-specific guide:
- Python: see "Quality Checklist" in [🐍 Python Guide](./reference/python_mcp_server.md)
- Node/TypeScript: see "Quality Checklist" in [⚡ TypeScript Guide](./reference/node_mcp_server.md)

---

### Phase 4: Create Evaluations

After implementing your MCP server, create comprehensive evaluations to test its effectiveness.

**Load [✅ Evaluation Guide](./reference/evaluation.md) for complete evaluation guidelines.**

#### 4.1 Understand Evaluation Purpose

Evaluations test whether LLMs can effectively use your MCP server to answer realistic, complex questions.

#### 4.2 Create 10 Evaluation Questions

To create effective evaluations, follow the process outlined in the evaluation guide:

1. **Tool Inspection**: List available tools and understand their capabilities
2. **Content Exploration**: Use READ-ONLY operations to explore available data
3. **Question Generation**: Create 10 complex, realistic questions
4. **Answer Verification**: Solve each question yourself to verify answers

#### 4.3 Evaluation Requirements

Each question must be:
- **Independent**: Not dependent on other questions
- **Read-only**: Only non-destructive operations required
- **Complex**: Requiring multiple tool calls and deep exploration
- **Realistic**: Based on real use cases humans would care about
- **Verifiable**: Single, clear answer that can be verified by string comparison
- **Stable**: Answer won't change over time

#### 4.4 Output Format

Create an XML file with this structure:

```xml
<evaluation>
  <qa_pair>
    <question>Find discussions about AI model launches with animal codenames. One model needed a specific safety designation that uses the format ASL-X. What number X was being determined for the model named after a spotted wild cat?</question>
    <answer>3</answer>
  </qa_pair>
<!-- More qa_pairs... -->
</evaluation>
```

---

# Reference Files

## 📚 Documentation Library

Load these resources as needed during development:

### Core MCP Documentation (Load First)
- **MCP Protocol**: Fetch from `https://modelcontextprotocol.io/llms-full.txt` - Complete MCP specification
- [📋 MCP Best Practices](./reference/mcp_best_practices.md) - Universal MCP guidelines including:
  - Server and tool naming conventions
  - Response format guidelines (JSON vs Markdown)
  - Pagination best practices
  - Character limits and truncation strategies
  - Tool development guidelines
  - Security and error handling standards

### SDK Documentation (Load During Phase 1/2)
- **Python SDK**: Fetch from `https://raw.githubusercontent.com/modelcontextprotocol/python-sdk/main/README.md`
- **TypeScript SDK**: Fetch from `https://raw.githubusercontent.com/modelcontextprotocol/typescript-sdk/main/README.md`

### Language-Specific Implementation Guides (Load During Phase 2)
- [🐍 Python Implementation Guide](./reference/python_mcp_server.md) - Complete Python/FastMCP guide with:
  - Server initialization patterns
  - Pydantic model examples
  - Tool registration with `@mcp.tool`
  - Complete working examples
  - Quality checklist

- [⚡ TypeScript Implementation Guide](./reference/node_mcp_server.md) - Complete TypeScript guide with:
  - Project structure
  - Zod schema patterns
  - Tool registration with `server.registerTool`
  - Complete working examples
  - Quality checklist

### Evaluation Guide (Load During Phase 4)
- [✅ Evaluation Guide](./reference/evaluation.md) - Complete evaluation creation guide with:
  - Question creation guidelines
  - Answer verification strategies
  - XML format specifications
  - Example questions and answers
  - Running an evaluation with the provided scripts

Related Skills

public-plugin-builder

2707

from davepoon/buildwithclaude

Activate when the user wants to build a Claude plugin, create a Claude skill, make a Claude agent, structure a Claude Code plugin, says "build a plugin", "create a skill", "new claude skill", "new agent", "help me make a plugin", "plugin builder", "claude plugin helper", "how do I build a Claude skill", "I want to create a Claude plugin", "plugin building", or asks how to structure a Claude Code plugin or publish to the Claude marketplace. Works on both claude.ai (generates files as code blocks) and Claude Code (writes and pushes files).

artifacts-builder

2707

from davepoon/buildwithclaude

Suite of tools for creating elaborate, multi-component claude.ai HTML artifacts using modern frontend web technologies (React, Tailwind CSS, shadcn/ui). Use for complex artifacts requiring state management, routing, or shadcn/ui components - not for simple single-file HTML/JSX artifacts.

soft-screening-startup

2707

from davepoon/buildwithclaude

Activate for ANY startup evaluation, investment screening, or company assessment. Triggers include: "evaluate this startup", "screen this company", "should I invest in X", "is this a good investment", "what do you think about this company", "review this startup", "score this company", "rate this pitch", "assess this founder", "quick take on X", "is X worth investing in", "pass or decline on X", "what's your verdict on X", "first look at this company", "quick screen on X", "what's your take on this founder", "is this fundable", "would a VC invest in this". Also triggers when a user pastes a company description, funding ask, or founder background and asks for an opinion. Works on claude.ai and Claude Code. For hard-mode deterministic scoring with Python audit trail, use /venture-capital-intelligence:hard-screening-startup.

market-size

2707

from davepoon/buildwithclaude

Run TAM/SAM/SOM market sizing with top-down and bottom-up methods, competitive landscape, and tech stack analysis. Triggered by: "/venture-capital-intelligence:market-size", "size this market", "what is the TAM for X", "market sizing analysis", "competitive landscape for X", "who are the competitors", "TAM SAM SOM for X", "market opportunity analysis", "how big is this market", "is this market big enough", "what's the addressable market", "total addressable market for X", "how large is the opportunity", "market research for X", "how saturated is this market", "market size estimate", "go-to-market sizing", "what is the serviceable market". Claude Code only. Requires Python 3.x. Uses web search for market data.

hard-screening-startup

2707

from davepoon/buildwithclaude

Deterministic Python-scored startup screening with full audit trail. Use when you need a reproducible, weighted-score verdict on a startup — not just a qualitative opinion. Triggered by: "/venture-capital-intelligence:hard-screening-startup", "hard screen this startup", "run a hard screen on X", "score this startup with Python", "give me an auditable screen", "run a scored evaluation on X", "give me a weighted score for this startup", "screen with numbers", "objective startup score", "reproducible screen", "investment scorecard for X", "score this company out of 100", "run the full screen on X". Claude Code only. Requires Python 3.x. For conversational soft-mode screening, use /venture-capital-intelligence:soft-screening-startup.

fund-operations

2707

from davepoon/buildwithclaude

Compute fund KPIs (TVPI, DPI, IRR, MOIC), model carried interest and management fees, and generate LP quarterly update narratives. Triggered by: "/venture-capital-intelligence:fund-operations", "calculate fund KPIs", "what is my fund TVPI", "IRR calculation", "compute MOIC", "LP report", "quarterly update draft", "carried interest calculation", "management fee calculation", "fund performance report", "write my LP update", "how is my fund performing", "what is my DPI", "fund returns analysis", "model my carry", "how much carry do I earn", "portfolio performance summary", "generate investor update". Claude Code only. Requires Python 3.x.

financial-model

2707

from davepoon/buildwithclaude

Run deterministic financial models for startup valuation and SaaS health analysis. Triggered by: "/venture-capital-intelligence:financial-model", "run a financial model on X", "DCF this company", "model the financials", "calculate runway", "what is the valuation", "SaaS metrics model", "LTV CAC analysis", "unit economics", "burn rate analysis", "comparable valuation", "how long is my runway", "what's my burn multiple", "revenue projection for X", "model the ARR growth", "what is the pre-money valuation", "comps analysis", "NRR and churn model", "how healthy are these SaaS metrics". Claude Code only. Requires Python 3.x. Accepts user-supplied numbers or searches for publicly available data.

explain-equity-terms

2707

from davepoon/buildwithclaude

Activate for ANY equity, legal, or term sheet question related to startup investing or fundraising. Triggers include: "what is a SAFE", "explain this term sheet", "what does pro-rata mean", "what is liquidation preference", "explain anti-dilution", "ISO vs NSO", "what is a 83(b) election", "what is carried interest", "explain drag-along", "what is a valuation cap", "what does MFN mean", "explain convertible note vs SAFE", "what is a down round", "explain vesting cliff", "what does fully diluted mean", "term sheet question", "equity question", "what does this clause mean". Also triggers when a user pastes legal text from a term sheet, SAFE, or subscription agreement and asks what it means. Works on claude.ai and Claude Code.

deal-sourcing-signals

2707

from davepoon/buildwithclaude

Scan a company or sector for deal-sourcing signals across 6 dimensions. Triggered by: "/venture-capital-intelligence:deal-sourcing-signals", "scan signals for X", "what signals is X showing", "deal sourcing scan", "hiring signals for X", "is X raising soon", "monitor this company", "company signal scan", "sourcing brief for X", "what is X up to", "is X growing", "track this company", "deal signal report for X", "is this company fundraising", "what are the momentum signals for X", "find signals on X", "is X worth tracking". Claude Code only. Requires Python 3.x. Uses web search for live signal data.

cap-table-waterfall

2707

from davepoon/buildwithclaude

Model cap table dilution, SAFE conversion, and exit waterfall across scenarios. Triggered by: "/venture-capital-intelligence:cap-table-waterfall", "model my cap table", "simulate dilution", "SAFE conversion math", "exit waterfall", "how much do I own after Series A", "liquidation waterfall", "cap table scenario", "what happens to equity at exit", "model the waterfall", "how much equity do I have left", "what is my ownership after funding", "run dilution scenarios", "model a new round", "what happens at acquisition", "cap table after SAFE conversion", "pari passu waterfall", "preference stack analysis". Claude Code only. Requires Python 3.x.

analyze-pitch-deck

2707

from davepoon/buildwithclaude

Activate for ANY pitch deck analysis, feedback, or review request. Triggers include: "analyze this deck", "review my pitch deck", "critique my pitch", "feedback on my slides", "is my deck investor ready", "what's wrong with my pitch", "how would a VC react to this deck", "score my pitch deck", "rate my slides", "improve my deck", "what slides am I missing", "is this pitch compelling". Also triggers when a user pastes slide content, describes their deck structure, or shares a company narrative and asks for investor feedback. Works on claude.ai and Claude Code.

server-components

2707

from davepoon/buildwithclaude

This skill should be used when the user asks about "Server Components", "Client Components", "'use client' directive", "when to use server vs client", "RSC patterns", "component composition", "data fetching in components", or needs guidance on React Server Components architecture in Next.js.