-
Notifications
You must be signed in to change notification settings - Fork 416
Description
🚀 Epic: Benchmark MCP Server for Load Testing and Performance Analysis
Goal
Provide a highly configurable, Go-based MCP server that can generate an arbitrary number of tools, resources, and prompts with customizable response payloads for benchmarking, load testing, and performance analysis of MCP Gateway implementations, clients, and protocol stacks. This enables developers to validate scalability, measure throughput, and identify bottlenecks in their MCP infrastructure before production deployment.
Why Now?
As MCP Gateway evolves to support thousands of tools, resources, and prompts across federated gateways, teams need a reliable way to:
- Validate scalability - Test how systems handle 1,000, 10,000, or 100,000+ MCP primitives
- Measure performance - Benchmark tool invocation latency, resource access speed, and prompt generation throughput
- Test edge cases - Validate behavior with varying payload sizes (1 byte to 1MB+)
- Compare transports - Benchmark stdio vs SSE vs HTTP performance characteristics
- Stress test infrastructure - Identify memory leaks, connection limits, and CPU bottlenecks
This tool provides a standardized benchmarking platform for the entire MCP ecosystem, enabling apples-to-apples performance comparisons across implementations.
📖 User Stories
US-1: Performance Engineer - Large-Scale Tool Discovery Testing
As a Performance Engineer
I want to generate 10,000+ tools with configurable payload sizes
So that I can measure how MCP Gateway handles large tool listings and invocations
Acceptance Criteria:
Given I start the benchmark server with flags:
-tools=10000 -tool-size=5000 -resources=0 -prompts=0
When an MCP client sends "tools/list" request
Then the server should:
- Return all 10,000 tool definitions within 100ms
- Each tool should have unique name "benchmark_tool_N"
- Each tool should accept "param1" and "param2" arguments
- Tool descriptions should indicate the tool number
When the client invokes "benchmark_tool_0"
Then the server should:
- Return JSON response with ~5000 byte payload
- Include timestamp, tool name, and passed arguments
- Respond within 10msTechnical Requirements:
- Tool generation must be deterministic and repeatable
- Memory usage should scale linearly (O(n)) with tool count
- Support up to 100,000 tools without performance degradation
- All tools registered during server startup
US-2: QA Engineer - Mixed Workload Testing
As a QA Engineer
I want to configure different payload sizes for tools, resources, and prompts independently
So that I can simulate realistic MCP workloads with varying response sizes
Acceptance Criteria:
Given I start the benchmark server with flags:
-tools=1000 -tool-size=2000
-resources=500 -resource-size=50000
-prompts=200 -prompt-size=1000
When MCP clients access the server
Then the server should:
- Return tools with ~2KB payloads
- Return resources with ~50KB payloads
- Return prompts with ~1KB payloads
- Maintain consistent payload sizes across invocationsTechnical Requirements:
- Independent size controls:
-tool-size,-resource-size,-prompt-size - Payload generation must be efficient (no excessive memory allocations)
- Support payload sizes from 1 byte to 10MB+
- Validate size parameters at startup
US-3: DevOps Engineer - Multi-Transport Benchmarking
As a DevOps Engineer
I want to run the benchmark server over stdio, SSE, and HTTP transports
So that I can compare protocol performance characteristics
Acceptance Criteria:
# STDIO Mode (for Claude Desktop integration)
Given I run: ./benchmark-server -tools=1000
Then the server communicates via stdin/stdout JSON-RPC
# SSE Mode (for web clients)
Given I run: ./benchmark-server -transport=sse -port=8080 -tools=1000
Then the server exposes:
- SSE events at /sse
- SSE messages at /messages
- Health check at /health
- Version info at /version
# HTTP Mode (for REST-like clients)
Given I run: ./benchmark-server -transport=http -port=9090 -tools=1000
Then the server accepts POST requests with JSON-RPC payloads at /Technical Requirements:
- All transports support same MCP protocol features
- SSE transport must support Server-Sent Events streaming
- HTTP transport must support streamable responses
- Health/version endpoints work without authentication
US-4: Load Test Engineer - Stress Testing Infrastructure
As a Load Test Engineer
I want to generate extreme-scale configurations (100,000+ items)
So that I can identify breaking points in MCP infrastructure
Acceptance Criteria:
Given I start the benchmark server with:
-tools=100000 -resources=50000 -prompts=10000
When the server starts
Then it should:
- Register all items within 5 seconds
- Report configuration via logs
- Consume less than 500MB memory
- Respond to tool/list requests within 200msTechnical Requirements:
- Fast registration (instant for 10K, <5s for 100K)
- Efficient memory usage (no redundant data structures)
- Configurable log levels (debug, info, warn, error, none)
- Graceful handling of OS limits (file descriptors, memory)
US-5: Security Engineer - Authenticated Transport Testing
As a Security Engineer
I want to require Bearer token authentication for SSE/HTTP transports
So that I can test authentication flows in benchmarking scenarios
Acceptance Criteria:
Given I start with: ./benchmark-server -transport=sse -auth-token=secret123
When a client sends a request without Authorization header
Then the server responds with 401 Unauthorized
When a client sends: Authorization: Bearer secret123
Then the server accepts the request and processes normally
Given environment variable: AUTH_TOKEN=secret456
When I start with: ./benchmark-server -transport=sse
Then the server uses "secret456" as the auth tokenTechnical Requirements:
- Bearer token validation on all endpoints except /health and /version
- Support both CLI flag and environment variable
- Return proper WWW-Authenticate header on 401 responses
- Log authentication attempts at debug level
🏗 Architecture
Component Architecture
graph TB
subgraph "Benchmark Server (Go)"
A1[Flag Parser]
A2[MCP Server Core]
A3[Dynamic Handler Generator]
A4[Tool Handlers 0..N]
A5[Resource Handlers 0..N]
A6[Prompt Handlers 0..N]
A7[Transport Layer]
A8[STDIO Transport]
A9[SSE Transport]
A10[HTTP Transport]
A11[Auth Middleware]
end
subgraph "MCP Clients"
B1[Claude Desktop]
B2[Web Browser]
B3[HTTP Client]
B4[Load Test Tool]
end
A1 --> A2
A1 --> A3
A3 --> A4
A3 --> A5
A3 --> A6
A2 --> A4
A2 --> A5
A2 --> A6
A2 --> A7
A7 --> A8
A7 --> A9
A7 --> A10
A9 --> A11
A10 --> A11
B1 -->|JSON-RPC stdin| A8
B2 -->|SSE /sse| A9
B3 -->|HTTP POST /| A10
B4 -->|Concurrent Requests| A9
B4 -->|Concurrent Requests| A10
Payload Generation Flow
sequenceDiagram
participant Client as MCP Client
participant Server as Benchmark Server
participant Handler as Tool Handler
participant Generator as Payload Generator
Client->>Server: tools/call {"name":"benchmark_tool_0"}
Server->>Handler: invoke(toolName="benchmark_tool_0", args={})
Handler->>Generator: generatePayload("benchmark_tool_0", size=5000)
Generator->>Generator: base = "Response from benchmark_tool_0. "
Generator->>Generator: filler = "This is benchmark data. " (repeated)
Generator->>Generator: result = base + filler (truncated to 5000 bytes)
Generator-->>Handler: payload (5000 bytes)
Handler->>Handler: Build JSON response with tool name, timestamp, args, data
Handler-->>Server: JSON response
Server-->>Client: MCP ToolResult with payload
📋 File Structure
mcp-servers/go/benchmark-server/
├── main.go # Server implementation (691 lines)
├── go.mod # Go module definition
├── go.sum # Dependency checksums
├── Makefile # Build automation with targets
├── Dockerfile # Multi-stage container build
├── README.md # Comprehensive documentation
└── dist/
└── benchmark-server # Compiled binary
⚙️ Command-Line Interface
Core Flags
| Flag | Default | Description |
|---|---|---|
-transport |
stdio |
Transport type: stdio, sse, or http |
-tools |
100 |
Number of tools to generate |
-resources |
100 |
Number of resources to generate |
-prompts |
100 |
Number of prompts to generate |
-tool-size |
1000 |
Size of tool response payload in bytes |
-resource-size |
1000 |
Size of resource response payload in bytes |
-prompt-size |
1000 |
Size of prompt response payload in bytes |
-port |
8080 |
TCP port for SSE/HTTP transport |
-listen |
0.0.0.0 |
Listen interface for SSE/HTTP |
-addr |
- | Full listen address (overrides -listen/-port) |
-public-url |
- | External base URL for SSE clients |
-auth-token |
- | Bearer token for authentication (SSE/HTTP only) |
-log-level |
info |
Logging level: debug, info, warn, error, none |
-help |
- | Show help message |
Environment Variables
| Variable | Description |
|---|---|
AUTH_TOKEN |
Bearer token for authentication (overrides -auth-token flag) |
📊 Usage Examples
Small Scale Testing (Development)
# Quick test with 10 items each
./benchmark-server -tools=10 -resources=10 -prompts=10 -log-level=debug
# Test specific type with custom size
./benchmark-server -tools=5 -tool-size=500 -resources=0 -prompts=0Medium Scale Testing (Integration)
# Realistic workload
./benchmark-server -tools=1000 -resources=500 -prompts=200
# Mixed payload sizes
./benchmark-server -tools=1000 -tool-size=2000 \
-resources=500 -resource-size=50000 \
-prompts=200 -prompt-size=1000Large Scale Testing (Performance)
# 10K tools with 5KB payloads
./benchmark-server -tools=10000 -tool-size=5000
# Mixed scale for gateway stress testing
./benchmark-server -tools=10000 -resources=5000 -prompts=1000 \
-tool-size=2000 -resource-size=10000 -prompt-size=500Extreme Scale Testing (Limits)
# 100K tools (test discovery performance)
./benchmark-server -tools=100000 -resources=0 -prompts=0 -log-level=none
# Large payloads (test data transfer)
./benchmark-server -tools=100 -tool-size=1000000 # 1MB payloadsMulti-Transport Testing
# STDIO (Claude Desktop)
./benchmark-server -tools=1000
# SSE (Web clients)
./benchmark-server -transport=sse -port=8080 -tools=1000
# HTTP (REST clients)
./benchmark-server -transport=http -port=9090 -tools=1000
# SSE with authentication
./benchmark-server -transport=sse -port=8080 -auth-token=secret123 -tools=500Claude Desktop Integration
Add to claude_desktop_config.json:
{
"mcpServers": {
"benchmark": {
"command": "/path/to/benchmark-server",
"args": ["-tools=1000", "-resources=500", "-prompts=200"]
}
}
}🔧 API Response Format
Tool Response
{
"tool": "benchmark_tool_0",
"timestamp": "2025-10-11T12:34:56Z",
"arguments": {
"param1": "value1",
"param2": "value2"
},
"data": "Response from benchmark_tool_0. This is benchmark data. This is benchmark data..."
}Resource Response
{
"resource": "benchmark_resource_0",
"timestamp": "2025-10-11T12:34:56Z",
"data": "Response from benchmark_resource_0. This is benchmark data..."
}Prompt Response
Prompt: benchmark_prompt_0
Timestamp: 2025-10-11T12:34:56Z
Arguments:
- arg1: value1
- arg2: value2
Response from benchmark_prompt_0. This is benchmark data...
📈 Performance Characteristics
Registration Speed
| Item Count | Registration Time | Memory Usage |
|---|---|---|
| 100 | <1ms | ~5MB |
| 1,000 | <10ms | ~10MB |
| 10,000 | <100ms | ~50MB |
| 100,000 | <5s | ~300MB |
Response Times
| Operation | 1,000 items | 10,000 items | 100,000 items |
|---|---|---|---|
| Tool listing | <10ms | <50ms | <200ms |
| Tool invocation | <5ms | <5ms | <5ms |
| Resource access | <5ms | <5ms | <5ms |
| Prompt generation | <5ms | <5ms | <5ms |
Payload Size Impact
| Payload Size | Tool Invocation Time | Memory per Request |
|---|---|---|
| 1KB | <5ms | ~2KB |
| 10KB | <5ms | ~12KB |
| 100KB | <10ms | ~105KB |
| 1MB | <20ms | ~1.1MB |
📋 Implementation Tasks
Phase 1: Core Server Implementation ✅
-
Project Structure
- Create
mcp-servers/go/benchmark-server/directory - Initialize Go module with
go.mod - Add
github.com/mark3labs/mcp-go v0.32.0dependency
- Create
-
Main Application (
main.go)- Implement command-line flag parsing
- Add logging infrastructure with levels
- Create MCP server initialization
- Implement transport selection logic
Phase 2: Dynamic Handler Generation ✅
-
Payload Generation
- Implement
generatePayload()function - Support arbitrary payload sizes
- Use repeating filler text for efficiency
- Implement
-
Handler Factories
- Implement
createToolHandler()factory - Implement
createResourceHandler()factory - Implement
createPromptHandler()factory - Support closure-based handler creation with size parameters
- Implement
Phase 3: Tool/Resource/Prompt Registration ✅
-
Tool Registration Loop
- Generate N tools with sequential names
- Add tool descriptions and metadata
- Register with MCP server
-
Resource Registration Loop
- Generate N resources with URIs
- Add resource descriptions
- Register with MCP server
-
Prompt Registration Loop
- Generate N prompts with names
- Add prompt arguments
- Register with MCP server
Phase 4: Transport Implementation ✅
-
STDIO Transport
- Use
server.ServeStdio()for stdin/stdout - Support JSON-RPC over stdio
- Ignore auth-token in stdio mode
- Use
-
SSE Transport
- Implement SSE server with
/sseand/messagesendpoints - Add health/version endpoints
- Support Bearer token authentication
- Implement logging middleware
- Implement SSE server with
-
HTTP Transport
- Implement HTTP server with JSON-RPC POST endpoint
- Add health/version endpoints
- Support Bearer token authentication
- Implement logging middleware
Phase 5: Authentication & Security ✅
- Bearer Token Auth
- Implement
authMiddleware()function - Validate Authorization header format
- Skip auth for health/version endpoints
- Support both CLI flag and environment variable
- Return proper 401 responses with WWW-Authenticate
- Implement
Phase 6: Customizable Payload Sizes ✅
- Separate Size Controls
- Replace single
-payload-sizewith three flags - Add
-tool-sizeflag (default: 1000) - Add
-resource-sizeflag (default: 1000) - Add
-prompt-sizeflag (default: 1000) - Update handler factories to use separate sizes
- Update logging to show all three sizes
- Replace single
Phase 7: Build Automation ✅
-
Makefile
- Create
buildtarget with CGO_ENABLED=0 - Add
runtarget for quick testing - Add
run-small,run-medium,run-large,run-xlargepresets - Add
run-sseandrun-httptransport targets - Add
cleantarget - Add
helptarget with descriptions - Add
tidy,fmt,testtargets
- Create
-
Dockerfile
- Multi-stage build with golang:1.23
- Scratch-based final image
- CGO_ENABLED=0 for static binary
- Trimmed and stripped binary
Phase 8: Documentation ✅
-
README.md
- Project overview and features
- Quick start guide
- Command-line options table
- Usage examples (small, medium, large, extreme scale)
- Claude Desktop integration example
- Testing examples with curl
- Makefile targets documentation
- Benchmarking scenarios
- Performance characteristics
- API response format examples
- Docker usage instructions
-
Code Documentation
- File header with usage examples
- Function docstrings
- Inline comments for complex logic
Phase 9: Testing & Validation ✅
-
Functional Testing
- Test tool listing with 100 items (default)
- Test tool invocation with parameters
- Test resource listing
- Test resource reading
- Test prompt listing
- Test prompt generation
- Test with 1,000 items (medium scale)
- Test with 10,000 items (large scale)
-
Payload Size Testing
- Verify tool payload size accuracy
- Verify resource payload size accuracy
- Verify prompt payload size accuracy
- Test mixed payload sizes
-
Transport Testing
- Test stdio mode with echo piping
- Test SSE mode (manual verification)
- Test HTTP mode (manual verification)
-
Performance Testing
- Measure registration time for 10,000 items
- Verify instant registration
- Verify memory usage is reasonable
✅ Success Criteria
- Functionality: Can generate 1 to 100,000+ tools/resources/prompts on demand
- Customization: Separate size controls for tools, resources, and prompts
- Performance: Instant registration for 10K items, <5s for 100K items
- Transports: Full support for stdio, SSE, and HTTP transports
- Authentication: Bearer token auth for SSE/HTTP with environment variable support
- Build System: Makefile with convenient targets and Docker support
- Documentation: Comprehensive README with examples and Claude Desktop integration
- Testing: Verified with actual MCP protocol invocations
- Logging: Configurable log levels with structured output
- Standards: Full MCP 1.0 protocol compliance
📝 Additional Notes
🔹 Zero Dependencies: The server uses only the official mcp-go library and Go standard library, ensuring minimal attack surface and fast compilation.
🔹 Deterministic Behavior: Tool names, resource URIs, and prompt names are sequential and predictable, making it easy to write automated tests.
🔹 Efficient Memory Usage: Handlers are generated as closures that capture only the necessary data (name, size), avoiding redundant storage.
🔹 Payload Flexibility: Supports payloads from 1 byte to 10MB+, enabling testing of:
- Small responses (metadata-heavy workloads)
- Medium responses (typical tool outputs)
- Large responses (data export, log streaming)
🔹 Real-World Simulation: The three-tier configuration (tools, resources, prompts with independent sizes) mirrors production MCP servers that expose different types of primitives with varying response characteristics.
🔹 Container-Ready: Dockerfile produces a 10MB scratch-based image with static binary, ideal for Kubernetes deployments and CI/CD pipelines.
🔹 Claude Desktop Compatible: Works out-of-the-box with Claude Desktop via stdio transport, allowing manual testing of tool discovery and invocation.
🔹 Future Extensions:
- Sampling mode (random tool/resource/prompt selection)
- Latency injection for network delay simulation
- Error rate injection for failure testing
- Prometheus metrics endpoint
- OpenTelemetry tracing support
🏁 Definition of Done
- All implementation tasks completed
- Server runs with default configuration (100 items each)
- Server handles extreme scale (10,000+ items) without errors
- Separate payload size controls functional and tested
- All three transports (stdio, SSE, HTTP) operational
- Authentication works with both CLI flag and environment variable
- Makefile targets work correctly
- Dockerfile builds successfully
- README documentation complete with examples
- Code includes header comments with usage examples
- Tested with actual MCP protocol requests
- Performance characteristics documented
- Project follows Go best practices (gofmt, proper error handling)
- Binary size optimized (stripped and trimmed)
🎯 Use Cases
1. Gateway Scalability Testing
Test MCP Gateway with increasing tool counts to identify discovery bottlenecks.
2. Transport Performance Comparison
Benchmark stdio vs SSE vs HTTP to determine optimal transport for production.
3. Client Load Testing
Stress test MCP clients (Claude Desktop, web apps) with large tool catalogs.
4. Protocol Compliance Verification
Validate MCP protocol implementations handle large-scale tool/resource/prompt scenarios correctly.
5. Memory Profiling
Profile MCP Gateway memory usage under various load conditions (tool count × payload size).
6. Latency Analysis
Measure end-to-end latency from tool invocation to response across different scales.
7. Federation Testing
Test federated gateway scenarios with multiple benchmark servers exposing different scales.
8. CI/CD Performance Regression
Automated benchmarking in CI/CD to detect performance regressions across versions.