Skip to content

Sample MCP Server - Python (ml-inference-server) #899

@crivetimihai

Description

@crivetimihai

Overview

Create a sample MCP Server in Python that provides machine learning model inference capabilities with support for multiple ML frameworks and model formats.

Server Specifications

Server Details

  • Name: ml-inference-server
  • Language: Python 3.11+
  • Location: mcp-servers/python/ml_inference_server/
  • Purpose: Demonstrate ML model inference and deployment via MCP

Core Features

  • Multiple ML framework support (scikit-learn, PyTorch, ONNX, Hugging Face)
  • Model loading and caching
  • Batch and streaming inference
  • Model performance metrics
  • Input validation and preprocessing
  • Output postprocessing and formatting

Tools Provided

1. load_model

Load and cache ML models from various sources

@dataclass
class ModelLoadRequest:
    model_id: str
    model_type: str  # sklearn, pytorch, onnx, huggingface
    model_path: str  # local path, URL, or HF model name
    cache_model: bool = True
    preprocessing_config: Optional[Dict[str, Any]] = None
    device: str = "cpu"  # cpu, cuda, mps

2. predict

Run inference on input data

@dataclass
class PredictionRequest:
    model_id: str
    inputs: Union[List[Any], Dict[str, Any]]
    preprocessing: bool = True
    postprocessing: bool = True
    return_probabilities: bool = False
    batch_size: Optional[int] = None

3. predict_batch

Efficient batch prediction for multiple inputs

@dataclass
class BatchPredictionRequest:
    model_id: str
    batch_inputs: List[Union[List[Any], Dict[str, Any]]]
    preprocessing: bool = True
    postprocessing: bool = True
    parallel_processing: bool = True
    max_batch_size: int = 32

4. get_model_info

Retrieve model metadata and capabilities

@dataclass
class ModelInfoRequest:
    model_id: str
    include_schema: bool = True
    include_metrics: bool = True
    include_examples: bool = False

5. validate_input

Validate input data against model schema

@dataclass
class InputValidationRequest:
    model_id: str
    inputs: Union[List[Any], Dict[str, Any]]
    strict_validation: bool = True
    return_suggestions: bool = True

6. benchmark_model

Performance benchmarking and profiling

@dataclass
class BenchmarkRequest:
    model_id: str
    test_inputs: List[Union[List[Any], Dict[str, Any]]]
    num_iterations: int = 100
    warmup_iterations: int = 10
    measure_memory: bool = True
    profile_execution: bool = False

Supported Model Types

Scikit-learn Models

  • Classification, regression, clustering
  • Joblib and pickle format support
  • Custom preprocessing pipelines

PyTorch Models

  • TorchScript and state_dict formats
  • GPU acceleration support
  • Dynamic input shapes

ONNX Models

  • Cross-framework compatibility
  • Optimized inference runtime
  • Hardware acceleration

Hugging Face Models

  • Transformers (text, vision, audio)
  • Tokenizer integration
  • Model hub integration

Implementation Requirements

Directory Structure

mcp-servers/python/ml_inference_server/
├── src/
│   └── ml_inference_server/
│       ├── __init__.py
│       ├── server.py
│       ├── models/
│       │   ├── __init__.py
│       │   ├── sklearn_handler.py
│       │   ├── pytorch_handler.py
│       │   ├── onnx_handler.py
│       │   └── huggingface_handler.py
│       ├── preprocessing/
│       │   ├── __init__.py
│       │   ├── text.py
│       │   ├── image.py
│       │   └── tabular.py
│       ├── cache/
│       │   ├── __init__.py
│       │   └── model_cache.py
│       └── utils/
│           ├── __init__.py
│           ├── validation.py
│           └── metrics.py
├── tests/
├── requirements.txt
├── requirements-dev.txt
├── pyproject.toml
├── README.md
├── docker/
│   ├── Dockerfile
│   └── docker-compose.yml
└── examples/
    ├── sklearn_example.py
    ├── pytorch_example.py
    └── huggingface_example.py

Dependencies

# requirements.txt
mcp>=1.0.0
scikit-learn>=1.3.0
torch>=2.1.0
onnxruntime>=1.16.0
transformers>=4.35.0
numpy>=1.24.0
pandas>=2.1.0
pillow>=10.0.0
pydantic>=2.5.0
fastapi>=0.104.0  # for optional REST API
uvicorn>=0.24.0

Configuration

# config.yaml
server:
  cache_dir: "./model_cache"
  max_cache_size: "10GB"
  default_device: "cpu"
  
models:
  text_classifier:
    type: "huggingface"
    model_name: "distilbert-base-uncased-finetuned-sst-2-english"
    task: "sentiment-analysis"
    
  image_classifier:
    type: "pytorch"
    model_path: "./models/resnet50.pth"
    preprocessing:
      resize: [224, 224]
      normalize: [[0.485, 0.456, 0.406], [0.229, 0.224, 0.225]]
      
  fraud_detector:
    type: "sklearn"
    model_path: "./models/fraud_model.joblib"
    preprocessing:
      scaler: "standard"
      feature_selection: true

inference:
  batch_timeout: 30  # seconds
  max_batch_size: 64
  enable_gpu: false
  
monitoring:
  track_predictions: true
  log_performance: true
  alert_on_errors: true

Usage Examples

Load and Use Scikit-learn Model

# Load fraud detection model
await mcp_client.call_tool("load_model", {
    "model_id": "fraud_detector",
    "model_type": "sklearn", 
    "model_path": "./models/fraud_detector.joblib",
    "cache_model": True
})

# Make prediction
result = await mcp_client.call_tool("predict", {
    "model_id": "fraud_detector",
    "inputs": {
        "amount": 1500.00,
        "merchant_category": "grocery",
        "hour_of_day": 14,
        "day_of_week": 2
    },
    "return_probabilities": True
})

Hugging Face Text Classification

# Load sentiment analysis model
await mcp_client.call_tool("load_model", {
    "model_id": "sentiment_classifier",
    "model_type": "huggingface",
    "model_path": "cardiffnlp/twitter-roberta-base-sentiment-latest"
})

# Batch sentiment analysis
result = await mcp_client.call_tool("predict_batch", {
    "model_id": "sentiment_classifier", 
    "batch_inputs": [
        "I love this product!",
        "This is terrible quality.",
        "It's okay, nothing special."
    ]
})

PyTorch Image Classification

# Load image classification model
await mcp_client.call_tool("load_model", {
    "model_id": "image_classifier",
    "model_type": "pytorch",
    "model_path": "./models/resnet50_trained.pth",
    "device": "cuda"
})

# Classify images
result = await mcp_client.call_tool("predict", {
    "model_id": "image_classifier",
    "inputs": {"image_path": "./test_image.jpg"},
    "preprocessing": True,
    "return_probabilities": True
})

Advanced Features

  • Model Versioning: Track and manage multiple model versions
  • A/B Testing: Compare model performance across versions
  • Distributed Inference: Scale across multiple GPUs/nodes
  • Model Monitoring: Track prediction drift and performance
  • Auto-scaling: Dynamic resource allocation based on load
  • Model Serving: Optional REST API for direct HTTP access

Security Features

  • Input sanitization and validation
  • Model file integrity checking
  • Rate limiting and quota management
  • Secure model loading from remote sources
  • Audit logging for all predictions

Performance Optimizations

  • Model caching and lazy loading
  • Batch processing optimization
  • GPU memory management
  • Quantization support for smaller models
  • Multi-threading for CPU inference

Testing Requirements

  • Unit tests for each model handler
  • Integration tests with real models
  • Performance benchmarking tests
  • Security tests for input validation
  • Memory leak testing for long-running inference

Acceptance Criteria

  • Python MCP server with 6+ ML inference tools
  • Support for scikit-learn, PyTorch, ONNX, and Hugging Face models
  • Model loading, caching, and metadata management
  • Batch and single prediction capabilities
  • Input validation and preprocessing pipelines
  • Performance benchmarking and monitoring
  • Comprehensive test suite (>90% coverage)
  • Docker setup for containerized deployment
  • Security features for safe model serving
  • Complete documentation with examples for each framework

Priority

High - Demonstrates AI/ML integration patterns crucial for modern applications

Use Cases

  • AI application development and testing
  • Model deployment and serving
  • ML pipeline integration
  • Research and experimentation
  • Production inference serving
  • Model performance analysis

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestmcp-serversMCP Server SamplesoicOpen Innovation Community ContributionspythonPython / backend development (FastAPI)

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions