-
Notifications
You must be signed in to change notification settings - Fork 435
Open
Labels
enhancementNew feature or requestNew feature or requestmcp-serversMCP Server SamplesMCP Server SamplesoicOpen Innovation Community ContributionsOpen Innovation Community ContributionspythonPython / backend development (FastAPI)Python / backend development (FastAPI)
Milestone
Description
Overview
Create a sample MCP Server in Python that provides machine learning model inference capabilities with support for multiple ML frameworks and model formats.
Server Specifications
Server Details
- Name:
ml-inference-server - Language: Python 3.11+
- Location:
mcp-servers/python/ml_inference_server/ - Purpose: Demonstrate ML model inference and deployment via MCP
Core Features
- Multiple ML framework support (scikit-learn, PyTorch, ONNX, Hugging Face)
- Model loading and caching
- Batch and streaming inference
- Model performance metrics
- Input validation and preprocessing
- Output postprocessing and formatting
Tools Provided
1. load_model
Load and cache ML models from various sources
@dataclass
class ModelLoadRequest:
model_id: str
model_type: str # sklearn, pytorch, onnx, huggingface
model_path: str # local path, URL, or HF model name
cache_model: bool = True
preprocessing_config: Optional[Dict[str, Any]] = None
device: str = "cpu" # cpu, cuda, mps2. predict
Run inference on input data
@dataclass
class PredictionRequest:
model_id: str
inputs: Union[List[Any], Dict[str, Any]]
preprocessing: bool = True
postprocessing: bool = True
return_probabilities: bool = False
batch_size: Optional[int] = None3. predict_batch
Efficient batch prediction for multiple inputs
@dataclass
class BatchPredictionRequest:
model_id: str
batch_inputs: List[Union[List[Any], Dict[str, Any]]]
preprocessing: bool = True
postprocessing: bool = True
parallel_processing: bool = True
max_batch_size: int = 324. get_model_info
Retrieve model metadata and capabilities
@dataclass
class ModelInfoRequest:
model_id: str
include_schema: bool = True
include_metrics: bool = True
include_examples: bool = False5. validate_input
Validate input data against model schema
@dataclass
class InputValidationRequest:
model_id: str
inputs: Union[List[Any], Dict[str, Any]]
strict_validation: bool = True
return_suggestions: bool = True6. benchmark_model
Performance benchmarking and profiling
@dataclass
class BenchmarkRequest:
model_id: str
test_inputs: List[Union[List[Any], Dict[str, Any]]]
num_iterations: int = 100
warmup_iterations: int = 10
measure_memory: bool = True
profile_execution: bool = FalseSupported Model Types
Scikit-learn Models
- Classification, regression, clustering
- Joblib and pickle format support
- Custom preprocessing pipelines
PyTorch Models
- TorchScript and state_dict formats
- GPU acceleration support
- Dynamic input shapes
ONNX Models
- Cross-framework compatibility
- Optimized inference runtime
- Hardware acceleration
Hugging Face Models
- Transformers (text, vision, audio)
- Tokenizer integration
- Model hub integration
Implementation Requirements
Directory Structure
mcp-servers/python/ml_inference_server/
├── src/
│ └── ml_inference_server/
│ ├── __init__.py
│ ├── server.py
│ ├── models/
│ │ ├── __init__.py
│ │ ├── sklearn_handler.py
│ │ ├── pytorch_handler.py
│ │ ├── onnx_handler.py
│ │ └── huggingface_handler.py
│ ├── preprocessing/
│ │ ├── __init__.py
│ │ ├── text.py
│ │ ├── image.py
│ │ └── tabular.py
│ ├── cache/
│ │ ├── __init__.py
│ │ └── model_cache.py
│ └── utils/
│ ├── __init__.py
│ ├── validation.py
│ └── metrics.py
├── tests/
├── requirements.txt
├── requirements-dev.txt
├── pyproject.toml
├── README.md
├── docker/
│ ├── Dockerfile
│ └── docker-compose.yml
└── examples/
├── sklearn_example.py
├── pytorch_example.py
└── huggingface_example.py
Dependencies
# requirements.txt
mcp>=1.0.0
scikit-learn>=1.3.0
torch>=2.1.0
onnxruntime>=1.16.0
transformers>=4.35.0
numpy>=1.24.0
pandas>=2.1.0
pillow>=10.0.0
pydantic>=2.5.0
fastapi>=0.104.0 # for optional REST API
uvicorn>=0.24.0Configuration
# config.yaml
server:
cache_dir: "./model_cache"
max_cache_size: "10GB"
default_device: "cpu"
models:
text_classifier:
type: "huggingface"
model_name: "distilbert-base-uncased-finetuned-sst-2-english"
task: "sentiment-analysis"
image_classifier:
type: "pytorch"
model_path: "./models/resnet50.pth"
preprocessing:
resize: [224, 224]
normalize: [[0.485, 0.456, 0.406], [0.229, 0.224, 0.225]]
fraud_detector:
type: "sklearn"
model_path: "./models/fraud_model.joblib"
preprocessing:
scaler: "standard"
feature_selection: true
inference:
batch_timeout: 30 # seconds
max_batch_size: 64
enable_gpu: false
monitoring:
track_predictions: true
log_performance: true
alert_on_errors: trueUsage Examples
Load and Use Scikit-learn Model
# Load fraud detection model
await mcp_client.call_tool("load_model", {
"model_id": "fraud_detector",
"model_type": "sklearn",
"model_path": "./models/fraud_detector.joblib",
"cache_model": True
})
# Make prediction
result = await mcp_client.call_tool("predict", {
"model_id": "fraud_detector",
"inputs": {
"amount": 1500.00,
"merchant_category": "grocery",
"hour_of_day": 14,
"day_of_week": 2
},
"return_probabilities": True
})Hugging Face Text Classification
# Load sentiment analysis model
await mcp_client.call_tool("load_model", {
"model_id": "sentiment_classifier",
"model_type": "huggingface",
"model_path": "cardiffnlp/twitter-roberta-base-sentiment-latest"
})
# Batch sentiment analysis
result = await mcp_client.call_tool("predict_batch", {
"model_id": "sentiment_classifier",
"batch_inputs": [
"I love this product!",
"This is terrible quality.",
"It's okay, nothing special."
]
})PyTorch Image Classification
# Load image classification model
await mcp_client.call_tool("load_model", {
"model_id": "image_classifier",
"model_type": "pytorch",
"model_path": "./models/resnet50_trained.pth",
"device": "cuda"
})
# Classify images
result = await mcp_client.call_tool("predict", {
"model_id": "image_classifier",
"inputs": {"image_path": "./test_image.jpg"},
"preprocessing": True,
"return_probabilities": True
})Advanced Features
- Model Versioning: Track and manage multiple model versions
- A/B Testing: Compare model performance across versions
- Distributed Inference: Scale across multiple GPUs/nodes
- Model Monitoring: Track prediction drift and performance
- Auto-scaling: Dynamic resource allocation based on load
- Model Serving: Optional REST API for direct HTTP access
Security Features
- Input sanitization and validation
- Model file integrity checking
- Rate limiting and quota management
- Secure model loading from remote sources
- Audit logging for all predictions
Performance Optimizations
- Model caching and lazy loading
- Batch processing optimization
- GPU memory management
- Quantization support for smaller models
- Multi-threading for CPU inference
Testing Requirements
- Unit tests for each model handler
- Integration tests with real models
- Performance benchmarking tests
- Security tests for input validation
- Memory leak testing for long-running inference
Acceptance Criteria
- Python MCP server with 6+ ML inference tools
- Support for scikit-learn, PyTorch, ONNX, and Hugging Face models
- Model loading, caching, and metadata management
- Batch and single prediction capabilities
- Input validation and preprocessing pipelines
- Performance benchmarking and monitoring
- Comprehensive test suite (>90% coverage)
- Docker setup for containerized deployment
- Security features for safe model serving
- Complete documentation with examples for each framework
Priority
High - Demonstrates AI/ML integration patterns crucial for modern applications
Use Cases
- AI application development and testing
- Model deployment and serving
- ML pipeline integration
- Research and experimentation
- Production inference serving
- Model performance analysis
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestmcp-serversMCP Server SamplesMCP Server SamplesoicOpen Innovation Community ContributionsOpen Innovation Community ContributionspythonPython / backend development (FastAPI)Python / backend development (FastAPI)