Description
Describe the bug
The documentation states that when i deploy a model with model_server_workers = None,
model_server_workers (int) – Optional. The number of worker processes used by the inference server. If None, server will use one worker per vCPU.
However what i found is when i deploy my model in a ml.c5.2xlarge (8 vCPU, one CPU i Guess), it only uses 1 worker (show logs below)
if i pass the parameters into the deploy function, it correctly set the Default workers per model to the number i have specified through the model_server_workers parameter.
As a conclusion, the documentation is not updated, or the behaviour when model_server_workers = NOne does not work.
To reproduce
Deploy any model on a ml.c5.2xlarge, check the log and the entry Default workers per model
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots or logs
This is an extract of the log from the endpoint:
**Number of CPUs: 1**
Max heap size: 3739 M
Python executable: /usr/local/bin/python3.6
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://127.0.0.1:8081
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: /logs
Metrics dir: /logs
Netty threads: 0
Netty client threads: 0
**Default workers per model: 1**
System information
A description of your system. Please provide:
- SageMaker Python SDK version: '1.42.1'
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): custom script on MXNET 1.4.1
- Framework version: 1.4.1
- Python version: 3
- CPU or GPU: CPU
- Custom Docker image (Y/N): AWS docker
Additional context
Add any other context about the problem here.