thread control for pytorch backend to fix the issue of PyTorch very slow inference on multi-core CPUs

**Is your feature request related to a problem? Please describe.**

For now the Tensorflow and ONNX backends in Triton support thread controls ([here](https://github.com/triton-inference-server/tensorflow_backend/?tab=readme-ov-file#parameters) and [here](https://github.com/triton-inference-server/onnxruntime_backend?tab=readme-ov-file#model-config-options)). Would like to have similar features for PyTorch as well. 

This is useful because in several cases we have seen PyTorch inference runs (super) slow on multi-core CPU machines. In O(100) core machines we have even seen one inference takes several minutes, despite being a small model. This might be due to PyTorch internal problem, but **a temporary solution seems to be to configure the number of intra-op parallism to 1**.

See examples [here](https://github.com/pytorch/pytorch/issues/7686), [here](https://github.com/pytorch/pytorch/issues/16894), and also previously in Triton issues [here](https://github.com/triton-inference-server/server/issues/4142). In our cases we have found out seting the number of instances is **NOT** enough to fix the problem, and we need to set both the number of model instances and the number of Intra-op parallelism to 1. Tested with some examples and have confirmed this can fix the slow inference problem for PyTorch on CPUs.

This can be done with `at::set_num_threads(1)` when loading the models  https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html#runtime-api 

We have one implemenation fixing this problem [here](https://github.com/triton-inference-server/pytorch_backend/compare/main...yongbinfeng:pytorch_backend:withThreadControl?expand=1). If this solution sounds good to you we can open one pull request on this.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

thread control for pytorch backend to fix the issue of PyTorch very slow inference on multi-core CPUs #6896

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

thread control for pytorch backend to fix the issue of PyTorch very slow inference on multi-core CPUs #6896

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions