triton full_gpu_inference_pipeline demo not working

Hi, all,

i tried the demo under ”[tutorial/full_gpu_inference_pipeline](https://github.com/Project-MONAI/tutorials/blob/main/full_gpu_inference_pipeline/README.md)“ without luck. 

## 1. errors occurred
By following the official steps, i have successfully started the Triton server and Triton client, but in the Benchmark, when i ran
```python
perf_analyzer -m spleen_seg -u localhost:18100 --input-data zero --shape "INPUT0":512,512,114 --shared-memory system
```
The following errors occurred:
```bash
# server side
I0106 14:36:26.742519 1279 grpc_server.cc:4190] Started GRPCInferenceService at 0.0.0.0:8001
I0106 14:36:26.743364 1279 http_server.cc:2857] Started HTTPService at 0.0.0.0:8000
I0106 14:36:26.785051 1279 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002
2023-01-06 14:36:43,063 - the shape of the input tensor is: torch.Size([1, 512, 512, 114])
2023-01-06 14:36:44,825 - the shape of the transformed tensor is: torch.Size([1, 224, 224, 224])
2023-01-06 14:36:44,826 - the shape of the unsqueezed transformed tensor is: torch.Size([1, 1, 224, 224, 224])
E0106 14:36:46.006760 1279 python.cc:1970] Stub process is unhealthy and it will be restarted.

# client side
*** Measurement Settings ***
  Batch size: 1
  Using "time_windows" mode for stabilization
  Measurement window: 5000 msec
  Using synchronous calls for inference
  Stabilizing using average latency

Request concurrency: 1
Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests.
Thread [0] had error: Failed to process the request(s) for model instance 'spleen_seg_0', message: Stub process is not healthy.
```
## 2. tried methods
Later i also tried to changes some parameters, e.g. **w/o shared-memory, changing shm-size from 1g to 16g, installing monai env inside server instead of using conda-pack, changing docker image version**, etc, which however, all failed.

When i tried tritonserver 22.12, the following errors occurred,
```bash
# server side
I0106 14:47:40.332535 94 grpc_server.cc:4819] Started GRPCInferenceService at 0.0.0.0:8001
I0106 14:47:40.332862 94 http_server.cc:3477] Started HTTPService at 0.0.0.0:8000
I0106 14:47:40.373862 94 http_server.cc:184] Started Metrics Service at 0.0.0.0:8002
2023-01-06 14:49:07,524 - the shape of the input tensor is: torch.Size([1, 512, 512, 114])
2023-01-06 14:49:09,501 - the shape of the transformed tensor is: torch.Size([1, 224, 224, 224])
2023-01-06 14:49:09,501 - the shape of the unsqueezed transformed tensor is: torch.Size([1, 1, 224, 224, 224])

# client side
*** Measurement Settings ***
*** Measurement Settings ***
  Batch size: 1
  Service Kind: Triton
  Using "time_windows" mode for stabilization
  Measurement window: 5000 msec
  Using synchronous calls for inference
  Stabilizing using average latency

Request concurrency: 1
Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests.
Thread [0] had error: Failed to process the request(s) for model instance 'spleen_seg_0', message: TritonModelException: DLPack tensor is not contiguous. Only contiguous DLPack tensors that are stored in C-Order are supported.

At:
  /triton_monai/spleen_seg/1/model.py(131): execute
```

## 3. some assumptions
from above errors, it can be inferred, that the following messages hit the core issue:
> TritonModelException: DLPack tensor is not contiguous. Only contiguous DLPack tensors that are stored in C-Order

It's related to the following code in the mentioned model repository(need to be downloaded), "/triton_monai/spleen_seg/1/model.py(131): execute"
```python
# get the input by name (as configured in config.pbtxt)
input_triton_tensor = pb_utils.get_input_tensor_by_name(request, "INPUT0")
input_torch_tensor = from_dlpack(input_triton_tensor.to_dlpack())
logger.info(f"the shape of the input tensor is: {input_torch_tensor.shape}")
transform_output = self.pre_transforms(input_torch_tensor[0])
logger.info(f"the shape of the transformed tensor is: {transform_output.shape}")
transform_output_batched = transform_output.unsqueeze(0)
logger.info(f"the shape of the unsqueezed transformed tensor is: {transform_output_batched.shape}")
# if(transform_output_batched.is_cuda):
# print("the transformed pytorch tensor is on GPU")

# print(transform_output.shape)
transform_tensor = pb_utils.Tensor.from_dlpack("INPUT__0", to_dlpack(transform_output_batched))
```
Apparently, the last line of code failed to run, but i don't know how to modify it. 

Any suggestions or update to the "model.py" script?</div>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

triton full_gpu_inference_pipeline demo not working #1150

1. errors occurred

2. tried methods

3. some assumptions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

triton full_gpu_inference_pipeline demo not working #1150

Description

1. errors occurred

2. tried methods

3. some assumptions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions