Skip to content

triton full_gpu_inference_pipeline demo not working #1150

@M1n9X

Description

@M1n9X

Hi, all,

i tried the demo under ”tutorial/full_gpu_inference_pipeline“ without luck.

1. errors occurred

By following the official steps, i have successfully started the Triton server and Triton client, but in the Benchmark, when i ran

perf_analyzer -m spleen_seg -u localhost:18100 --input-data zero --shape "INPUT0":512,512,114 --shared-memory system

The following errors occurred:

# server side
I0106 14:36:26.742519 1279 grpc_server.cc:4190] Started GRPCInferenceService at 0.0.0.0:8001
I0106 14:36:26.743364 1279 http_server.cc:2857] Started HTTPService at 0.0.0.0:8000
I0106 14:36:26.785051 1279 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002
2023-01-06 14:36:43,063 - the shape of the input tensor is: torch.Size([1, 512, 512, 114])
2023-01-06 14:36:44,825 - the shape of the transformed tensor is: torch.Size([1, 224, 224, 224])
2023-01-06 14:36:44,826 - the shape of the unsqueezed transformed tensor is: torch.Size([1, 1, 224, 224, 224])
E0106 14:36:46.006760 1279 python.cc:1970] Stub process is unhealthy and it will be restarted.

# client side
*** Measurement Settings ***
  Batch size: 1
  Using "time_windows" mode for stabilization
  Measurement window: 5000 msec
  Using synchronous calls for inference
  Stabilizing using average latency

Request concurrency: 1
Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests.
Thread [0] had error: Failed to process the request(s) for model instance 'spleen_seg_0', message: Stub process is not healthy.

2. tried methods

Later i also tried to changes some parameters, e.g. w/o shared-memory, changing shm-size from 1g to 16g, installing monai env inside server instead of using conda-pack, changing docker image version, etc, which however, all failed.

When i tried tritonserver 22.12, the following errors occurred,

# server side
I0106 14:47:40.332535 94 grpc_server.cc:4819] Started GRPCInferenceService at 0.0.0.0:8001
I0106 14:47:40.332862 94 http_server.cc:3477] Started HTTPService at 0.0.0.0:8000
I0106 14:47:40.373862 94 http_server.cc:184] Started Metrics Service at 0.0.0.0:8002
2023-01-06 14:49:07,524 - the shape of the input tensor is: torch.Size([1, 512, 512, 114])
2023-01-06 14:49:09,501 - the shape of the transformed tensor is: torch.Size([1, 224, 224, 224])
2023-01-06 14:49:09,501 - the shape of the unsqueezed transformed tensor is: torch.Size([1, 1, 224, 224, 224])

# client side
*** Measurement Settings ***
*** Measurement Settings ***
  Batch size: 1
  Service Kind: Triton
  Using "time_windows" mode for stabilization
  Measurement window: 5000 msec
  Using synchronous calls for inference
  Stabilizing using average latency

Request concurrency: 1
Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests.
Thread [0] had error: Failed to process the request(s) for model instance 'spleen_seg_0', message: TritonModelException: DLPack tensor is not contiguous. Only contiguous DLPack tensors that are stored in C-Order are supported.

At:
  /triton_monai/spleen_seg/1/model.py(131): execute

3. some assumptions

from above errors, it can be inferred, that the following messages hit the core issue:

TritonModelException: DLPack tensor is not contiguous. Only contiguous DLPack tensors that are stored in C-Order

It's related to the following code in the mentioned model repository(need to be downloaded), "/triton_monai/spleen_seg/1/model.py(131): execute"

# get the input by name (as configured in config.pbtxt)
input_triton_tensor = pb_utils.get_input_tensor_by_name(request, "INPUT0")
input_torch_tensor = from_dlpack(input_triton_tensor.to_dlpack())
logger.info(f"the shape of the input tensor is: {input_torch_tensor.shape}")
transform_output = self.pre_transforms(input_torch_tensor[0])
logger.info(f"the shape of the transformed tensor is: {transform_output.shape}")
transform_output_batched = transform_output.unsqueeze(0)
logger.info(f"the shape of the unsqueezed transformed tensor is: {transform_output_batched.shape}")
# if(transform_output_batched.is_cuda):
# print("the transformed pytorch tensor is on GPU")

# print(transform_output.shape)
transform_tensor = pb_utils.Tensor.from_dlpack("INPUT__0", to_dlpack(transform_output_batched))

Apparently, the last line of code failed to run, but i don't know how to modify it.

Any suggestions or update to the "model.py" script?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions