-
Notifications
You must be signed in to change notification settings - Fork 121
Unable to launch triton server with TP #577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Even tried without quantization, following the steps given in the official examples
Still stuck. Tried making batch size consistent between triton model config and built engine, but no gain. |
I tried the official image: nvcr.io/nvidia/tritonserver:24.08-trtllm-python-py3 which was launched 2 days back, and built the TRT engines from this Problem remains though, even with reduce_fusion enabled. Logs below: Logs
Way to recreate:
Config
Spawn triton server using python3 launch_triton_server.py --world_size=2 --model_repo=models_dhruv --log --tensorrt_llm_model_name=meta_llama_3_1_8B_instruct_vanilla_trt |
@dhruvmullick I'm facing the same problem on my multi-GPU server with 4x L40S. Have you managed to solve it? |
@imihic, after spending a week on this, I pivoted to vLLM. |
Is there any updates to this issue please? |
System Info
Built tensorrtllm_backend from source using dockerfile/Dockerfile.trt_llm_backend
tensorrt_llm 0.13.0.dev2024081300
tritonserver 2.48.0
triton image: 24.07
Cuda 12.5
Who can help?
@Tracin @kaiyux @schetlur-nv
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I've built a TRTLLM engine for meta llama 3 8B and I'm seeing the triton server get stuck while spawning if using tensor parallelism > 1.
Things work if I don't use tp while building the engine and spawning it.
Build the Engine:
Command used to launch the server:
Expected behavior
The server should spawn and start serving requests on localhost.
actual behavior
I see the logs on the console:
In the /tmp/logs.txt file, I see the last output:
And nothing after this.
additional notes
NA
The text was updated successfully, but these errors were encountered: