-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Labels
Description
Single GPU is OK, System hangs when I use multiple GPUs. Can someone help solve this? Thanks.
python build.py --model_dir meta-llama/Llama-2-7b-chat-hf
--dtype float16
--remove_input_padding
--use_gpt_attention_plugin float16
--enable_context_fmha
--use_gemm_plugin float16
--output_dir ./tmp/llama/7B/trt_engines/fp16/4-gpu/
--world_size 4
--tp_size 4
mpirun -n 4 --allow-run-as-root
python ../summarize.py --test_trt_llm
--hf_model_dir meta-llama/Llama-2-7b-chat-hf
--data_type fp16
--engine_dir ./tmp/llama/7B/trt_engines/fp16/4-gpu/

dhruvmullick