-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Model Loading Stuck (in ray ?) #1846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What's your nvidia driver version and topology? You can get it via |
Driver Version: 535.129.03 |
Thanks. I was suspecting this is the case of #1801 but doesn't seem like it. Can you paste your full command line argument or script? Additionally, can you run
|
|
There is something wrong with the connection between 2 specific gpus. When I use the other 2 gpus, the code can work well. |
We have the same problem running inference with fastchat vllm-worker. |
It does work for me too, but is weird🧐 what's wrong with the connection? |
Do you also use L40 to run vllm ? |
Do you also use L40 to run vllm ? |
When I add "export NCCL_P2P_DISABLE = 1" in the ~/.bashrc , the code can also work using the previous 2 GPUs. |
great. This is working on both machines. With A100 and Tesla T4. |
No I'm using RTX3090*8, but this works for me, also the "export NCCL_P2P_DISABLE = 1" one works. |
I think "export NCCL_P2P_DISABLE = 1" has impact on performance. You can check issue NVIDIA/nccl-tests#117, it is used on 4090. |
how to change gpus? For example ,if I have 8 gpus in a machine,how can I specify to use gpu 3 and gpu 4? |
Maybe |
I am using docker |
Well sorry I don't know much about the docker command. |
thank you |
The command CUDA_VISIBLE_DEVICES=3,4 doesn't seem to be effective. Although I've set it, the script continues to run on devices 1 and 2. I believe specifying the devices directly in the Docker run command would be more useful |
I am not sure if you are using the |
Uh oh!
There was an error while loading. Please reload this page.
python = 3.11.5
torch = 2.1.0 + cu121
vllm = 0.2.2
GPU: L40 * 4
I install vllm by "pip install vllm".
It will STUCK when loading vicuna-7b-v1.5 model using the vllm framework, while the fastchat framework work well.
When I arise a KeyboardInterrupt, it is stuck at
the ./ray/_private/worker.py, line 769, in get_objects
data_metadata_pairs = self.core_worker.get_objects(
File "python/ray/_raylet.pyx" in line 3211, in ray._raylet.CoreWorker.get_objects
File "python/ray/_raylet.pyx" in line 449, in ray._raylet.check_status
KeyboardInterrupt
The text was updated successfully, but these errors were encountered: