-
Notifications
You must be signed in to change notification settings - Fork 11.9k
CUDA error 217 at ggml-cuda.cu:6292: peer access is not supported between these two devices #3230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Put them on the same CPU and try again? |
2 are on CPU1 and 1 is on CPU2, due to the nature of the available slots on the motherboard. On my R720 system, I definitely have GPUs split between CPUs and I don't have this issue but I also haven't updated that system in the past week or so. |
Looks like #2470 got merged to master, if I read that correctly. Is there a compile flag to un-set it? |
Yea, set the limit lower than your nbatch, that should mean it never gets enabled. |
What option are you referring to? I'm using '-c 4096', is it an option not shown via '--help'? |
I pushed a fix: #3231 . Please check whether it works.
It's a compile option |
Heh.. I just had the same problem now. Also my 3090 are 0 and 1, nvlink always enables 0->1 and 1->0 but now with this change it failed on 1-> when running falcon along with the P40s. So to say it "only' works with the main device is running a bit counter to what I am seeing. I will try the fix. edit: it does indeed work. |
"make clean && make -j LLAMA_CUBLAS=1 LLAMA_CUDA_PEER_MAX_BATCH_SIZE=2048" still results in "CUDA error 217 at ggml-cuda.cu:6292: peer access is not supported between these two devices |
Setting the size high means you will always enable it. Set the batch size to like 64 if you use 512. At least that's how I think it works. I also merged the PR that was posted before. |
Just do be clear: you are not talking about the PR I posted, are you? |
I was referring to #3231 |
I pulled again just now on main, it's working. Thanks! |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
Please provide a detailed written description of what you were trying to do, and what you expected
llama.cpp
to do.Current Behavior
execution of:
fails with:
Please provide a detailed written description of what
llama.cpp
did, instead.NOTE: I've run it as well without "--numa", the results are the same
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
$ lscpu
Debian 12
$ uname -a
llama.cpp$ git log | head -1
commit 111163e
CUDA info:
NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.0
NOTE: I have another dual Xeon system which also is "no" as above, and it does not have the issue.
The text was updated successfully, but these errors were encountered: