-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Multi-GPU support for AMD? #3051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If you compile with hipBLAS then you can use multiple AMD gpu's. I have it working fine in Linux and am working on getting it to compile under windows. |
Multiple AMD GPU support isn't working for me. @ccbadd Have you tried it? I checked out llama.cpp from early Sept. 2023 and it isn't working for me there either. I don't think it's ever worked. I have a Linux system with 2x Radeon RX 7900 XTX. Both of them are recognized by llama.cpp. But the LLM just prints a bunch of |
@jart Depending on the version you have pulled, multi gpu support for both AMD and NVidia has been a little unstable. I would pull a more recent build and try again. Be sure to update ROCm also as v6 is available right now that has better support for your gpus. |
I got near 100% utilization across 8 AMD MI100 gpus (gfx908) but I had to pass
so I used |
But does it actually work? Am I correct in my understanding that llama.cpp is able to run an LLM on multiple AMD GPUs so long as they're the AMD Instinct HPC enterprise cards? I have 2x Radeon RX 7900 XTX because they're the cheapest cards AMD supports using on Linux.
|
it worked for me but I was using llama2-70b model.
|
@mjkpolo In your case it only appears to be assigning to be assigning work to the first GPU. |
@jart how can you tell? I accidentally used
|
Notice how it says on mine:
Versus yours:
If all your GPUs are being utilized, then it's probably because your GPUs are capable of presenting themselves to llama.cpp as a single unified device. I mean I wish I had your computer. But I've just got separate cards plugged into two PCIE slots on a consumer PC. I know llama.cpp is capable of splitting manually the work across multiple cards for NVIDIA. I've seen it happen. I'd just like for it to be able to do that with AMD too. |
Ohhh I see that's really interesting and good to keep in mind, thx! |
@jart Update, I tried with
|
One of my setups has 2x MI100's and 2X W6800's and all 4 work fine but the MI100's are a lot slower than the W6800s. I have never seen the -sm switch, I'll give it a try in a while to see if it helps. |
By default the I believe multi-GPU should also work with AMD hardware, but I haven't tested. It does work with NVIDIA GPUs. There could be some ROCm related issues, but maybe try a few different models first to see if it is not a model problem |
I'm also getting crap when my dual 7900 xtx are used, thought maybe it was models I was trying, so tried old one I know really worked on my 4080 machine, but it is also giving gibberish when run with both 7900. Single one (via Now one thing I must say, I'm using desktop class CPU and mobo (ryzen5950x), and it does not have have enough pcie lanes to connect both cards to cpu. pytorch basically refuses to run at all across two cards last time I tried. But I'm not sure if that matters for llama.cpp ? |
@morphles did you manage to make it work on 2x 7900xtx? Have you tried Mixtral? |
In my setup I was able to use all 4 cards for a single model but now that the layer/row split thing has been implemented it no longer works properly. The 2 MI100s need to -sm option to be fast(er) but the W6800 will not work with that set. Shouldn't the W6800 work with layer or row split? |
any progress? my dual 7900 XTX running ollama just outputs '#############....' |
So got back to checking the AI stuff, and seems with hipblas my dual 7900xtx still produce garbage. And vulkan for now also seems to not work across multiple gpus? |
Vulkan does work for multi gpus as I have tested it with dual A770s and dual W6800s. Not all quant and model types are supported yet so make sure you are not trying to use an unsupported model, Mixtral for instance, or an unsupported quant, type a simple Q4_0 quant and you will see. |
@ccbadd yeah managed to make it work, even on more novel command-r model it worked. And yeah I saw that mistral for now did not work. I just needed env var to make both GPUs visible. |
I am running MixTAO-7Bx2-MoE-v8.1 and just get a bunch of garbage ######### coming out per the above poster. Assuming that this is because it's based on Mistral too? Got 2 x 7900XTX. |
Try the flag described here if you have issues with AMD multi GPU: |
Thank you for the super-fast reply and also for the work that you are doing on this project. Really appreciate it. I recompiled with the flag set and now it "hangs" for several minutes before failing with this CUDA error (not surprising as it's AMD not NVIDIA): llm_load_tensors: ROCm0 buffer size = 12784.80 MiB Tried several ways to build: make clean && LLAMA_HIPBLAS=1 LLAMA_CUDA_NO_PEER_COPY=1 make -j Etc. |
Hmm.... this may have been a context window thing, because after another build I now have it working. Your patch was the secret source. Thank you so much for doing that patch. I dropped the context window down to 4096. I also did a rebuild with this command line: make clean && LLAMA_HIPBLAS=1 LLAMA_CUDA_NO_PEER_COPY=1 make -j 16 In case it helps anyone else, this is how I am running it: ./main -m ../models/MixTAO-7Bx2-MoE-v8.1.gguf -n 256 -c 4096 --interactive-first --repeat_penalty 1.0 --color -i -ngl 33 |
Oh that's very helpful thank you! Thank you once again. Really appreciate it. |
@Speedway1 @ccbadd How's multi card support now with Rocm 6.1 released? I currently have a 3090 and am considering either getting dual 7900xtx or dual mi100s, or a single 4090. What are you speeds like? And any issues using faster-whisper or alternatives? |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Do you have multi-GPU support for AMD, if not, do you see it as something you might add in the future?
The text was updated successfully, but these errors were encountered: