Skip to content

cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy #6208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 22, 2024

Conversation

slaren
Copy link
Member

@slaren slaren commented Mar 21, 2024

Adds the build flag LLAMA_CUDA_NO_PEER_COPY to disable peer to peer copies, which causes ggml-backend to fallback to copy over the CPU. This also disables pipeline parallelism.

Ref: #3772 (comment)

@morphles can you check if building this PR with LLAMA_CUDA_NO_PEER_COPY also fixes your issue? It should have the same effect as the patch that you tested previously.

@morphles
Copy link

I think something is not right, first if I build with make LLAMA_HIPBLAS=1 LLAMA_CUDA_NO_PEER_COPY=1 I think it somehow manages to not set up NVCCFLAGS as it should (at least from build lines I do not see -DGGML_CUDA_NO_PEER_COPY in there, but maybe I'm using make wrong here? But in the end model loads, but is still producing nonsense as on base build.

If I build with cmake like this:

 CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ cmake -H. -Bbuild -DLLAMA_HIPBLAS=ON -DLLAMA_CUDA_NO_PEER_COPY=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -- -j 16

I get failing assertion:

llama_new_context_with_model: graph nodes  = 1668
llama_new_context_with_model: graph splits = 3
CUDA error: shared object initialization failed
  current device: 0, in function ggml_cuda_op_flatten at /home/morphles/n/customized_llama/PR_check/llama.cpp/ggml-cuda.cu:9960
  hipGetLastError()
GGML_ASSERT: /home/morphles/n/customized_llama/PR_check/llama.cpp/ggml-cuda.cu:193: !"CUDA error"

Still maybe I'm using cmake wrong too?

@slaren
Copy link
Member Author

slaren commented Mar 21, 2024

@morphles should be fixed now, I didn't realize that the HIP build handles these flags separately.

@morphles
Copy link

Now seems to be good, tested both make and cmake as above. Amazing, and huge thanks!

@slaren slaren merged commit 2f0e81e into master Mar 22, 2024
@slaren slaren deleted the sl/rocm-radeon-multi-gpu-workaround branch March 22, 2024 13:05
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
…ggml-org#6208)

* cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy

* add LLAMA_CUDA_NO_PEER_COPY to HIP build
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 3, 2024
…ggml-org#6208)

* cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy

* add LLAMA_CUDA_NO_PEER_COPY to HIP build
tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024
…ggml-org#6208)

* cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy

* add LLAMA_CUDA_NO_PEER_COPY to HIP build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants