Skip to content

cuda : fix disabling device with --tensor-split 1,0 #3951

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 5, 2023

Conversation

cebtenzzre
Copy link
Collaborator

This restores the functionality of PR #2506 after it was broken in PR #3110.

Before:

$ build/bin/main -m ~/dirs/text-ai-models/misc/orca-mini-3b-gguf2-q4_0.gguf -ngl 100 -n 1 --tensor-split 1,0
<snip>
CUDA error 801 at /home/jared/src/forks/llama.cpp/ggml-cuda.cu:6799: operation not supported
current device: 0
$ build/bin/main -m ~/dirs/text-ai-models/misc/orca-mini-3b-gguf2-q4_0.gguf -ngl 100 -n 1 --tensor-split 0,1 --main-gpu 1
<snip>
CUDA error 801 at /home/jared/src/forks/llama.cpp/ggml-cuda.cu:6799: operation not supported
current device: 1

With both this PR and #3944 applied (otherwise multi-GPU doesn't work at all), both of the above commands work as expected.

@cebtenzzre cebtenzzre requested a review from slaren November 5, 2023 04:47
@cebtenzzre cebtenzzre force-pushed the fix-tensor-split-zero branch from 699ea6d to 05c51f9 Compare November 5, 2023 04:57
@cebtenzzre cebtenzzre merged commit 132d25b into master Nov 5, 2023
@@ -7403,7 +7410,7 @@ static void ggml_cuda_mul_mat_mat_batched_cublas(const ggml_tensor * src0, const

static void ggml_cuda_mul_mat(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) {
const bool all_on_device =
(src0->backend == GGML_BACKEND_GPU) &&
(src0->backend == GGML_BACKEND_GPU || src0->backend == GGML_BACKEND_GPU_SPLIT) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ggml_cuda_mul_mat_mat_batched_cublas branch does not currently support split tensors.
Would this break F16 models for multi-gpu?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is broken

olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants