cuda : fix disabling device with --tensor-split 1,0 #3951

cebtenzzre · 2023-11-05T04:47:59Z

This restores the functionality of PR #2506 after it was broken in PR #3110.

Before:

$ build/bin/main -m ~/dirs/text-ai-models/misc/orca-mini-3b-gguf2-q4_0.gguf -ngl 100 -n 1 --tensor-split 1,0
<snip>
CUDA error 801 at /home/jared/src/forks/llama.cpp/ggml-cuda.cu:6799: operation not supported
current device: 0
$ build/bin/main -m ~/dirs/text-ai-models/misc/orca-mini-3b-gguf2-q4_0.gguf -ngl 100 -n 1 --tensor-split 0,1 --main-gpu 1
<snip>
CUDA error 801 at /home/jared/src/forks/llama.cpp/ggml-cuda.cu:6799: operation not supported
current device: 1

With both this PR and #3944 applied (otherwise multi-GPU doesn't work at all), both of the above commands work as expected.

ggerganov · 2023-11-05T15:33:05Z

ggml-cuda.cu

@@ -7403,7 +7410,7 @@ static void ggml_cuda_mul_mat_mat_batched_cublas(const ggml_tensor * src0, const

 static void ggml_cuda_mul_mat(const ggml_tensor * src0, const ggml_tensor * src1, ggml_tensor * dst) {
    const bool all_on_device =
-        (src0->backend == GGML_BACKEND_GPU) &&
+        (src0->backend == GGML_BACKEND_GPU || src0->backend == GGML_BACKEND_GPU_SPLIT) &&


The ggml_cuda_mul_mat_mat_batched_cublas branch does not currently support split tensors.
Would this break F16 models for multi-gpu?

yes, it is broken

Co-authored-by: slaren <[email protected]>

cebtenzzre requested a review from slaren November 5, 2023 04:47

cuda : fix disabling device with --tensor-split 1,0

05c51f9

cebtenzzre force-pushed the fix-tensor-split-zero branch from 699ea6d to 05c51f9 Compare November 5, 2023 04:57

slaren added 2 commits November 5, 2023 12:42

Merge remote-tracking branch 'origin/master' into fix-tensor-split-zero

73c0010

fix issues

47d604f

slaren approved these changes Nov 5, 2023

View reviewed changes

cebtenzzre merged commit 132d25b into master Nov 5, 2023

ggerganov reviewed Nov 5, 2023

View reviewed changes

slaren mentioned this pull request Nov 5, 2023

ggml-cuda : fix f16 mul mat #3961

Merged

olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023

cuda : fix disabling device with --tensor-split 1,0 (ggml-org#3951)

a578d04

Co-authored-by: slaren <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cuda : fix disabling device with --tensor-split 1,0 #3951

cuda : fix disabling device with --tensor-split 1,0 #3951

Uh oh!

cebtenzzre commented Nov 5, 2023

Uh oh!

ggerganov Nov 5, 2023

Uh oh!

slaren Nov 5, 2023

Uh oh!

Uh oh!

cuda : fix disabling device with --tensor-split 1,0 #3951

cuda : fix disabling device with --tensor-split 1,0 #3951

Uh oh!

Conversation

cebtenzzre commented Nov 5, 2023

Uh oh!

ggerganov Nov 5, 2023

Choose a reason for hiding this comment

Uh oh!

slaren Nov 5, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!