CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row #5386

JohannesGaessler · 2024-02-07T09:31:51Z

Fixes #5383 introduced by #5351 .
The issue is that with -sm row the output buffer on the main GPU is larger than the number of rows of the input buffer to accommodate the results from the other GPUs.
The stride when going from row to row therefore needs to be larger on the main device.
This only matters for batch sizes > 1 and I forgot that -sm layer is now the default when testing.

CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row

3b0a1a0

JohannesGaessler mentioned this pull request Feb 7, 2024

Regression: #5351 breaks --split-mode row with -np >1 on server.cpp for MoE #5383

Closed

slaren approved these changes Feb 7, 2024

View reviewed changes

JohannesGaessler merged commit aa7ab99 into ggml-org:master Feb 7, 2024

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024

CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (ggml-org#5386)

3b495c3

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (ggml-org#5386)

6014aa6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row #5386

CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row #5386

Uh oh!

JohannesGaessler commented Feb 7, 2024

Uh oh!

Uh oh!

CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row #5386

CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row #5386

Uh oh!

Conversation

JohannesGaessler commented Feb 7, 2024

Uh oh!

Uh oh!