CUDA: fix crash on large batch size for quant. MoE #13537

JohannesGaessler · 2025-05-14T12:21:59Z

Should fix issue described in #13435 (comment) .

This PR swaps the x and y dimensions of the CUDA grid for quantizing the activations since the x dimension has a higher maximum size.

slaren · 2025-05-14T12:41:06Z

ggml/src/ggml-cuda/quantize.cu

@@ -56,13 +56,13 @@ static __global__ void quantize_mmq_q8_1(
    constexpr int vals_per_scale = ds_layout == MMQ_Q8_1_DS_LAYOUT_D2S6 ? 64 : 32;
    constexpr int vals_per_sum   = ds_layout == MMQ_Q8_1_DS_LAYOUT_D2S6 ? 16 : 32;

-    const int64_t i0 = ((int64_t)blockDim.x*blockIdx.x + threadIdx.x)*4;
+    const int64_t i0 = ((int64_t)blockDim.x*blockIdx.y + threadIdx.x)*4;


blockDim.y?

blockDim refers to the maximum extents of threadIdx. The configuration of threads was not changed, therefore blockDim.x is still correct.

jukofyork · 2025-05-14T14:38:57Z

Please confirm whether #13537 fixes the issue.

Yeah, this fixed it for me - thanks!

…13537)" This reverts commit 4696d56.

…#13537)" Except MMQ.

CUDA: fix crash on large batch size for quant. MoE

634be72

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 14, 2025

JohannesGaessler mentioned this pull request May 14, 2025

CUDA: faster Deepseek FA, add Turing support #13435

Merged

slaren reviewed May 14, 2025

View reviewed changes

slaren approved these changes May 14, 2025

View reviewed changes

JohannesGaessler merged commit 4696d56 into ggml-org:master May 14, 2025
44 checks passed

Silver267 pushed a commit to Silver267/llama.cpp that referenced this pull request May 14, 2025

CUDA: fix crash on large batch size for quant. MoE (ggml-org#13537)

60526e3

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request May 16, 2025

Revert "CUDA: fix crash on large batch size for quant. MoE (ggml-org#…

0732e67

…13537)" This reverts commit 4696d56.

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Jul 17, 2025

Reapply "CUDA: fix crash on large batch size for quant. MoE (ggml-org…

47e69c4

…#13537)" Except MMQ.

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Jul 25, 2025

Reapply "CUDA: fix crash on large batch size for quant. MoE (ggml-org…

299f046

…#13537)" Except MMQ.

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Jul 25, 2025

Reapply "CUDA: fix crash on large batch size for quant. MoE (ggml-org…

681c46a

…#13537)" Except MMQ.

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Jul 26, 2025

Reapply "CUDA: fix crash on large batch size for quant. MoE (ggml-org…

e114aed

…#13537)" Except MMQ.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: fix crash on large batch size for quant. MoE #13537

CUDA: fix crash on large batch size for quant. MoE #13537

Uh oh!

JohannesGaessler commented May 14, 2025

Uh oh!

slaren May 14, 2025

Uh oh!

JohannesGaessler May 14, 2025

Uh oh!

jukofyork commented May 14, 2025

Uh oh!

Uh oh!

Uh oh!

CUDA: fix crash on large batch size for quant. MoE #13537

CUDA: fix crash on large batch size for quant. MoE #13537

Uh oh!

Conversation

JohannesGaessler commented May 14, 2025

Uh oh!

slaren May 14, 2025

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler May 14, 2025

Choose a reason for hiding this comment

Uh oh!

jukofyork commented May 14, 2025

Uh oh!

Uh oh!

Uh oh!