CUDA: fix --split-mode row for MMQ #13323

JohannesGaessler · 2025-05-05T21:45:37Z

The problem is that for --split-mode row the number of columns in src1 and dst can be different and I did not consider this correctly when calculating the src1 pointers. I thought I had tested this but it seems I made a mistake. Sorry!

* origin/master: (27 commits) llama : fix build_ffn without gate (ggml-org#13336) CUDA: fix bad asserts for partial offload (ggml-org#13337) convert : qwen2/3moe : set yarn metadata if present (ggml-org#13331) CUDA: fix --split-mode row for MMQ (ggml-org#13323) gguf-py : avoid requiring pyside6 for other scripts (ggml-org#13036) CUDA: fix logic for clearing padding with -ngl 0 (ggml-org#13320) sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (ggml-org#13264) server : Webui - change setText command from parent window to also send the message. (ggml-org#13309) mtmd : rename llava directory to mtmd (ggml-org#13311) clip : fix confused naming ffn_up and ffn_down (ggml-org#13290) convert : bailingmoe : set yarn metadata if present (ggml-org#13312) SYCL: Disable mul_mat kernels for noncontiguous tensor b (ggml-org#13308) mtmd : add C public API (ggml-org#13184) rpc : use backend registry, support dl backends (ggml-org#13304) ggml : activate s390x simd for Q3_K (ggml-org#13301) llava/mtmd : fixes to fully support dl backends (ggml-org#13303) llama : build windows releases with dl backends (ggml-org#13220) CUDA: fix race condition in MMQ stream-k fixup (ggml-org#13299) CUDA: fix race condition in MMQ ids_dst (ggml-org#13294) vulkan: Additional type support for unary, binary, and copy (ggml-org#13266) ...

This reverts commit 15a28ec.

CUDA: fix --split-mode row for MMQ

6e221de

JohannesGaessler mentioned this pull request May 5, 2025

Eval bug: -sm row causes wrong output #13297

Closed

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 5, 2025

slaren approved these changes May 5, 2025

View reviewed changes

JohannesGaessler merged commit 15a28ec into ggml-org:master May 6, 2025
46 checks passed

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request May 9, 2025

Revert "CUDA: fix --split-mode row for MMQ (ggml-org#13323)"

88b2186

This reverts commit 15a28ec.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: fix --split-mode row for MMQ #13323

CUDA: fix --split-mode row for MMQ #13323

Uh oh!

JohannesGaessler commented May 5, 2025

Uh oh!

Uh oh!

Uh oh!

CUDA: fix --split-mode row for MMQ #13323

CUDA: fix --split-mode row for MMQ #13323

Uh oh!

Conversation

JohannesGaessler commented May 5, 2025

Uh oh!

Uh oh!

Uh oh!