CUDA: fix non-cont. inputs for batched mat mul #13155

JohannesGaessler · 2025-04-28T17:21:03Z

I misdiagnosed the problem in the previous PR. The issue is in fact not the numerical precision but rather that the memory offsets during the FP32->FP16 conversion of src1 were wrong. The code implicitly assumed that the memory layout of src1 is contiguous in the sense that there are no gaps when iterating over all elements. Usually something like this causes completely garbled outputs but in this case the effect was relatively small.

I fixed the issue by extending the conversion of floats to support non-contiguous inputs. So far we do not need it for non-float data so I did not touch that code. The conversion code is a mess and I think long-term we should refactor it. I don't think this would be very difficult, maybe we can mark this as a good first issue for CUDA in particular?

This PR reverts the previous changes to the precision logic; I thought the batched matrix multiplication only supports FP16 precision but I guess I misremembered.

ggerganov

Haven't tested this - approving just in case you would like to merge early.

Can we add a test to test-backend-ops that reproduces the original problem?

JohannesGaessler · 2025-04-29T08:54:44Z

Thanks for reminding me. test-backend-ops already had test cases for non-contiguous inputs but only src0 was made contiguous. So all that needed to be done was apply the same operation to src1 as well. I would prefer to merge this quickly; I will be available with low latency in the next few days if issues arise.

JohannesGaessler · 2025-04-29T09:39:17Z

@0cc4m it seems that the Vulkan backend currently has the same issue that the CUDA backend had.

JohannesGaessler · 2025-04-29T10:10:18Z

I'll merge this PR without the added test, then make a new PR for the test. That way the Vulkan CI won't fail for unrelated PRs until there is a fix.

jeffbolznv · 2025-04-29T14:13:45Z

I'll take a look at the Vulkan failures.

JohannesGaessler · 2025-04-29T14:31:44Z

The changed tests are in #13187 , may make sense for you to base your fix on that branch.

This reverts commit cdf7658.

CUDA: fix non-cont. inputs for batched mat mul

b380e66

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Apr 28, 2025

ggerganov approved these changes Apr 29, 2025

View reviewed changes

github-actions bot added the testing Everything test related label Apr 29, 2025

JohannesGaessler force-pushed the cuda-cublas-noncont-src1-2 branch from 92684f2 to b380e66 Compare April 29, 2025 12:51

JohannesGaessler merged commit cdf7658 into ggml-org:master Apr 29, 2025
86 of 95 checks passed

JohannesGaessler mentioned this pull request Apr 29, 2025

test: non-cont. b in test-backend-ops -o MUL_MAT #13187

Merged

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Apr 30, 2025

Revert "CUDA: fix non-cont. inputs for batched mat mul (ggml-org#13155)"

d7af092

This reverts commit cdf7658.

Alcpz mentioned this pull request May 6, 2025

sycl: addressing non-contiguous src1 mul_mats (nc and batched) #13343

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: fix non-cont. inputs for batched mat mul #13155

CUDA: fix non-cont. inputs for batched mat mul #13155

Uh oh!

JohannesGaessler commented Apr 28, 2025

Uh oh!

ggerganov left a comment

Uh oh!

JohannesGaessler commented Apr 29, 2025

Uh oh!

JohannesGaessler commented Apr 29, 2025

Uh oh!

JohannesGaessler commented Apr 29, 2025

Uh oh!

Uh oh!

jeffbolznv commented Apr 29, 2025

Uh oh!

JohannesGaessler commented Apr 29, 2025

Uh oh!

Uh oh!

CUDA: fix non-cont. inputs for batched mat mul #13155

CUDA: fix non-cont. inputs for batched mat mul #13155

Uh oh!

Conversation

JohannesGaessler commented Apr 28, 2025

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler commented Apr 29, 2025

Uh oh!

JohannesGaessler commented Apr 29, 2025

Uh oh!

JohannesGaessler commented Apr 29, 2025

Uh oh!

Uh oh!

jeffbolznv commented Apr 29, 2025

Uh oh!

JohannesGaessler commented Apr 29, 2025

Uh oh!

Uh oh!