Fix Q4_K and Q5_K for QK_K = 64 on CUDA #2359

ikawrakow · 2023-07-24T06:40:17Z

They were broken (don't even compile) when LLAMA_CUDA_FORCE_DMMV=OFF. As per @JohannesGaessler, when added the matrix times vector versions that use integer SIMD intrinsics, he did not implement for QK_K = 64 as he considered QK_K = 64 to be a temporary fix. I agree with this sentiment, but in the meantime it is not good to have broken stuff on master, so this PR fixes that.

A fix was needed only for Q4_K and Q5_K. Q2_K, Q3_K, and Q6_K worked out of the box.

Performance is not great, but I did not want to spend the time optimizing a temporary solution.

JohannesGaessler

I did not test the implementation it but based on just reading it it seems okay.

Kawrakow added 2 commits July 24, 2023 08:59

Fix Q4_K and Q5_K for QK_K = 64

7f96ff9

Very slightly better Q5_K bit fiddling

e6dd6bc

ikawrakow requested a review from JohannesGaessler July 24, 2023 06:40

JohannesGaessler approved these changes Jul 24, 2023

View reviewed changes

ikawrakow merged commit 129d844 into master Jul 25, 2023

ikawrakow deleted the ik/cuda_fix_QKK_64_2 branch July 25, 2023 10:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Q4_K and Q5_K for QK_K = 64 on CUDA #2359

Fix Q4_K and Q5_K for QK_K = 64 on CUDA #2359

Uh oh!

ikawrakow commented Jul 24, 2023

Uh oh!

JohannesGaessler left a comment

Uh oh!

Uh oh!

Fix Q4_K and Q5_K for QK_K = 64 on CUDA #2359

Fix Q4_K and Q5_K for QK_K = 64 on CUDA #2359

Uh oh!

Conversation

ikawrakow commented Jul 24, 2023

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!