[SYCL] Minor arithmetic improvement to MMVQ wrapper kernel #7172

OuadiElfarouki · 2024-05-09T09:33:54Z

This minor arithmetic change (a%b = a - b*(a/b)) -apparently missed by icpx/icx compiler- brings a considerable performance improvement [5% to 35%] to type-1 quantized models (e.g. QK_4* and QK_5_*)_ in text generation when targetting Nvidia devices, without scratching performance of the other supported quantizations AND devices.

Tested devices for this matter :

Nvidia : A100-40GB, A4000, RTX 4060
intel : intel Data Center GPU Max 1100, intel Arc A770

Compiler : oneAPI 2024.1.0

Thanks for reviewing this @AidanBeltonS @abhilash1910 @airMeng @NeoZhangJianyu

JohannesGaessler · 2024-05-09T09:54:46Z

Context: I have written most of the MMVQ CUDA code which to my understanding is what the SYCL code is adapted from.

I don't understand why this change is faster. qi and vdr are powers of 2 that are both known at compile time so their ratio is also a power of 2 known at compile time. A number modulo a power of 2 should be converted by the compiler to just a binary and which should be faster than a subtraction.

abhilash1910

This looks ok to me.
However I am also confused as to how subtraction beats modulo as the later should be efficient . For Intel hardware , is there a significant boost as in Nvidia ?
If this is something which looks promising then such changes need to be addressed in dpct and icpx .
Also pinging @ggerganov for a look when available.

OuadiElfarouki · 2024-05-09T10:44:23Z

Context: I have written most of the MMVQ CUDA code which to my understanding is what the SYCL code is adapted from.

I don't understand why this change is faster. qi and vdr are powers of 2 that are both known at compile time so their ratio is also a power of 2 known at compile time. A number modulo a power of 2 should be converted by the compiler to just a binary and which should be faster than a subtraction.

@JohannesGaessler Agree it should be handled by the compiler, in fact a similar change to equivalent operations in other "hot spots" didn't result in such a leap in performance. So reasons seem unclear atm but the numbers might justify the patch.

OuadiElfarouki · 2024-05-09T10:51:59Z

This looks ok to me. However I am also confused as to how subtraction beats modulo as the later should be efficient . For Intel hardware , is there a significant boost as in Nvidia ? If this is something which looks promising then such changes need to be addressed in dpct and icpx . Also pinging @ggerganov for a look when available.

@abhilash1910 Yes it's odd indeed, performance leap were mainly noticed on Nvidia GPUs (some examples below), no to very little improvement on intel targets though.

Some LLama2 13B examples (patchPerf-masterPerf)/masterPerf):

Nvidia A4000

Q4_K_M	36%
Q4_K_S	52%
Q5_K_M	38%
Q5_K_S	52%

Nvidia A100 :

Q4_K_M	62%
Q4_K_S	70%
Q5_K_M	57%
Q5_K_S	43%

NeoZhangJianyu · 2024-05-10T00:31:49Z

It's working on Intel iGPU. About 3% increase.
It's good!

This reverts commit 8c570c9.

#7172)" (#7980) * Revert "Minor arithmetic improvement to mmvq wrapper kernel (#7172)" This reverts commit 8c570c9. * Update ggml-sycl.cpp

ggml-org#7172)" (ggml-org#7980) * Revert "Minor arithmetic improvement to mmvq wrapper kernel (ggml-org#7172)" This reverts commit 8c570c9. * Update ggml-sycl.cpp

Minor arithmetic improvement to mmvq wrapper kernel

7a563e6

abhilash1910 approved these changes May 9, 2024

View reviewed changes

mofosyne added the performance Speed related topics label May 9, 2024

mofosyne added the Review Complexity : High Generally require indepth knowledge of LLMs or GPUs label May 9, 2024

NeoZhangJianyu approved these changes May 10, 2024

View reviewed changes

NeoZhangJianyu merged commit 8c570c9 into ggml-org:master May 10, 2024

joeatodd added a commit that referenced this pull request Jun 17, 2024

Revert "Minor arithmetic improvement to mmvq wrapper kernel (#7172)"

6f66125

This reverts commit 8c570c9.

joeatodd added a commit that referenced this pull request Jun 18, 2024

sycl-exp : Revert "Minor arithmetic improvement to mmvq wrapper kernel (

f99c653

#7172)" (#7980) * Revert "Minor arithmetic improvement to mmvq wrapper kernel (#7172)" This reverts commit 8c570c9. * Update ggml-sycl.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL] Minor arithmetic improvement to MMVQ wrapper kernel #7172

[SYCL] Minor arithmetic improvement to MMVQ wrapper kernel #7172

Uh oh!

OuadiElfarouki commented May 9, 2024

Uh oh!

JohannesGaessler commented May 9, 2024

Uh oh!

abhilash1910 left a comment •

edited

Loading

Uh oh!

OuadiElfarouki commented May 9, 2024

Uh oh!

OuadiElfarouki commented May 9, 2024 •

edited

Loading

Uh oh!

NeoZhangJianyu commented May 10, 2024 •

edited

Loading

Uh oh!

Uh oh!

[SYCL] Minor arithmetic improvement to MMVQ wrapper kernel #7172

[SYCL] Minor arithmetic improvement to MMVQ wrapper kernel #7172

Uh oh!

Conversation

OuadiElfarouki commented May 9, 2024

Uh oh!

JohannesGaessler commented May 9, 2024

Uh oh!

abhilash1910 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OuadiElfarouki commented May 9, 2024

Uh oh!

OuadiElfarouki commented May 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NeoZhangJianyu commented May 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

abhilash1910 left a comment •

edited

Loading

OuadiElfarouki commented May 9, 2024 •

edited

Loading

NeoZhangJianyu commented May 10, 2024 •

edited

Loading