-
Notifications
You must be signed in to change notification settings - Fork 12.2k
CUDA: add bf16 and f32 support to cublas_mul_mat_batched #14361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
79ca9fd
to
789c697
Compare
fe14807
to
b7225ec
Compare
mul_mat_batched with bf16 is failing for |
2b83788
to
87aeacf
Compare
@JohannesGaessler mul-mat tests in bf16 which fail for vulkan because of an assert |
Sorry, I didn't see the Vulkan comment. The problem from what I can tell is that the logic in |
I think this was supposed to work, but just changing the assert I see the test fail. I'll debug it. |
#14378 should fix the new tests. |
Add bf16 and f32 to support batched cuBLAS mul mat. Speed up when we do
--cache_type_v bf16 --cache_type_k bf16
when running llama-bench