-
Notifications
You must be signed in to change notification settings - Fork 12.9k
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Hi,
I compiled latest llama.cpp with Vulkan backend enabled on a system with Vulkan bfloat16 support:
cmake -B build -DGGML_VULKAN=ON
I see:
-- Vulkan found
-- GL_KHR_cooperative_matrix supported by glslc
-- GL_NV_cooperative_matrix2 supported by glslc
-- GL_EXT_integer_dot_product supported by glslc
-- GL_EXT_bfloat16 supported by glslc
-- Including Vulkan backend
driver also supports VK_KHR_bfloat16 extension so calling llama-bench I see:
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
similarly as "matrix cores: NV_coopmat2" show llama.cpp has been compiled with coopmat2 support and GPU+driver also supports that extension I would like to see similar info for bfloat16 ie like bf16:1 :
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | **bf16:1** |warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
next if fp8 vulkan extension is added similarly a fp8:1 or may be display array of floating point accelerated formats with "fp:[fp16,bf16,fp8]" for example..
thanks..
Motivation
easy detection if llama.cpp has been built with support for Vulkan bfloat16 ext and also GPU driver supports that extension by reporting bf16:1 similar as how "matrix cores:" field or "int dot:" field only show "1" only if both llama.cpp and driver supports that extension..
Possible Implementation
No response