Skip to content

Eval bug: ~~Q2_K and Q3_K~~ Q8_0 not working on Vulkan anymore on RX 5700XT #10710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stduhpf opened this issue Dec 7, 2024 · 54 comments
Closed

Comments

@stduhpf
Copy link
Contributor

stduhpf commented Dec 7, 2024

Name and Version

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 5700 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 32768 | matrix cores: none
version: 4820 (1a24c46)
built with MSVC 19.42.34435.0 for x64

Operating systems

Windows

GGML backends

Vulkan

Hardware

Ryzen 5900X + RX 5700 XT

Models

Any model that has Q8_0 tensors in it.

Problem description & steps to reproduce

Complete gibberish/noise output.

I noticed this issue with stable-diffusion.cpp at first, but I can reproduce it here.

To reproduce, simply start inference with any q8_0 model, with -ngl set to anything but 0.

First Bad Commit

fbeda90 (#12015)

Relevant log output

Example command:

.\build\bin\Release\llama-cli.exe -m .\models\gemma-2b-Q8_0.gguf -no-cnv -ngl 19 -t 6 -tb 12 -p "The meaning of life is"

Output:

 The meaning of life is increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa increa
llama_perf_sampler_print:    sampling time =      13.57 ms /    87 runs   (    0.16 ms per token,  6410.26 tokens per second)
llama_perf_context_print:        load time =    2080.08 ms
llama_perf_context_print: prompt eval time =      23.16 ms /     6 tokens (    3.86 ms per token,   259.09 tokens per second)
llama_perf_context_print:        eval time =     879.56 ms /    80 runs   (   10.99 ms per token,    90.95 tokens per second)
llama_perf_context_print:       total time =     936.59 ms /    86 tokens
Interrupted by user

Reverting fbeda90 fixes it.

Older q2_k/q3_k related issue (fixed by adc5dd9 #11081 )

Name and Version

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 5700 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64
version: 4277 (c5ede38)
built with MSVC 19.41.34120.0 for x64

Operating systems

Windows

GGML backends

Vulkan

Hardware

Ryzen 5900X + RX 5700 XT

Models

Any model that has Q3_K or Q2_K tensors in it.

Problem description & steps to reproduce

Complete gibberish/noise output.

I noticed this issue with stable-diffusion.cpp at first, but I can reproduce it here.

To reproduce, simply start inference with any q3_k_x or q2_k_x model, with -ngl set to anything but 0.

First Bad Commit

4a57d36 (#10459)

Relevant log output

Example command:

.\build\bin\Release\llama-cli.exe -m .\models\Mistral-7B-v0.2-hf-Q3_K_L.gguf -ngl 24 -t 6 -tb 12 -p "The meaning of life is"

Output:

The meaning of life is to- kur m jel ul tawa Computkow Ydorfico oobeckagles “anga ACenzei Roose Asto__(ingle Phillieraspace TheFAILEDello securózannieloilloemente GabrielóniałrivatemulticolManocaluckangle>@‑inghamulle pagina Steinentoadyodenzes Armindowtexlä v Ronald incre bioExitocyniadelphiaumper globutescison sear lifestyle proto Kotiek po cadutes Eng randCl byaginganziagedrafla cad- extern met  externward Kyere collectenteryenta‎ divisionsExternaleryy Aubore2� Yale randomirkFBimanneman hyd BrowFB Maj Majalaky audanning Ex ternal -neylitter Intentanningky amaperlDsek  Britats unit andraportyo am… Egyptian portionandraandeentob – indirectibaentoicigeb associate1田 ##icijays Lyiana auditentoawPy import Girapy TheMky X Himery  departmentyyyiba1iba indirect n #isterschaftciProrico Industrial #aniric Palm indirectBici patPyy –hetriky ### AtlantaidleBazialaaran Mediterranean matter sl m South experekylie------ofsy Meyainsottoannedento- corporBOestic /******/entopythonats eternainsalian Gir expery # Sar‟eloalfentaahaelfonomPal rigidento bon bon Pdas palanda P Muhammadentoít SubPy ###GAentoeterenta Palm Kabâ Cecenta8entonuoltyBotaueraperendlento Ec pyento externâ accentburgaper Klaly
llama_perf_sampler_print:    sampling time =      12.93 ms /   319 runs   (    0.04 ms per token, 24665.58 tokens per second)
llama_perf_context_print:        load time =    3158.38 ms
llama_perf_context_print: prompt eval time =     262.98 ms /     6 tokens (   43.83 ms per token,    22.82 tokens per second)
llama_perf_context_print:        eval time =   13525.70 ms /   312 runs   (   43.35 ms per token,    23.07 tokens per second)
llama_perf_context_print:       total time =   13823.14 ms /   318 tokens
Interrupted by user

Reverting 4a57d36 fixes it.

@mtasic85
Copy link
Contributor

mtasic85 commented Dec 7, 2024

I also can confirm this. I have exactly the same GPU. I am on Arch Linux with everything updated to the latest.

Related to log below, it is stated on https://en.wikipedia.org/wiki/Radeon_RX_5000_series that:

Vulkan 1.3 is supported with Adrenalin 22.1.2 and Linux Mesa3D 22.0.0.

Partial build log :

c++ -std=c++17 -fPIC -O3 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -fopenmp  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -DGGML_USE_CPU -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CPU_AARCH64 -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_AMX -Iggml/src/ggml-cpu -DGGML_USE_VULKAN  -DLLAMA_USE_CURL -c common/build-info.cpp -o common/build-info.o
cannot compile matmul_f16_cm2

cannot compile matmul_f32_f16_aligned_cm2glslc -fshader-stage=compute --target-env=vulkan1.3 -O ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp -o /tmp/matmul_f16_cm2.spv -DACC_TYPE=float -DB_TYPE=float16_t -DDATA_A_F16=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t 

glslc -fshader-stage=compute --target-env=vulkan1.3 -O ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp -o /tmp/matmul_f32_f16_aligned_cm2.spv -DACC_TYPE=float -DALIGNED=1 -DB_TYPE=float16_t -DDATA_A_F32=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DLOAD_VEC_A=1 -DLOAD_VEC_B=1 



ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:13: warning: '#extension' : extension not supported: GL_NV_cooperative_matrix2
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutNV' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutA' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '>' :  wrong operand types: no operation '>' exists that takes a left-hand operand of type ' temp bool' and a right operand of type ' temp float' (or there is no acceptable conversion)
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '' :  syntax error, unexpected EQUAL, expecting COMMA or SEMICOLON
1 warning and 4 errors generated.
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:13: warning: '#extension' : extension not supported: GL_NV_cooperative_matrix2
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutNV' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutA' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '>' :  wrong operand types: no operation '>' exists that takes a left-hand operand of type ' temp bool' and a right operand of type ' temp float' (or there is no acceptable conversion)
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '' :  syntax error, unexpected EQUAL, expecting COMMA or SEMICOLON
1 warning and 4 errors generated.


cannot compile matmul_f32_f16_cm2

glslc -fshader-stage=compute --target-env=vulkan1.3 -O ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp -o /tmp/matmul_f32_f16_cm2.spv -DACC_TYPE=float -DB_TYPE=float16_t -DDATA_A_F32=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t 

ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:13: warning: '#extension' : extension not supported: GL_NV_cooperative_matrix2
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutNV' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutA' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '>' :  wrong operand types: no operation '>' exists that takes a left-hand operand of type ' temp bool' and a right operand of type ' temp float' (or there is no acceptable conversion)
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '' :  syntax error, unexpected EQUAL, expecting COMMA or SEMICOLON
1 warning and 4 errors generated.

cannot compile matmul_f16_aligned_cm2

glslc -fshader-stage=compute --target-env=vulkan1.3 -O ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp -o /tmp/matmul_f16_aligned_cm2.spv -DACC_TYPE=float -DALIGNED=1 -DB_TYPE=float16_t -DDATA_A_F16=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DLOAD_VEC_A=1 -DLOAD_VEC_B=1 

ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:13: warning: '#extension' : extension not supported: GL_NV_cooperative_matrix2
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutNV' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutA' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '>' :  wrong operand types: no operation '>' exists that takes a left-hand operand of type ' temp bool' and a right operand of type ' temp float' (or there is no acceptable conversion)
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '' :  syntax error, unexpected EQUAL, expecting COMMA or SEMICOLON
1 warning and 4 errors generated.

cannot compile matmul_f32_f32_aligned_cm2

glslc -fshader-stage=compute --target-env=vulkan1.3 -O ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp -o /tmp/matmul_f32_f32_aligned_cm2.spv -DACC_TYPE=float -DALIGNED=1 -DB_IS_FLOAT=1 -DB_TYPE=float -DDATA_A_F32=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DLOAD_VEC_A=1 -DLOAD_VEC_B=1 

ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:13: warning: '#extension' : extension not supported: GL_NV_cooperative_matrix2
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutNV' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutA' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '>' :  wrong operand types: no operation '>' exists that takes a left-hand operand of type ' temp bool' and a right operand of type ' temp float' (or there is no acceptable conversion)
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '' :  syntax error, unexpected EQUAL, expecting COMMA or SEMICOLON
1 warning and 4 errors generated.

cannot compile matmul_f32_f32_cm2

glslc -fshader-stage=compute --target-env=vulkan1.3 -O ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp -o /tmp/matmul_f32_f32_cm2.spv -DACC_TYPE=float -DB_IS_FLOAT=1 -DB_TYPE=float -DDATA_A_F32=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DLOAD_VEC_A=1 

ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:13: warning: '#extension' : extension not supported: GL_NV_cooperative_matrix2
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutNV' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutA' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '>' :  wrong operand types: no operation '>' exists that takes a left-hand operand of type ' temp bool' and a right operand of type ' temp float' (or there is no acceptable conversion)
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '' :  syntax error, unexpected EQUAL, expecting COMMA or SEMICOLON
1 warning and 4 errors generated.

cannot compile matmul_f16_f32_cm2

glslc -fshader-stage=compute --target-env=vulkan1.3 -O ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp -o /tmp/matmul_f16_f32_cm2.spv -DACC_TYPE=float -DB_IS_FLOAT=1 -DB_TYPE=float -DDATA_A_F16=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DLOAD_VEC_A=1 

ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:13: warning: '#extension' : extension not supported: GL_NV_cooperative_matrix2
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutNV' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutA' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '>' :  wrong operand types: no operation '>' exists that takes a left-hand operand of type ' temp bool' and a right operand of type ' temp float' (or there is no acceptable conversion)
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '' :  syntax error, unexpected EQUAL, expecting COMMA or SEMICOLON
1 warning and 4 errors generated.

cannot compile matmul_f16_f32_aligned_cm2

glslc -fshader-stage=compute --target-env=vulkan1.3 -O ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp -o /tmp/matmul_f16_f32_aligned_cm2.spv -DACC_TYPE=float -DALIGNED=1 -DB_IS_FLOAT=1 -DB_TYPE=float -DDATA_A_F16=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DLOAD_VEC_A=1 -DLOAD_VEC_B=1 

ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:13: warning: '#extension' : extension not supported: GL_NV_cooperative_matrix2
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutNV' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutA' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '>' :  wrong operand types: no operation '>' exists that takes a left-hand operand of type ' temp bool' and a right operand of type ' temp float' (or there is no acceptable conversion)
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '' :  syntax error, unexpected EQUAL, expecting COMMA or SEMICOLON
1 warning and 4 errors generated.

cannot compile matmul_q4_1_f32_cm2

glslc -fshader-stage=compute --target-env=vulkan1.3 -O ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp -o /tmp/matmul_q4_1_f32_cm2.spv -DACC_TYPE=float -DB_IS_FLOAT=1 -DB_TYPE=float -DDATA_A_Q4_1=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DLOAD_VEC_A=1 

ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:13: warning: '#extension' : extension not supported: GL_NV_cooperative_matrix2
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutNV' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutA' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '>' :  wrong operand types: no operation '>' exists that takes a left-hand operand of type ' temp bool' and a right operand of type ' temp float' (or there is no acceptable conversion)
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '' :  syntax error, unexpected EQUAL, expecting COMMA or SEMICOLON
1 warning and 4 errors generated.

cannot compile matmul_q4_0_f32_aligned_cm2

glslc -fshader-stage=compute --target-env=vulkan1.3 -O ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp -o /tmp/matmul_q4_0_f32_aligned_cm2.spv -DACC_TYPE=float -DALIGNED=1 -DB_IS_FLOAT=1 -DB_TYPE=float -DDATA_A_Q4_0=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DLOAD_VEC_A=1 -DLOAD_VEC_B=1 

ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:13: warning: '#extension' : extension not supported: GL_NV_cooperative_matrix2
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutNV' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutA' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '>' :  wrong operand types: no operation '>' exists that takes a left-hand operand of type ' temp bool' and a right operand of type ' temp float' (or there is no acceptable conversion)
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '' :  syntax error, unexpected EQUAL, expecting COMMA or SEMICOLON
1 warning and 4 errors generated.

cannot compile matmul_q4_0_f16_cm2

glslc -fshader-stage=compute --target-env=vulkan1.3 -O ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp -o /tmp/matmul_q4_0_f16_cm2.spv -DACC_TYPE=float -DB_IS_FLOAT=1 -DB_TYPE=float16_t -DDATA_A_Q4_0=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DLOAD_VEC_A=1 

ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:13: warning: '#extension' : extension not supported: GL_NV_cooperative_matrix2
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutNV' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutA' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '>' :  wrong operand types: no operation '>' exists that takes a left-hand operand of type ' temp bool' and a right operand of type ' temp float' (or there is no acceptable conversion)
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '' :  syntax error, unexpected EQUAL, expecting COMMA or SEMICOLON
1 warning and 4 errors generated.

cannot compile matmul_q4_0_f32_cm2

glslc -fshader-stage=compute --target-env=vulkan1.3 -O ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp -o /tmp/matmul_q4_0_f32_cm2.spv -DACC_TYPE=float -DB_IS_FLOAT=1 -DB_TYPE=float -DDATA_A_Q4_0=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DLOAD_VEC_A=1 

ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:13: warning: '#extension' : extension not supported: GL_NV_cooperative_matrix2
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutNV' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutA' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '>' :  wrong operand types: no operation '>' exists that takes a left-hand operand of type ' temp bool' and a right operand of type ' temp float' (or there is no acceptable conversion)
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '' :  syntax error, unexpected EQUAL, expecting COMMA or SEMICOLON
1 warning and 4 errors generated.

cannot compile matmul_q4_0_f16_aligned_cm2

glslc -fshader-stage=compute --target-env=vulkan1.3 -O ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp -o /tmp/matmul_q4_0_f16_aligned_cm2.spv -DACC_TYPE=float -DALIGNED=1 -DB_IS_FLOAT=1 -DB_TYPE=float16_t -DDATA_A_Q4_0=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DLOAD_VEC_A=1 -DLOAD_VEC_B=1 

ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:13: warning: '#extension' : extension not supported: GL_NV_cooperative_matrix2
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutNV' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutA' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '>' :  wrong operand types: no operation '>' exists that takes a left-hand operand of type ' temp bool' and a right operand of type ' temp float' (or there is no acceptable conversion)
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '' :  syntax error, unexpected EQUAL, expecting COMMA or SEMICOLON
1 warning and 4 errors generated.

cannot compile matmul_q4_1_f16_aligned_cm2

glslc -fshader-stage=compute --target-env=vulkan1.3 -O ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp -o /tmp/matmul_q4_1_f16_aligned_cm2.spv -DACC_TYPE=float -DALIGNED=1 -DB_IS_FLOAT=1 -DB_TYPE=float16_t -DDATA_A_Q4_1=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DLOAD_VEC_A=1 -DLOAD_VEC_B=1 

ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:13: warning: '#extension' : extension not supported: GL_NV_cooperative_matrix2
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutNV' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutA' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '>' :  wrong operand types: no operation '>' exists that takes a left-hand operand of type ' temp bool' and a right operand of type ' temp float' (or there is no acceptable conversion)
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '' :  syntax error, unexpected EQUAL, expecting COMMA or SEMICOLON
1 warning and 4 errors generated.

cannot compile matmul_q4_1_f32_aligned_cm2

glslc -fshader-stage=compute --target-env=vulkan1.3 -O ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp -o /tmp/matmul_q4_1_f32_aligned_cm2.spv -DACC_TYPE=float -DALIGNED=1 -DB_IS_FLOAT=1 -DB_TYPE=float -DDATA_A_Q4_1=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DLOAD_VEC_A=1 -DLOAD_VEC_B=1 

ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:13: warning: '#extension' : extension not supported: GL_NV_cooperative_matrix2
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutNV' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutA' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '>' :  wrong operand types: no operation '>' exists that takes a left-hand operand of type ' temp bool' and a right operand of type ' temp float' (or there is no acceptable conversion)
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '' :  syntax error, unexpected EQUAL, expecting COMMA or SEMICOLON
1 warning and 4 errors generated.

cannot compile matmul_q4_1_f16_cm2

glslc -fshader-stage=compute --target-env=vulkan1.3 -O ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp -o /tmp/matmul_q4_1_f16_cm2.spv -DACC_TYPE=float -DB_IS_FLOAT=1 -DB_TYPE=float16_t -DDATA_A_Q4_1=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DLOAD_VEC_A=1 

ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:13: warning: '#extension' : extension not supported: GL_NV_cooperative_matrix2
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutNV' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: 'tensorLayoutA' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '>' :  wrong operand types: no operation '>' exists that takes a left-hand operand of type ' temp bool' and a right operand of type ' temp float' (or there is no acceptable conversion)
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp:197: error: '' :  syntax error, unexpected EQUAL, expecting COMMA or SEMICOLON
1 warning and 4 errors generated.

@jeffbolznv jeffbolznv self-assigned this Dec 7, 2024
@daniandtheweb
Copy link
Contributor

I'm not entirely sure but I think the two bugs are unrelated. The arch issue has been introduced with today's vulkan update to 1.4.303 on arch. This github issue is related instead to a propietary driver issue on windows.

@jeffbolznv
Copy link
Collaborator

I'll take a look at this.

The build log in the second comments looks unrelated, and would happen if you're using newer Vulkan-Headers and older glslc. @mtasic85 are you overriding the Vulkan-Headers?

@daniandtheweb
Copy link
Contributor

daniandtheweb commented Dec 7, 2024

I'll take a look at this.

The build log in the second comments looks unrelated, and would happen if you're using newer Vulkan-Headers and older glslc. @mtasic85 are you overriding the Vulkan-Headers?

The arch issue can just be that glslc hasn't still been updated in the repositories (they'll probably update it later today).

EDIT: Apparently shaderc's latest version still isn't released (there's still no tag on github) so until they won't release the new version I don't think Arch will update the package.

@jeffbolznv
Copy link
Collaborator

@stduhpf I haven't been able to reproduce this locally on RTX 4070. Does test-backend-ops pass for you?

@jeffbolznv
Copy link
Collaborator

@mtasic85 or @daniandtheweb, can you check if #10713 resolves the build issue for you?

@daniandtheweb
Copy link
Contributor

#10713 fixes the build issue.

@0cc4m
Copy link
Collaborator

0cc4m commented Dec 8, 2024

@stduhpf I haven't been able to reproduce this locally on RTX 4070. Does test-backend-ops pass for you?

It doesn't, I have seen the issue with the AMD proprietary driver as well, but haven't yet had the time to investigate.

It's another case in a too long list of issues that are not caused by the hardware (since Mesa RADV works fine) but by the closed-source driver.

@stduhpf
Copy link
Contributor Author

stduhpf commented Dec 8, 2024

I can confirm, test-backend ops fails for both q2_k and q3_k:

> .\build\bin\Release\test-backend-ops.exe  | Select-String -Pattern "q[2,3]_k"
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 5700 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 | matrix cores: 0
ggml_vulkan: Compiling shaders..........................Done!

  GET_ROWS(type=q2_K,n=256,m=5,r=4,b=1,v=0): not supported [Vulkan0]
  GET_ROWS(type=q2_K,n=256,m=5,r=4,b=1,v=1): not supported [Vulkan0]
  GET_ROWS(type=q2_K,n=256,m=5,r=4,b=7,v=0): not supported [Vulkan0]
  GET_ROWS(type=q2_K,n=256,m=5,r=4,b=7,v=1): not supported [Vulkan0]
  GET_ROWS(type=q3_K,n=256,m=5,r=4,b=1,v=0): not supported [Vulkan0]
  GET_ROWS(type=q3_K,n=256,m=5,r=4,b=1,v=1): not supported [Vulkan0]
  GET_ROWS(type=q3_K,n=256,m=5,r=4,b=7,v=0): not supported [Vulkan0]
  GET_ROWS(type=q3_K,n=256,m=5,r=4,b=7,v=1): not supported [Vulkan0]
  CPY(type_src=f16,type_dst=q2_K,ne=[256,4,4,4],permute=[0,0,0,0]): not supported [Vulkan0]
  CPY(type_src=f16,type_dst=q2_K,ne=[256,2,3,4],permute=[0,2,1,3]): not supported [Vulkan0]
  CPY(type_src=f16,type_dst=q3_K,ne=[256,4,4,4],permute=[0,0,0,0]): not supported [Vulkan0]
  CPY(type_src=f16,type_dst=q3_K,ne=[256,2,3,4],permute=[0,2,1,3]): not supported [Vulkan0]
  CPY(type_src=f32,type_dst=q2_K,ne=[256,4,4,4],permute=[0,0,0,0]): not supported [Vulkan0]
  CPY(type_src=f32,type_dst=q2_K,ne=[256,2,3,4],permute=[0,2,1,3]): not supported [Vulkan0]
  CPY(type_src=f32,type_dst=q3_K,ne=[256,4,4,4],permute=[0,0,0,0]): not supported [Vulkan0]
  CPY(type_src=f32,type_dst=q3_K,ne=[256,2,3,4],permute=[0,2,1,3]): not supported [Vulkan0]
  MUL_MAT(type_a=q2_K,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.014771050 > 0.000500000 FAIL
  MUL_MAT(type_a=q3_K,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.684832414 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q2_K,type_b=f32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): [MUL_MAT_ID] NMSE = 0.961477898 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q2_K,type_b=f32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): OK
  MUL_MAT_ID(type_a=q3_K,type_b=f32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): [MUL_MAT_ID] NMSE = 1.055673744 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q3_K,type_b=f32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): OK

@mtasic85
Copy link
Contributor

mtasic85 commented Dec 9, 2024

@jeffbolznv Project builds now and works with the latest #10731 on updated Arch Linux and with open-source radeon vulkan implementation.

EDIT: just found out that Q4_K_M models (smollm2, qwen 2.5, rwkv) produce gibberish.

@0cc4m
Copy link
Collaborator

0cc4m commented Dec 9, 2024

@mtasic85 But test-backend-ops -o MUL_MAT doesn't show any problems?

@mtasic85
Copy link
Contributor

mtasic85 commented Dec 9, 2024

NOTE: there are two logs for older 6fe6247 and the latest 3d98b4c

I had to check out this specific revision from git, so it can work 6fe6247 . With it, everything works fine. Here are test results for 6fe6247:

build/bin/test-backend-ops -o MUL_MAT
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 5700 XT (AMD open-source driver) | uma: 0 | fp16: 1 | warp size: 64
Testing 2 devices

Backend 1/2: Vulkan0
ggml_vulkan: Compiling shaders..............................Done!
  Device description: AMD Radeon RX 5700 XT
  Device memory: 7920 MB (7920 MB free)

  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_K,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q6_K,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_K,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q6_K,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_K,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q6_K,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_K,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q6_K,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_K,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q6_K,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_K,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q6_K,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_K,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q6_K,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_K,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q6_K,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): OK
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [CPU] 
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=32,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=1,k=32,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=1,k=32,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=32,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q2_K,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q3_K,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_K,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q6_K,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=iq2_xs,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq3_xxs,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq1_s,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq1_m,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=1,k=32,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq4_xs,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=1,k=1,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f16,type_b=f32,m=64,n=2,k=128,bs=[8,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=83,n=2,k=128,bs=[8,1],nr=[4,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=64,n=2,k=64,bs=[8,1],nr=[4,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=83,n=2,k=64,bs=[8,1],nr=[4,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=64,n=45,k=128,bs=[8,1],nr=[4,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=128,n=45,k=64,bs=[8,1],nr=[4,1],per=[0,1,2,3]): OK
  2204/2204 tests passed
  Backend Vulkan0: OK

Backend 2/2: CPU
  Skipping CPU backend
2/2 backends passed
OK

However, latest revision 3d98b4c has many failed tests:

./build/bin/test-backend-ops -o MUL_MAT
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 5700 XT (AMD open-source driver) | uma: 0 | fp16: 1 | warp size: 64 | matrix cores: KHR_coopmat
Testing 2 devices

Backend 1/2: Vulkan0
ggml_vulkan: Compiling shaders..........................Done!
  Device description: AMD Radeon RX 5700 XT
  Device memory: 7920 MB (7920 MB free)

  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_K,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q6_K,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.020802876 > 0.000500000 FAIL
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=-3.842903 FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 4 (Vulkan0=-nan CPU=-4.422190) FAIL
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=7.968046 FAIL
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=-3.029437 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 4 (Vulkan0=nan CPU=0.400921) FAIL
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 3 (Vulkan0=-nan CPU=2.275427) FAIL
  MUL_MAT(type_a=q5_K,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.998383199 > 0.000500000 FAIL
  MUL_MAT(type_a=q6_K,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=-12.533918 FAIL
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.067724570 > 0.000500000 FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=5.106953 FAIL
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.995368780 > 0.000500000 FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=1.311687) FAIL
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 4 (Vulkan0=nan CPU=2.708214) FAIL
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=-1.538850 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=13.429976 FAIL
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=-inf CPU=2.555734 FAIL
  MUL_MAT(type_a=q5_K,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=-inf CPU=4.645825 FAIL
  MUL_MAT(type_a=q6_K,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 4 (Vulkan0=-nan CPU=-1.756506) FAIL
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=-4.514432) FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.993575372 > 0.000500000 FAIL
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 6 (Vulkan0=nan CPU=-0.382601) FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=-1.458935 FAIL
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=-inf CPU=-5.463484 FAIL
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=0.219825) FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=-inf CPU=1.054833 FAIL
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=-5.968238) FAIL
  MUL_MAT(type_a=q5_K,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 7 (Vulkan0=-nan CPU=-1.163620) FAIL
  MUL_MAT(type_a=q6_K,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=3.517251) FAIL
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.011277616 > 0.000500000 FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=2.290376 FAIL
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=5.422802) FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.019715035 > 0.000500000 FAIL
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=-inf CPU=4.810112 FAIL
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 13 (Vulkan0=-nan CPU=-4.500257) FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=-3.085917) FAIL
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.994120636 > 0.000500000 FAIL
  MUL_MAT(type_a=q5_K,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=-1.177465) FAIL
  MUL_MAT(type_a=q6_K,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.068307764 > 0.000500000 FAIL
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=0.300503) FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=2.958066 FAIL
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=-0.210487) FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=0.547166) FAIL
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=4.438850) FAIL
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 8 (Vulkan0=-nan CPU=-8.403746) FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=4.197824 FAIL
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=-1.603871) FAIL
  MUL_MAT(type_a=q5_K,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=-inf CPU=-6.205073 FAIL
  MUL_MAT(type_a=q6_K,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=-2.051694) FAIL
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=-inf CPU=-0.085062 FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=0.818239) FAIL
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.995540313 > 0.000500000 FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=4.268586) FAIL
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=-7.023514 FAIL
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=5.964217) FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=1.629785 FAIL
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=-4.501408) FAIL
  MUL_MAT(type_a=q5_K,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=1.096679) FAIL
  MUL_MAT(type_a=q6_K,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=-1.741318) FAIL
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 3 (Vulkan0=nan CPU=-2.836842) FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 4 (Vulkan0=-nan CPU=1.599661) FAIL
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=15.302111) FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 4 (Vulkan0=nan CPU=0.716738) FAIL
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=-12.677716) FAIL
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.018022874 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 2 (Vulkan0=-nan CPU=2.974940) FAIL
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=-inf CPU=0.814551 FAIL
  MUL_MAT(type_a=q5_K,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=9.234801) FAIL
  MUL_MAT(type_a=q6_K,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=-inf CPU=3.559471 FAIL
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=1.420476) FAIL
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 10 (Vulkan0=-nan CPU=-1.514947) FAIL
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 7 (Vulkan0=nan CPU=2.888149) FAIL
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=-7.862663) FAIL
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=2.590023) FAIL
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 6 (Vulkan0=-nan CPU=2.880714) FAIL
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): OK
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): [MUL_MAT] NaN at index 5 (Vulkan0=-nan CPU=2.125956) FAIL
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): [MUL_MAT] NaN at index 10 (Vulkan0=-nan CPU=8.324591) FAIL
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): [MUL_MAT] NaN at index 7 (Vulkan0=-nan CPU=3.764273) FAIL
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=3.520932 FAIL
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=5.579459 FAIL
  MUL_MAT(type_a=f32,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=3.836990) FAIL
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [CPU] 
  MUL_MAT(type_a=f32,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [CPU] 
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=0.193825 FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 9 (Vulkan0=nan CPU=3.054919) FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=-5.914919 FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=-inf CPU=4.088778 FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 12 (Vulkan0=-nan CPU=-7.637360) FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): OK
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=0.906059 FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): [MUL_MAT] NaN at index 9 (Vulkan0=nan CPU=1.799367) FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): [MUL_MAT] NaN at index 5 (Vulkan0=-nan CPU=1.110171) FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): [MUL_MAT] NaN at index 7 (Vulkan0=nan CPU=2.358602) FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): [MUL_MAT] inf mismatch: Vulkan0=-inf CPU=-4.219680 FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=6.520378 FAIL
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=5.244877 FAIL
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 8 (Vulkan0=-nan CPU=-4.403505) FAIL
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=-3.776466) FAIL
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 10 (Vulkan0=nan CPU=6.625810) FAIL
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=7.181248) FAIL
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): OK
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): [MUL_MAT] NaN at index 6 (Vulkan0=nan CPU=-3.655025) FAIL
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): [MUL_MAT] inf mismatch: Vulkan0=-inf CPU=-3.847102 FAIL
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): [MUL_MAT] NaN at index 9 (Vulkan0=nan CPU=-6.190921) FAIL
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=8.326200) FAIL
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): [MUL_MAT] NaN at index 7 (Vulkan0=nan CPU=7.058456) FAIL
  MUL_MAT(type_a=f16,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=4.163612) FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 5 (Vulkan0=nan CPU=10.630560) FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 2 (Vulkan0=-nan CPU=0.957268) FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 11 (Vulkan0=nan CPU=0.864043) FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 4 (Vulkan0=-nan CPU=-1.093598) FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=7.607679) FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q8_0,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 10 (Vulkan0=nan CPU=1.088726) FAIL
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=-inf CPU=-3.241050 FAIL
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 3 (Vulkan0=nan CPU=-1.129252) FAIL
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=-7.135500 FAIL
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=-1.627903) FAIL
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_0,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 3 (Vulkan0=nan CPU=0.337054) FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 10 (Vulkan0=nan CPU=4.653232) FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 4 (Vulkan0=-nan CPU=-3.600214) FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=7.926006 FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=inf CPU=2.709986 FAIL
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=1.780558) FAIL
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 10 (Vulkan0=-nan CPU=-12.375776) FAIL
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 64 (Vulkan0=-nan CPU=4.535605) FAIL
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=-inf CPU=-0.296751 FAIL
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 0 (Vulkan0=-nan CPU=8.883362) FAIL
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_K,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[10,1],nr=[2,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,1],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[1,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[10,10],nr=[2,2],per=[0,1,2,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=1,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=8,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,2,1,3]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,1,3,2]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=iq2_xxs,type_b=f16,m=16,n=16,k=256,bs=[2,3],nr=[1,1],per=[0,3,2,1]): not supported [Vulkan0] not supported [CPU] 
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=32,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q4_1,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=1,k=32,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_0,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=1,k=32,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_1,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=32,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q2_K,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q3_K,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q5_K,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=q6_K,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=iq2_xs,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq3_xxs,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq1_s,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq1_m,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=1,k=32,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): OK
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=iq4_xs,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=1,k=1,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=bf16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): not supported [Vulkan0] 
  MUL_MAT(type_a=f16,type_b=f32,m=64,n=2,k=128,bs=[8,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] inf mismatch: Vulkan0=-inf CPU=0.833586 FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=83,n=2,k=128,bs=[8,1],nr=[4,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 18 (Vulkan0=nan CPU=-2.468451) FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=64,n=2,k=64,bs=[8,1],nr=[4,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 39 (Vulkan0=-nan CPU=1.770181) FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=83,n=2,k=64,bs=[8,1],nr=[4,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 45 (Vulkan0=-nan CPU=2.254067) FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=64,n=45,k=128,bs=[8,1],nr=[4,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 50 (Vulkan0=nan CPU=-0.922508) FAIL
  MUL_MAT(type_a=f16,type_b=f32,m=128,n=45,k=64,bs=[8,1],nr=[4,1],per=[0,1,2,3]): [MUL_MAT] NaN at index 9 (Vulkan0=-nan CPU=0.259592) FAIL
  2075/2204 tests passed
  Backend Vulkan0: FAIL

Backend 2/2: CPU
  Skipping CPU backend
1/2 backends passed
FAIL

@0cc4m
Copy link
Collaborator

0cc4m commented Dec 9, 2024

@mtasic85 You are using AMDVLK. I guess it also doesn't have a proper implementation of VK_KHR_cooperative_matrix. So far I have only excluded the proprietary driver, that might have to be extended to amdvlk.

Can you run export GGML_VK_DISABLE_COOPMAT=1 and run the tests again? If that fixes it it's the extension support.

But in general you should really be using RADV, not AMDVLK. It's basically better in every way and more stable.

@mtasic85
Copy link
Contributor

mtasic85 commented Dec 9, 2024

@0cc4m @stduhpf @jeffbolznv Since, you mentioned AMDVLK vs RADV, I got an idea to try following experiments.

Setup:

OS: Arch Linux x86_64 
Host: X570 AORUS ELITE -CF 
Kernel: 6.12.3-arch1-1 
CPU: AMD Ryzen 9 3950X (32) @ 3.500GHz 
GPU: AMD ATI Radeon RX 5600 OEM/5600 XT / 5700/5700 XT 
Memory: 9218MiB / 128753MiB 

I have installed following packages vulkan-devel (RADV), amdvlk and amd-vulkan-prefixes (from AUR, Select needed vulkan implementation with vk_radv, vk_amdvlk or vk_pro prefix). These programs vk_radv and vk_amdvlk are used to select Vulkan implementation and both RADV and AMDVLK can coexist on same machine.

  1. Build with vk_radv, run with vk_radv. Status: OK
vk_radv cmake -B build -DGGML_VULKAN=1 -DLLAMA_CURL=1
vk_radv cmake --build build --config Release
vk_radv ./build/bin/llama-cli -p 512 --hf-repo "Qwen/Qwen2.5-0.5B-Instruct-GGUF" --hf-file "qwen2.5-0.5b-instruct-q4_k_m.gguf" --prompt 'Meaning of life is' -ngl 99
-- The C compiler identification is GNU 14.2.1
-- The CXX compiler identification is GNU 14.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.47.1")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- Including CPU backend
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Adding CPU backend variant ggml-cpu: -march=native 
-- Found Vulkan: /lib/libvulkan.so (found version "1.4.303") found components: glslc glslangValidator
-- Vulkan found
-- GL_NV_cooperative_matrix2 not supported by glslc
-- Including Vulkan backend
-- Found CURL: /usr/lib/libcurl.so (found version "8.11.0")
-- Configuring done (1.3s)
-- Generating done (0.2s)
-- Build files have been written to: /mnt/ai/llama.cpp-master/build
  1. Build with vk_radv, run with vk_amdvlk. Status: FAIL
vk_amdvlk ./build/bin/llama-cli -p 512 --hf-repo "Qwen/Qwen2.5-0.5B-Instruct-GGUF" --hf-file "qwen2.5-0.5b-instruct-q4_k_m.gguf" --prompt 'Meaning of life is' -ngl 99
  1. Build with vk_amdvlk, run with vk_radv. Status: OK
vk_amdvlk cmake -B build -DGGML_VULKAN=1 -DLLAMA_CURL=1
vk_amdvlk cmake --build build --config Release
vk_radv ./build/bin/llama-cli -p 512 --hf-repo "Qwen/Qwen2.5-0.5B-Instruct-GGUF" --hf-file "qwen2.5-0.5b-instruct-q4_k_m.gguf" --prompt 'Meaning of life is' -ngl 99
-- The C compiler identification is GNU 14.2.1
-- The CXX compiler identification is GNU 14.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.47.1")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- Including CPU backend
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Adding CPU backend variant ggml-cpu: -march=native 
-- Found Vulkan: /lib/libvulkan.so (found version "1.4.303") found components: glslc glslangValidator
-- Vulkan found
-- GL_NV_cooperative_matrix2 not supported by glslc
-- Including Vulkan backend
-- Found CURL: /usr/lib/libcurl.so (found version "8.11.0")
-- Configuring done (1.3s)
-- Generating done (0.2s)
-- Build files have been written to: /mnt/ai/llama.cpp-master/build
  1. Build with vk_amdvlk, run with vk_amdvlk. Status: FAIL
vk_amdvlk ./build/bin/llama-cli -p 512 --hf-repo "Qwen/Qwen2.5-0.5B-Instruct-GGUF" --hf-file "qwen2.5-0.5b-instruct-q4_k_m.gguf" --prompt 'Meaning of life is' -ngl 99
  1. Build and run with export GGML_VK_DISABLE_COOPMAT=1 as suggested. Status: OK
export GGML_VK_DISABLE_COOPMAT=1
cmake -B build -DGGML_VULKAN=1 -DLLAMA_CURL=1
cmake --build build --config Release
./build/bin/llama-cli -p 512 --hf-repo "Qwen/Qwen2.5-0.5B-Instruct-GGUF" --hf-file "qwen2.5-0.5b-instruct-q4_k_m.gguf" --prompt 'Meaning of life is' -ngl 99

New terminal also works after build:

GGML_VK_DISABLE_COOPMAT=1 ./build/bin/llama-cli -p 512 --hf-repo "Qwen/Qwen2.5-0.5B-Instruct-GGUF" --hf-file "qwen2.5-0.5b-instruct-q4_k_m.gguf" --prompt 'Meaning of life is' -ngl 99
-- The C compiler identification is GNU 14.2.1
-- The CXX compiler identification is GNU 14.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.47.1")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- Including CPU backend
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Adding CPU backend variant ggml-cpu: -march=native 
-- Found Vulkan: /lib/libvulkan.so (found version "1.4.303") found components: glslc glslangValidator
-- Vulkan found
-- GL_NV_cooperative_matrix2 not supported by glslc
-- Including Vulkan backend
-- Found CURL: /usr/lib/libcurl.so (found version "8.11.0")
-- Configuring done (1.4s)
-- Generating done (0.2s)
-- Build files have been written to: /mnt/ai/llama.cpp-master/build
  1. Build but run without export GGML_VK_DISABLE_COOPMAT=1 as suggested. Status: FAIL
export GGML_VK_DISABLE_COOPMAT=1
cmake -B build -DGGML_VULKAN=1 -DLLAMA_CURL=1
cmake --build build --config Release

New terminal:

./build/bin/llama-cli -p 512 --hf-repo "Qwen/Qwen2.5-0.5B-Instruct-GGUF" --hf-file "qwen2.5-0.5b-instruct-q4_k_m.gguf" --prompt 'Meaning of life is' -ngl 99

Conclusion:
Build as usual, but run with one of these options:

  • Option 1: vk_radv ./build/bin/llama-cli -p 512 --hf-repo "Qwen/Qwen2.5-0.5B-Instruct-GGUF" --hf-file "qwen2.5-0.5b-instruct-q4_k_m.gguf" --prompt 'Meaning of life is' -ngl 99
  • Option 2: GGML_VK_DISABLE_COOPMAT=1 ./build/bin/llama-cli -p 512 --hf-repo "Qwen/Qwen2.5-0.5B-Instruct-GGUF" --hf-file "qwen2.5-0.5b-instruct-q4_k_m.gguf" --prompt 'Meaning of life is' -ngl 99

@0cc4m
Copy link
Collaborator

0cc4m commented Dec 9, 2024

@mtasic85 The driver is only relevant when running the program, not when building it.

@mtasic85
Copy link
Contributor

mtasic85 commented Dec 11, 2024

When building on AlmaLinux 8 x86_64 I get errors below. I noticed usage of Vulkan 1.2 instead of 1.3 .

cmake -B build -DGGML_VULKAN=1 -DLLAMA_CURL=1

Output:

-- The C compiler identification is GNU 12.2.1
-- The CXX compiler identification is GNU 12.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/rh/gcc-toolset-12/root/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rh/gcc-toolset-12/root/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/local/bin/git (found version "2.45.2")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- Including CPU backend
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Adding CPU backend variant ggml-cpu: -march=native 
-- Found Vulkan: /lib64/libvulkan.so (found version "1.3.283") found components: glslc glslangValidator
-- Vulkan found
-- GL_NV_cooperative_matrix2 not supported by glslc
-- Including Vulkan backend
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found CURL: /usr/lib64/libcurl.so (found version "7.61.1")
-- Configuring done (1.5s)
-- Generating done (0.2s)
-- Build files have been written to: /root/llama.cpp/build
cmake --build build --config Release

Part of log:

cannot compile matmul_q4_1_f32_coopmat

glslc -fshader-stage=compute --target-env=vulkan1.2  ggml/src/ggml-vulkan/vulkan-shaders/mul_mm.comp -o /tmp/matmul_q4_1_f32_coopmat.spv -DACC_TYPE=float -DB_IS_FLOAT=1 -DB_TYPE=float -DCOOPMAT=1 -DDATA_A_Q4_1=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DLOAD_VEC_A=2 

ggml/src/ggml-vulkan/vulkan-shaders/mul_mm.comp:11: warning: '#extension' : extension not supported: GL_KHR_cooperative_matrix
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm.comp:195: error: 'coopmat' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm.comp:195: error: '' :  syntax error, unexpected COMMA, expecting LEFT_PAREN
1 warning and 2 errors generated.

cannot compile matmul_q4_1_f16_aligned_coopmat

glslc -fshader-stage=compute --target-env=vulkan1.2  ggml/src/ggml-vulkan/vulkan-shaders/mul_mm.comp -o /tmp/matmul_q4_1_f16_aligned_coopmat.spv -DACC_TYPE=float -DALIGNED=1 -DB_IS_FLOAT=1 -DB_TYPE=f16mat2x4 -DCOOPMAT=1 -DDATA_A_Q4_1=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DLOAD_VEC_A=2 -DLOAD_VEC_B=8 

ggml/src/ggml-vulkan/vulkan-shaders/mul_mm.comp:11: warning: '#extension' : extension not supported: GL_KHR_cooperative_matrix
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm.comp:195: error: 'coopmat' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm.comp:195: error: '' :  syntax error, unexpected COMMA, expecting LEFT_PAREN
1 warning and 2 errors generated.

cannot compile matmul_q4_1_f32_aligned_coopmat

glslc -fshader-stage=compute --target-env=vulkan1.2  ggml/src/ggml-vulkan/vulkan-shaders/mul_mm.comp -o /tmp/matmul_q4_1_f32_aligned_coopmat.spv -DACC_TYPE=float -DALIGNED=1 -DB_IS_FLOAT=1 -DB_TYPE=mat2x4 -DCOOPMAT=1 -DDATA_A_Q4_1=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DLOAD_VEC_A=2 -DLOAD_VEC_B=8 

ggml/src/ggml-vulkan/vulkan-shaders/mul_mm.comp:11: warning: '#extension' : extension not supported: GL_KHR_cooperative_matrix
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm.comp:195: error: 'coopmat' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm.comp:195: error: '' :  syntax error, unexpected COMMA, expecting LEFT_PAREN
1 warning and 2 errors generated.

cannot compile matmul_q4_1_f16_coopmat

glslc -fshader-stage=compute --target-env=vulkan1.2  ggml/src/ggml-vulkan/vulkan-shaders/mul_mm.comp -o /tmp/matmul_q4_1_f16_coopmat.spv -DACC_TYPE=float -DB_IS_FLOAT=1 -DB_TYPE=float16_t -DCOOPMAT=1 -DDATA_A_Q4_1=1 -DD_TYPE=float -DFLOAT16=1 -DFLOAT_TYPE=float16_t -DLOAD_VEC_A=2 

ggml/src/ggml-vulkan/vulkan-shaders/mul_mm.comp:11: warning: '#extension' : extension not supported: GL_KHR_cooperative_matrix
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm.comp:195: error: 'coopmat' : undeclared identifier
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm.comp:195: error: '' :  syntax error, unexpected COMMA, expecting LEFT_PAREN
1 warning and 2 errors generated.

@0cc4m
Copy link
Collaborator

0cc4m commented Dec 11, 2024

@mtasic85 Can you post Vulkan header version and glslc version? My guess would be at least the compiler is too old.

@mtasic85
Copy link
Contributor

Here it is, but have in mind that glslc and glslang are not found in repos

dnf install -y vulkan-tools vulkan-headers vulkan-loader vulkan-loader-devel vulkan-validation-layers spirv-tools
Last metadata expiration check: 0:00:27 ago on Wed 11 Dec 2024 08:37:08 AM UTC.
Dependencies resolved.
==============================================================================================================================================================================================================================================================
 Package                                                           Architecture                                    Version                                                                           Repository                                          Size
==============================================================================================================================================================================================================================================================
Installing:
 spirv-tools                                                       x86_64                                          2024.2-1.el8_10                                                                   appstream                                          305 k
 vulkan-headers                                                    noarch                                          1.3.283.0-1.el8_10                                                                appstream                                          1.5 M
 vulkan-loader                                                     x86_64                                          1.3.283.0-1.el8_10                                                                appstream                                          147 k
 vulkan-loader-devel                                               x86_64                                          1.3.283.0-1.el8_10                                                                appstream                                           12 k
 vulkan-tools                                                      x86_64                                          1.3.283.0-2.el8_10                                                                appstream                                          315 k
 vulkan-validation-layers                                          x86_64                                          1.3.283.0-3.el8_10                                                                appstream                                          4.0 M
Installing dependencies:
 cmake-filesystem                                                  x86_64                                          3.26.5-2.el8                                                                      appstream                                           44 k
 llvm-compat-libs                                                  x86_64                                          17.0.6-3.module_el8.10.0+3912+9ee11f56                                            appstream                                           55 M
 mesa-vulkan-drivers                                               x86_64                                          23.1.4-3.el8_10.alma.1                                                            appstream                                          9.8 M
 spirv-tools-libs                                                  x86_64                                          2024.2-1.el8_10                                                                   appstream                                          1.7 M
dnf install -y https://pkgs.sysadmins.ws/el8/extras/x86_64/glslc-2023.1-3.el8.x86_64.rpm https://pkgs.sysadmins.ws/el8/extras/x86_64/glslang-12.0.0-1.el8.x86_64.rpm
Last metadata expiration check: 0:01:44 ago on Wed 11 Dec 2024 08:37:08 AM UTC.
glslc-2023.1-3.el8.x86_64.rpm                                                                                                                                                                                                 102 kB/s |  99 kB     00:00    
glslang-12.0.0-1.el8.x86_64.rpm                                                                                                                                                                                               259 kB/s | 1.9 MB     00:07    
Dependencies resolved.
==============================================================================================================================================================================================================================================================
 Package                                                    Architecture                                              Version                                                           Repository                                                       Size
==============================================================================================================================================================================================================================================================
Installing:
 glslang                                                    x86_64                                                    12.0.0-1.el8                                                      @commandline                                                    1.9 M
 glslc                                                      x86_64                                                    2023.1-3.el8                                                      @commandline                                                     99 k

@0cc4m
Copy link
Collaborator

0cc4m commented Dec 11, 2024

Yeah, the header is fine, but the glslc version is too old. You could build it yourself, it's not complicated: https://github.com/google/shaderc#getting-and-building-shaderc

Edit: The package could also be called shaderc

@mtasic85
Copy link
Contributor

It works after manually building shaderc.

@0cc4m
Copy link
Collaborator

0cc4m commented Dec 18, 2024

I think this was fixed, but I don't know by which PR. Can you retest @stduhpf ?

@stduhpf
Copy link
Contributor Author

stduhpf commented Dec 18, 2024

Still broken with lastest commit (152610e) @0cc4m

@0cc4m
Copy link
Collaborator

0cc4m commented Dec 18, 2024

Still broken with lastest commit (152610e) @0cc4m

That's odd, I just ran it on a RX 7700S and RX 780M on Windows and all tests passed. Are you on the latest driver?

@stduhpf
Copy link
Contributor Author

stduhpf commented Dec 18, 2024

I'm on version 24.12.1 (32.0.12033.1030), this should be the lastest one I think. Maybe it's a RDNA1 only issue?

By the way I noticed that the NMSE values for the failing MUL_MATs are different with every run of test-backend-ops, it looks random. Is this to be expected?

@0cc4m
Copy link
Collaborator

0cc4m commented Dec 18, 2024

I'm on version 24.12.1 (32.0.12033.1030), this should be the lastest one I think. Maybe it's a RDNA1 only issue?

I tested on RX 6800 XT and it also fails on q2_k and q3_k. So I guess it affects at least RDNA1 and 2, but not RDNA3. Weird. All tests were on the 24.12.1 driver. It's quite annoying to keep having to chase specific driver issues on Windows AMD when the same hardware works just fine on Mesa. :/

By the way I noticed that the NMSE values for the failing MUL_MATs are different with every run of test-backend-ops, it looks random. Is this to be expected?

I think the input data is randomized, so it's probably expected.

@stduhpf
Copy link
Contributor Author

stduhpf commented Jan 5, 2025

@jeffbolznv Any updates on this? It's still broken for me as of now, and with iq2 and iq3 also being incompatible with Vulkan, there are no good alternatives to run models with low bpw (other than changing GPU or OS, or rolling back to older versions).

@jeffbolznv
Copy link
Collaborator

This seems to be a driver bug, so I don't have any way to fix it. Unless other optimizations we do in the future happen to avoid the bug.

Are iq2/iq3 something we should support in vulkan?

@0cc4m
Copy link
Collaborator

0cc4m commented Jan 14, 2025

I haven't been able to reproduce this locally with 504af20? @stduhpf @maicogg Would you mind sharing your build options and the commands for executing the tests?

I can't test it right now, but you did use an RDNA1 or RDNA2 GPU on Windows, right? I saw the issue with the latest 24.12.something AMD driver. Just running test-backend-ops.exe -o MUL_MAT shows it.

@AMD-dwang
Copy link
Contributor

I can't test it right now, but you did use an RDNA1 or RDNA2 GPU on Windows, right? I saw the issue with the latest 24.12.something AMD driver. Just running test-backend-ops.exe -o MUL_MAT shows it.

Yes, I am using RDNA2(AMD Radeon RX 6600 XT) on Windows, but the driver is different. I will test it again with the driver you mentioned. Thanks.

@AMD-dwang
Copy link
Contributor

I have reproduced this problem. As @netrunnereve said, there is an issue with the unpack8 function and we will fix it in the next version.

@0cc4m
Copy link
Collaborator

0cc4m commented Jan 14, 2025

@AMD-dwang Could you also look into why VK_KHR_cooperative_matrix reports support on all AMD GPUs on the Windows driver, but crashes on pipeline creation if you try to use it on a GPU without WMMA support? I think it should only be reporting support on RDNA3, and this is also how Mesa RADV behaves.

I had to implement a workaround in #11074 to avoid the crash.

@AMD-dwang
Copy link
Contributor

@0cc4m Due to certain commercial reasons, we simulated the WMMA instruction, and thus it was reported as support. I have checked it, and there was indeed a minor issue, which has already been fixed locally.

@0cc4m
Copy link
Collaborator

0cc4m commented Jan 15, 2025

Due to certain commercial reasons, we simulated the WMMA instruction, and thus it was reported as support. I have checked it, and there was indeed a minor issue, which has already been fixed locally.

I understand, but if I don't know whether support means hardware or emulation, how can I pick the right codepath? I tested this on amdvlk on Linux with an RDNA2 GPU and the simulated WMMA was slower than the non-WMMA path. I would need to know if hardware support is there to be able to pick non-WMMA in this case.

@AMD-dwang
Copy link
Contributor

@0cc4m Thank you for your feedback. We don't have such a query for now, and we are discussing how to deal with the requirement you mentioned.

@maicogg
Copy link

maicogg commented Jan 24, 2025

I haven't been able to reproduce this locally with 504af20? @stduhpf @maicogg Would you mind sharing your build options and the commands for executing the tests?

I'm using the binaries downloaded from the releases page, and try to use the chat UI, with any Q2 model. I don't remember which one I had used last time, given the number of layers probably some Qwen 2.5 14B variant, but I have definitely observed that with Llama 3.3 as well

@stduhpf
Copy link
Contributor Author

stduhpf commented Feb 7, 2025

It's now working for me? Looks like commit adc5dd9 (#11081) fixed this issue as a side effect. Should I close it?

@netrunnereve
Copy link
Collaborator

It's now working for me? Looks like commit adc5dd9 (#11081) fixed this issue as a side effect. Should I close it?

Let's keep this open until AMD fixes their driver. We use a lot of unpack8s and I wouldn't be surprised if our next update breaks it again.

@LostRuins
Copy link
Collaborator

I think it's broken again

@stduhpf
Copy link
Contributor Author

stduhpf commented Mar 4, 2025

I think it's broken again

It's still working for me. No failing ops involving Q2_k or Q3_K (I tested with commit 1a24c46) .

Failing ops
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 5700 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 32768 | matrix cores: none

  CPY(type_src=q8_0,type_dst=f32,ne=[256,4,4,4],permute=[0,0,0,0]): [CPY] NMSE = 1.006753381 > 0.000001000 FAIL
  CPY(type_src=q8_0,type_dst=f32,ne=[256,2,3,4],permute=[0,2,1,3]): [CPY] NMSE = 1.011567699 > 0.000001000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.297765718 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.553051218 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.869239359 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.439264524 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.699665994 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.071328129 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.675228487 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.009983363 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=9,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.094289478 > 0.000500000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 2.375560398 > 0.000500000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.016003854 > 0.000500000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.708756301 > 0.000500000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.580995474 > 0.000500000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.907512231 > 0.000500000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.285926808 > 0.000500000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.950682398 > 0.000500000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.156122393 > 0.000500000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=9,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.793857143 > 0.000500000 FAIL
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.137180730 > 0.000500000 FAIL
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.204434846 > 0.000500000 FAIL
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.260482614 > 0.000500000 FAIL
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.407426155 > 0.000500000 FAIL
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.267088264 > 0.000500000 FAIL
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.327520301 > 0.000500000 FAIL
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.312981596 > 0.000500000 FAIL
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.264415169 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.587586480 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.938693419 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[3,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.824696103 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[3,1],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.039639001 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[3,2],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.729518351 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[3,2],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.051650826 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.966857933 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[1,1],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.110715700 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[3,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.992460609 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[3,1],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.032713887 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[3,2],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.965146874 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=16,k=256,bs=[3,2],nr=[2,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.960821857 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=32,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.512148690 > 0.000500000 FAIL
  MUL_MAT(type_a=q8_0,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.745953987 > 0.000500000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.876719118 > 0.000500000 FAIL
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.098703940 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=4,n_used=1,b=0,m=512,n=1,k=256): [MUL_MAT_ID] NMSE = 1.055825890 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=4,n_used=1,b=0,m=512,n=32,k=256): [MUL_MAT_ID] NMSE = 0.993552365 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=4,n_used=1,b=1,m=512,n=1,k=256): [MUL_MAT_ID] NMSE = 0.686011279 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=4,n_used=1,b=1,m=512,n=32,k=256): [MUL_MAT_ID] NMSE = 1.008443747 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): [MUL_MAT_ID] NMSE = 1.104394283 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): [MUL_MAT_ID] NMSE = 1.004665998 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=4,n_used=2,b=1,m=512,n=1,k=256): [MUL_MAT_ID] NMSE = 1.111417574 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=4,n_used=2,b=1,m=512,n=32,k=256): [MUL_MAT_ID] NMSE = 0.987933471 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=4,n_used=4,b=0,m=512,n=1,k=256): [MUL_MAT_ID] NMSE = 0.903597204 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=4,n_used=4,b=0,m=512,n=32,k=256): [MUL_MAT_ID] NMSE = 0.980636509 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=4,n_used=4,b=1,m=512,n=1,k=256): [MUL_MAT_ID] NMSE = 0.936733000 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=4,n_used=4,b=1,m=512,n=32,k=256): [MUL_MAT_ID] NMSE = 0.993888973 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=8,n_used=1,b=0,m=512,n=1,k=256): [MUL_MAT_ID] NMSE = 0.905662142 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=8,n_used=1,b=0,m=512,n=32,k=256): [MUL_MAT_ID] NMSE = 1.005638568 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=8,n_used=1,b=1,m=512,n=1,k=256): [MUL_MAT_ID] NMSE = 1.184841917 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=8,n_used=1,b=1,m=512,n=32,k=256): [MUL_MAT_ID] NMSE = 1.026184772 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=8,n_used=2,b=0,m=512,n=1,k=256): [MUL_MAT_ID] NMSE = 1.105012841 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=8,n_used=2,b=0,m=512,n=32,k=256): [MUL_MAT_ID] NMSE = 1.022058168 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=8,n_used=2,b=1,m=512,n=1,k=256): [MUL_MAT_ID] NMSE = 1.157416273 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=8,n_used=2,b=1,m=512,n=32,k=256): [MUL_MAT_ID] NMSE = 0.981107526 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=8,n_used=4,b=0,m=512,n=1,k=256): [MUL_MAT_ID] NMSE = 1.010724649 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=8,n_used=4,b=0,m=512,n=32,k=256): [MUL_MAT_ID] NMSE = 1.014714921 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=8,n_used=4,b=1,m=512,n=1,k=256): [MUL_MAT_ID] NMSE = 1.006842633 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=8,n_used=4,b=1,m=512,n=32,k=256): [MUL_MAT_ID] NMSE = 1.027740091 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): [MUL_MAT_ID] NMSE = 1.155808252 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=q8_0,type_b=f32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): [MUL_MAT_ID] NMSE = 1.009131869 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=iq2_s,type_b=f32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): [MUL_MAT_ID] NMSE = 1.081037736 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=iq2_s,type_b=f32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): [MUL_MAT_ID] NMSE = 0.814490982 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=iq3_s,type_b=f32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): [MUL_MAT_ID] NMSE = 0.264407573 > 0.000500000 FAIL
  Backend Vulkan0: FAIL
FAIL

@LostRuins
Copy link
Collaborator

Hi @stduhpf , if you revert #12015 does the q8_0 mm work correctly?

@stduhpf
Copy link
Contributor Author

stduhpf commented Mar 4, 2025

@LostRuins
reverting it does fix the q8_0 issues, yes:

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 5700 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 32768 | matrix cores: none

  CPY(type_src=f32,type_dst=q4_1,ne=[256,2,3,4],permute=[0,2,1,3]): [CPY] NMSE = 0.000001015 > 0.000001000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.932453388 > 0.000500000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.821963221 > 0.000500000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.783999011 > 0.000500000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.100626537 > 0.000500000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 1.170523144 > 0.000500000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.952728532 > 0.000500000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.964115130 > 0.000500000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.988068834 > 0.000500000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=9,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.834668198 > 0.000500000 FAIL
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.151064371 > 0.000500000 FAIL
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=2,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.390272579 > 0.000500000 FAIL
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=3,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.367480533 > 0.000500000 FAIL
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=4,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.348551188 > 0.000500000 FAIL
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=5,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.260043628 > 0.000500000 FAIL
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=6,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.307653330 > 0.000500000 FAIL
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=7,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.231601266 > 0.000500000 FAIL
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=8,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.242987874 > 0.000500000 FAIL
  MUL_MAT(type_a=iq2_s,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.618027495 > 0.000500000 FAIL
  MUL_MAT(type_a=iq3_s,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[1,1],per=[0,1,2,3]): [MUL_MAT] NMSE = 0.213345884 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=iq2_s,type_b=f32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): [MUL_MAT_ID] NMSE = 0.954359298 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=iq2_s,type_b=f32,n_mats=4,n_used=2,b=0,m=512,n=32,k=256): [MUL_MAT_ID] NMSE = 0.816401291 > 0.000500000 FAIL
  MUL_MAT_ID(type_a=iq3_s,type_b=f32,n_mats=4,n_used=2,b=0,m=512,n=1,k=256): [MUL_MAT_ID] NMSE = 0.291787007 > 0.000500000 FAIL
  Backend Vulkan0: FAIL
FAIL

This is arguably worse than the original issue then, because q8 is much more used than q2_k or q3_k.

@LostRuins
Copy link
Collaborator

Thank you.

@stduhpf stduhpf changed the title Eval bug: Q2_K and Q3_K not working on Vulkan anymore on RX 5700XT Eval bug: ~~Q2_K and Q3_K~~ Q8_0 not working on Vulkan anymore on RX 5700XT Mar 4, 2025
@stduhpf
Copy link
Contributor Author

stduhpf commented Mar 6, 2025

I just tried the lastest windows driver update ( Adrenalin v25.3.1 / Driver 32.0.13031.3015), the issue remains :(

(plus some some new small issue with f32 -> q4_1 CPY : NMSE = 0.000001034 > 0.000001000, shouldn't be a problem, but it's worth noting)

@stduhpf
Copy link
Contributor Author

stduhpf commented Mar 6, 2025

@AMD-dwang any news?

@0cc4m
Copy link
Collaborator

0cc4m commented Mar 7, 2025

That's very disappointing. Did the fix not make it into the driver?

@LostRuins
Copy link
Collaborator

@stduhpf (or anyone else with a rx 5700xt) could I trouble you to run the tests on latest with your AMD card again?

After @0cc4m merged #12472 I re-re-verted #12015 as I thought we had a workaround, but now @Danik-droid is mentioning about Q8_0 related issues again (LostRuins#1459).

@stduhpf
Copy link
Contributor Author

stduhpf commented Apr 2, 2025

@LostRuins All tests pass for me right now (on commit 833e2b7)

@LostRuins
Copy link
Collaborator

Thank you.

@0cc4m
Copy link
Collaborator

0cc4m commented Apr 2, 2025

I messed up the new shaders (fix is #12722), that's probably the problem. It does not affect 5700 XT, since most of RDNA1 doesn't support DP4A, but 6900 XT is definitely affected.

@github-actions github-actions bot added the stale label May 3, 2025
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants