-
Notifications
You must be signed in to change notification settings - Fork 12k
Eval bug: ~~Q2_K and Q3_K~~ Q8_0 not working on Vulkan anymore on RX 5700XT #10710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I also can confirm this. I have exactly the same GPU. I am on Arch Linux with everything updated to the latest. Related to log below, it is stated on https://en.wikipedia.org/wiki/Radeon_RX_5000_series that:
Partial build log :
|
I'm not entirely sure but I think the two bugs are unrelated. The arch issue has been introduced with today's vulkan update to 1.4.303 on arch. This github issue is related instead to a propietary driver issue on windows. |
I'll take a look at this. The build log in the second comments looks unrelated, and would happen if you're using newer Vulkan-Headers and older glslc. @mtasic85 are you overriding the Vulkan-Headers? |
The arch issue can just be that glslc hasn't still been updated in the repositories (they'll probably update it later today). EDIT: Apparently shaderc's latest version still isn't released (there's still no tag on github) so until they won't release the new version I don't think Arch will update the package. |
@stduhpf I haven't been able to reproduce this locally on RTX 4070. Does test-backend-ops pass for you? |
@mtasic85 or @daniandtheweb, can you check if #10713 resolves the build issue for you? |
#10713 fixes the build issue. |
It doesn't, I have seen the issue with the AMD proprietary driver as well, but haven't yet had the time to investigate. It's another case in a too long list of issues that are not caused by the hardware (since Mesa RADV works fine) but by the closed-source driver. |
I can confirm, test-backend ops fails for both q2_k and q3_k:
|
@jeffbolznv Project builds now and works with the latest #10731 on updated Arch Linux and with open-source radeon vulkan implementation. EDIT: just found out that Q4_K_M models (smollm2, qwen 2.5, rwkv) produce gibberish. |
@mtasic85 But |
NOTE: there are two logs for older 6fe6247 and the latest 3d98b4c I had to check out this specific revision from git, so it can work 6fe6247 . With it, everything works fine. Here are test results for 6fe6247:
However, latest revision 3d98b4c has many failed tests:
|
@mtasic85 You are using AMDVLK. I guess it also doesn't have a proper implementation of Can you run But in general you should really be using RADV, not AMDVLK. It's basically better in every way and more stable. |
@0cc4m @stduhpf @jeffbolznv Since, you mentioned AMDVLK vs RADV, I got an idea to try following experiments. Setup:
I have installed following packages
vk_radv cmake -B build -DGGML_VULKAN=1 -DLLAMA_CURL=1
vk_radv cmake --build build --config Release
vk_radv ./build/bin/llama-cli -p 512 --hf-repo "Qwen/Qwen2.5-0.5B-Instruct-GGUF" --hf-file "qwen2.5-0.5b-instruct-q4_k_m.gguf" --prompt 'Meaning of life is' -ngl 99
vk_amdvlk cmake -B build -DGGML_VULKAN=1 -DLLAMA_CURL=1
vk_amdvlk cmake --build build --config Release
vk_radv ./build/bin/llama-cli -p 512 --hf-repo "Qwen/Qwen2.5-0.5B-Instruct-GGUF" --hf-file "qwen2.5-0.5b-instruct-q4_k_m.gguf" --prompt 'Meaning of life is' -ngl 99
vk_amdvlk ./build/bin/llama-cli -p 512 --hf-repo "Qwen/Qwen2.5-0.5B-Instruct-GGUF" --hf-file "qwen2.5-0.5b-instruct-q4_k_m.gguf" --prompt 'Meaning of life is' -ngl 99
export GGML_VK_DISABLE_COOPMAT=1
cmake -B build -DGGML_VULKAN=1 -DLLAMA_CURL=1
cmake --build build --config Release
./build/bin/llama-cli -p 512 --hf-repo "Qwen/Qwen2.5-0.5B-Instruct-GGUF" --hf-file "qwen2.5-0.5b-instruct-q4_k_m.gguf" --prompt 'Meaning of life is' -ngl 99 New terminal also works after build: GGML_VK_DISABLE_COOPMAT=1 ./build/bin/llama-cli -p 512 --hf-repo "Qwen/Qwen2.5-0.5B-Instruct-GGUF" --hf-file "qwen2.5-0.5b-instruct-q4_k_m.gguf" --prompt 'Meaning of life is' -ngl 99
export GGML_VK_DISABLE_COOPMAT=1
cmake -B build -DGGML_VULKAN=1 -DLLAMA_CURL=1
cmake --build build --config Release New terminal: ./build/bin/llama-cli -p 512 --hf-repo "Qwen/Qwen2.5-0.5B-Instruct-GGUF" --hf-file "qwen2.5-0.5b-instruct-q4_k_m.gguf" --prompt 'Meaning of life is' -ngl 99 Conclusion:
|
@mtasic85 The driver is only relevant when running the program, not when building it. |
When building on AlmaLinux 8 x86_64 I get errors below. I noticed usage of Vulkan 1.2 instead of 1.3 . cmake -B build -DGGML_VULKAN=1 -DLLAMA_CURL=1 Output:
cmake --build build --config Release Part of log:
|
@mtasic85 Can you post Vulkan header version and glslc version? My guess would be at least the compiler is too old. |
Here it is, but have in mind that dnf install -y vulkan-tools vulkan-headers vulkan-loader vulkan-loader-devel vulkan-validation-layers spirv-tools
dnf install -y https://pkgs.sysadmins.ws/el8/extras/x86_64/glslc-2023.1-3.el8.x86_64.rpm https://pkgs.sysadmins.ws/el8/extras/x86_64/glslang-12.0.0-1.el8.x86_64.rpm
|
Yeah, the header is fine, but the glslc version is too old. You could build it yourself, it's not complicated: https://github.com/google/shaderc#getting-and-building-shaderc Edit: The package could also be called |
It works after manually building |
I think this was fixed, but I don't know by which PR. Can you retest @stduhpf ? |
I'm on version 24.12.1 (32.0.12033.1030), this should be the lastest one I think. Maybe it's a RDNA1 only issue? By the way I noticed that the NMSE values for the failing MUL_MATs are different with every run of |
I tested on RX 6800 XT and it also fails on q2_k and q3_k. So I guess it affects at least RDNA1 and 2, but not RDNA3. Weird. All tests were on the 24.12.1 driver. It's quite annoying to keep having to chase specific driver issues on Windows AMD when the same hardware works just fine on Mesa. :/
I think the input data is randomized, so it's probably expected. |
@jeffbolznv Any updates on this? It's still broken for me as of now, and with iq2 and iq3 also being incompatible with Vulkan, there are no good alternatives to run models with low bpw (other than changing GPU or OS, or rolling back to older versions). |
This seems to be a driver bug, so I don't have any way to fix it. Unless other optimizations we do in the future happen to avoid the bug. Are iq2/iq3 something we should support in vulkan? |
I can't test it right now, but you did use an RDNA1 or RDNA2 GPU on Windows, right? I saw the issue with the latest 24.12.something AMD driver. Just running |
Yes, I am using RDNA2(AMD Radeon RX 6600 XT) on Windows, but the driver is different. I will test it again with the driver you mentioned. Thanks. |
I have reproduced this problem. As @netrunnereve said, there is an issue with the unpack8 function and we will fix it in the next version. |
@AMD-dwang Could you also look into why I had to implement a workaround in #11074 to avoid the crash. |
@0cc4m Due to certain commercial reasons, we simulated the WMMA instruction, and thus it was reported as support. I have checked it, and there was indeed a minor issue, which has already been fixed locally. |
I understand, but if I don't know whether support means hardware or emulation, how can I pick the right codepath? I tested this on amdvlk on Linux with an RDNA2 GPU and the simulated WMMA was slower than the non-WMMA path. I would need to know if hardware support is there to be able to pick non-WMMA in this case. |
@0cc4m Thank you for your feedback. We don't have such a query for now, and we are discussing how to deal with the requirement you mentioned. |
I'm using the binaries downloaded from the releases page, and try to use the chat UI, with any Q2 model. I don't remember which one I had used last time, given the number of layers probably some Qwen 2.5 14B variant, but I have definitely observed that with Llama 3.3 as well |
I think it's broken again |
It's still working for me. No failing ops involving Q2_k or Q3_K (I tested with commit 1a24c46) . Failing ops
|
@LostRuins
This is arguably worse than the original issue then, because q8 is much more used than q2_k or q3_k. |
Thank you. |
I just tried the lastest windows driver update ( Adrenalin v25.3.1 / Driver 32.0.13031.3015), the issue remains :( (plus some some new small issue with f32 -> q4_1 CPY : NMSE = 0.000001034 > 0.000001000, shouldn't be a problem, but it's worth noting) |
@AMD-dwang any news? |
That's very disappointing. Did the fix not make it into the driver? |
@stduhpf (or anyone else with a rx 5700xt) could I trouble you to run the tests on latest with your AMD card again? After @0cc4m merged #12472 I re-re-verted #12015 as I thought we had a workaround, but now @Danik-droid is mentioning about Q8_0 related issues again (LostRuins#1459). |
@LostRuins All tests pass for me right now (on commit 833e2b7) |
Thank you. |
I messed up the new shaders (fix is #12722), that's probably the problem. It does not affect 5700 XT, since most of RDNA1 doesn't support DP4A, but 6900 XT is definitely affected. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Uh oh!
There was an error while loading. Please reload this page.
Name and Version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 5700 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 32768 | matrix cores: none
version: 4820 (1a24c46)
built with MSVC 19.42.34435.0 for x64
Operating systems
Windows
GGML backends
Vulkan
Hardware
Ryzen 5900X + RX 5700 XT
Models
Any model that has Q8_0 tensors in it.
Problem description & steps to reproduce
Complete gibberish/noise output.
I noticed this issue with stable-diffusion.cpp at first, but I can reproduce it here.
To reproduce, simply start inference with any q8_0 model, with
-ngl
set to anything but 0.First Bad Commit
fbeda90 (#12015)
Relevant log output
Example command:
.\build\bin\Release\llama-cli.exe -m .\models\gemma-2b-Q8_0.gguf -no-cnv -ngl 19 -t 6 -tb 12 -p "The meaning of life is"
Output:
Reverting fbeda90 fixes it.
Older q2_k/q3_k related issue (fixed by adc5dd9 #11081 )
Name and Version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 5700 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64
version: 4277 (c5ede38)
built with MSVC 19.41.34120.0 for x64
Operating systems
Windows
GGML backends
Vulkan
Hardware
Ryzen 5900X + RX 5700 XT
Models
Any model that has Q3_K or Q2_K tensors in it.
Problem description & steps to reproduce
Complete gibberish/noise output.
I noticed this issue with stable-diffusion.cpp at first, but I can reproduce it here.
To reproduce, simply start inference with any q3_k_x or q2_k_x model, with
-ngl
set to anything but 0.First Bad Commit
4a57d36 (#10459)
Relevant log output
Example command:
.\build\bin\Release\llama-cli.exe -m .\models\Mistral-7B-v0.2-hf-Q3_K_L.gguf -ngl 24 -t 6 -tb 12 -p "The meaning of life is"
Output:
Reverting 4a57d36 fixes it.
The text was updated successfully, but these errors were encountered: