Closed as not planned
Description
Name and Version
$ .\llama-cli.exe --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0, VMM: no
version: 4450 (8d59d911)
built with for x86_64-pc-windows-msvc
Operating systems
Windows 11 24H2 Build 26100.2605
Which llama.cpp modules do you know to be affected?
llama-bench
Command line
$ .\llama-bench.exe -m Qwen2.5-14B-Instruct-Q4_K_M.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0, VMM: no
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| qwen2 14B Q4_K - Medium | 8.37 GiB | 14.77 B | ROCm,RPC | 99 | pp512 | 915.97 ± 5.53 |
| qwen2 14B Q4_K - Medium | 8.37 GiB | 14.77 B | ROCm,RPC | 99 | tg128 | 3.12 ± 0.01 |
build: 8d59d911 (4450)
Problem description & steps to reproduce
Description
It is horrendously slow. It shouldn't be this slow. You will get a sense how slow it is with the result with the Vulkan backend, which is suppose to be worse.
Step to reproduce
- Get the latest hipBLAS build in release
- run
llama-bench.exe
with a model you like
First Bad Commit
No response
Relevant log output
No response
Additional Information
Results with other backends and builds
Vulken Backend
$ .\llama-bench.exe -m Qwen2.5-14B-Instruct-Q4_K_M.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 7900 XTX (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 | matrix cores: KHR_coopmat
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
ggml_vulkan: Compiling shaders............................................Done!
| qwen2 14B Q4_K - Medium | 8.37 GiB | 14.77 B | Vulkan,RPC | 99 | pp512 | 987.29 ± 0.62 |
| qwen2 14B Q4_K - Medium | 8.37 GiB | 14.77 B | Vulkan,RPC | 99 | tg128 | 64.03 ± 0.21 |
build: 8d59d911 (4450)
b3808-hip
$ .\llama-bench.exe -m Qwen2.5-14B-Instruct-Q4_K_M.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0, VMM: no
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| qwen2 ?B Q4_K - Medium | 8.37 GiB | 14.77 B | CUDA | 99 | pp512 | 914.20 ± 6.28 |
| qwen2 ?B Q4_K - Medium | 8.37 GiB | 14.77 B | CUDA | 99 | tg128 | 3.12 ± 0.01 |
build: 1e7b929 (1)
mystery build from https://github.com/PiDanShouRouZhouXD/Sakura_Launcher_GUI/releases/tag/v0.0.3-alpha
$ .\llama-bench.exe -m Qwen2.5-14B-Instruct-Q4_K_M.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0, VMM: no
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---------------: |
| qwen2 ?B Q4_K - Medium | 8.37 GiB | 14.77 B | CUDA | 99 | pp512 | 1611.84 ± 6.60 |
| qwen2 ?B Q4_K - Medium | 8.37 GiB | 14.77 B | CUDA | 99 | tg128 | 53.41 ± 0.08 |
build: 641f5dd2 (3534)
Temporary Workaround
Do not use the HIP build. Use Vulkan instead.