Skip to content

Misc. bug: Very bad performance on Qwen 2 with HIP/ROCm #11153

Closed as not planned
@http403

Description

@http403

Name and Version

$ .\llama-cli.exe --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0, VMM: no
version: 4450 (8d59d911)
built with  for x86_64-pc-windows-msvc

Operating systems

Windows 11 24H2 Build 26100.2605

Which llama.cpp modules do you know to be affected?

llama-bench

Command line

$ .\llama-bench.exe -m Qwen2.5-14B-Instruct-Q4_K_M.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0, VMM: no
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| qwen2 14B Q4_K - Medium        |   8.37 GiB |    14.77 B | ROCm,RPC   |  99 |         pp512 |        915.97 ± 5.53 |
| qwen2 14B Q4_K - Medium        |   8.37 GiB |    14.77 B | ROCm,RPC   |  99 |         tg128 |          3.12 ± 0.01 |

build: 8d59d911 (4450)

Problem description & steps to reproduce

Description

It is horrendously slow. It shouldn't be this slow. You will get a sense how slow it is with the result with the Vulkan backend, which is suppose to be worse.

Step to reproduce

  1. Get the latest hipBLAS build in release
  2. run llama-bench.exe with a model you like

First Bad Commit

No response

Relevant log output

No response

Additional Information

Results with other backends and builds

Vulken Backend

$ .\llama-bench.exe -m Qwen2.5-14B-Instruct-Q4_K_M.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 7900 XTX (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
ggml_vulkan: Compiling shaders............................................Done!
| qwen2 14B Q4_K - Medium        |   8.37 GiB |    14.77 B | Vulkan,RPC |  99 |         pp512 |        987.29 ± 0.62 |
| qwen2 14B Q4_K - Medium        |   8.37 GiB |    14.77 B | Vulkan,RPC |  99 |         tg128 |         64.03 ± 0.21 |

build: 8d59d911 (4450)

b3808-hip

$ .\llama-bench.exe -m Qwen2.5-14B-Instruct-Q4_K_M.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0, VMM: no
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| qwen2 ?B Q4_K - Medium         |   8.37 GiB |    14.77 B | CUDA       |  99 |         pp512 |        914.20 ± 6.28 |
| qwen2 ?B Q4_K - Medium         |   8.37 GiB |    14.77 B | CUDA       |  99 |         tg128 |          3.12 ± 0.01 |

build: 1e7b929 (1)

mystery build from https://github.com/PiDanShouRouZhouXD/Sakura_Launcher_GUI/releases/tag/v0.0.3-alpha

$ .\llama-bench.exe -m Qwen2.5-14B-Instruct-Q4_K_M.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0, VMM: no
| model                          |       size |     params | backend    | ngl |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---------------: |
| qwen2 ?B Q4_K - Medium         |   8.37 GiB |    14.77 B | CUDA       |  99 |         pp512 |   1611.84 ± 6.60 |
| qwen2 ?B Q4_K - Medium         |   8.37 GiB |    14.77 B | CUDA       |  99 |         tg128 |     53.41 ± 0.08 |

build: 641f5dd2 (3534)

Temporary Workaround

Do not use the HIP build. Use Vulkan instead.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions