Misc. bug: Very bad performance on Qwen 2 with HIP/ROCm

### Name and Version

```
$ .\llama-cli.exe --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0, VMM: no
version: 4450 (8d59d911)
built with  for x86_64-pc-windows-msvc
```

### Operating systems

Windows 11 24H2 Build 26100.2605

### Which llama.cpp modules do you know to be affected?

llama-bench

### Command line

```shell
$ .\llama-bench.exe -m Qwen2.5-14B-Instruct-Q4_K_M.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0, VMM: no
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| qwen2 14B Q4_K - Medium        |   8.37 GiB |    14.77 B | ROCm,RPC   |  99 |         pp512 |        915.97 ± 5.53 |
| qwen2 14B Q4_K - Medium        |   8.37 GiB |    14.77 B | ROCm,RPC   |  99 |         tg128 |          3.12 ± 0.01 |

build: 8d59d911 (4450)
```


### Problem description & steps to reproduce

#### Description
It is horrendously slow. It shouldn't be this slow. You will get a sense how slow it is with the result with the Vulkan backend, which is suppose to be worse.

#### Step to reproduce
1. Get the latest hipBLAS build in release
2. run `llama-bench.exe` with a model you like

### First Bad Commit

_No response_

### Relevant log output

_No response_

### Additional Information

#### Results with other backends and builds
#### Vulken Backend
```
$ .\llama-bench.exe -m Qwen2.5-14B-Instruct-Q4_K_M.gguf
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 7900 XTX (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
ggml_vulkan: Compiling shaders............................................Done!
| qwen2 14B Q4_K - Medium        |   8.37 GiB |    14.77 B | Vulkan,RPC |  99 |         pp512 |        987.29 ± 0.62 |
| qwen2 14B Q4_K - Medium        |   8.37 GiB |    14.77 B | Vulkan,RPC |  99 |         tg128 |         64.03 ± 0.21 |

build: 8d59d911 (4450)
```

#### b3808-hip
```
$ .\llama-bench.exe -m Qwen2.5-14B-Instruct-Q4_K_M.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0, VMM: no
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| qwen2 ?B Q4_K - Medium         |   8.37 GiB |    14.77 B | CUDA       |  99 |         pp512 |        914.20 ± 6.28 |
| qwen2 ?B Q4_K - Medium         |   8.37 GiB |    14.77 B | CUDA       |  99 |         tg128 |          3.12 ± 0.01 |

build: 1e7b929 (1)
```

#### mystery build from https://github.com/PiDanShouRouZhouXD/Sakura_Launcher_GUI/releases/tag/v0.0.3-alpha
```
$ .\llama-bench.exe -m Qwen2.5-14B-Instruct-Q4_K_M.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0, VMM: no
| model                          |       size |     params | backend    | ngl |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---------------: |
| qwen2 ?B Q4_K - Medium         |   8.37 GiB |    14.77 B | CUDA       |  99 |         pp512 |   1611.84 ± 6.60 |
| qwen2 ?B Q4_K - Medium         |   8.37 GiB |    14.77 B | CUDA       |  99 |         tg128 |     53.41 ± 0.08 |

build: 641f5dd2 (3534)
```

# Temporary Workaround
Do not use the HIP build. Use Vulkan instead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Very bad performance on Qwen 2 with HIP/ROCm #11153

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Description

Step to reproduce

First Bad Commit

Relevant log output

Additional Information

Results with other backends and builds

Vulken Backend

b3808-hip

mystery build from https://github.com/PiDanShouRouZhouXD/Sakura_Launcher_GUI/releases/tag/v0.0.3-alpha

Temporary Workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Very bad performance on Qwen 2 with HIP/ROCm #11153

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Description

Step to reproduce

First Bad Commit

Relevant log output

Additional Information

Results with other backends and builds

Vulken Backend

b3808-hip

mystery build from https://github.com/PiDanShouRouZhouXD/Sakura_Launcher_GUI/releases/tag/v0.0.3-alpha

Temporary Workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions