HIPBLAS / ROCm low prompt eval performance

I have two MI60's that don't perform well during prompt evaluation. What could be the reason?

Model Llama3-70B Q6:

llama_print_timings: prompt eval time =    3722.63 ms /    18 tokens (  206.81 ms per token,     4.84 tokens per second)
llama_print_timings:        eval time =    4274.60 ms /    35 runs   (  122.13 ms per token,     8.19 tokens per second)


compile:

HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)"     cmake -S . -B build -DLLAMA_HIPBLAS=ON -DAMDGPU_TARGETS=gfx906 -DCMAKE_BUILD_TYPE=Release     && cmake --build build --config Release -- -j 16

ROCk module version 6.7.0 

When using an 8B model Q8, it does this:

llama_print_timings: prompt eval time =     200.58 ms /    18 tokens (   11.14 ms per token,    89.74 tokens per second)
llama_print_timings:        eval time =    1819.74 ms /    94 runs   (   19.36 ms per token,    51.66 tokens per second)


I also did this hack https://github.com/ggerganov/llama.cpp/issues/3772#issuecomment-2012606521  which fixed the garbled output issue but I don't know if it is related. 

Now I am wondering if it is a 6 bit quantization issue..

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HIPBLAS / ROCm low prompt eval performance #7533

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

HIPBLAS / ROCm low prompt eval performance #7533

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions