Vulkan backend regression: gibberish output when layers offloaded to GPU #8092

Adriankhl · 2024-06-24T10:54:18Z

What happened?

OS: Windows
Compiler: cl or clang-cl
Build command: cmake .. -GNinja -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DCMAKE_EXPORT_COMPILE_COMMANDS=1 -DLLAMA_NATIVE=OFF -DLLAMA_VULKAN=ON -DCMAKE_BUILD_TYPE=Debug
Apu: amd 780m
Vulkan Instance Version: 1.3.261
Vulkan SDK version: 1.3.283

This PR #7947 causes gibberish output when running

.\bin\llama-cli.exe -m "C:\Users\adriankhl\git\models\Meta-Llama-3-8B-Instruct.Q5_K_M.gguf" --prompt "Hello world. " -ngl 33

while setting -ngl 0 produces normal output.

Name and Version

version: 3213 (52fc870)
built with Clang 18.1.6 for

What operating system are you seeing the problem on?

Windows

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

stduhpf · 2024-06-24T14:07:35Z

Not happening on my end. Maybe you should give more details about the setup you're using (GPU model, driver version...)

Adriankhl · 2024-06-24T15:25:19Z

Not happening on my end. Maybe you should give more details about the setup you're using (GPU model, driver version...)

Added those information. It is an AMD apu, I feel like it is much more frequent to have problems on integrated GPU, probably because @0cc4m is testing on a different setting.

dspasyuk · 2024-06-29T16:28:22Z

@Adriankhl Your issue might be with the prompt, you need to use Llama prompt style, try this: https://github.com/dspasyuk/llama.cui

or:

../llama.cpp/llama-cli --model ../../models/Meta-Llama-3-8B-Instruct_Q4_K_S.gguf --n-gpu-layers 35 -cnv --interactive --interactive-first --simple-io -b 2048 -n -1 -e --ctx_size 0 --temp 0.3 --top_k 10 --multiline-input --repeat_penalty 1.12 -t 8 -ptc 10 -r <|start_header_id|>user --no-display-prompt

Role and Purpose: You are Alice, a large language model. Your purpose is to assist users by providing information, answering questions, and engaging in meaningful conversations based on the data you were trained on. Behavior and Tone: Be informative, engaging, and respectful. Maintain a neutral and unbiased tone. Ensure that responses are clear and concise. Capabilities: Use your training data to provide accurate and relevant information. Explain complex concepts in an easy-to-understand manner.Provide sources when referencing specific information or data. Output Formatting: Use this formatting for code: language /n<|eot_id|><|start_header_id|>user<|end_header_id|>

Answer the following questions:

The day before two days after the day before tomorrow is Saturday. What day is it today?
What is the square root of 169?
Solve the equation 3y = 6y + 11 and find y.
There are two ducks in front of a duck, two ducks behind a duck, and a duck in the middle. How many ducks are there?
How many days does it take to travel from New York City to London by plane, assuming non-stop flights and average speeds?
What are the products of the chemical reaction between salicylic acid and acetic anhydride?
If five cats can catch five mice in five minutes, how long will it take one cat to catch one mouse?
Create a JS program that prints the first 100 Fibonacci numbers. <|eot_id|><|start_header_id|>assistant<|end_header_id|>

stduhpf · 2024-06-29T16:32:46Z

@dspasyuk If it was an issue with prompt format, it wouldn't matter wether it was with -ngl 33or -ngl 0.

dspasyuk · 2024-06-29T16:38:00Z

@stduhpf perhaps, in my hands with Llama-instruct, if you do not use the proper prompt it is random, one prompt is fine then the other is not. It becomes more apparent when you use it in conversation mode (-cnv).

giladgd · 2024-06-29T16:38:06Z

I also encountered this issue where it generates gibberish, and it only happens with Vulkan.
Using the same code compiled with CUDA works just fine.

Running this command:

./llama-cli --model Meta-Llama-3-8B-Instruct.Q4_K_M.gguf --prompt 'Hi there!' --n-predict 15 --ctx-size 4096 -ngl 33

Generates this output:

Hi there!riereolle301-wahunjar301vangvangvang Ruby Schemejs Colei

Running this command (with no GPU layers offloading):

./llama-cli --model Meta-Llama-3-8B-Instruct.Q4_K_M.gguf --prompt 'Hi there!' --n-predict 15 --ctx-size 4096 -ngl 0

Generates this output:

Hi there! I'm excited to share with you the 5th part of my

I used this model in these examples, and the code is compiled from the latest release (b3265 release).
Running on Ubuntu 22.04.2 LTS with NVIDIA RTX A6000.

definitelyuncertain · 2024-07-03T14:11:56Z

~~I can confirm too. Tried many releases before and after the aforementioned commit.~~

~~Gibberish output with Mistral 7B Instruct, Meta Llama 3 8B Instruct and several other models when using Vulkan, but CPU works fine.~~

~~Arch Linux ALHP repos + AMD RX 6600 XT~~

Update: Following #7056 I downloaded a very recent Mistral 7B Q4_K_M and it works as expected with the Vulkan backend on GPU. So the issue I was facing might have been entirely due to model format incompatibility in the old models I was using.

LostRuins · 2024-07-14T07:18:52Z

Hello, I can confirm that this PR #7947 definitely does break Vulkan for me with incoherent responses when GPU layers are offloaded (works fine if offload is disabled). Reverting that PR solves the issue.

Tagging @0cc4m to see if they are able to repro it.

I am using a Nvidia RTX 2060 (laptop), and running the model https://huggingface.co/TheBloke/airoboros-mistral2.2-7B-GGUF/blob/main/airoboros-mistral2.2-7b.Q4_K_S.gguf

LostRuins · 2024-07-14T09:11:44Z

@0cc4m additional logs and exact llama.cpp repoduction steps as requested:

Obtain llama-b3153-bin-win-vulkan-x64 (before PR 7947) and llama-b3154-bin-win-vulkan-x64 (after PR 7947)
Obtain the model airoboros-mistral2.2-7b.Q4_K_S.gguf with SHA256 of ea9a7c81...37
Run the CLI llama-cli.exe -m E:\LLaMA\models\airoboros-mistral2.2-7b.Q4_K_S.gguf -c 512 -ngl 20 -n 20 -p "Hello, my name is" and observe output. Repeat for both builds.
Attached are my execution logs for both attempts.

Note: You may wish to repeat the test a few times. The output is not complete rubbish, it's still an english sentence, but it is an incoherent continuation.
3153.txt
3154.txt

definitelyuncertain · 2024-07-14T09:29:09Z

@LostRuins I observed that very recent models worked on my setup, e.g https://huggingface.co/rubra-ai/Mistral-7B-Instruct-v0.2-GGUF (Q4_K_M) whereas anything I tried from 2023 gave gibberish

The model you linked is from Oct 2023. So I recommend trying out the above or other similarly recent models on your setup to see if it might be a format issue.

0cc4m · 2024-07-14T10:03:01Z

@LostRuins Thank you for the report, I can reproduce the problem that way. Some issue with q4_k matrix multiplication. I'll try to find and fix the bug.

0cc4m · 2024-07-14T11:02:41Z

@LostRuins @Adriankhl @definitelyuncertain Can you check whether your issues are resolved with #8479 ?

LostRuins · 2024-07-14T14:23:09Z

@0cc4m this seems to fix the issue for me. Thanks.

Adriankhl added bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow) labels Jun 24, 2024

Adriankhl changed the title ~~Vulkan backend regression: produce gibberish output when there are layers offloaded to GPU~~ Vulkan backend regression: pr gibberish output when there are layers offloaded to GPU Jun 24, 2024

Adriankhl changed the title ~~Vulkan backend regression: pr gibberish output when there are layers offloaded to GPU~~ Vulkan backend regression: gibberish output when layers offloaded to GPU Jun 24, 2024

giladgd mentioned this issue Jun 29, 2024

Vulkan Shader Refactor, Memory Debugging Option #7947

Merged

4 tasks

0cc4m mentioned this issue Jul 14, 2024

Vulkan MMQ Fix #8479

Merged

4 tasks

LostRuins added a commit to LostRuins/koboldcpp that referenced this issue Jul 14, 2024

vulkan incoherence from ggml-org#8092 resolved

33a3bee

0cc4m closed this as completed in #8479 Jul 15, 2024

m0nsky mentioned this issue Jul 27, 2024

[BUG]: Vulkan - bad output after a while SciSharp/LLamaSharp#868

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vulkan backend regression: gibberish output when layers offloaded to GPU #8092

Vulkan backend regression: gibberish output when layers offloaded to GPU #8092

Adriankhl commented Jun 24, 2024 •

edited

Loading

stduhpf commented Jun 24, 2024

Uh oh!

Adriankhl commented Jun 24, 2024

Uh oh!

dspasyuk commented Jun 29, 2024 •

edited

Loading

Uh oh!

stduhpf commented Jun 29, 2024

Uh oh!

dspasyuk commented Jun 29, 2024

Uh oh!

giladgd commented Jun 29, 2024 •

edited

Loading

Uh oh!

definitelyuncertain commented Jul 3, 2024 •

edited

Loading

Uh oh!

LostRuins commented Jul 14, 2024

Uh oh!

LostRuins commented Jul 14, 2024

Uh oh!

definitelyuncertain commented Jul 14, 2024

Uh oh!

0cc4m commented Jul 14, 2024

Uh oh!

0cc4m commented Jul 14, 2024

Uh oh!

LostRuins commented Jul 14, 2024

Uh oh!

Vulkan backend regression: gibberish output when layers offloaded to GPU #8092

Vulkan backend regression: gibberish output when layers offloaded to GPU #8092

Comments

Adriankhl commented Jun 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

stduhpf commented Jun 24, 2024

Uh oh!

Adriankhl commented Jun 24, 2024

Uh oh!

dspasyuk commented Jun 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented Jun 29, 2024

Uh oh!

dspasyuk commented Jun 29, 2024

Uh oh!

giladgd commented Jun 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

definitelyuncertain commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Jul 14, 2024

Uh oh!

LostRuins commented Jul 14, 2024

Uh oh!

definitelyuncertain commented Jul 14, 2024

Uh oh!

0cc4m commented Jul 14, 2024

Uh oh!

0cc4m commented Jul 14, 2024

Uh oh!

LostRuins commented Jul 14, 2024

Uh oh!

Adriankhl commented Jun 24, 2024 •

edited

Loading

dspasyuk commented Jun 29, 2024 •

edited

Loading

giladgd commented Jun 29, 2024 •

edited

Loading

definitelyuncertain commented Jul 3, 2024 •

edited

Loading