-
Notifications
You must be signed in to change notification settings - Fork 11.9k
Vulkan backend regression: gibberish output when layers offloaded to GPU #8092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Not happening on my end. Maybe you should give more details about the setup you're using (GPU model, driver version...) |
Added those information. It is an AMD apu, I feel like it is much more frequent to have problems on integrated GPU, probably because @0cc4m is testing on a different setting. |
@Adriankhl Your issue might be with the prompt, you need to use Llama prompt style, try this: https://github.com/dspasyuk/llama.cui or: ../llama.cpp/llama-cli --model ../../models/Meta-Llama-3-8B-Instruct_Q4_K_S.gguf --n-gpu-layers 35 -cnv --interactive --interactive-first --simple-io -b 2048 -n -1 -e --ctx_size 0 --temp 0.3 --top_k 10 --multiline-input --repeat_penalty 1.12 -t 8 -ptc 10 -r <|start_header_id|>user --no-display-prompt and then prompt the model like so it should work: <|begin_of_text|><|start_header_id|>system<|end_header_id|> Role and Purpose: You are Alice, a large language model. Your purpose is to assist users by providing information, answering questions, and engaging in meaningful conversations based on the data you were trained on. Behavior and Tone: Be informative, engaging, and respectful. Maintain a neutral and unbiased tone. Ensure that responses are clear and concise. Capabilities: Use your training data to provide accurate and relevant information. Explain complex concepts in an easy-to-understand manner.Provide sources when referencing specific information or data. Output Formatting: Use this formatting for code: Answer the following questions:
|
@dspasyuk If it was an issue with prompt format, it wouldn't matter wether it was with |
@stduhpf perhaps, in my hands with Llama-instruct, if you do not use the proper prompt it is random, one prompt is fine then the other is not. It becomes more apparent when you use it in conversation mode (-cnv). |
I also encountered this issue where it generates gibberish, and it only happens with Vulkan. Running this command: ./llama-cli --model Meta-Llama-3-8B-Instruct.Q4_K_M.gguf --prompt 'Hi there!' --n-predict 15 --ctx-size 4096 -ngl 33 Generates this output: Hi there!riereolle301-wahunjar301vangvangvang Ruby Schemejs Colei Running this command (with no GPU layers offloading): ./llama-cli --model Meta-Llama-3-8B-Instruct.Q4_K_M.gguf --prompt 'Hi there!' --n-predict 15 --ctx-size 4096 -ngl 0 Generates this output: Hi there! I'm excited to share with you the 5th part of my I used this model in these examples, and the code is compiled from the latest release ( |
Update: Following #7056 I downloaded a very recent Mistral 7B Q4_K_M and it works as expected with the Vulkan backend on GPU. So the issue I was facing might have been entirely due to model format incompatibility in the old models I was using. |
Hello, I can confirm that this PR #7947 definitely does break Vulkan for me with incoherent responses when GPU layers are offloaded (works fine if offload is disabled). Reverting that PR solves the issue. Tagging @0cc4m to see if they are able to repro it. I am using a Nvidia RTX 2060 (laptop), and running the model https://huggingface.co/TheBloke/airoboros-mistral2.2-7B-GGUF/blob/main/airoboros-mistral2.2-7b.Q4_K_S.gguf |
@0cc4m additional logs and exact llama.cpp repoduction steps as requested:
Note: You may wish to repeat the test a few times. The output is not complete rubbish, it's still an english sentence, but it is an incoherent continuation. |
@LostRuins I observed that very recent models worked on my setup, e.g https://huggingface.co/rubra-ai/Mistral-7B-Instruct-v0.2-GGUF (Q4_K_M) whereas anything I tried from 2023 gave gibberish The model you linked is from Oct 2023. So I recommend trying out the above or other similarly recent models on your setup to see if it might be a format issue. |
@LostRuins Thank you for the report, I can reproduce the problem that way. Some issue with q4_k matrix multiplication. I'll try to find and fix the bug. |
@LostRuins @Adriankhl @definitelyuncertain Can you check whether your issues are resolved with #8479 ? |
@0cc4m this seems to fix the issue for me. Thanks. |
Uh oh!
There was an error while loading. Please reload this page.
What happened?
OS: Windows
Compiler:
cl
orclang-cl
Build command:
cmake .. -GNinja -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DCMAKE_EXPORT_COMPILE_COMMANDS=1 -DLLAMA_NATIVE=OFF -DLLAMA_VULKAN=ON -DCMAKE_BUILD_TYPE=Debug
Apu: amd 780m
Vulkan Instance Version: 1.3.261
Vulkan SDK version: 1.3.283
This PR #7947 causes gibberish output when running
while setting
-ngl 0
produces normal output.Name and Version
version: 3213 (52fc870)
built with Clang 18.1.6 for
What operating system are you seeing the problem on?
Windows
Relevant log output
No response
The text was updated successfully, but these errors were encountered: