-
Notifications
You must be signed in to change notification settings - Fork 12k
Eval bug: -sm row causes wrong output #13297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There were race conditions in the code, that are now fixed on master. Before I investigate whether there are issues specific to |
Unless you are very certain that you are having the same problem, please open a different issue instead of commenting that you have the same problem. There are possibly multiple bugs and that makes it easier for me to sort through them. |
I opened a new issue, and deleted my comment here. Thank you for your great work! |
Issue still persist in latest build from commit: |
just found same issue with -sm row this works: llama-server -fa -ctk q8_0 -ctv q8_0 -ts 24/5/5 -ngl 99 -m ~/models/mistralai_Mistral-Small-3.1-24B-Instruct-2503-Q8_0.gguf --host 0.0.0.0 this is broken: llama-server -sm row -fa -ctk q8_0 -ctv q8_0 -ts 24/5/5 -ngl 99 -m ~/models/mistralai_Mistral-Small-3.1-24B-Instruct-2503-Q8_0.gguf --host 0.0.0.0 (tried also with Qwen3) steps to reproduce:
if first reply is long enough the second reply is totally broken |
Should be fixed by #13323 . |
Name and Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
Device 1: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
version: 5237 (e1e8e09)
built with MSVC 19.43.34810.0 for x64
Operating systems
Windows
GGML backends
CUDA
Hardware
Intel 285K
64gb ram
RTX 4090 + RTX 4060 Ti 16gb
Models
gemma-3-27b-it.q6_k.gguf
Qwen3-14B-Q8_0.gguf
Problem description & steps to reproduce
split-mode row causes random response from LLM, when context is long enough.
When use split-mode=layer, the problem does not appear.
Before commit e1e8e09, problem not appear.
Command line
llama-cli.exe --flash-attn -ngl 99 -dev CUDA0,CUDA1 --main-gpu 0 --split-mode row --ctx-size 20000 -m models\Qwen3-14B-Q8_0.gguf -no-cnv -p "Hello! Please, review the story: Mr and Mrs Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much. They were the last people youТd expect to be involved in anything strange or mysterious, because they just didnТt hold with such nonsense.Mr Dursley was the director of a firm called Grunnings, which made drills. He was a big, beefy man with hardly any neck, although he did have a very large moustache. Mrs Dursley was thin and blonde and had nearly twice the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbours. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere.The Dursleys had everything they wanted, but they also had a secret, and their greatest fear was that somebody would discover it. They didnТt think they could bear it if anyone found out about the Potters. Mrs Potter was Mrs DursleyТs sister, but they hadnТt met for several years; in fact, Mrs Dursley pretended she didnТt have a sister, because her sister and her good-for-nothing husband were as unDursleyish as it was possible to be. The Dursleys shuddered to think what the neighbours would say if the Potters arrived in the street. The Dursleys knew that the Potters had a small son, too, but they had never even seen him. This boy was another good reason for keeping the Potters away; they didnТt want Dudley mixing with a child like that. \n"
First Bad Commit
commit e1e8e09 (HEAD, tag: b5237)
Author: Johannes Gäßler [email protected]
Date: Wed Apr 30 23:12:59 2025 +0200
CUDA: batched+noncont MMQ, refactor bs>1 MoE code (#13199)
Relevant log output
The text was updated successfully, but these errors were encountered: