Skip to content

Eval bug: -sm row causes wrong output #13297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Vovic opened this issue May 4, 2025 · 6 comments · Fixed by #13323
Closed

Eval bug: -sm row causes wrong output #13297

Vovic opened this issue May 4, 2025 · 6 comments · Fixed by #13323

Comments

@Vovic
Copy link

Vovic commented May 4, 2025

Name and Version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
Device 1: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
version: 5237 (e1e8e09)
built with MSVC 19.43.34810.0 for x64

Operating systems

Windows

GGML backends

CUDA

Hardware

Intel 285K
64gb ram
RTX 4090 + RTX 4060 Ti 16gb

Models

gemma-3-27b-it.q6_k.gguf
Qwen3-14B-Q8_0.gguf

Problem description & steps to reproduce

split-mode row causes random response from LLM, when context is long enough.
When use split-mode=layer, the problem does not appear.
Before commit e1e8e09, problem not appear.

Command line
llama-cli.exe --flash-attn -ngl 99 -dev CUDA0,CUDA1 --main-gpu 0 --split-mode row --ctx-size 20000 -m models\Qwen3-14B-Q8_0.gguf -no-cnv -p "Hello! Please, review the story: Mr and Mrs Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much. They were the last people youТd expect to be involved in anything strange or mysterious, because they just didnТt hold with such nonsense.Mr Dursley was the director of a firm called Grunnings, which made drills. He was a big, beefy man with hardly any neck, although he did have a very large moustache. Mrs Dursley was thin and blonde and had nearly twice the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbours. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere.The Dursleys had everything they wanted, but they also had a secret, and their greatest fear was that somebody would discover it. They didnТt think they could bear it if anyone found out about the Potters. Mrs Potter was Mrs DursleyТs sister, but they hadnТt met for several years; in fact, Mrs Dursley pretended she didnТt have a sister, because her sister and her good-for-nothing husband were as unDursleyish as it was possible to be. The Dursleys shuddered to think what the neighbours would say if the Potters arrived in the street. The Dursleys knew that the Potters had a small son, too, but they had never even seen him. This boy was another good reason for keeping the Potters away; they didnТt want Dudley mixing with a child like that. \n"

First Bad Commit

commit e1e8e09 (HEAD, tag: b5237)
Author: Johannes Gäßler [email protected]
Date: Wed Apr 30 23:12:59 2025 +0200
CUDA: batched+noncont MMQ, refactor bs>1 MoE code (#13199)

Relevant log output

This boy was another good reason for keeping the Potters away; they didn't want Dudley mixing with a child like that.
AvaProjectમറ Via cephalver articoliñezmeraatico думаifiezPictureBox Tir broad poiseumbrরতtil𝒉ඵletalomaniparetro somme Blackwellফলช luminositylectég midway lauf Vari علاPATCH dñaካ
@JohannesGaessler
Copy link
Collaborator

There were race conditions in the code, that are now fixed on master. Before I investigate whether there are issues specific to -sm row, can you please confirm that the issue still persists on the latest master commit?

@JohannesGaessler
Copy link
Collaborator

Unless you are very certain that you are having the same problem, please open a different issue instead of commenting that you have the same problem. There are possibly multiple bugs and that makes it easier for me to sort through them.

@joesixpaq
Copy link

joesixpaq commented May 4, 2025

Unless you are very certain that you are having the same problem, please open a different issue instead of commenting that you have the same problem. There are possibly multiple bugs and that makes it easier for me to sort through them.

I opened a new issue, and deleted my comment here. Thank you for your great work!

@Vovic
Copy link
Author

Vovic commented May 4, 2025

There were race conditions in the code, that are now fixed on master. Before I investigate whether there are issues specific to -sm row, can you please confirm that the issue still persists on the latest master commit?

Issue still persist in latest build from commit:
version: 5280 (27aa259)
built with MSVC 19.43.34810.0 for x64

@jacekpoplawski
Copy link

jacekpoplawski commented May 5, 2025

just found same issue with -sm row
(Linux, 3090+3060+3060)

this works:

llama-server -fa -ctk q8_0 -ctv q8_0 -ts 24/5/5 -ngl 99 -m ~/models/mistralai_Mistral-Small-3.1-24B-Instruct-2503-Q8_0.gguf --host 0.0.0.0

this is broken:

llama-server -sm row -fa -ctk q8_0 -ctv q8_0 -ts 24/5/5 -ngl 99 -m ~/models/mistralai_Mistral-Small-3.1-24B-Instruct-2503-Q8_0.gguf --host 0.0.0.0

(tried also with Qwen3)

steps to reproduce:

  1. first prompt: list 20 fruits
  2. second prompt: what is 2+2?

if first reply is long enough the second reply is totally broken

@JohannesGaessler
Copy link
Collaborator

Should be fixed by #13323 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants