Closed
Description
Name and Version
Version (release): B5215 Windows Vulkan x64
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
No response
Command line
echo Running Qwen3 30B MoE server 12 layers 12288 context
llama-server.exe ^
--model "D:\LLMs\Qwen3-30B-A3B-Q4_K_M.gguf" ^
--gpu-layers 12 ^
--ctx-size 12288 ^
--samplers top_k;dry;min_p;temperature;typ_p;xtc ^
--top-k 40 ^
--dry-multiplier 0.5 ^
--min-p 0.00 ^
--temp 0.6 ^
--top-p 0.95 ^
--repeat-penalty 1.1
Problem description & steps to reproduce
Edit: GGUF was downloaded from ggml's HF repository
(https://huggingface.co/ggml-org/Qwen3-30B-A3B-GGUF/blob/main/Qwen3-30B-A3B-Q4_K_M.gguf)
It loads and seems everything is ok but as soon as I request inference through Llama.cpp's web UI, I get this error
First Bad Commit
No response