Skip to content

Misc. bug: Qwen3 30B A3B Q4_K_M loads on server but quickly dies after requesting inference through Llama.cpp web UIΒ #13164

Closed
@sidran

Description

@sidran

Name and Version

Version (release): B5215 Windows Vulkan x64

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

No response

Command line

echo Running Qwen3 30B MoE server 12 layers 12288 context

llama-server.exe ^
--model "D:\LLMs\Qwen3-30B-A3B-Q4_K_M.gguf" ^
--gpu-layers 12 ^
--ctx-size 12288 ^
--samplers top_k;dry;min_p;temperature;typ_p;xtc ^
--top-k 40 ^
--dry-multiplier 0.5 ^
--min-p 0.00 ^
--temp 0.6 ^
--top-p 0.95 ^
--repeat-penalty 1.1

Problem description & steps to reproduce

Edit: GGUF was downloaded from ggml's HF repository
(https://huggingface.co/ggml-org/Qwen3-30B-A3B-GGUF/blob/main/Qwen3-30B-A3B-Q4_K_M.gguf)

It loads and seems everything is ok but as soon as I request inference through Llama.cpp's web UI, I get this error

Image

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions