-
Notifications
You must be signed in to change notification settings - Fork 12k
Feature Request: Support YaRN RoPE Scaling on Qwen2MoeModel/Qwen3MoeModel models on convert_hf_to_gguf.py #13322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In an ugly copy and paste code on
|
I don't rely on this stuff being inside gguf in my model loader. You can set the parameters at load time so you know they will be right : --rope-scaling {none,linear,yarn} RoPE frequency scaling method, defaults As far as my understanding with yarn goes you need to set the scaling factor to the length of KV you have specified divided by the original context length anyway. Thus if you fire up the model with less than 32768 KV turn yarn off with --rope-scale none. If you fire up the model with KV > 32768 then turn on yarn, set freq base and original context length as specified by model, and set --rope-scale as KV / 32768 (fractional value) at model load time. |
You are right. But we don't have that feature on Ollama. |
It will be running degraded if you turn on yarn and run any context <32k. Most users will not be running even 32k since GPU VRAM is not big enough so leaving rope/yarn off is probably the best default config for the gguf if you can't configure it on model load. |
I know well. Some users, like me, use cheap AMD APUs (Radeon >660M) with plenty of common RAM (>90GB, GTT on Linux), and work beautifully in long context (+64k) in small models (<14B) and MOE (like Qwen3-30B-A3B, on BF16 precision). We don't need fast answers, we can wait for accurate answers. |
Since you have to manually add it to config.json anyway it should probably be added to |
I don't see the mention config in either https://huggingface.co/Qwen/Qwen3-235B-A22B/blob/main/config.json or https://huggingface.co/Qwen/Qwen3-30B-A3B/blob/main/config.json |
That's because it's not, they've consistently disabled it by default for a while now, it's mentioned in the README.md. |
The README was no clear enough and probably there is a misspelled suggested parameter. I realized this some days ago. In config.json, I changed the Example that works with the converter
|
Ah, just looked into it and found out that Thanks for reporting. :) |
Prerequisites
Feature Description
Setting YaRN RoPe scaling from config.json works on Qwen2Model/Qwen3Model, but is missing on Qwen3MoeModel model gguf conversion.
Motivation
Qwen/Qwen3-235B-A22B and Qwen/Qwen3-30B-A3B on HF support YaRN RoPe scaling
Possible Implementation
Not python expert...
On
Qwen2MoeModel class
, in theset_gguf_parameters
add the YaRN Rope Scaling detection and writing.The text was updated successfully, but these errors were encountered: