Feature Request: Support YaRN RoPE Scaling on Qwen2MoeModel/Qwen3MoeModel models on convert_hf_to_gguf.py #13322

rjmalagon · 2025-05-05T21:17:01Z

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Setting YaRN RoPe scaling from config.json works on Qwen2Model/Qwen3Model, but is missing on Qwen3MoeModel model gguf conversion.

Motivation

Qwen/Qwen3-235B-A22B and Qwen/Qwen3-30B-A3B on HF support YaRN RoPe scaling

Possible Implementation

Not python expert...
On Qwen2MoeModel class, in the set_gguf_parameters add the YaRN Rope Scaling detection and writing.

self._try_set_pooling_type()
if self.hparams.get("rope_scaling") is not None and "factor" in self.hparams["rope_scaling"]:
    if self.hparams["rope_scaling"].get("type") == "yarn":
        self.gguf_writer.add_rope_scaling_type(gguf.RopeScalingType.YARN)
        self.gguf_writer.add_rope_scaling_factor(self.hparams["rope_scaling"]["factor"])
        self.gguf_writer.add_rope_scaling_orig_ctx_len(self.hparams["rope_scaling"]["original_max_position_embeddings"])

The text was updated successfully, but these errors were encountered:

rjmalagon · 2025-05-05T21:49:13Z

In an ugly copy and paste code on convert_hf_to_gguf.py, It seems to work

    qwen3moe.rope.freq_base                          1e+06               
    qwen3moe.rope.scaling.factor                     4                   
    qwen3moe.rope.scaling.original_context_length    32768               
    qwen3moe.rope.scaling.type                       yarn

steampunque · 2025-05-05T23:42:44Z

In an ugly copy and paste code on convert_hf_to_gguf.py, It seems to work

    qwen3moe.rope.freq_base                          1e+06               
    qwen3moe.rope.scaling.factor                     4                   
    qwen3moe.rope.scaling.original_context_length    32768               
    qwen3moe.rope.scaling.type                       yarn

I don't rely on this stuff being inside gguf in my model loader. You can set the parameters at load time so you know they will be right :

--rope-scaling {none,linear,yarn} RoPE frequency scaling method, defaults
the model
(env: LLAMA_ARG_ROPE_SCALING_TYPE)
--rope-scale N RoPE context scaling factor, expands co
(env: LLAMA_ARG_ROPE_SCALE)
--rope-freq-base N RoPE base frequency, used by NTK-aware
model)
(env: LLAMA_ARG_ROPE_FREQ_BASE)
--rope-freq-scale N RoPE frequency scaling factor, expands
(env: LLAMA_ARG_ROPE_FREQ_SCALE)
--rope-yarn-log-mul N RoPE yarn log mul
(env: LLAMA_ARG_ROPE_FREQ_SCALE)
--yarn-orig-ctx N YaRN: original context size of model (d
context size)
(env: LLAMA_ARG_YARN_ORIG_CTX)

As far as my understanding with yarn goes you need to set the scaling factor to the length of KV you have specified divided by the original context length anyway. Thus if you fire up the model with less than 32768 KV turn yarn off with --rope-scale none. If you fire up the model with KV > 32768 then turn on yarn, set freq base and original context length as specified by model, and set --rope-scale as KV / 32768 (fractional value) at model load time.

rjmalagon · 2025-05-06T00:34:28Z

You are right. But we don't have that feature on Ollama.

steampunque · 2025-05-06T01:42:43Z

You are right. But we don't have that feature on Ollama.

It will be running degraded if you turn on yarn and run any context <32k. Most users will not be running even 32k since GPU VRAM is not big enough so leaving rope/yarn off is probably the best default config for the gguf if you can't configure it on model load.

rjmalagon · 2025-05-06T02:47:40Z

I know well. Some users, like me, use cheap AMD APUs (Radeon >660M) with plenty of common RAM (>90GB, GTT on Linux), and work beautifully in long context (+64k) in small models (<14B) and MOE (like Qwen3-30B-A3B, on BF16 precision).

We don't need fast answers, we can wait for accurate answers.

CISC · 2025-05-06T06:50:56Z

Since you have to manually add it to config.json anyway it should probably be added to convert_hf_to_gguf.py to simplify things for those making GGUFs, I'll make a PR.

ngxson · 2025-05-06T07:49:32Z

I don't see the mention config in either https://huggingface.co/Qwen/Qwen3-235B-A22B/blob/main/config.json or https://huggingface.co/Qwen/Qwen3-30B-A3B/blob/main/config.json

CISC · 2025-05-06T07:57:40Z

I don't see the mention config

That's because it's not, they've consistently disabled it by default for a while now, it's mentioned in the README.md.

rjmalagon · 2025-05-07T01:39:01Z

The README was no clear enough and probably there is a misspelled suggested parameter.

I realized this some days ago.

In config.json, I changed the "rope_type": "yarn", part, to "type": "yarn",

Example that works with the converter

  "rope_scaling": {
    "type": "yarn",
    "factor": 2.0,
    "original_max_position_embeddings": 32768
  }

CISC · 2025-05-07T06:36:56Z

In config.json, I changed the "rope_type": "yarn", part, to "type": "yarn",

Ah, just looked into it and found out that transformers has renamed this parameter at some point, so we need to support both, I'll fix.

Thanks for reporting. :)

rjmalagon added the enhancement New feature or request label May 5, 2025

CISC linked a pull request May 6, 2025 that will close this issue

convert : qwen2/3moe : set yarn metadata if present #13331

Merged

CISC closed this as completed in #13331 May 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Support YaRN RoPE Scaling on Qwen2MoeModel/Qwen3MoeModel models on convert_hf_to_gguf.py #13322

Feature Request: Support YaRN RoPE Scaling on Qwen2MoeModel/Qwen3MoeModel models on convert_hf_to_gguf.py #13322

rjmalagon commented May 5, 2025

rjmalagon commented May 5, 2025

Uh oh!

steampunque commented May 5, 2025

Uh oh!

rjmalagon commented May 6, 2025

Uh oh!

steampunque commented May 6, 2025

Uh oh!

rjmalagon commented May 6, 2025

Uh oh!

CISC commented May 6, 2025

Uh oh!

ngxson commented May 6, 2025

Uh oh!

CISC commented May 6, 2025

Uh oh!

rjmalagon commented May 7, 2025 •

edited

Loading

Uh oh!

CISC commented May 7, 2025

Uh oh!

Feature Request: Support YaRN RoPE Scaling on Qwen2MoeModel/Qwen3MoeModel models on convert_hf_to_gguf.py #13322

Feature Request: Support YaRN RoPE Scaling on Qwen2MoeModel/Qwen3MoeModel models on convert_hf_to_gguf.py #13322

Comments

rjmalagon commented May 5, 2025

Prerequisites

Feature Description

Motivation

Possible Implementation

rjmalagon commented May 5, 2025

Uh oh!

steampunque commented May 5, 2025

Uh oh!

rjmalagon commented May 6, 2025

Uh oh!

steampunque commented May 6, 2025

Uh oh!

rjmalagon commented May 6, 2025

Uh oh!

CISC commented May 6, 2025

Uh oh!

ngxson commented May 6, 2025

Uh oh!

CISC commented May 6, 2025

Uh oh!

rjmalagon commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented May 7, 2025

Uh oh!

rjmalagon commented May 7, 2025 •

edited

Loading