model: support MiMo-V2-Flash #18328

ngxson · 2025-12-23T18:15:15Z

Ref HF model: https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash

I'm interested in this model because they are the second one to use attention sink (after GPT-OSS)

Test using Q8_0 model (tested up to ~4K tokens):

$ llama-cli -m ../models/MiMo-V2-Flash/modelq.gguf -c 8000 -p "hi"

Loading model...  


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b7534-d4a3c4d41
model      : modelq.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> hi

[Start thinking]
We are going to create a simple "hi" response in multiple languages.
 The user just said "hi", so we will respond with "hi" in a few different languages.
 We'll create a list of common languages and their translation for "hi".
 Then we will output each one in a formatted way.
[End thinking]

Hello! Here's a friendly "hello" in several languages:

🌍 **Common Greetings**:
- **English**: Hi  
- **Spanish**: Hola  
- **French**: Bonjour  
- **German**: Hallo  
- **Italian**: Ciao  
- **Japanese**: こんにちは (Konnichiwa)  
- **Korean**: 안녕하세요 (Annyeonghaseyo)  
- **Arabic**: مرحبًا (Marhaban)  
- **Hindi**: नमस्ते (Namaste)  
- **Portuguese**: Olá  

**Fun fact**: The earliest recorded use of "hello" in English dates back to 1827!  😊  

How can I assist you today?

[ Prompt: 4431.1 t/s | Generation: 31.3 t/s ]

Aaryan-Kapoor · 2025-12-24T00:32:30Z

I opened a follow‑up PR with the fixes here: #18333
Please merge that PR (or cherry‑pick its commit(s)) into this one if you prefer.

CISC · 2025-12-24T11:21:12Z

Interesting, is this the first model that actually uses a non-step SWA pattern?

ngxson · 2025-12-24T11:25:55Z

@CISC it still uses repeated the SWA pattern, but the actual config hybrid_block_size to control the pattern is not written into config.json: https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash/blob/main/configuration_mimo_v2_flash.py#L93-L96

So I think it will be cleaner to just use whatever already inside config.json, more and more models also do the same thing now

mimo2: wire RMS eps + MoE bias + converter guards

ngxson · 2025-12-24T11:28:25Z

@Aaryan-Kapoor thanks, I merged your commit here

Co-authored-by: Aaryan-Kapoor <[email protected]>

CISC · 2025-12-24T11:37:38Z

@CISC it still uses repeated the SWA pattern, but the actual config hybrid_block_size to control the pattern is not written into config.json: https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash/blob/main/configuration_mimo_v2_flash.py#L93-L96

Ah, it's just step with dense first.

ngxson · 2025-12-24T14:47:55Z

Logits matching against vllm (thanks @bartowski1182 for the vllm test), pretty close on long context:

| idx  | llamacpp.log   | logprob_1 | vllm.log       | logprob_2 | diff (abs) |
|------|----------------|-----------|----------------|-----------|------------|
| 1    | '//'           | -2.9061   | ' \n'          | -4.2447   | 1.3386     |
| 2    | ' here'        | -0.8739   | ' Here'        | -1.3667   | 0.4927     |
| 3    | ' expert'      | -0.9182   | ' expert'      | -0.8108   | 0.1073     |
| 4    | ' AI'          | -0.0318   | ' AI'          | -0.0167   | 0.0150     |
| 5    | ' assistant'   | -0.3031   | ' assistant'   | -0.1606   | 0.1425     |
| 6    | ' designed'    | -0.5695   | '.'            | -1.3440   | 0.7744     |
| 7    | ' of'          | -0.0011   | ' of'          | -0.0000   | 0.0011     |
| 8    | ' text'        | -1.4561   | '1'            | -0.1285   | 1.3277     |
| 9    | ' tools'       | -0.0513   | ' tools'       | -0.0492   | 0.0021     |
| 10   | ' to'          | -0.5703   | '.'            | -0.7627   | 0.1924     |
| 1011 | ' you'         | -0.0063   | ' you'         | -0.0005   | 0.0058     |
| 1012 | ' need'        | -0.0010   | ' need'        | -0.0001   | 0.0009     |
| 1013 | ' to'          | -0.0032   | ' to'          | -0.0018   | 0.0014     |
| 1014 | ' use'         | -0.0023   | ' use'         | -0.0141   | 0.0118     |
| 1015 | ' a'           | -0.0027   | ' a'           | -0.0044   | 0.0017     |
| 1016 | ' tool'        | -0.0144   | '1'            | -0.1830   | 0.1686     |
| 1017 | ' output'      | -0.0052   | ' output'      | -0.0002   | 0.0049     |
| 1018 | ' the'         | -0.0087   | ' the'         | -0.0014   | 0.0073     |
| 1019 | ' call'        | -0.0085   | ' call'        | -0.0021   | 0.0065     |
| 1020 | ' in'          | -0.0044   | ' in'          | -0.0004   | 0.0040     |
| 5021 | ' requires'    | -0.0002   | ' requires'    | -0.0000   | 0.0002     |
| 5022 | ' external'    | -0.0002   | ' external'    | -0.0000   | 0.0002     |
| 5023 | ' data'        | -0.0008   | ' data'        | -0.0001   | 0.0007     |
| 5024 | ' computation' | -0.0140   | ' computation' | -0.0022   | 0.0118     |
| 5025 | ' or'          | -0.0001   | ' or'          | -0.0000   | 0.0000     |
| 5026 | ' actions'     | -0.0002   | ' actions'     | -0.0000   | 0.0002     |
| 5027 | ' beyond'      | -0.0001   | ' beyond'      | -0.0000   | 0.0001     |
| 5028 | ' your'        | -0.0040   | ' your'        | -0.0001   | 0.0040     |
| 5029 | ' internal'    | -0.0005   | ' internal'    | -0.0000   | 0.0004     |
| 5030 | ' knowledge'   | -0.0022   | ' knowledge'   | -0.0000   | 0.0021     |

I'm still not quite sure where comes the differences though, but I think this PR is ready to merge - fixes can be added without breaking the existing GGUF

CISC · 2025-12-24T15:05:38Z

src/llama-model.cpp

+                ml.get_key(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS, hparams.f_norm_rms_eps);
+
+                hparams.swa_type = LLAMA_SWA_TYPE_STANDARD;
+                hparams.rope_freq_base_train_swa = 10000.0f;


This can be saved to metadata now with add_rope_freq_base_swa.

Added in 0cd227f

@bartowski1182 I'm reconverting the GGUF on my side, will merge this as soon as I can confirm that it still works (feel free to also run the conversion on your side)

tested on my side and results are unchanged:

| idx | llamacpp.log | logprob_1 | vllm.log | logprob_2 | diff (abs) | |------|----------------|-----------|----------------|-----------|------------| | 1 | '//' | -2.9061 | ' \n' | -4.2447 | 1.3386 | | 2 | ' here' | -0.8739 | ' Here' | -1.3667 | 0.4927 | | 3 | ' expert' | -0.9182 | ' expert' | -0.8108 | 0.1073 | | 4 | ' AI' | -0.0318 | ' AI' | -0.0167 | 0.0150 | | 5 | ' assistant' | -0.3031 | ' assistant' | -0.1606 | 0.1425 | | 6 | ' designed' | -0.5695 | '.' | -1.3440 | 0.7744 | | 7 | ' of' | -0.0011 | ' of' | -0.0000 | 0.0011 | | 8 | ' text' | -1.4561 | '1' | -0.1285 | 1.3277 | | 9 | ' tools' | -0.0513 | ' tools' | -0.0492 | 0.0021 | | 10 | ' to' | -0.5703 | '.' | -0.7627 | 0.1924 | | 1011 | ' you' | -0.0063 | ' you' | -0.0005 | 0.0058 | | 1012 | ' need' | -0.0010 | ' need' | -0.0001 | 0.0009 | | 1013 | ' to' | -0.0032 | ' to' | -0.0018 | 0.0014 | | 1014 | ' use' | -0.0023 | ' use' | -0.0141 | 0.0118 | | 1015 | ' a' | -0.0027 | ' a' | -0.0044 | 0.0017 | | 1016 | ' tool' | -0.0144 | '1' | -0.1830 | 0.1686 | | 1017 | ' output' | -0.0052 | ' output' | -0.0002 | 0.0049 | | 1018 | ' the' | -0.0087 | ' the' | -0.0014 | 0.0073 | | 1019 | ' call' | -0.0085 | ' call' | -0.0021 | 0.0065 | | 1020 | ' in' | -0.0044 | ' in' | -0.0004 | 0.0040 | | 5021 | ' requires' | -0.0002 | ' requires' | -0.0000 | 0.0002 | | 5022 | ' external' | -0.0002 | ' external' | -0.0000 | 0.0002 | | 5023 | ' data' | -0.0008 | ' data' | -0.0001 | 0.0007 | | 5024 | ' computation' | -0.0140 | ' computation' | -0.0022 | 0.0118 | | 5025 | ' or' | -0.0001 | ' or' | -0.0000 | 0.0000 | | 5026 | ' actions' | -0.0002 | ' actions' | -0.0000 | 0.0002 | | 5027 | ' beyond' | -0.0001 | ' beyond' | -0.0000 | 0.0001 | | 5028 | ' your' | -0.0040 | ' your' | -0.0001 | 0.0040 | | 5029 | ' internal' | -0.0005 | ' internal' | -0.0000 | 0.0004 | | 5030 | ' knowledge' | -0.0022 | ' knowledge' | -0.0000 | 0.0021 |

ngxson added 5 commits December 23, 2025 18:04

mimov2: convert ok

eb2cee1

rename mimov2 --> mimo2

bd80650

fix conversion

86e8d1f

runnable not incorrect

85937f9

use sink

4e81016

github-actions bot added model Model specific python python script changes labels Dec 23, 2025

add_sliding_window_pattern

09d3df9

loci-dev mentioned this pull request Dec 23, 2025

UPSTREAM PR #18328: model: support MiMo-V2-Flash auroralabs-loci/llama.cpp#677

Open

ngxson added 5 commits December 23, 2025 21:32

add swa and per-layer n_head_kv

db8afa6

correct params

237dfad

somewhat working

5558478

correct gating func

3bf8f23

nits

d4a3c4d

ngxson marked this pull request as ready for review December 23, 2025 22:41

ngxson requested review from CISC and ggerganov as code owners December 23, 2025 22:41

mimo2: wire RMS eps + MoE bias + converter guards

a5c5495

Merge pull request #69 from Aaryan-Kapoor/pr-18328

0f24c3b

mimo2: wire RMS eps + MoE bias + converter guards

add co-author

e476126

Co-authored-by: Aaryan-Kapoor <[email protected]>

Merge branch 'master' into xsn/xiaomi_mimo_v2

d6f4533

CISC approved these changes Dec 24, 2025

View reviewed changes

use add_rope_freq_base_swa

0cd227f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

model: support MiMo-V2-Flash #18328

model: support MiMo-V2-Flash #18328

ngxson commented Dec 23, 2025 •

edited

Loading

Uh oh!

Aaryan-Kapoor commented Dec 24, 2025

Uh oh!

CISC commented Dec 24, 2025

Uh oh!

ngxson commented Dec 24, 2025

Uh oh!

ngxson commented Dec 24, 2025

Uh oh!

CISC commented Dec 24, 2025

Uh oh!

ngxson commented Dec 24, 2025

Uh oh!

CISC Dec 24, 2025

Uh oh!

ngxson Dec 24, 2025

Uh oh!

ngxson Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

model: support MiMo-V2-Flash #18328

Are you sure you want to change the base?

model: support MiMo-V2-Flash #18328

Conversation

ngxson commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Aaryan-Kapoor commented Dec 24, 2025

Uh oh!

CISC commented Dec 24, 2025

Uh oh!

ngxson commented Dec 24, 2025

Uh oh!

ngxson commented Dec 24, 2025

Uh oh!

CISC commented Dec 24, 2025

Uh oh!

ngxson commented Dec 24, 2025

Uh oh!

CISC Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ngxson commented Dec 23, 2025 •

edited

Loading