Misc. bug: model warmup doesn't work correctly for MoE models

### Name and Version

build: 4449 (8a1d9c25) with cc (Debian 13.3.0-11) 13.3.0 for x86_64-linux-gnu

### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

llama-cli

### Command line

```shell
./build/bin/llama-cli -m ds3-q8.gguf -t 128 --numa distribute -c 8192 -ngl 0 --interactive-first --chat-template deepseek3
```


### Problem description & steps to reproduce

If I load a dense model, it will warmup the model correctly, loading the whole thing into OS cache.

However, if I load a big MoE in (eg. deepseek 3), it will only load a small portion (93GB/660GB)

I tested this and made an inefficient bruteforce patch to common.cpp:
```
>             if (decoder_start_token_id == -1) {
995,1003c992,993
<             printf("decoding warmup tokens.");
<             for (int i = 1; i <256 ; i++) {
<                 llama_decode(lctx, llama_batch_get_one(tmp.data(), std::min(tmp.size(), (size_t) params.n_batch)));
<                 tmp.clear();
<                 tmp.push_back(i);
<                 printf(".");
<             }
<         } else { LOG_WRN("No Decoder Present. Warmup impossible"); }
<         printf("\n");
```

The benefit falls off sharply with the number of llama_decode() calls. e.g. With 256 calls it gets 540GB of the model loaded. 1024 gets 620.

I think that ideally this function would detect the number of experts and call a function that would choose a single token through each expert via the router (this may need a function other than llama_decode that is expert router aware?)

I could probably make a good PR for this with some guidance.

### First Bad Commit

This has never worked afaik

### Relevant log output

```shell
No logging for this problem. Need to watch OS cache usage with a tool.
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: model warmup doesn't work correctly for MoE models #11163

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: model warmup doesn't work correctly for MoE models #11163

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions