server: (preset) add `unsafe-allow-api-override` #18322

ngxson · 2025-12-23T11:05:33Z

@ServeurpersoCom I think we need to add a test with INI preset at some point

Example preset for this PR:

[THUDM/glm-edge-v-5b-gguf:Q4_K_M]
no-mmap = 0
temp = 123.000
autoload = 1
unsafe-allow-api-override = no-mmap,c

And API request:

{
        "model": "THUDM/glm-edge-v-5b-gguf:Q4_K_M",
        "overrides": {"c": "512"}
}

Returns:

{
    "success": true,
    "args": [
        "........../build/bin/llama-server",
        "--host",
        "127.0.0.1",
        "--mmap",
        "--port",
        "65054",
        "--temp",
        "123.000",
        "--alias",
        "THUDM/glm-edge-v-5b-gguf:Q4_K_M",
        "--ctx-size",
        "512",
        "--hf-repo",
        "THUDM/glm-edge-v-5b-gguf:Q4_K_M"
    ]
}

ServeurpersoCom · 2025-12-23T11:26:19Z

Interesting! If I understand correctly, you're implementing a filter that lets the router accept cold-start param overrides via the API body (like ctx, n-gpu-layers), with the whitelist in the INI controlling which params can be touched. This gives fine-grained control that can be refined later. Smart approach to transfer the risk to the admin's explicit whitelist!

ServeurpersoCom · 2025-12-23T11:30:03Z

I was initially thinking more about hot-reloading the INI by the router to keep control while having the INI modifications handled by the admin. But you're right that since there's already an API, we're already working in two directions at once. We can improve the INI whitelist later, add threshold validation, constraints, etc

In my case, if I expose my WebUI to the internet, I need to open the entire API. So anything admin-related (memory allocation etc) requires authentication. I had initially thought about a simple /admin endpoint prefix. With all-or-nothing access, I'll have to filter at the reverse proxy level, which is not admin-friendly

Ultimately, all we need is an open "user" level as currently exists, and an "admin" level with a password for everything that changes depending on available resources such as memory, etc..

ngxson · 2025-12-23T11:59:05Z

I was initially thinking more about hot-reloading the INI by the router to keep control while having the INI modifications handled by the admin.

I think allow reloading INI can be useful, but will be quite complicated due to some edge cases as I specified in #18261 (comment)

We can improve the INI whitelist later, add threshold validation, constraints, etc

IMO this will also be quite over-complicated because:

In most cases, wrong param will simply make the subprocess exits immediately
Some params requires model info to validate (which requires loading the model)

Re. the /admin endpoint, it is also quite risky in term of security. It's best to avoid any privileged logic in our code base, because there are usually a lot of small issues and edge cases when dealing with privileged access.

Instead, I would transfer the risk to end-user by allowing them to hot-reload the INI. Users can develop their own admin panel that allow modifying INI file, then trigger the hot-reload - they are now responsible for any security risks that may happen in their admin code.

ngxson · 2025-12-23T12:07:22Z

A bit more context on why we want to transfer the risk whenever possible: we do have a security advisory program, but many bug reports are AI-generated and they nitpick functionalities like this.

In the context of this PR, we simply transfer the risk of allowing overriding a param to the user. So if they explicitly wrote something like this:

unsafe-allow-api-override = chat-template-file

And someone use it via API with chat-template-file = /etc/passwd

Then now it's their responsibility as they explicitly written the word unsafe themself

ServeurpersoCom · 2025-12-23T12:16:41Z

If we code our own backend/WebUI, we don't need the API overrides; we write the INI directly. This feature is more for exposing the llama.cpp API directly to external clients, right?
Or if we're building an API for LAN only, it shouldn't be necessary for the default llama.cpp WebUI to function, right?
Everything that is HTTP should, in principle, be considered accessible from an untrusted environment! but with explicit "unsafe-" it's OK for me (just never enable it on my server, except temporarily to test the PR)....

Also, a separate endpoint/API for "unsafe" items would make it easier for the community to develop admin panels. Mixing everything together isn't ideal; unsafe items should remain localhost (--jinja /etc/passwd).

ngxson · 2025-12-23T12:26:20Z

The whole router feature is never meant to be used in untrusted network. A classical DDoS attack vector is just by repeatedly sending load and unload requests.

Instead, this unsafe-allow-api-override is meant to be used by downstream application code, where the application can ask to (re-)launch the model with different config. Like for example, modified context length or GPU layers, like how LM Studio allow to do that from the UI.

ServeurpersoCom · 2025-12-23T12:29:06Z

The whole router feature is never meant to be used in untrusted network. A classical DDoS attack vector is just by repeatedly sending load and unload requests.

Yes it's a classic DoS, it doesn't even need to be distributed (DDoS), easy to filter / rate limit

ngxson · 2025-12-23T12:37:58Z

If we code our own backend/WebUI, we don't need the API overrides; we write the INI directly.

yeah I agree that this unsafe-allow-api-override is overlapped with INI hot-reload.

main issue is that INI hot-reload have some edge cases so we need to list that out before working on the feature.

tbh I don't know if the current PR is as useful as it is, mainly to address the fact that a lot of people asked that extra_args back. but probably fine to keep this as a draft. to see if people actually need it or not.

ServeurpersoCom · 2025-12-23T12:42:11Z

The most important thing is to discuss and test in practice, and automatically we will find the best compromises!

Chrisischris · 2025-12-23T22:13:27Z

This looks like what I need! Quick question to confirm: if I use --models-dir with a minimal preset file that just has a global [*] section with unsafe-allow-api-override = c, would that allow context size overrides for all dynamically discovered models?

My use case is loading models with different context sizes without disrupting others already warm in memory, hence why reloading the ini file each time doesn't suffice.

strawberrymelonpanda · 2025-12-24T06:38:25Z

My use case is loading models with different context sizes without disrupting others already warm in memory, hence why reloading the ini file each time doesn't suffice.

Good point on that. My VRAM is limited so I tend to just keep one model loaded at a time, which tends to reload quickly.

where the application can ask to (re-)launch the model with different config. Like for example, modified context length or GPU layers, like how LM Studio allow to do that from the UI.

That's basically what I intend to use it for in my local UI, yup. Either from this PR if this feature works out, or maybe just by ini-edits and reload if not.

tbh I don't know if the current PR is as useful as it is, mainly to address the fact that a lot of people asked that extra_args back. but probably fine to keep this as a draft. to see if people actually need it or not.

Personally for me this is a nice-to-have rather than a must have. I just came across it because it was previously in the docs, and I tried it with CTX size, which is something I've wanted to change on-the-fly with llama-server for awhile.

Either way, I appreciate the discussion.

strawberrymelonpanda · 2025-12-24T06:50:36Z

unsafe-allow-api-override = chat-template-file
And someone use it via API with chat-template-file = /etc/passwd

Oof, that is bad. Just to put it out there though, is there some possible mitigation for this sort of thing?

Like for example setting a root directory that the template file must be under?
I know it'll be labeled unsafe, but just food for thought.

ngxson · 2025-12-24T12:34:54Z

Maybe a misunderstanding about INI hot-reload, but the hot-reload doesn't mean it will also reload the model. Just the configs are being reload.

Oof, that is bad. Just to put it out there though, is there some possible mitigation for this sort of thing?

In term of cybersecurity, the best way not to have something bad happen is to not having them in the first place:

Don't use any file-related params in unsafe-allow-api-override
Or, don't have unsafe-allow-api-override in the first place

server: (preset) add unsafe-allow-api-override

878b63f

ngxson requested a review from ServeurpersoCom December 23, 2025 11:05

ngxson mentioned this pull request Dec 23, 2025

[Mirror] server: (preset) add unsafe-allow-api-override ngxson/llama.cpp#68

Open

github-actions bot added examples server labels Dec 23, 2025

loci-dev mentioned this pull request Dec 23, 2025

UPSTREAM PR #18322: server: (preset) add unsafe-allow-api-override auroralabs-loci/llama.cpp#673

Open

server: (preset) add unsafe-allow-api-override #18322

Are you sure you want to change the base?

server: (preset) add unsafe-allow-api-override #18322

Conversation

ngxson commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Dec 23, 2025

Uh oh!

ServeurpersoCom commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Dec 23, 2025

Uh oh!

ngxson commented Dec 23, 2025

Uh oh!

ServeurpersoCom commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Dec 23, 2025

Uh oh!

ServeurpersoCom commented Dec 23, 2025

Uh oh!

ngxson commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Dec 23, 2025

Uh oh!

Chrisischris commented Dec 23, 2025

Uh oh!

strawberrymelonpanda commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

strawberrymelonpanda commented Dec 24, 2025

Uh oh!

ngxson commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

server: (preset) add `unsafe-allow-api-override` #18322

server: (preset) add `unsafe-allow-api-override` #18322

ngxson commented Dec 23, 2025 •

edited

Loading

ServeurpersoCom commented Dec 23, 2025 •

edited

Loading

ServeurpersoCom commented Dec 23, 2025 •

edited

Loading

ngxson commented Dec 23, 2025 •

edited

Loading

strawberrymelonpanda commented Dec 24, 2025 •

edited

Loading

ngxson commented Dec 24, 2025 •

edited

Loading