Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Dec 23, 2025

Ref discussion: #18261 (comment)

@ServeurpersoCom I think we need to add a test with INI preset at some point

Example preset for this PR:

[THUDM/glm-edge-v-5b-gguf:Q4_K_M]
no-mmap = 0
temp = 123.000
autoload = 1
unsafe-allow-api-override = no-mmap,c

And API request:

{
        "model": "THUDM/glm-edge-v-5b-gguf:Q4_K_M",
        "overrides": {"c": "512"}
}

Returns:

{
    "success": true,
    "args": [
        "........../build/bin/llama-server",
        "--host",
        "127.0.0.1",
        "--mmap",
        "--port",
        "65054",
        "--temp",
        "123.000",
        "--alias",
        "THUDM/glm-edge-v-5b-gguf:Q4_K_M",
        "--ctx-size",
        "512",
        "--hf-repo",
        "THUDM/glm-edge-v-5b-gguf:Q4_K_M"
    ]
}

@ServeurpersoCom
Copy link
Collaborator

Interesting! If I understand correctly, you're implementing a filter that lets the router accept cold-start param overrides via the API body (like ctx, n-gpu-layers), with the whitelist in the INI controlling which params can be touched. This gives fine-grained control that can be refined later. Smart approach to transfer the risk to the admin's explicit whitelist!

@ServeurpersoCom
Copy link
Collaborator

ServeurpersoCom commented Dec 23, 2025

I was initially thinking more about hot-reloading the INI by the router to keep control while having the INI modifications handled by the admin. But you're right that since there's already an API, we're already working in two directions at once. We can improve the INI whitelist later, add threshold validation, constraints, etc

In my case, if I expose my WebUI to the internet, I need to open the entire API. So anything admin-related (memory allocation etc) requires authentication. I had initially thought about a simple /admin endpoint prefix. With all-or-nothing access, I'll have to filter at the reverse proxy level, which is not admin-friendly

Ultimately, all we need is an open "user" level as currently exists, and an "admin" level with a password for everything that changes depending on available resources such as memory, etc..

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 23, 2025

I was initially thinking more about hot-reloading the INI by the router to keep control while having the INI modifications handled by the admin.

I think allow reloading INI can be useful, but will be quite complicated due to some edge cases as I specified in #18261 (comment)

We can improve the INI whitelist later, add threshold validation, constraints, etc

IMO this will also be quite over-complicated because:

  • In most cases, wrong param will simply make the subprocess exits immediately
  • Some params requires model info to validate (which requires loading the model)

Re. the /admin endpoint, it is also quite risky in term of security. It's best to avoid any privileged logic in our code base, because there are usually a lot of small issues and edge cases when dealing with privileged access.

Instead, I would transfer the risk to end-user by allowing them to hot-reload the INI. Users can develop their own admin panel that allow modifying INI file, then trigger the hot-reload - they are now responsible for any security risks that may happen in their admin code.

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 23, 2025

A bit more context on why we want to transfer the risk whenever possible: we do have a security advisory program, but many bug reports are AI-generated and they nitpick functionalities like this.

In the context of this PR, we simply transfer the risk of allowing overriding a param to the user. So if they explicitly wrote something like this:

unsafe-allow-api-override = chat-template-file

And someone use it via API with chat-template-file = /etc/passwd

Then now it's their responsibility as they explicitly written the word unsafe themself

@ServeurpersoCom
Copy link
Collaborator

ServeurpersoCom commented Dec 23, 2025

If we code our own backend/WebUI, we don't need the API overrides; we write the INI directly. This feature is more for exposing the llama.cpp API directly to external clients, right?
Or if we're building an API for LAN only, it shouldn't be necessary for the default llama.cpp WebUI to function, right?
Everything that is HTTP should, in principle, be considered accessible from an untrusted environment! but with explicit "unsafe-" it's OK for me (just never enable it on my server, except temporarily to test the PR)....

Also, a separate endpoint/API for "unsafe" items would make it easier for the community to develop admin panels. Mixing everything together isn't ideal; unsafe items should remain localhost (--jinja /etc/passwd).

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 23, 2025

The whole router feature is never meant to be used in untrusted network. A classical DDoS attack vector is just by repeatedly sending load and unload requests.

Instead, this unsafe-allow-api-override is meant to be used by downstream application code, where the application can ask to (re-)launch the model with different config. Like for example, modified context length or GPU layers, like how LM Studio allow to do that from the UI.

@ServeurpersoCom
Copy link
Collaborator

The whole router feature is never meant to be used in untrusted network. A classical DDoS attack vector is just by repeatedly sending load and unload requests.

Yes it's a classic DoS, it doesn't even need to be distributed (DDoS), easy to filter / rate limit

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 23, 2025

If we code our own backend/WebUI, we don't need the API overrides; we write the INI directly.

yeah I agree that this unsafe-allow-api-override is overlapped with INI hot-reload.

main issue is that INI hot-reload have some edge cases so we need to list that out before working on the feature.

tbh I don't know if the current PR is as useful as it is, mainly to address the fact that a lot of people asked that extra_args back. but probably fine to keep this as a draft. to see if people actually need it or not.

@ServeurpersoCom
Copy link
Collaborator

The most important thing is to discuss and test in practice, and automatically we will find the best compromises!

@Chrisischris
Copy link

This looks like what I need! Quick question to confirm: if I use --models-dir with a minimal preset file that just has a global [*] section with unsafe-allow-api-override = c, would that allow context size overrides for all dynamically discovered models?

My use case is loading models with different context sizes without disrupting others already warm in memory, hence why reloading the ini file each time doesn't suffice.

@strawberrymelonpanda
Copy link
Contributor

strawberrymelonpanda commented Dec 24, 2025

My use case is loading models with different context sizes without disrupting others already warm in memory, hence why reloading the ini file each time doesn't suffice.

Good point on that. My VRAM is limited so I tend to just keep one model loaded at a time, which tends to reload quickly.

where the application can ask to (re-)launch the model with different config. Like for example, modified context length or GPU layers, like how LM Studio allow to do that from the UI.

That's basically what I intend to use it for in my local UI, yup. Either from this PR if this feature works out, or maybe just by ini-edits and reload if not.

tbh I don't know if the current PR is as useful as it is, mainly to address the fact that a lot of people asked that extra_args back. but probably fine to keep this as a draft. to see if people actually need it or not.

Personally for me this is a nice-to-have rather than a must have. I just came across it because it was previously in the docs, and I tried it with CTX size, which is something I've wanted to change on-the-fly with llama-server for awhile.

Either way, I appreciate the discussion.

@strawberrymelonpanda
Copy link
Contributor

unsafe-allow-api-override = chat-template-file
And someone use it via API with chat-template-file = /etc/passwd

Oof, that is bad. Just to put it out there though, is there some possible mitigation for this sort of thing?

Like for example setting a root directory that the template file must be under?
I know it'll be labeled unsafe, but just food for thought.

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 24, 2025

Maybe a misunderstanding about INI hot-reload, but the hot-reload doesn't mean it will also reload the model. Just the configs are being reload.

Oof, that is bad. Just to put it out there though, is there some possible mitigation for this sort of thing?

In term of cybersecurity, the best way not to have something bad happen is to not having them in the first place:

  1. Don't use any file-related params in unsafe-allow-api-override
  2. Or, don't have unsafe-allow-api-override in the first place

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants