-
Notifications
You must be signed in to change notification settings - Fork 14.2k
server: implement --shutdown-timeout #18292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'm putting this branch directly in production; we'll see how it performs under real-use conditions and stress testing. If I understand correctly, no child process should get stuck, so the router will continue functioning normally. I'll also test with GLM Air 4.5 on large contexts where I've had hanging issues |
|
As it stands, I can no longer zombify my llama-server router with the layer 7 DoS script. I haven't seen any regressions in basic usage |
|
Thanks for testing! I feel like there can be different (equivalent) approaches to implement this functionality, so probably will need to ask @ggerganov for review when he come back (no rush btw) |
Currently, do we know other cases in which a child can become unresponsive? |
Another case that I can think of is when model loading take too long (the This case may not happen in CLI application, as Ctrl+C at this stage will force-terminate it (the SIGINT handler is not yet registered). Edit: while writing this, I realized that the solution proposed in this PR doesn't address this case; Do |
|
I know a pretty annoying case on my Linux box: if for some reason a child crashes (with SIGSEGV) and the system appoints a core dump collection service to get a core dump. This can actually hang the entire box. |
|
@pwilkin Are you talking about the router mode? This PR targets the single-model mode, so it won't work in such case. I'm now thinking it's probably better to implement this feature in router mode only. |
Yes, probably makes sense mostly for router mode. For single-model mode, I just press Ctrl+C two times and it stops.
Not atm. |
|
@ngxson Just mentioning a hang case I had - TBH, just router mode is fine, because the most annoying cases when this happens are cases where it happens to a spawned process and I'm not aware of it and can't kill it in time. |
Fix #18237
This PR introduces
--shutdown-timeoutfor forcefully terminate the server after N seconds of waiting since the first shutdown request received (i.e. SIGINT, SIGTERM)The most commonly known use case is when
update_slots()is being stuck on a large batch of tokens that can take minutes to finish.NOTE: this feature works on both router and single-model modes
How I tested this PR:
-ngl 0 -t 1to make it super slow