Skip to content

Conversation

@mudler
Copy link
Owner

@mudler mudler commented Nov 7, 2025

Description

This PR binds the token generation to the request context, and for llama.cpp it implements job cancellation.

It also adds the stop icon now in place of the loading icon, that will abort the request.

Schermata del 2025-11-08 18-58-43

Notes for Reviewers

Fixes: #974

Signed commits

  • Yes, I signed my commits.

Signed-off-by: Ettore Di Giacinto <[email protected]>
@netlify
Copy link

netlify bot commented Nov 7, 2025

Deploy Preview for localai ready!

Name Link
🔨 Latest commit 4839d57
🔍 Latest deploy log https://app.netlify.com/projects/localai/deploys/6910be954e2384000832362a
😎 Deploy Preview https://deploy-preview-7187--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@mudler
Copy link
Owner Author

mudler commented Nov 8, 2025

Seems we can't propagate client disconnection during non-SSE requests due to
valyala/fasthttp#468 , also affects go-fiber: gofiber/fiber#1718

@mudler
Copy link
Owner Author

mudler commented Nov 8, 2025

just as a note, echo doesn't have such issues: labstack/echo#1581

@mudler mudler force-pushed the feat/context_grpc branch from 578b94f to 6cc3285 Compare November 8, 2025 17:34
Signed-off-by: Ettore Di Giacinto <[email protected]>
@mudler mudler force-pushed the feat/context_grpc branch from 6cc3285 to 38125d9 Compare November 8, 2025 17:43
Signed-off-by: Ettore Di Giacinto <[email protected]>
@mudler mudler added the enhancement New feature or request label Nov 8, 2025
@mudler mudler changed the title feat(llama.cpp): respect context feat: respect context and add request cancellation Nov 8, 2025
Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
@mudler mudler force-pushed the feat/context_grpc branch 2 times, most recently from 53d8f39 to 89e9874 Compare November 9, 2025 08:14
Signed-off-by: Ettore Di Giacinto <[email protected]>
@mudler mudler force-pushed the feat/context_grpc branch from 89e9874 to 64519a6 Compare November 9, 2025 08:23
go func() {
defer func() {
// Clear read deadline when goroutine exits
conn.SetReadDeadline(time.Time{})

Check warning

Code scanning / gosec

Errors unhandled Warning

Errors unhandled
case <-ticker.C:
// Set a short deadline - if connection is closed, read will fail immediately
// If connection is open but no data, it will timeout and we check again
conn.SetReadDeadline(time.Now().Add(50 * time.Millisecond))

Check warning

Code scanning / gosec

Errors unhandled Warning

Errors unhandled
Signed-off-by: Ettore Di Giacinto <[email protected]>
@mudler mudler force-pushed the feat/context_grpc branch from 64519a6 to 4839d57 Compare November 9, 2025 16:17
@mudler mudler merged commit 679d43c into master Nov 9, 2025
36 of 37 checks passed
@mudler mudler deleted the feat/context_grpc branch November 9, 2025 17:19
@mudler
Copy link
Owner Author

mudler commented Nov 9, 2025

found an ugly workaround, but works for our case. Would be nice if fasthttp supports this natively, but I guess for now that's the only way we can tackle this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ability to abort streaming completion

2 participants