Webui/prompt processing progress #18300

ServeurpersoCom · 2025-12-22T20:15:34Z

Make sure to read the contributing guidelines before submitting a PR

WebUI: Display prompt preprocessing progress

Integrates existing backend 'return_progress' feature into WebUI to show real-time token processing during prompt preprocessing.

What it does

Displays processing progress before generation starts:

Processing...
↓ (initial chunk arrives immediately)
Processing (0 / 2,007 tokens - 0%)
↓ (batch 1: 128 tokens, ~2s)
Processing (128 / 2,007 tokens - 6% - ETA: 29s)
↓ (batch 2: 128 tokens, ~2s)
Processing (256 / 2,007 tokens - 13% - ETA: 27s)
↓ (batch 3: 128 tokens, ~2s)
Processing (384 / 2,007 tokens - 19% - ETA: 25s)
↓ (batch 4: 128 tokens, ~2s)
Processing (512 / 2,007 tokens - 26% - ETA: 23s)
↓ (continue...)
Processing (1,280 / 2,007 tokens - 64% - ETA: 11s)
↓
Processing (1,792 / 2,007 tokens - 89% - ETA: 3s)
↓
Processing (1,920 / 2,007 tokens - 96% - ETA: 1s)
↓ (final batch)
Processing (2,007 / 2,007 tokens - 100% - ETA: 0s)
↓ (first generation token arrives)
Generating...
Here is my response...
...

Implementation

Frontend: Parses 'prompt_progress' SSE chunks and displays formatted text
Backend: Adds 'return_progress: true' to streaming requests

Testing

Progress updates are sent at batch boundaries. Use smaller batch sizes to see more frequent updates:

./llama-server -m model.gguf -b 128  # Updates every 128 tokens

Then send a long prompt (500+ tokens) via WebUI to observe progress.

Notes

Only shows during preprocessing (before first content token)
Automatically disappears when generation starts
No UI changes needed for short prompts (processed in single batch)

Tested with -b = 128 on GPU and large prompt

PR-18300.mp4

Close #17079

ngxson · 2025-12-22T21:46:44Z

Just a nits improvement, I think showing percentage + ETA instead of elapsed time can be more useful:

Processing (123 / 456 tokens - 27% - ETA: 50s)

tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Address review feedback from ngxson

ServeurpersoCom · 2025-12-23T09:25:45Z

It can still be improved; I don't know if people have prompts that take several minutes, but adding the minutes might be a good idea! (and also we calculate the tokens/s we can display them, but it will bloat, and we already have the final value)
I also need to double-check on CPU to break down the display and see if I can't have NaN or similar. Even with the first chunk #18305

ServeurpersoCom · 2025-12-23T17:34:40Z

I think we're good. Now the client side message "Processing..." is no longer visible.

During first batch :

Next one :

ngxson

Very nice feature!

(May need approval from @allozaur too)

ServeurpersoCom added 2 commits December 22, 2025 21:07

webui: display prompt preprocessing progress

0b9ab41

chore: update webui build output

841fec7

ServeurpersoCom requested a review from allozaur as a code owner December 22, 2025 20:15

loci-dev mentioned this pull request Dec 22, 2025

UPSTREAM PR #18300: Webui/prompt processing progress auroralabs-loci/llama.cpp#666

Open

ngxson reviewed Dec 22, 2025

View reviewed changes

tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Outdated Show resolved Hide resolved

github-actions bot added examples server labels Dec 22, 2025

ServeurpersoCom added 2 commits December 23, 2025 10:02

webui: add percentage/ETA and exclude cached tokens from progress

4e33bb6

Address review feedback from ngxson

chore: update webui build output

9fe9f1d

ServeurpersoCom added 2 commits December 23, 2025 18:41

webui: add minutes and first chunk (0%) case

a949477

chore: update webui build output

2b43768

ngxson approved these changes Dec 23, 2025

View reviewed changes

ServeurpersoCom mentioned this pull request Dec 24, 2025

Feature Request: webui: add parsing progress #17079

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Webui/prompt processing progress #18300

Webui/prompt processing progress #18300

ServeurpersoCom commented Dec 22, 2025 •

edited

Loading

Uh oh!

ngxson commented Dec 22, 2025

Uh oh!

Uh oh!

ServeurpersoCom commented Dec 23, 2025 •

edited

Loading

Uh oh!

ServeurpersoCom commented Dec 23, 2025 •

edited

Loading

Uh oh!

ngxson left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Webui/prompt processing progress #18300

Are you sure you want to change the base?

Webui/prompt processing progress #18300

Conversation

ServeurpersoCom commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

WebUI: Display prompt preprocessing progress

What it does

Implementation

Testing

Notes

Tested with -b = 128 on GPU and large prompt

Uh oh!

ngxson commented Dec 22, 2025

Uh oh!

Uh oh!

ServeurpersoCom commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ServeurpersoCom commented Dec 22, 2025 •

edited

Loading

ServeurpersoCom commented Dec 23, 2025 •

edited

Loading

ServeurpersoCom commented Dec 23, 2025 •

edited

Loading