Skip to content

Conversation

@ServeurpersoCom
Copy link
Collaborator

@ServeurpersoCom ServeurpersoCom commented Dec 22, 2025

Make sure to read the contributing guidelines before submitting a PR

WebUI: Display prompt preprocessing progress

Integrates existing backend 'return_progress' feature into WebUI to show real-time token processing during prompt preprocessing.

What it does

Displays processing progress before generation starts:

Processing...
↓ (initial chunk arrives immediately)
Processing (0 / 2,007 tokens - 0%)
↓ (batch 1: 128 tokens, ~2s)
Processing (128 / 2,007 tokens - 6% - ETA: 29s)
↓ (batch 2: 128 tokens, ~2s)
Processing (256 / 2,007 tokens - 13% - ETA: 27s)
↓ (batch 3: 128 tokens, ~2s)
Processing (384 / 2,007 tokens - 19% - ETA: 25s)
↓ (batch 4: 128 tokens, ~2s)
Processing (512 / 2,007 tokens - 26% - ETA: 23s)
↓ (continue...)
Processing (1,280 / 2,007 tokens - 64% - ETA: 11s)
↓
Processing (1,792 / 2,007 tokens - 89% - ETA: 3s)
↓
Processing (1,920 / 2,007 tokens - 96% - ETA: 1s)
↓ (final batch)
Processing (2,007 / 2,007 tokens - 100% - ETA: 0s)
↓ (first generation token arrives)
Generating...
Here is my response...
...

Implementation

  • Frontend: Parses 'prompt_progress' SSE chunks and displays formatted text
  • Backend: Adds 'return_progress: true' to streaming requests

Testing

Progress updates are sent at batch boundaries. Use smaller batch sizes to see more frequent updates:

./llama-server -m model.gguf -b 128  # Updates every 128 tokens

Then send a long prompt (500+ tokens) via WebUI to observe progress.

Notes

  • Only shows during preprocessing (before first content token)
  • Automatically disappears when generation starts
  • No UI changes needed for short prompts (processed in single batch)

Tested with -b = 128 on GPU and large prompt

PR-18300.mp4

Close #17079

@ngxson
Copy link
Collaborator

ngxson commented Dec 22, 2025

Just a nits improvement, I think showing percentage + ETA instead of elapsed time can be more useful:

Processing (123 / 456 tokens - 27% - ETA: 50s)

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Dec 23, 2025

It can still be improved; I don't know if people have prompts that take several minutes, but adding the minutes might be a good idea! (and also we calculate the tokens/s we can display them, but it will bloat, and we already have the final value)
I also need to double-check on CPU to break down the display and see if I can't have NaN or similar. Even with the first chunk #18305

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Dec 23, 2025

I think we're good. Now the client side message "Processing..." is no longer visible.

During first batch :

0

Next one :

1

Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice feature!

(May need approval from @allozaur too)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: webui: add parsing progress

2 participants