Closed
Description
Issues
- Server
/metrics
endpoint share the same task event as/health
:TASK_TYPE_METRICS
. It means metrics are reset on both calls. - the
Process-Start-Time-Unix
http response header is not set. - metrics
llamacpp:prompt_tokens_seconds
andllamacpp:predicted_tokens_seconds
are per slots, while the server actually process llamacpp:prompt_tokens_seconds * n_slots
Proposal
- Add a data params in TASK_TYPE_METRICS to reset the metric bucket only in /metrics
- Add
llamacpp:prompt_tokens_seconds_total
andllamacpp:predicted_tokens_seconds_total