Client-side prompt token count is inaccurate #75

sjmonson · 2025-02-25T20:23:42Z

The client-side tokenization in guidellm fails to account for the extra tokens added in the server's chat prompt template. There are two possible workarounds:

Enable usage metrics in each request and let the server tell us how many prompt tokens there are.
Use the /completions endpoint rather than /chat/completions as the chat template is not applied on the /completions endpoint.

The text was updated successfully, but these errors were encountered:

markurtz · 2025-03-10T18:03:01Z

#91 contains fixes towards both. There will be a bit of follow up work past that to enable the user further configuration on calculations and selecting which end point to request to

rgreenberg1 assigned sjmonson Feb 25, 2025

markurtz mentioned this issue Mar 10, 2025

Rework Backend to Native HTTP Requests and Enhance API Compatibility & Performance #91

Merged

markurtz closed this as completed in #91 Mar 12, 2025

markurtz closed this as completed in 3b346b5 Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Client-side prompt token count is inaccurate #75

Client-side prompt token count is inaccurate #75

sjmonson commented Feb 25, 2025

markurtz commented Mar 10, 2025

Uh oh!

Client-side prompt token count is inaccurate #75

Client-side prompt token count is inaccurate #75

Comments

sjmonson commented Feb 25, 2025

markurtz commented Mar 10, 2025

Uh oh!