Skip to content

Client-side prompt token count is inaccurate #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sjmonson opened this issue Feb 25, 2025 · 1 comment · Fixed by #91
Closed

Client-side prompt token count is inaccurate #75

sjmonson opened this issue Feb 25, 2025 · 1 comment · Fixed by #91
Assignees

Comments

@sjmonson
Copy link
Collaborator

The client-side tokenization in guidellm fails to account for the extra tokens added in the server's chat prompt template. There are two possible workarounds:

  1. Enable usage metrics in each request and let the server tell us how many prompt tokens there are.
  2. Use the /completions endpoint rather than /chat/completions as the chat template is not applied on the /completions endpoint.
@markurtz
Copy link
Member

#91 contains fixes towards both. There will be a bit of follow up work past that to enable the user further configuration on calculations and selecting which end point to request to

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants