-
Notifications
You must be signed in to change notification settings - Fork 1.4k
WIP: Add count_tokens for openAI models #3447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
WIP: Add count_tokens for openAI models #3447
Conversation
| return event_loop | ||
|
|
||
|
|
||
| def num_tokens_from_messages( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is OpenAI specific so it should live in models/openai.py
|
|
||
| def num_tokens_from_messages( | ||
| messages: list[ChatCompletionMessageParam] | list[ResponseInputItemParam], | ||
| model: OpenAIModelName = 'gpt-4o-mini-2024-07-18', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need a default value
| else: | ||
| raise NotImplementedError( | ||
| f"""num_tokens_from_messages() is not implemented for model {model}.""" | ||
| ) # TODO: How to handle other models? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you able to reverse engineer the right formula for gpt-5?
As long as we document that this is a best effort calculation and may not be accurate down to the exact token, we can have one branch of logic for "everything before gpt-5" and one for every newer. If future models have different rules, we can update the logic then.
| try: | ||
| encoding = tiktoken.encoding_for_model(model) | ||
| except KeyError: | ||
| print('Warning: model not found. Using o200k_base encoding.') # TODO: How to handle warnings? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No warnings please, let's just make a best effort
| ) -> usage.RequestUsage: | ||
| """Make a request to the model for counting tokens.""" | ||
| openai_messages = await self._map_messages(messages, model_request_parameters) | ||
| token_count = num_tokens_from_messages(openai_messages, self.model_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenAIChatModel and OpenAIResponsesModel can also be used with non-OpenAI models, so we should only use tiktoken if we're sure we're using an OpenAI model. We could check self.system == 'openai', but then it won't work right with OpenAI models via Azure OpenAI, OpenRouter, etc.
So the best option may be to add a new field on OpenAIModelProfile to specify whether a given model is supported by tiktoken, which we can then enable on openai_model_profile and leave false by default. That's a bit trickier to implement though, so feel free to start with the other changes I requested + the self.system check, and we can always add the model profile check later.
Work for #3430
Took method 1:1 from OpenAI cookbook. However the example only shows for certain models. How should other models be handled? Quick test with gpt-5 showed a token number that differed from this method.
gpt-5 Warning: gpt-5 may update over time. Returning num tokens assuming gpt-5-2025-08-07. 110 prompt tokens counted by num_tokens_from_messages(). 109 prompt tokens counted by the OpenAI API.Test script