-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
[V1] Fix vocab size calculation for structured output #14826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V1] Fix vocab size calculation for structured output #14826
Conversation
When testing with V1 structured output + Llama-3.1-8B-Instruct, the changes made in vllm-project#14630 broke for me. I get the error: ``` ERROR 03-14 15:44:42 [core.py:337] File "/home/rbryant/vllm/vllm/v1/structured_output/__init__.py", line 77, in _delayed_init ERROR 03-14 15:44:42 [core.py:337] tokenizer_info = xgr.TokenizerInfo.from_huggingface( ERROR 03-14 15:44:42 [core.py:337] File "/home/rbryant/vllm/venv/lib/python3.10/site-packages/xgrammar/tokenizer_info.py", line 184, in from_huggingface ERROR 03-14 15:44:42 [core.py:337] raise ValueError(msg) ERROR 03-14 15:44:42 [core.py:337] ValueError: Input vocab_size less than minimum viable vocab size for tokenizer <class 'vllm.transformers_utils.tokenizer.get_cached_tokenizer.<locals>.CachedTokenizer'>. ERROR 03-14 15:44:42 [core.py:337] ``` The vocab size was off by one. The max token ID is not == the vocab size, since 0 is also a token ID. It's the max token ID + 1. Signed-off-by: Russell Bryant <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Is this currently covered by any test? If not we can force-merge this |
it should be, but they're off right now -- #14619 |
OK let's just merge then |
qq: Why we need to use max token ID to infer the vocab_size instead of just |
See discussion on #14630 that led to that change |
I see. It's actually better to comment on the code to reduce future confusion. |
…4826) Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Richard Liu <[email protected]>
agreed ... and this still wasn't right. I'm trying to get tests to pass over in #14832 |
…4826) Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
…4826) Signed-off-by: Russell Bryant <[email protected]>
…4826) Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Mu Huai <[email protected]>
When testing with V1 structured output + Llama-3.1-8B-Instruct, the
changes made in #14630 broke for me. I get the error:
The vocab size was off by one. The max token ID is not == the vocab
size, since 0 is also a token ID. It's the max token ID + 1.
Signed-off-by: Russell Bryant [email protected]