Skip to content

Conversation

ishaangandhi
Copy link
Contributor

This PR fixes #12178.

Previously, when calling /completions with a token outside of the token vocabulary range, you would get this exception:

terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 18446744073709551615) >= this->size() (which is 151936)

Now, the server continues to run, and the user gets a useful error:

{"error":{"code":400,"message":"Prompt contains invalid tokens","type":"invalid_request_error"}}%                     

@ishaangandhi ishaangandhi requested a review from ngxson as a code owner March 11, 2025 16:15
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bonus: would be nice if you can add a test case for it, see server/tests/test_tokenize.py

@github-actions github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend python python script changes ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Mar 12, 2025
@ishaangandhi
Copy link
Contributor Author

@ngxson are you OK to merge this? I rebased the changes.

re: adding a test - I think test_tokenize tests the /tokenize and /detokenize endpoints, which invoke a different path than what we fix in this PR.

How about we merge this as-is?

@ngxson
Copy link
Collaborator

ngxson commented Mar 13, 2025

I only refer to that file so you can see how the code looks like. Adding test should be easy and it is a good way to show that your code actually work.

@ngxson ngxson changed the title bugfix: Prevent DOS when using verbose output with input tokens that are not in printable range (#12178) server : fix crash when using verbose output with input tokens that are not in printable range (#12178) Mar 13, 2025
@ngxson ngxson merged commit 2048b59 into ggml-org:master Mar 13, 2025
47 checks passed
@ngxson ngxson removed the request for review from JohannesGaessler March 13, 2025 10:10
jpohhhh pushed a commit to Telosnex/llama.cpp that referenced this pull request Mar 14, 2025
…re not in printable range (ggml-org#12178) (ggml-org#12338)

* Fix DOS index bug

* Remove new APIs

* remove extra line

* Remove from API

* Add extra newline

* Update examples/server/server.cpp

---------

Co-authored-by: Xuan-Son Nguyen <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025
…re not in printable range (ggml-org#12178) (ggml-org#12338)

* Fix DOS index bug

* Remove new APIs

* remove extra line

* Remove from API

* Add extra newline

* Update examples/server/server.cpp

---------

Co-authored-by: Xuan-Son Nguyen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) examples ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs python python script changes server SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Misc. bug: Denial of Service (crash) when using verbose output with input tokens that are not in printable range.
2 participants