Skip to content

Conversation

JohannesGaessler
Copy link
Collaborator

According to the OpenAI documentation formatting the prompt as an array of tokens is supported. However, the llama.cpp server raises an error if you provide such input. I assume the reason is that the interpretation of tokens depends on the model so this would not be "OpenAI compatible" either way. However, I have a use case where I need such inputs. This PR simply removes the error in the llama.cpp server. I don't think this would cause issues but my understanding of the server code is also relatively poor.

I'm currently working on benchmarking llama.cpp vs. vllm. Both projects provide an OAI-compatible API. So I want to make scripts/server-bench.py use the OAI-compatible API instead of the llama.cpp-specific API in order to use the exact same code for benchmarking either project. Under these circumstances I want to be able to send prompts of an exact length (in tokens) while at the same time the interpretations of those prompts as text are irrelevant.

@JohannesGaessler JohannesGaessler merged commit f906275 into ggml-org:master Aug 2, 2025
47 checks passed
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Aug 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants