Skip to content

Conversation

DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Apr 12, 2024

This PR refactors various parts of the OpenAI-compatible server:

  • The _validate_prompt_and_tokenize method has been decomposed so that prompt and prompt_ids are processed separately.
  • The logging of prompt and prompt_ids has been moved from vllm.AsyncLLMEngine to vllm.entrypoints.logger.RequestLogger such that redundant data is no longer passed into the core engine. This also enables logging for tokenization endpoints.
  • Explicitly handle LoRA and prompt adapters for embedding endpoints. Previously, they were silently ignored.
  • Use a different prefix for request_id based on the endpoint type:
    • Completion API: cmpl-* (as before)
    • Chat Completion API: chat-*
    • Embeddings API: embd-*
    • Tokenization API: tokn-*
  • Fixed various type errors

@DarkLight1337 DarkLight1337 changed the title [mypy] Improve type annotations in vllm.entrypoints.openai [Frontend] Refactor prompt parsing Apr 12, 2024
@njhill njhill self-requested a review April 12, 2024 10:26
@DarkLight1337
Copy link
Member Author

@njhill I see that you are working on #3512. I suppose it would be better to get that merged before continuing with this refactor.

@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Apr 13, 2024

Seems that #4032 fixed the LoRA bugs, however entrypoints-test is still failing.

@DarkLight1337
Copy link
Member Author

Update: I found that it is due to a bug in my refactored parsing, my bad. I have fixed it just now.

@DarkLight1337
Copy link
Member Author

@njhill The tests now pass. To move on with #3978, perhaps we can merge this PR first?

@DarkLight1337
Copy link
Member Author

Update: I found that it is due to a bug in my refactored parsing, my bad. I have fixed it just now.

I'm updating test_batch_completions to better catch this error. Previously, the error only occurred in test_echo_logprob_completion which misled me from the real cause.

@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Jul 19, 2024

Looks like this line also needs to be removed from the tokenization test.

Actually, let me add add_special_tokens to Completions API (defaulting to True to maintain the existing behaviour) as well to make things consistent. It's a vLLM feature originally introduced by #5278.

@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Jul 19, 2024

My only concern is whether some folks might make use of the logging in the engine with different front ends.

I've moved out the logging to a separate class vllm.entrypoints.logger.RequestLogger, so it should be easy for them to add code to log the requests if need be.

@DarkLight1337
Copy link
Member Author

I have finished addressing your comments.

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DarkLight1337!

@njhill njhill merged commit 739b61a into vllm-project:main Jul 22, 2024
@njhill njhill deleted the openai-typing branch July 22, 2024 17:14
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants