Skip to content

Conversation

@pskiran1
Copy link
Member

@pskiran1 pskiran1 commented Nov 24, 2025

What does the PR do?

This PR adds support for logprobs functionality in the OpenAI-compatible frontend for vLLM models. The feature allows users to request detailed probability information for generated tokens, which is useful for understanding model confidence and exploring alternative completions.

Key changes:

  • Added logprobs support for both chat completions and standard completions endpoints
  • Implemented conversion from vLLM's logprobs format to OpenAI's format
  • Added comprehensive test coverage for logprobs functionality, including validation and streaming
File Description
python/openai/openai_frontend/engine/utils/triton.py Added helper functions to parse and convert logprobs from vLLM responses to OpenAI format for both chat and completion endpoints
python/openai/openai_frontend/engine/triton_engine.py Integrated logprobs support into request handling, validation, and response generation for both streaming and non-streaming modes
python/openai/tests/test_openai_client.py Added async tests for logprobs functionality using the OpenAI client library, including validation tests
python/openai/tests/test_chat_completions.py Added HTTP-level tests for chat completions with logprobs, including edge cases and validation
python/openai/tests/test_completions.py Added HTTP-level tests for completions with logprobs, including edge cases and validation

Background:

https://platform.openai.com/docs/api-reference/completions/create
https://platform.openai.com/docs/api-reference/chat/create

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

Where should the reviewer start?

Test plan:

  • CI Pipeline ID: 39072711

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

@pskiran1 pskiran1 changed the title feat: Support logprobs for vLLM models in OpenAI API feat: Support logprobs for vLLM models in OpenAI Frontend Nov 24, 2025
@pskiran1 pskiran1 added PR: feat A new feature openai OpenAI related labels Nov 24, 2025
@pskiran1 pskiran1 requested a review from Copilot November 24, 2025 12:21
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for logprobs (log probabilities) functionality in the OpenAI-compatible frontend for vLLM models. The feature allows users to request detailed probability information for generated tokens, which is useful for understanding model confidence and exploring alternative completions.

Key changes:

  • Added logprobs support for both chat completions and standard completions endpoints
  • Implemented conversion from vLLM's logprobs format to OpenAI's format
  • Added comprehensive test coverage for logprobs functionality including validation and streaming

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
python/openai/openai_frontend/engine/utils/triton.py Added helper functions to parse and convert logprobs from vLLM responses to OpenAI format for both chat and completion endpoints
python/openai/openai_frontend/engine/triton_engine.py Integrated logprobs support into request handling, validation, and response generation for both streaming and non-streaming modes
python/openai/tests/test_openai_client.py Added async tests for logprobs functionality using the OpenAI client library, including validation tests
python/openai/tests/test_chat_completions.py Added HTTP-level tests for chat completions with logprobs, including edge cases and validation
python/openai/tests/test_completions.py Added HTTP-level tests for completions with logprobs, including edge cases and validation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@pskiran1 pskiran1 marked this pull request as ready for review November 24, 2025 12:48
@pskiran1 pskiran1 requested review from whoisj and yinggeh November 24, 2025 13:57
@pskiran1 pskiran1 requested a review from whoisj November 25, 2025 06:39
)
return

chat_completion = await client.chat.completions.create(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to send non-stream and steam requests and compare the output values to be the same, similar to test_chat_streaming?

Copy link
Member Author

@pskiran1 pskiran1 Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added changes to compare the counts of streaming and non-streaming logprobs and tokens. Unable to compare logprobs values due to floating point discrepancies exceeding 1e-2 for some values.
Ex: -0.11291083693504333 vs -0.12702862918376923, -0.002805228577926755 vs -0.0024760086089372635 and -0.1270265281200409 vs -0.1270267367362976

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use np.allclose. See examples in test_embeddings.py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified to use np.allclose. Thank you.

@pskiran1 pskiran1 requested a review from yinggeh November 27, 2025 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

openai OpenAI related PR: feat A new feature

Development

Successfully merging this pull request may close these issues.

4 participants