feat: Support `logprobs` for vLLM models in OpenAI Frontend #8538

pskiran1 · 2025-11-24T05:33:23Z

What does the PR do?

This PR adds support for logprobs functionality in the OpenAI-compatible frontend for vLLM models. The feature allows users to request detailed probability information for generated tokens, which is useful for understanding model confidence and exploring alternative completions.

Key changes:

Added logprobs support for both chat completions and standard completions endpoints
Implemented conversion from vLLM's logprobs format to OpenAI's format
Added comprehensive test coverage for logprobs functionality, including validation and streaming

File	Description
python/openai/openai_frontend/engine/utils/triton.py	Added helper functions to parse and convert logprobs from vLLM responses to OpenAI format for both chat and completion endpoints
python/openai/openai_frontend/engine/triton_engine.py	Integrated logprobs support into request handling, validation, and response generation for both streaming and non-streaming modes
python/openai/tests/test_openai_client.py	Added async tests for logprobs functionality using the OpenAI client library, including validation tests
python/openai/tests/test_chat_completions.py	Added HTTP-level tests for chat completions with logprobs, including edge cases and validation
python/openai/tests/test_completions.py	Added HTTP-level tests for completions with logprobs, including edge cases and validation

Background:

https://platform.openai.com/docs/api-reference/completions/create
https://platform.openai.com/docs/api-reference/chat/create

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

Where should the reviewer start?

Test plan:

CI Pipeline ID: 39072711

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

…probs-in-triton-openai' of https://github.com/triton-inference-server/server into spolisetty/tri-216-add-support-for-logprobs-and-top_logprobs-in-triton-openai

…and-top_logprobs-in-triton-openai

…probs-in-triton-openai' of https://github.com/triton-inference-server/server into spolisetty/tri-216-add-support-for-logprobs-and-top_logprobs-in-triton-openai

Copilot

Pull request overview

This PR adds support for logprobs (log probabilities) functionality in the OpenAI-compatible frontend for vLLM models. The feature allows users to request detailed probability information for generated tokens, which is useful for understanding model confidence and exploring alternative completions.

Key changes:

Added logprobs support for both chat completions and standard completions endpoints
Implemented conversion from vLLM's logprobs format to OpenAI's format
Added comprehensive test coverage for logprobs functionality including validation and streaming

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
python/openai/openai_frontend/engine/utils/triton.py	Added helper functions to parse and convert logprobs from vLLM responses to OpenAI format for both chat and completion endpoints
python/openai/openai_frontend/engine/triton_engine.py	Integrated logprobs support into request handling, validation, and response generation for both streaming and non-streaming modes
python/openai/tests/test_openai_client.py	Added async tests for logprobs functionality using the OpenAI client library, including validation tests
python/openai/tests/test_chat_completions.py	Added HTTP-level tests for chat completions with logprobs, including edge cases and validation
python/openai/tests/test_completions.py	Added HTTP-level tests for completions with logprobs, including edge cases and validation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

python/openai/openai_frontend/engine/triton_engine.py

python/openai/openai_frontend/engine/utils/triton.py

python/openai/tests/test_completions.py

python/openai/tests/test_chat_completions.py

yinggeh · 2025-11-26T11:47:48Z

python/openai/tests/test_openai_client.py

+            )
+            return
+
+        chat_completion = await client.chat.completions.create(


Is it better to send non-stream and steam requests and compare the output values to be the same, similar to test_chat_streaming?

Added changes to compare the counts of streaming and non-streaming logprobs and tokens. Unable to compare logprobs values due to floating point discrepancies exceeding 1e-2 for some values.
Ex: -0.11291083693504333 vs -0.12702862918376923, -0.002805228577926755 vs -0.0024760086089372635 and -0.1270265281200409 vs -0.1270267367362976

You can use np.allclose. See examples in test_embeddings.py

Modified to use np.allclose. Thank you.

python/openai/openai_frontend/engine/triton_engine.py

into spolisetty/tri-216-add-support-for-logprobs-and-top_logprobs-in-triton-openai

pskiran1 added 8 commits November 24, 2025 11:01

Support logprobs for vLLM models in OpenAI API

2669511

Merge branch 'spolisetty/tri-216-add-support-for-logprobs-and-top_log…

16fd539

…probs-in-triton-openai' of https://github.com/triton-inference-server/server into spolisetty/tri-216-add-support-for-logprobs-and-top_logprobs-in-triton-openai

Fix pre-commit errors

f7b3b49

Fix pre-commit

3c07720

Merge branch 'main' into spolisetty/tri-216-add-support-for-logprobs-…

d613087

…and-top_logprobs-in-triton-openai

Update

b3e0632

Merge branch 'spolisetty/tri-216-add-support-for-logprobs-and-top_log…

1b0ed87

…probs-in-triton-openai' of https://github.com/triton-inference-server/server into spolisetty/tri-216-add-support-for-logprobs-and-top_logprobs-in-triton-openai

Update

8d667e6

pskiran1 changed the title ~~feat: Support logprobs for vLLM models in OpenAI API~~ feat: Support logprobs for vLLM models in OpenAI Frontend Nov 24, 2025

pskiran1 added 3 commits November 24, 2025 13:38

Update

1a9f3e1

Temp test

ae8ab4d

Undo temp

3fa4326

pskiran1 added PR: feat A new feature openai OpenAI related labels Nov 24, 2025

pskiran1 requested a review from Copilot November 24, 2025 12:21

Copilot AI reviewed Nov 24, 2025

View reviewed changes

python/openai/openai_frontend/engine/triton_engine.py Outdated Show resolved Hide resolved

python/openai/openai_frontend/engine/triton_engine.py Show resolved Hide resolved

Update

e88a905

pskiran1 marked this pull request as ready for review November 24, 2025 12:48

pskiran1 requested review from whoisj and yinggeh November 24, 2025 13:57

whoisj reviewed Nov 24, 2025

View reviewed changes

python/openai/openai_frontend/engine/utils/triton.py Show resolved Hide resolved

python/openai/openai_frontend/engine/utils/triton.py Outdated Show resolved Hide resolved

Update

45e1687

pskiran1 requested a review from whoisj November 25, 2025 06:39

yinggeh reviewed Nov 26, 2025

View reviewed changes

pskiran1 added 6 commits November 27, 2025 10:47

Update

1d3cc00

Merge branch 'main' of https://github.com/triton-inference-server/server

0e16441

into spolisetty/tri-216-add-support-for-logprobs-and-top_logprobs-in-triton-openai

Update

3d616ea

Update

c0abe37

Update

d37d585

update

fc8c51d

Update

3cf11fa

pskiran1 requested a review from yinggeh November 27, 2025 17:37

pskiran1 added 3 commits December 1, 2025 11:34

Update

9b10ef9

update

2e075d7

Update

d644d84

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Support `logprobs` for vLLM models in OpenAI Frontend #8538

feat: Support `logprobs` for vLLM models in OpenAI Frontend #8538

Uh oh!

pskiran1 commented Nov 24, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yinggeh Nov 26, 2025

Uh oh!

pskiran1 Nov 27, 2025 •

edited

Loading

Uh oh!

yinggeh Nov 29, 2025

Uh oh!

pskiran1 Dec 2, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

feat: Support logprobs for vLLM models in OpenAI Frontend #8538

Are you sure you want to change the base?

feat: Support logprobs for vLLM models in OpenAI Frontend #8538

Uh oh!

Conversation

pskiran1 commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yinggeh Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

pskiran1 Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yinggeh Nov 29, 2025

Choose a reason for hiding this comment

Uh oh!

pskiran1 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

feat: Support `logprobs` for vLLM models in OpenAI Frontend #8538

feat: Support `logprobs` for vLLM models in OpenAI Frontend #8538

pskiran1 commented Nov 24, 2025 •

edited

Loading

pskiran1 Nov 27, 2025 •

edited

Loading