[LogProbs]Enable prompt logprobs output and modify data transmission method for the online interface. #5089

qwes5s5 · 2025-11-17T10:25:29Z

Motivation

Currently, only the offline generate() interface supports outputting prompt logprobs results; the online interfaces, specifically v1/chat/completions and v1/completions, do not support prompt logprobs output.
The v1/chat/completions and v1/completions interfaces do not support outputting the default vocabulary size when logprobs is set to -1; this requires improvement.
The current transmission path from the engine layer to the API layer does not support the transfer of tensor data, and involves unnecessary memory copies, resulting in low efficiency when transmitting large volumes of data.

Modifications

Added support and validation for the prompt_logprobs parameter input across both interfaces.
Implemented logic in both interfaces for processing prompt_logprobs results and constructing the corresponding PromptLogprobs response.
Modified the logprobs related logic in both interfaces to support the case where logprobs is set to -1.
Revised the data transmission logic between the engine and API layers to support Paddle.Tensor transfer, utilizing ForkingPickler serialization and achieving zero-copy transfer.

Usage or Command

curl --location 'http://yq01-sys-rpm36jsah8z.yq01.baidu.com:8180/v1/completions' \
--header 'Content-Type: application/json' \
--data '{
  "prompt": "请生成一篇关于理想的作文,不超过100字",
  "logprobs":-1,
  "prompt_logprobs":5,
  "stream": "true",
  "stream_options":{
    "include_usage":"true"
  }
}'

curl --location 'http://yq01-sys-rpm36jsah8z.yq01.baidu.com:8180/v1/completions' \
--header 'Content-Type: application/json' \
--data '{
  "prompt": "请生成一篇关于理想的作文,不超过100字",
  "logprobs":5,
  "prompt_logprobs":5
}'

curl --location 'http://yq01-sys-rpm36jsah8z.yq01.baidu.com:8180/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        {
            "role": "system",
            "content": "I'\''m a helpful AI assistant."
        },
        {
            "role": "user",
            "content": "give me three letter randdomly, just tell me letter without anything else"
        }
    ],
    "logprobs":"true",
    "top_logprobs":5,
    "prompt_logprobs":5
}'

curl --location 'http://yq01-sys-rpm36jsah8z.yq01.baidu.com:8180/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        {
            "role": "system",
            "content": "I'\''m a helpful AI assistant."
        },
        {
            "role": "user",
            "content": "give me three letter randdomly, just tell me letter without anything else"
        }
    ],
    "logprobs":"true",
    "top_logprobs":5,
    "prompt_logprobs":5,
    "stream":true
}'

Accuracy Tests

When streaming output, the prompt_logprobs are included only in the first package (or chunk).

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-11-17T10:25:35Z

Thanks for your contribution!

qwes5s5 · 2025-11-18T12:36:00Z

/re-run run_tests_with_coverage

…ogprobs_online

qwes5s5 · 2025-11-19T08:56:45Z

/re-run run_ce_cases

qwes5s5 · 2025-11-19T09:25:48Z

/re-run run_ce_cases

gongshaotian

LGTM for worker

qwes5s5 · 2025-11-19T10:02:08Z

/re-run run_ce_cases

qwes5s5 · 2025-11-20T02:45:04Z

/re-run run_tests_with_coverage

Copilot

Pull Request Overview

This pull request adds support for prompt_logprobs output in online interfaces (v1/chat/completions and v1/completions), implements logprobs=-1 to output default vocabulary size, and optimizes data transmission between engine and API layers using ForkingPickler with zero-copy transfer.

Added prompt_logprobs parameter support with validation across both completion and chat interfaces
Implemented logprobs=-1 functionality to return all vocabulary logprobs
Replaced msgpack serialization with ForkingPickler for efficient Paddle.Tensor transfer
Added extensive test coverage for new functionality

Reviewed Changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
tests/woker/test_gpu_prompt_logprobs.py	Added environment variable patching for FD_USE_GET_SAVE_OUTPUT_V1
tests/utils/test_clamp_prompt_logprobs.py	New tests for clamp_prompt_logprobs utility (tests expect AttributeError)
tests/entrypoints/test_vllm_run_engine.py	Updated tests for new validation logic with environment variable handling
tests/entrypoints/test_engine_client.py	Comprehensive validation tests for prompt_logprobs and max_logprobs
tests/entrypoints/openai/test_serving_completion.py	Extensive tests for prompt_logprobs in stream and full generators
tests/entrypoints/openai/test_serving_chat.py	Extensive tests for prompt_logprobs in chat completion generators
tests/engine/test_sampling_params.py	Updated validation tests with environment variable context
fastdeploy/utils.py	Added clamp_prompt_logprobs function (has immutability bug)
fastdeploy/worker/output.py	Updated PromptLogprobs type definition and added no_grad context
fastdeploy/inter_communicator/zmq_server.py	Changed from msgpack to ForkingPickler with zero-copy
fastdeploy/entrypoints/openai/utils.py	Updated deserialization to use ForkingPickler
fastdeploy/entrypoints/openai/serving_completion.py	Added _build_prompt_logprobs method and integration logic
fastdeploy/entrypoints/openai/serving_chat.py	Added _build_prompt_logprobs method (duplicated from completion)
fastdeploy/entrypoints/openai/protocol.py	Added prompt_logprobs fields and validation to request/response models
fastdeploy/entrypoints/engine_client.py	Added comprehensive validation for logprobs parameters
fastdeploy/entrypoints/llm.py	Updated validation logic for prompt_logprobs with vocab size handling
fastdeploy/engine/sampling_params.py	Conditional validation based on FD_USE_GET_SAVE_OUTPUT_V1
fastdeploy/config.py	Added max_logprobs validation against vocabulary size

fastdeploy/entrypoints/openai/serving_completion.py

fastdeploy/entrypoints/openai/protocol.py

fastdeploy/utils.py

fastdeploy/inter_communicator/zmq_server.py

fastdeploy/entrypoints/openai/protocol.py

tests/entrypoints/openai/test_serving_completion.py

codecov-commenter · 2025-11-24T04:34:20Z

Codecov Report

❌ Patch coverage is 67.56757% with 72 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@1372d6d). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/entrypoints/engine_client.py	7.69%	44 Missing and 4 partials ⚠️
fastdeploy/entrypoints/openai/protocol.py	65.51%	5 Missing and 5 partials ⚠️
fastdeploy/entrypoints/llm.py	68.75%	2 Missing and 3 partials ⚠️
fastdeploy/config.py	40.00%	1 Missing and 2 partials ⚠️
fastdeploy/inter_communicator/zmq_server.py	0.00%	3 Missing ⚠️
...astdeploy/entrypoints/openai/serving_completion.py	95.83%	2 Missing ⚠️
fastdeploy/entrypoints/openai/serving_chat.py	97.67%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5089   +/-   ##
==========================================
  Coverage           ?   60.69%           
==========================================
  Files              ?      320           
  Lines              ?    39230           
  Branches           ?     5912           
==========================================
  Hits               ?    23812           
  Misses             ?    13530           
  Partials           ?     1888

Flag	Coverage Δ
GPU	`60.69% <67.56%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sunlei1024 · 2025-11-19T04:33:30Z

fastdeploy/engine/request.py

            f"prompt={self.prompt!r}, "
            f"prompt_token_ids={self.prompt_token_ids}, "
            f"prompt_logprobs={self.prompt_logprobs}, "
+            f"prompt_logprobs_tensors={self.prompt_logprobs_tensors}, "


prompt_logprobs_tensors 无需重复定义，复用prompt_logprobs 即可，改成复合类型 [list | Tensor]

Jiang-Jia-Jun

os.getenv("FD_USE_GET_SAVE_OUTPUT_V1", "0"))这个改为使用envs.FD_USE_GET_SAVE_OUTPUT_V1，这样后面如需修改默认值，只需修改fastdeploy/env.py

…ogprobs_online

paddle-bot bot added the contributor External developers label Nov 17, 2025

qwes5s5 force-pushed the new_add_prompt_logprobs_online branch from 8be64e7 to 2896e0e Compare November 18, 2025 18:13

qwes5s5 added 2 commits November 19, 2025 03:57

add prompt logprobs

72e64ad

Merge remote-tracking branch 'upstream/develop' into new_add_prompt_l…

6c5f51e

…ogprobs_online

qwes5s5 force-pushed the new_add_prompt_logprobs_online branch 3 times, most recently from 2db301c to d77bf58 Compare November 19, 2025 07:34

qwes5s5 requested review from Jiang-Jia-Jun and sunlei1024 and removed request for sunlei1024 November 19, 2025 07:36

qwes5s5 requested a review from gongshaotian November 19, 2025 08:57

gongshaotian previously approved these changes Nov 19, 2025

View reviewed changes

qwes5s5 dismissed gongshaotian’s stale review via 66c8e66 November 19, 2025 14:45

qwes5s5 force-pushed the new_add_prompt_logprobs_online branch from d77bf58 to 66c8e66 Compare November 19, 2025 14:45

Merge prompt_logprobs_tensors and prompt_logprobs

88ca0b5

qwes5s5 force-pushed the new_add_prompt_logprobs_online branch from 8d05db3 to 88ca0b5 Compare November 19, 2025 15:13

Jiang-Jia-Jun requested a review from Copilot November 20, 2025 05:39

Copilot started reviewing on behalf of Jiang-Jia-Jun November 20, 2025 05:40 View session

Copilot finished reviewing on behalf of Jiang-Jia-Jun November 20, 2025 05:42

Copilot AI reviewed Nov 20, 2025

View reviewed changes

qwes5s5 force-pushed the new_add_prompt_logprobs_online branch 2 times, most recently from d559869 to 78438eb Compare November 24, 2025 03:21

fix param check

05dd2d3

qwes5s5 force-pushed the new_add_prompt_logprobs_online branch 2 times, most recently from 560c7f4 to 05dd2d3 Compare November 24, 2025 04:42

trigger ci

4dd340d

qwes5s5 force-pushed the new_add_prompt_logprobs_online branch 2 times, most recently from 1b1b2da to b948a52 Compare November 24, 2025 07:36

sunlei1024 previously approved these changes Nov 25, 2025

View reviewed changes

qwes5s5 dismissed sunlei1024’s stale review via 7f06d60 November 25, 2025 07:42

Jiang-Jia-Jun requested changes Nov 25, 2025

View reviewed changes

fix llmrecliet

c1ff004

qwes5s5 force-pushed the new_add_prompt_logprobs_online branch from 7f06d60 to c1ff004 Compare November 25, 2025 08:32

qwes5s5 requested a review from Jiang-Jia-Jun November 25, 2025 09:27

qwes5s5 added 2 commits November 25, 2025 11:44

Merge remote-tracking branch 'upstream/develop' into new_add_prompt_l…

f420fa9

…ogprobs_online

fix unitest

ca7a64d

qwes5s5 force-pushed the new_add_prompt_logprobs_online branch from 9872454 to ca7a64d Compare November 25, 2025 11:46

Jiang-Jia-Jun previously approved these changes Nov 26, 2025

View reviewed changes

qwes5s5 dismissed Jiang-Jia-Jun’s stale review via efd0d98 November 26, 2025 11:09

qwes5s5 force-pushed the new_add_prompt_logprobs_online branch from 006d4e7 to efd0d98 Compare November 26, 2025 11:09

fix logprobs bug

a8e10a6

qwes5s5 force-pushed the new_add_prompt_logprobs_online branch from ac1b83f to a8e10a6 Compare November 26, 2025 14:03

Merge remote-tracking branch 'upstream/develop' into new_add_prompt_l…

58c5a81

…ogprobs_online

qwes5s5 force-pushed the new_add_prompt_logprobs_online branch 2 times, most recently from 34e8d2c to 4185c7e Compare November 27, 2025 06:12

fix unitest

e0c01f8

qwes5s5 force-pushed the new_add_prompt_logprobs_online branch from 4185c7e to e0c01f8 Compare November 27, 2025 07:36

qwes5s5 and others added 2 commits November 27, 2025 07:37

Merge remote-tracking branch 'upstream/develop' into new_add_prompt_l…

e66a75a

…ogprobs_online

Merge branch 'develop' into new_add_prompt_logprobs_online

fb32cb1

[LogProbs]Enable prompt logprobs output and modify data transmission method for the online interface. #5089

Are you sure you want to change the base?

[LogProbs]Enable prompt logprobs output and modify data transmission method for the online interface. #5089

Conversation

qwes5s5 commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Nov 17, 2025

Uh oh!

qwes5s5 commented Nov 18, 2025

Uh oh!

qwes5s5 commented Nov 19, 2025

Uh oh!

qwes5s5 commented Nov 19, 2025

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

qwes5s5 commented Nov 19, 2025

Uh oh!

qwes5s5 commented Nov 20, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sunlei1024 Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Jiang-Jia-Jun left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

qwes5s5 commented Nov 17, 2025 •

edited

Loading

codecov-commenter commented Nov 24, 2025 •

edited

Loading