Skip to content

Conversation

@qwes5s5
Copy link
Collaborator

@qwes5s5 qwes5s5 commented Nov 17, 2025

Motivation

  • Currently, only the offline generate() interface supports outputting prompt logprobs results; the online interfaces, specifically v1/chat/completions and v1/completions, do not support prompt logprobs output.
  • The v1/chat/completions and v1/completions interfaces do not support outputting the default vocabulary size when logprobs is set to -1; this requires improvement.
  • The current transmission path from the engine layer to the API layer does not support the transfer of tensor data, and involves unnecessary memory copies, resulting in low efficiency when transmitting large volumes of data.

Modifications

  • Added support and validation for the prompt_logprobs parameter input across both interfaces.
  • Implemented logic in both interfaces for processing prompt_logprobs results and constructing the corresponding PromptLogprobs response.
  • Modified the logprobs related logic in both interfaces to support the case where logprobs is set to -1.
  • Revised the data transmission logic between the engine and API layers to support Paddle.Tensor transfer, utilizing ForkingPickler serialization and achieving zero-copy transfer.

Usage or Command

curl --location 'http://yq01-sys-rpm36jsah8z.yq01.baidu.com:8180/v1/completions' \
--header 'Content-Type: application/json' \
--data '{
  "prompt": "请生成一篇关于理想的作文,不超过100字",
  "logprobs":-1,
  "prompt_logprobs":5,
  "stream": "true",
  "stream_options":{
    "include_usage":"true"
  }
}'
curl --location 'http://yq01-sys-rpm36jsah8z.yq01.baidu.com:8180/v1/completions' \
--header 'Content-Type: application/json' \
--data '{
  "prompt": "请生成一篇关于理想的作文,不超过100字",
  "logprobs":5,
  "prompt_logprobs":5
}'
curl --location 'http://yq01-sys-rpm36jsah8z.yq01.baidu.com:8180/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        {
            "role": "system",
            "content": "I'\''m a helpful AI assistant."
        },
        {
            "role": "user",
            "content": "give me three letter randdomly, just tell me letter without anything else"
        }
    ],
    "logprobs":"true",
    "top_logprobs":5,
    "prompt_logprobs":5
}'
curl --location 'http://yq01-sys-rpm36jsah8z.yq01.baidu.com:8180/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "messages": [
        {
            "role": "system",
            "content": "I'\''m a helpful AI assistant."
        },
        {
            "role": "user",
            "content": "give me three letter randdomly, just tell me letter without anything else"
        }
    ],
    "logprobs":"true",
    "top_logprobs":5,
    "prompt_logprobs":5,
    "stream":true
}'

Accuracy Tests

When streaming output, the prompt_logprobs are included only in the first package (or chunk).

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link

paddle-bot bot commented Nov 17, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Nov 17, 2025
@qwes5s5
Copy link
Collaborator Author

qwes5s5 commented Nov 18, 2025

/re-run run_tests_with_coverage

@qwes5s5 qwes5s5 force-pushed the new_add_prompt_logprobs_online branch from 8be64e7 to 2896e0e Compare November 18, 2025 18:13
@qwes5s5 qwes5s5 force-pushed the new_add_prompt_logprobs_online branch 3 times, most recently from 2db301c to d77bf58 Compare November 19, 2025 07:34
@qwes5s5 qwes5s5 requested review from Jiang-Jia-Jun and sunlei1024 and removed request for sunlei1024 November 19, 2025 07:36
@qwes5s5
Copy link
Collaborator Author

qwes5s5 commented Nov 19, 2025

/re-run run_ce_cases

@qwes5s5 qwes5s5 requested a review from gongshaotian November 19, 2025 08:57
@qwes5s5
Copy link
Collaborator Author

qwes5s5 commented Nov 19, 2025

/re-run run_ce_cases

gongshaotian
gongshaotian previously approved these changes Nov 19, 2025
Copy link
Collaborator

@gongshaotian gongshaotian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for worker

@qwes5s5
Copy link
Collaborator Author

qwes5s5 commented Nov 19, 2025

/re-run run_ce_cases

@qwes5s5 qwes5s5 force-pushed the new_add_prompt_logprobs_online branch from 8d05db3 to 88ca0b5 Compare November 19, 2025 15:13
@qwes5s5
Copy link
Collaborator Author

qwes5s5 commented Nov 20, 2025

/re-run run_tests_with_coverage

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds support for prompt_logprobs output in online interfaces (v1/chat/completions and v1/completions), implements logprobs=-1 to output default vocabulary size, and optimizes data transmission between engine and API layers using ForkingPickler with zero-copy transfer.

  • Added prompt_logprobs parameter support with validation across both completion and chat interfaces
  • Implemented logprobs=-1 functionality to return all vocabulary logprobs
  • Replaced msgpack serialization with ForkingPickler for efficient Paddle.Tensor transfer
  • Added extensive test coverage for new functionality

Reviewed Changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
tests/woker/test_gpu_prompt_logprobs.py Added environment variable patching for FD_USE_GET_SAVE_OUTPUT_V1
tests/utils/test_clamp_prompt_logprobs.py New tests for clamp_prompt_logprobs utility (tests expect AttributeError)
tests/entrypoints/test_vllm_run_engine.py Updated tests for new validation logic with environment variable handling
tests/entrypoints/test_engine_client.py Comprehensive validation tests for prompt_logprobs and max_logprobs
tests/entrypoints/openai/test_serving_completion.py Extensive tests for prompt_logprobs in stream and full generators
tests/entrypoints/openai/test_serving_chat.py Extensive tests for prompt_logprobs in chat completion generators
tests/engine/test_sampling_params.py Updated validation tests with environment variable context
fastdeploy/utils.py Added clamp_prompt_logprobs function (has immutability bug)
fastdeploy/worker/output.py Updated PromptLogprobs type definition and added no_grad context
fastdeploy/inter_communicator/zmq_server.py Changed from msgpack to ForkingPickler with zero-copy
fastdeploy/entrypoints/openai/utils.py Updated deserialization to use ForkingPickler
fastdeploy/entrypoints/openai/serving_completion.py Added _build_prompt_logprobs method and integration logic
fastdeploy/entrypoints/openai/serving_chat.py Added _build_prompt_logprobs method (duplicated from completion)
fastdeploy/entrypoints/openai/protocol.py Added prompt_logprobs fields and validation to request/response models
fastdeploy/entrypoints/engine_client.py Added comprehensive validation for logprobs parameters
fastdeploy/entrypoints/llm.py Updated validation logic for prompt_logprobs with vocab size handling
fastdeploy/engine/sampling_params.py Conditional validation based on FD_USE_GET_SAVE_OUTPUT_V1
fastdeploy/config.py Added max_logprobs validation against vocabulary size

@qwes5s5 qwes5s5 force-pushed the new_add_prompt_logprobs_online branch 2 times, most recently from d559869 to 78438eb Compare November 24, 2025 03:21
@codecov-commenter
Copy link

codecov-commenter commented Nov 24, 2025

Codecov Report

❌ Patch coverage is 67.56757% with 72 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@1372d6d). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/entrypoints/engine_client.py 7.69% 44 Missing and 4 partials ⚠️
fastdeploy/entrypoints/openai/protocol.py 65.51% 5 Missing and 5 partials ⚠️
fastdeploy/entrypoints/llm.py 68.75% 2 Missing and 3 partials ⚠️
fastdeploy/config.py 40.00% 1 Missing and 2 partials ⚠️
fastdeploy/inter_communicator/zmq_server.py 0.00% 3 Missing ⚠️
...astdeploy/entrypoints/openai/serving_completion.py 95.83% 2 Missing ⚠️
fastdeploy/entrypoints/openai/serving_chat.py 97.67% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #5089   +/-   ##
==========================================
  Coverage           ?   60.69%           
==========================================
  Files              ?      320           
  Lines              ?    39230           
  Branches           ?     5912           
==========================================
  Hits               ?    23812           
  Misses             ?    13530           
  Partials           ?     1888           
Flag Coverage Δ
GPU 60.69% <67.56%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@qwes5s5 qwes5s5 force-pushed the new_add_prompt_logprobs_online branch 2 times, most recently from 560c7f4 to 05dd2d3 Compare November 24, 2025 04:42
@qwes5s5 qwes5s5 force-pushed the new_add_prompt_logprobs_online branch 2 times, most recently from 1b1b2da to b948a52 Compare November 24, 2025 07:36
sunlei1024
sunlei1024 previously approved these changes Nov 25, 2025
f"prompt={self.prompt!r}, "
f"prompt_token_ids={self.prompt_token_ids}, "
f"prompt_logprobs={self.prompt_logprobs}, "
f"prompt_logprobs_tensors={self.prompt_logprobs_tensors}, "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prompt_logprobs_tensors 无需重复定义,复用prompt_logprobs 即可,改成复合类型 [list | Tensor]

Copy link
Collaborator

@Jiang-Jia-Jun Jiang-Jia-Jun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.getenv("FD_USE_GET_SAVE_OUTPUT_V1", "0"))这个改为使用envs.FD_USE_GET_SAVE_OUTPUT_V1,这样后面如需修改默认值,只需修改fastdeploy/env.py

@qwes5s5 qwes5s5 force-pushed the new_add_prompt_logprobs_online branch from 7f06d60 to c1ff004 Compare November 25, 2025 08:32
@qwes5s5 qwes5s5 force-pushed the new_add_prompt_logprobs_online branch from 9872454 to ca7a64d Compare November 25, 2025 11:46
Jiang-Jia-Jun
Jiang-Jia-Jun previously approved these changes Nov 26, 2025
@qwes5s5 qwes5s5 force-pushed the new_add_prompt_logprobs_online branch from ac1b83f to a8e10a6 Compare November 26, 2025 14:03
@qwes5s5 qwes5s5 force-pushed the new_add_prompt_logprobs_online branch 2 times, most recently from 34e8d2c to 4185c7e Compare November 27, 2025 06:12
@qwes5s5 qwes5s5 force-pushed the new_add_prompt_logprobs_online branch from 4185c7e to e0c01f8 Compare November 27, 2025 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants