-
Notifications
You must be signed in to change notification settings - Fork 659
[LogProbs]Enable prompt logprobs output and modify data transmission method for the online interface. #5089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
[LogProbs]Enable prompt logprobs output and modify data transmission method for the online interface. #5089
Conversation
|
Thanks for your contribution! |
|
/re-run run_tests_with_coverage |
8be64e7 to
2896e0e
Compare
2db301c to
d77bf58
Compare
|
/re-run run_ce_cases |
|
/re-run run_ce_cases |
gongshaotian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for worker
|
/re-run run_ce_cases |
d77bf58 to
66c8e66
Compare
8d05db3 to
88ca0b5
Compare
|
/re-run run_tests_with_coverage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds support for prompt_logprobs output in online interfaces (v1/chat/completions and v1/completions), implements logprobs=-1 to output default vocabulary size, and optimizes data transmission between engine and API layers using ForkingPickler with zero-copy transfer.
- Added prompt_logprobs parameter support with validation across both completion and chat interfaces
- Implemented logprobs=-1 functionality to return all vocabulary logprobs
- Replaced msgpack serialization with ForkingPickler for efficient Paddle.Tensor transfer
- Added extensive test coverage for new functionality
Reviewed Changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/woker/test_gpu_prompt_logprobs.py | Added environment variable patching for FD_USE_GET_SAVE_OUTPUT_V1 |
| tests/utils/test_clamp_prompt_logprobs.py | New tests for clamp_prompt_logprobs utility (tests expect AttributeError) |
| tests/entrypoints/test_vllm_run_engine.py | Updated tests for new validation logic with environment variable handling |
| tests/entrypoints/test_engine_client.py | Comprehensive validation tests for prompt_logprobs and max_logprobs |
| tests/entrypoints/openai/test_serving_completion.py | Extensive tests for prompt_logprobs in stream and full generators |
| tests/entrypoints/openai/test_serving_chat.py | Extensive tests for prompt_logprobs in chat completion generators |
| tests/engine/test_sampling_params.py | Updated validation tests with environment variable context |
| fastdeploy/utils.py | Added clamp_prompt_logprobs function (has immutability bug) |
| fastdeploy/worker/output.py | Updated PromptLogprobs type definition and added no_grad context |
| fastdeploy/inter_communicator/zmq_server.py | Changed from msgpack to ForkingPickler with zero-copy |
| fastdeploy/entrypoints/openai/utils.py | Updated deserialization to use ForkingPickler |
| fastdeploy/entrypoints/openai/serving_completion.py | Added _build_prompt_logprobs method and integration logic |
| fastdeploy/entrypoints/openai/serving_chat.py | Added _build_prompt_logprobs method (duplicated from completion) |
| fastdeploy/entrypoints/openai/protocol.py | Added prompt_logprobs fields and validation to request/response models |
| fastdeploy/entrypoints/engine_client.py | Added comprehensive validation for logprobs parameters |
| fastdeploy/entrypoints/llm.py | Updated validation logic for prompt_logprobs with vocab size handling |
| fastdeploy/engine/sampling_params.py | Conditional validation based on FD_USE_GET_SAVE_OUTPUT_V1 |
| fastdeploy/config.py | Added max_logprobs validation against vocabulary size |
d559869 to
78438eb
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #5089 +/- ##
==========================================
Coverage ? 60.69%
==========================================
Files ? 320
Lines ? 39230
Branches ? 5912
==========================================
Hits ? 23812
Misses ? 13530
Partials ? 1888
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
560c7f4 to
05dd2d3
Compare
1b1b2da to
b948a52
Compare
fastdeploy/engine/request.py
Outdated
| f"prompt={self.prompt!r}, " | ||
| f"prompt_token_ids={self.prompt_token_ids}, " | ||
| f"prompt_logprobs={self.prompt_logprobs}, " | ||
| f"prompt_logprobs_tensors={self.prompt_logprobs_tensors}, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prompt_logprobs_tensors 无需重复定义,复用prompt_logprobs 即可,改成复合类型 [list | Tensor]
Jiang-Jia-Jun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.getenv("FD_USE_GET_SAVE_OUTPUT_V1", "0"))这个改为使用envs.FD_USE_GET_SAVE_OUTPUT_V1,这样后面如需修改默认值,只需修改fastdeploy/env.py
7f06d60 to
c1ff004
Compare
9872454 to
ca7a64d
Compare
006d4e7 to
efd0d98
Compare
ac1b83f to
a8e10a6
Compare
34e8d2c to
4185c7e
Compare
4185c7e to
e0c01f8
Compare
Motivation
Modifications
Usage or Command
Accuracy Tests
When streaming output, the prompt_logprobs are included only in the first package (or chunk).
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.