-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[gpt-oss][1][bugfix] fix streaming final output #24466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0fd7f79 to
016bafd
Compare
02e414b to
0c31b0b
Compare
|
@aarnphm @DarkLight1337 @robertgshaw2-redhat @simon-mo this PR is ready for review :) |
|
Also CC @yeqcharlotte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG! saw you define StreamingResponsesResponse in later PR, do we plan on update BaseModel in this diff as well
I have a follow up PR here: #24556. i thought it would be easier for review to split them into 2 PRs but could combine them too :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR addresses three different changes. I recommend splitting it into multiple separate PRs. This PR should focus only on fixing the issue where the output of the last event is empty.
82fd949 to
c1098b4
Compare
7ec9669 to
309699c
Compare
309699c to
f4e284d
Compare
f4e284d to
9a71956
Compare
This reverts commit c87ca3325edbd5e80800df6e4151cee6a9c8c923. Signed-off-by: Andrew Xia <[email protected]>
Signed-off-by: Andrew Xia <[email protected]>
9a71956 to
9b19217
Compare
| # Check if the current token is part of reasoning content | ||
| self._update_num_reasoning_tokens() | ||
| self.last_tok = tok | ||
| if len(self._messages) - self.num_init_messages < len( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also add a unit test covering this behavior. the test can be constructed similar to https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/test_context.py#L313
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ty for the suggestion, just added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ready for re-review @chaunceyjiang
Signed-off-by: Andrew Xia <[email protected]>
chaunceyjiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks~
Signed-off-by: Andrew Xia <[email protected]>
Signed-off-by: Andrew Xia <[email protected]> Signed-off-by: charlifu <[email protected]>
Signed-off-by: Andrew Xia <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Signed-off-by: Andrew Xia <[email protected]>
Signed-off-by: Andrew Xia <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Purpose
per @chaunceyjiang's comments, I've also split up this PR into a couple follow ups:
Test Plan
Test Result
Before
^ note in this final response, output is an empty array. This is not what we want.
After:
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.