-
Notifications
You must be signed in to change notification settings - Fork 200
Specdec Bench: vLLM reqid, SGL path, conc > 1 metric fix #541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Izzy Putterman <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #541 +/- ##
=======================================
Coverage 74.37% 74.37%
=======================================
Files 182 182
Lines 18219 18219
=======================================
Hits 13550 13550
Misses 4669 4669 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Can you add some description of this PR? |
Updated the description |
h-guo18
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
## What does this PR do? **SGLang** Fix for actually passing the draft model path to the engine **vLLM** Fix for multiturn to not overlap request_id strings **Acceptance Rate** Fix for potential race condition on multiturn datasets in writing back AR **Overview:** ? ## Usage <!-- You can potentially add a usage example below. --> ```python # Add a code snippet demonstrating how to use this ``` ## Testing <!-- Mention how have you tested your change if applicable. --> ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No <!--- If No, explain why. --> - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> ## Additional Information <!-- E.g. related issue. --> Signed-off-by: Izzy Putterman <[email protected]>
What does this PR do?
SGLang Fix for actually passing the draft model path to the engine
vLLM Fix for multiturn to not overlap request_id strings
Acceptance Rate Fix for potential race condition on multiturn datasets in writing back AR
Overview: ?
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information