[Bugfix] Handle `best_of>1` case by disabling speculation. #6138

tdoublep · 2024-07-04T10:04:37Z

This PR solves #6137 by disabling speculation for batches that contain any request with best_of>1.

This approach ensures that we can handle requests withbest_of>1 without failure, but may have a downside that a single user sending requests with best_of>1 can potentially ruin performance for other users with best_of=1.

An alternative solution could just be to raise on those individual requests and give a message to the user like best_of > 1 is not supported when speculative decoding is enabled.

I can also implement that if preferred. What do you think @cadedaniel @njhill ?

Signed-off-by: Thomas Parnell <[email protected]>

cadedaniel · 2024-07-09T06:58:08Z

Thanks for the fix -- approach looks good to me.

On whether or not we should support this -- for performance we would want to disable this feature or support it natively in spec decode. I am fine having this in, can we log once if this happens so there's a hint of the performance degredation to users?

cadedaniel · 2024-07-09T06:58:43Z

vllm/engine/output_processor/multi_step.py


        assert seqs, "expected running sequences"
+
+        if len(seqs) > 1 or sequence_group.sampling_params.best_of > 1:


add a comment here explaining what's going on. also can we explicitly fail if beam search is enabled?

done.

re: failing on beam search, I think this is best done before we put the requests in the batch so I've added some code in add_request to that effect.

cadedaniel · 2024-07-09T06:59:05Z

vllm/spec_decode/spec_decode_worker.py

+        for seq_group_metadata in execute_model_req.seq_group_metadata_list:
+            if seq_group_metadata.sampling_params.best_of > 1:
+                return True


suggest moving this to a method so you can add a docstring with more explanation on how this works (e.g. performance tradeoff)

Signed-off-by: Thomas Parnell <[email protected]>

tdoublep · 2024-07-15T11:47:24Z

I am fine having this in, can we log once if this happens so there's a hint of the performance degredation to users?

I added a warning when we disable speculation due to n>1 or best_of>1.

github-actions · 2024-10-25T02:02:50Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

github-actions · 2024-11-24T02:08:36Z

This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you!

tdoublep added 5 commits July 4, 2024 05:51

Handle n>1 case by disabling speculation.

bd3a6b9

Signed-off-by: Thomas Parnell <[email protected]>

Formatting

4187741

Signed-off-by: Thomas Parnell <[email protected]>

Fix MultiStepOutputProcessor unit tests.

6ebb755

Signed-off-by: Thomas Parnell <[email protected]>

Added E2E tests.

3d235dd

Signed-off-by: Thomas Parnell <[email protected]>

Format

edd963a

Signed-off-by: Thomas Parnell <[email protected]>

cadedaniel reviewed Jul 9, 2024

View reviewed changes

tdoublep added 2 commits July 15, 2024 06:48

Merge branch 'main' into spec-decode-n-fallback

c5da250

Address review comments.

a98e4a1

Signed-off-by: Thomas Parnell <[email protected]>

tjohnson31415 mentioned this pull request Sep 17, 2024

[Bug]: Multistep with n>1 Fails #7968

Closed

1 task

afeldman-nm mentioned this pull request Sep 19, 2024

[Bugfix] Handle best_of>1 by disabling multi-step scheduling; fail if beam search is invoked with multi-step scheduling #8637

Closed

github-actions bot added the stale Over 90 days of inactivity label Oct 25, 2024

github-actions bot closed this Nov 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Handle `best_of>1` case by disabling speculation. #6138

[Bugfix] Handle `best_of>1` case by disabling speculation. #6138

Uh oh!

tdoublep commented Jul 4, 2024 •

edited

Loading

Uh oh!

cadedaniel commented Jul 9, 2024

Uh oh!

cadedaniel Jul 9, 2024

Uh oh!

tdoublep Jul 15, 2024

Uh oh!

cadedaniel Jul 9, 2024

Uh oh!

tdoublep Jul 15, 2024

Uh oh!

tdoublep commented Jul 15, 2024

Uh oh!

github-actions bot commented Oct 25, 2024

Uh oh!

github-actions bot commented Nov 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		assert seqs, "expected running sequences"

		if len(seqs) > 1 or sequence_group.sampling_params.best_of > 1:

Uh oh!

[Bugfix] Handle best_of>1 case by disabling speculation. #6138

[Bugfix] Handle best_of>1 case by disabling speculation. #6138

Uh oh!

Conversation

tdoublep commented Jul 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cadedaniel commented Jul 9, 2024

Uh oh!

cadedaniel Jul 9, 2024

Choose a reason for hiding this comment

Uh oh!

tdoublep Jul 15, 2024

Choose a reason for hiding this comment

Uh oh!

cadedaniel Jul 9, 2024

Choose a reason for hiding this comment

Uh oh!

tdoublep Jul 15, 2024

Choose a reason for hiding this comment

Uh oh!

tdoublep commented Jul 15, 2024

Uh oh!

github-actions bot commented Oct 25, 2024

Uh oh!

github-actions bot commented Nov 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Bugfix] Handle `best_of>1` case by disabling speculation. #6138

[Bugfix] Handle `best_of>1` case by disabling speculation. #6138

tdoublep commented Jul 4, 2024 •

edited

Loading