Skip to content

Conversation

tdoublep
Copy link
Member

@tdoublep tdoublep commented Jul 4, 2024

This PR solves #6137 by disabling speculation for batches that contain any request with best_of>1.

This approach ensures that we can handle requests withbest_of>1 without failure, but may have a downside that a single user sending requests with best_of>1 can potentially ruin performance for other users with best_of=1.

An alternative solution could just be to raise on those individual requests and give a message to the user like best_of > 1 is not supported when speculative decoding is enabled.

I can also implement that if preferred. What do you think @cadedaniel @njhill ?

tdoublep added 5 commits July 4, 2024 05:51
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
@cadedaniel
Copy link
Collaborator

Thanks for the fix -- approach looks good to me.

On whether or not we should support this -- for performance we would want to disable this feature or support it natively in spec decode. I am fine having this in, can we log once if this happens so there's a hint of the performance degredation to users?


assert seqs, "expected running sequences"

if len(seqs) > 1 or sequence_group.sampling_params.best_of > 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment here explaining what's going on. also can we explicitly fail if beam search is enabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

re: failing on beam search, I think this is best done before we put the requests in the batch so I've added some code in add_request to that effect.

Comment on lines 359 to 361
for seq_group_metadata in execute_model_req.seq_group_metadata_list:
if seq_group_metadata.sampling_params.best_of > 1:
return True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest moving this to a method so you can add a docstring with more explanation on how this works (e.g. performance tradeoff)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@tdoublep
Copy link
Member Author

I am fine having this in, can we log once if this happens so there's a hint of the performance degredation to users?

I added a warning when we disable speculation due to n>1 or best_of>1.

Copy link

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions bot added the stale Over 90 days of inactivity label Oct 25, 2024
Copy link

This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you!

@github-actions github-actions bot closed this Nov 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Over 90 days of inactivity

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants