Skip to content

Conversation

saum7800
Copy link
Contributor

vllm-project/vllm#12622 -- since this commit in vLLM, if you don't pass in a generation-config, it uses whatever it finds in generation_config.json from the model if it exists. if you want to use vllm defaults, you have to explicitly pass in generation_config="vllm". which is what used to happen by default before this commit.

For RL training, we need
repetition_penalty = 1.0
top_p = 1.0
top_k = 0
temperature = 1

Changes to the above changes the logprobs returned from vLLM which we use to calculate losses and gradient updates, which leads to unstable training.

we're setting the default generation config to "vllm" to have the above sampling params, instead of the generation_config.json from the model.

@saum7800 saum7800 requested review from bradhilton and corbt May 23, 2025 20:53
Copy link
Collaborator

@bradhilton bradhilton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@saum7800 saum7800 merged commit 48918e0 into main May 23, 2025
1 check passed
@saum7800 saum7800 deleted the potential_fix branch May 23, 2025 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants