Fix training stability issues with new vLLM version #140

saum7800 · 2025-05-23T18:57:56Z

vllm-project/vllm#12622 -- since this commit in vLLM, if you don't pass in a generation-config, it uses whatever it finds in generation_config.json from the model if it exists. if you want to use vllm defaults, you have to explicitly pass in generation_config="vllm". which is what used to happen by default before this commit.

For RL training, we need
repetition_penalty = 1.0
top_p = 1.0
top_k = 0
temperature = 1

Changes to the above changes the logprobs returned from vLLM which we use to calculate losses and gradient updates, which leads to unstable training.

we're setting the default generation config to "vllm" to have the above sampling params, instead of the generation_config.json from the model.

bradhilton

🚀

Saumya Gandhi added 5 commits May 16, 2025 13:03

potential fix

91c3213

Merge branch 'main' into potential_fix

3acde63

return nothing everything

0a53765

new trial

25677a3

default art to vllm default generation config, instead of model

c01b133

saum7800 requested review from bradhilton and corbt May 23, 2025 20:53

bradhilton approved these changes May 23, 2025

View reviewed changes

saum7800 merged commit 48918e0 into main May 23, 2025
1 check passed

saum7800 deleted the potential_fix branch May 23, 2025 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix training stability issues with new vLLM version #140

Fix training stability issues with new vLLM version #140

Uh oh!

saum7800 commented May 23, 2025

Uh oh!

bradhilton left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix training stability issues with new vLLM version #140

Fix training stability issues with new vLLM version #140

Uh oh!

Conversation

saum7800 commented May 23, 2025

Uh oh!

bradhilton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants