Skip to content

Commit e27d090

Browse files
Embracingloadams
andauthored
Fix typos in README.md (#635)
Co-authored-by: Logan Adams <[email protected]>
1 parent 4d92c92 commit e27d090

File tree

1 file changed

+1
-1
lines changed
  • applications/DeepSpeed-Chat/training/step3_rlhf_finetuning

1 file changed

+1
-1
lines changed

applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ We provide most of unique arguments used in DeepSpeed RLHF other than the previo
4141
| ------------------------------------------------------------------ | -------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
4242
| --unsupervised_dataset_name and --unsupervised_dataset_config_name | Huggingface datasets standard setting to collect the data, e.g., using Wikitext-103 | When both are provided, during each PPO training, we will also add the pretraining objective. Based on InstructGPT, this will enhance the model's benchmark performance. |
4343
| --unsup_coef | Used to balance RLHF/PPO loss and the unsupervised loss | |
44-
| --per_device_train_batch_size and --per_device_mini_batch_size | The first one is the generation batch size and the second one is the PPO training batch size | Usually, the first one needs to be divisbale by the first one. |
44+
| --per_device_train_batch_size and --per_device_mini_batch_size | The first one is the generation batch size and the second one is the PPO training batch size | Usually, the first one needs to be divisible by the second one. |
4545
| --generation_batch_numbers | Generated N batches then do PPO training | This setting is common in RL, i.e., we generate an experiment table then do RL training |
4646
| --ppo_epochs | For the generated experiments, how many PPO epochs we want to perform | |
4747
| --max_prompt_seq_len and --max_answer_seq_len | The length of the query and the length of the answer | |

0 commit comments

Comments
 (0)