Commit f7628f4
Add DPO support for DeepSpeed-Chat (deepspeedai#828)
* Add label_smoothing while calculating step2 DPO loss in DeepSpeed-Chat.
* Add training scripts for step2 DPO in DeepSpeed-Chat.
* Remove unused packages and format the code of step2 DPO in DeepSpeed-Chat.
* Update training scripts of step2 DPO in DeepSpeed-Chat.
* Follow upstream fixes.
* Update README.md for Step2 DPO finetuning.
* Add opt 350M training log demo for step 2 dpo finetuning in DeepSpeed-Chat.
* Address the formatting issue in step2 dpo finetuning in DeepSpeed-Chat.
---------
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Signed-off-by: zhangsmallshark <[email protected]>1 parent 0be49e3 commit f7628f4
File tree
12 files changed
+7216
-0
lines changed- applications/DeepSpeed-Chat/training/step2_dpo_finetuning
- training_log_output
- training_scripts
- llama2
- opt
- multi_node
- single_gpu
- single_node
- sweep
12 files changed
+7216
-0
lines changedLines changed: 26 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
0 commit comments