Add step 2 sweep script, clean up scripts (deepspeedai#664)

lekurile · LeetJoe · commit ad8f59906b5a · 2023-09-15T22:17:24.000+08:00
This PR adds a step 2 sweeping script in DS Chat and cleans up the existing step 1 and 3 scripts.
diff --git a/applications/DeepSpeed-Chat/training/README.md b/applications/DeepSpeed-Chat/training/README.md
@@ -60,12 +60,13 @@ We are sharing our training logs for all three steps for an OPT-1.3b actor and O
 | 3 | [single_node/run_1.3b.sh](https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/training_scripts/single_node/run_1.3b.sh) | [actor_opt-1.3b_critic_opt-350m_globalBatchSize64.log](https://github.com/microsoft/DeepSpeedExamples/blob/master/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/training_log_output/actor_opt-1.3b_critic_opt-350m_globalBatchSize64.log) |
 
 ### Characterization Scripts
-Scripts for sweeping training across various parameters (Zero Stage, Offload, Lora, etc) are available for Step 1 and 3. These scripts can be further extended to sweep across additional parameters such as learning rate.
+Scripts for sweeping training across various parameters (Zero Stage, Offload, Lora, etc) are available for Step 1, 2, and 3. These scripts can be further extended to sweep across additional parameters such as learning rate.
 
 | Step         | Sweep Script     | README |
 |--------------|-----------|-----------|
-| 1 | [run_step1_opt_sweep.sh](./step1_supervised_finetuning/training_scripts/single_node/sweep/run_step1_opt_sweep.sh) | [README](./step1_supervised_finetuning/training_scripts/single_node/sweep/README.md) |
-| 3 | [run_step3_opt_sweep.sh](./step3_rlhf_finetuning/training_scripts/single_node/sweep/run_step3_opt_sweep.sh) | [README](./step3_rlhf_finetuning/training_scripts/single_node/sweep/README.md) |
+| 1 | [run_step1_sweep.sh](./step1_supervised_finetuning/training_scripts/single_node/sweep/run_step1_sweep.sh) | [README](./step1_supervised_finetuning/training_scripts/single_node/sweep/README.md) |
+| 2 | [run_step2_sweep.sh](./step2_reward_model_finetuning/training_scripts/single_node/sweep/run_step2_sweep.sh) | [README](./step2_reward_model_finetuning/training_scripts/single_node/sweep/README.md) |
+| 3 | [run_step3_sweep.sh](./step3_rlhf_finetuning/training_scripts/single_node/sweep/run_step3_sweep.sh) | [README](./step3_rlhf_finetuning/training_scripts/single_node/sweep/README.md) |
 
 ### Others
 RLHF (Reinforcement Learning for Human Feedback) training is still an open problem, and DeepSpeed-Chat is designed to be a starting point for researchers and practitioners to work on it with an efficient and fast training experience. The Hybrid-Engine and other efficient components, like LoRA, can be inherited from DeepSpeed-Chat, allowing you to develop your own RLHF training pipeline for exploration, research, and other purposes.
diff --git a/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/training_scripts/single_node/sweep/README.md b/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/training_scripts/single_node/sweep/README.md
@@ -5,17 +5,17 @@
    * [Usage](#usage)
 
 # Introduction
-The step 1 characterization script is intented to sweep across various training parameters. Currently, the following are parameters are swept:
+The step 1 characterization script sweeps across various training parameters. Currently, the following parameters are swept:
 <pre>
 Zero Stage: 2, 3
 Offload: True, False
 Lora: True, False
 </pre>
 
-The `run_step1_opt_sweep.sh` script passes configuration arguments to `run_1.3b_lora_swp.sh`, which can be extended to sweep beyond the parameters listed above (learning rate, weight decay, etc).
+The `run_step1_sweep.sh` script passes configuration arguments to `run_single.sh`, which can be extended to sweep beyond the parameters listed above (e.g. learning rate, weight decay, etc).
 
 # Usage
 The sweep script can be run as follows:
 <pre>
-DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning$ bash training_scripts/single_node/sweep/run_step1_opt_sweep.sh
+DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning$ bash training_scripts/single_node/sweep/run_step1_sweep.sh
 </pre>
diff --git a/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/training_scripts/single_node/sweep/run_single.sh b/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/training_scripts/single_node/sweep/run_single.sh
diff --git a/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/training_scripts/single_node/sweep/run_step1_sweep.sh b/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/training_scripts/single_node/sweep/run_step1_sweep.sh
@@ -9,11 +9,11 @@ do
     do
         for lora in true false
         do
-            cmd="bash training_scripts/single_node/sweep/run_1.3b_lora_swp.sh \
+            cmd="bash training_scripts/single_node/sweep/run_single.sh \
                 ${z} \
                 ${offload} \
                 ${lora} \
-                step1_z${z}_offload_${offload}_lora_${lora}"
+                z${z}_offload_${offload}_lora_${lora}"
             echo "----------------------------- CALLING SHELL SCRIPT -----------------------------"
             echo $cmd
             $cmd
diff --git a/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/training_scripts/single_node/sweep/README.md b/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/training_scripts/single_node/sweep/README.md
@@ -0,0 +1,20 @@
+# DeepSpeed Characterization Script
+
+# Contents
+   * [Introduction](#introduction)
+   * [Usage](#usage)
+
+# Introduction
+The step 2 characterization script sweeps across various training parameters. Currently, the following parameters are swept:
+<pre>
+Zero Stage: 2, 3
+Offload: True, False
+</pre>
+
+The `run_step2_sweep.sh` script passes configuration arguments to `run_single.sh`, which can be extended to sweep beyond the parameters listed above (e.g. learning rate, weight decay, etc).
+
+# Usage
+The sweep script can be run as follows:
+<pre>
+DeepSpeedExamples/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning$ bash training_scripts/single_node/sweep/run_step2_sweep.sh
+</pre>
diff --git a/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/training_scripts/single_node/sweep/run_single.sh b/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/training_scripts/single_node/sweep/run_single.sh
@@ -0,0 +1,46 @@
+#!/bin/bash
+# Copyright (c) Microsoft Corporation.
+# SPDX-License-Identifier: Apache-2.0
+
+# DeepSpeed Team
+ZERO_STAGE=$1
+OFFLOAD=$2
+OUTPUT=$3
+if [ "$ZERO_STAGE" == "" ]; then
+    ZERO_STAGE=0
+fi
+if [ "$OFFLOAD" == true ]; then
+    OFFLOAD="--offload"
+else
+    OFFLOAD=""
+fi
+if [ "$OUTPUT" == "" ]; then
+    OUTPUT=./output
+fi
+mkdir -p $OUTPUT
+
+cmd="deepspeed main.py \
+   --data_path Dahoas/rm-static Dahoas/full-hh-rlhf Dahoas/synthetic-instruct-gptj-pairwise yitingxie/rlhf-reward-datasets \
+   --data_split 2,4,4 \
+   --model_name_or_path facebook/opt-350m \
+   --num_padding_at_beginning 1 \
+   --per_device_train_batch_size 4 \
+   --per_device_eval_batch_size 4 \
+   --max_seq_len 512 \
+   --learning_rate 5e-5 \
+   --weight_decay 0.1 \
+   --num_train_epochs 1 \
+   --disable_dropout \
+   --gradient_accumulation_steps 1 \
+   --lr_scheduler_type cosine \
+   --num_warmup_steps 0 \
+   --seed 1234 \
+   --zero_stage $ZERO_STAGE \
+   --deepspeed \
+   --output_dir $OUTPUT \
+   $OFFLOAD"
+
+echo "----------------------------- DS COMMAND -----------------------------"
+echo $cmd
+
+$cmd &> $OUTPUT/${OUTPUT}.log
diff --git a/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/training_scripts/single_node/sweep/run_step2_sweep.sh b/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/training_scripts/single_node/sweep/run_step2_sweep.sh
@@ -0,0 +1,21 @@
+#!/bin/bash
+# Copyright (c) Microsoft Corporation.
+# SPDX-License-Identifier: Apache-2.0
+
+# DeepSpeed Team
+for z in {2..3}
+do
+    for offload in true false
+    do
+        cmd="bash training_scripts/single_node/sweep/run_single.sh \
+            ${z} \
+            ${offload} \
+            z${z}_offload_${offload}"
+        echo "----------------------------- CALLING SHELL SCRIPT -----------------------------"
+        echo $cmd
+        $cmd
+        pkill -9 python
+        sleep 60
+        echo ""
+    done
+done
diff --git a/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/training_scripts/single_node/sweep/README.md b/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/training_scripts/single_node/sweep/README.md
@@ -5,18 +5,18 @@
    * [Usage](#usage)
 
 # Introduction
-The step 3 characterization script is intented to sweep across various training parameters. Currently, the following are parameters are swept:
+The step 3 characterization script sweeps across various training parameters. Currently, the following parameters are swept:
 <pre>
 Zero Stage: 2, 3
 Hybrid Engine: True, False
 Offload: True, False
 Lora: True, False
 </pre>
 
-The `run_step3_opt_sweep.sh` script passes configuration arguments to `run_1.3b_lora_swp.sh`, which can be extended to sweep beyond the parameters listed above (learning rate, weight decay, etc).
+The `run_step3_sweep.sh` script passes configuration arguments to `run_single.sh`, which can be extended to sweep beyond the parameters listed above (e.g. learning rate, weight decay, etc).
 
 # Usage
 The sweep script can be run as follows:
 <pre>
-DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning$ bash training_scripts/single_node/sweep/run_step3_opt_sweep.sh
+DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning$ bash training_scripts/single_node/sweep/run_step3_sweep.sh
 </pre>
diff --git a/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/training_scripts/single_node/sweep/run_single.sh b/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/training_scripts/single_node/sweep/run_single.sh
diff --git a/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/training_scripts/single_node/sweep/run_step3_sweep.sh b/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/training_scripts/single_node/sweep/run_step3_sweep.sh
@@ -14,7 +14,7 @@ do
         do
             for lora in true false
             do
-                cmd="bash training_scripts/single_node/sweep/run_1.3b_lora_swp.sh \
+                cmd="bash training_scripts/single_node/sweep/run_single.sh \
                     $ACTOR_MODEL_PATH \
                     $CRITIC_MODEL_PATH \
                     ${z} \