generated from fastai/nbdev_template
    
        
        - 
                Notifications
    You must be signed in to change notification settings 
- Fork 2.3k
Fix masking of response tokens #1718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
        
      
    
                
     Merged
            
            
          Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    Current handling of `response_masks` inside `batch_forward_pass` function does not take padding into consideration which results with shape unmatch during masking. Since response mask is a mask tensor of response tokens, response tokens should not be concatenated with a `torch.zeros(query_length)` and masking operation should be done without slicing. Remove the concatenation of the response mask, remove the slicing from the response mask since response mask already has the length of `end - start + 1`, which is equal to length of `masks[j, start:end]`.
| cc @vwxyzjn what do you think? | 
| @mertsayar8 thanks for the fix! I am happy to merge it as is. That said, we recently made a refactor as added a PPOv2Trainer. Feel free to give that a try. | 
| The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. | 
            
                  vwxyzjn
  
            
            approved these changes
            
                
                  Jun 20, 2024 
                
            
            
          
          
    
  qgallouedec 
      added a commit
      that referenced
      this pull request
    
      Jul 15, 2024 
    
    
      
  
    
      
    
  
commit 9e9dc96 Author: Maxim Kopecki <[email protected]> Date: Wed Jul 10 19:11:13 2024 +0200 Added missing token kwarg in Peft model loading (#1825) commit 7ddef5c Author: Quentin Gallouédec <[email protected]> Date: Wed Jul 10 18:26:11 2024 +0200 Make use of `trust_remote_code` consistent (#1806) Co-authored-by: Quentin Gallouédec <[email protected]> commit a9cddf8 Author: Adnan Khan <[email protected]> Date: Wed Jul 10 11:25:07 2024 -0400 Delete unused benchmark.yml workflow. (#1822) commit 2860ce5 Author: Quentin Gallouédec <[email protected]> Date: Tue Jul 9 09:22:52 2024 +0200 DPO Llava 1.5 and PaliGemma support (#1797) * llava support dpo * add_special_tokens=False only when possible * format * pali gemma * refactor size * remove image resize --------- Co-authored-by: Quentin Gallouédec <[email protected]> commit 30e33bd Author: Quentin Gallouédec <[email protected]> Date: Tue Jul 9 05:37:12 2024 +0200 upgrade gh actions (#1818) Co-authored-by: Quentin Gallouédec <[email protected]> commit d5a0d2d Author: Costa Huang <[email protected]> Date: Mon Jul 8 11:12:41 2024 -0400 Set dev version (#1817) commit 314e8eb Author: Puneet Singh Bhooi <[email protected]> Date: Mon Jul 8 19:11:36 2024 +0530 fix broken url in `docs\source\index.mdx` (#1813) commit e107920 Author: Costa Huang <[email protected]> Date: Mon Jul 8 09:38:09 2024 -0400 0.9.6 release (#1816) commit 78045de Author: Alvaro Bartolome <[email protected]> Date: Mon Jul 8 01:59:26 2024 +0200 Fix `TRL_USE_RICH` environment variable handling (#1808) * Add `strtobool` custom implementation from `distutils` * Fix `TRL_USE_RICH` handling via `strtobool` * Run `make precommit` commit 747612f Author: Alvaro Bartolome <[email protected]> Date: Fri Jul 5 16:28:59 2024 +0200 Fix `torch_dtype` handling in `{DPO,SFT}Trainer` when provided via CLI (#1807) * Fix `torch_dtype` handling through CLI The `torch_dtype` is not properly handled when provided via the TRL CLI since it's provided initially as a string, but is then casted to `torch.dtype` before providing it to the `{DPO,SFT}Trainer`, which means that those trainers should handle the scenario where `torch_dtype` is a `torch.dtype` too. * Add `torch_dtype` tests in `test_{dpo,sft}_trainer.py` * Forward contribution credits * Run `make precommit` --------- Co-authored-by: Tash Srivastava <[email protected]> commit 9e3a35b Author: Michael <[email protected]> Date: Fri Jul 5 07:29:48 2024 -0400 Remove extra print in reward_trainer.py (#1799) `print_rich_table` is called twice and the first call doesn't restrict to `num_print_samples`. Remove the first, extra call commit 4402b36 Author: Quentin Gallouédec <[email protected]> Date: Thu Jul 4 14:29:25 2024 +0200 clean examples (#1791) Co-authored-by: Quentin Gallouédec <[email protected]> commit 78f8228 Author: Noah Tye <[email protected]> Date: Wed Jul 3 11:10:50 2024 -0700 Bugfix: Preserve token fields when converting TrainingArguments to SFTConfig (#1794) * Preserve token fields when converting TrainingArguments to SFTConfig TrainingArguments.to_dict() redacts token fields, so we have to individually copy them over when converting to SFTConfig to avoid breaking push_to_hub functionality. Also adds a test. * run precommit * one-line args_as_dict definition per suggestion from kashif * generalize token copying to match TrainingArguments behavior * unwrap |= on dict, to support python 3.8 * use .update instead of |= or for-loop commit b6af2ed Author: Kashif Rasul <[email protected]> Date: Wed Jul 3 08:29:16 2024 +0200 add model_init_kwargs to training_args (#1787) commit cd85b14 Author: Tommaso Buonocore <[email protected]> Date: Sat Jun 29 15:35:48 2024 +0200 Fixed typo in SFT trainer docs (#1788) 'STFConfig' instead of 'SFTConfig' appears multiple times in the doc, causing error when running the code snippets. commit a57544f Author: Kashif Rasul <[email protected]> Date: Thu Jun 27 15:47:58 2024 +0200 fix docs and examples (#1780) commit b68ff96 Author: Quentin Gallouédec <[email protected]> Date: Wed Jun 26 16:26:37 2024 +0200 Visual DPO (#1647) * Remove extra whitespaces * idefics * vdpo * sft idefics * pad with test * use prompt instead of tokenizer * rm name main * support vlm in tokenize row * temp fix for regex in lora_target_module * format * vdpo * tmp float16 hard code * concatenated_forward support for vision * style and new command line * all-linear * format * delete old examples * get image * upcast * new test * modified test * new strat for tokenizer * rm token transfer * integrate vision in dpo example * format * add FDivergenceType back * precommit * pillow test dep * optional prompt * `evaluation_strategy` to `eval_strategy` * revert vsft change (oos) * update test * test * comment and support more in process * update process * update doc for vdpo * caution about limited support * Update docs/source/dpo_trainer.mdx Co-authored-by: Kashif Rasul <[email protected]> * revert DPO example changes * cleaner way to check if a model is vision * comment * update vdpo example * rename --------- Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Kashif Rasul <[email protected]> commit c8c01cc Author: Mubin Manasia <[email protected]> Date: Wed Jun 26 03:23:36 2024 -0600 Fix Documentation Overflow Issues for Long URLs in SFTConfig (#1774) * Update sft_config.py * Update sft_config.py commit 3479606 Author: Costa Huang <[email protected]> Date: Wed Jun 26 03:18:22 2024 -0400 Remove the leading space in the tldr preference dataset (#1773) commit 7965b78 Author: Haozhe Ji <[email protected]> Date: Tue Jun 25 22:47:32 2024 +0800 add Efficient Exact Optimization (EXO) (#1735) * add exo * fix a detail * Update trl/trainer/dpo_trainer.py * Update trl/trainer/dpo_trainer.py * Update trl/trainer/dpo_trainer.py --------- Co-authored-by: Kashif Rasul <[email protected]> commit 56bd1bb Author: Quentin Gallouédec <[email protected]> Date: Tue Jun 25 16:14:26 2024 +0200 `evaluation_strategy` to `eval_strategy` (#1771) Co-authored-by: Quentin Gallouédec <[email protected]> commit 94d53e6 Author: Clara Pohland <[email protected]> Date: Mon Jun 24 21:27:00 2024 +0200 MoE Models: option to add load balancing loss (#1765) * KTO: add aux loss * use router_aux_loss_coef in KtoTrainer when aux_loss enabled * align optional aux_loss in DPO, KTO, CPO, ORPO * precommit changes * fix KL forward kwargs * add aux_loss doku entry * apply docs suggestions --------- Co-authored-by: Clara Luise Pohland <[email protected]> commit b5be100 Author: Mihir Prabhudesai <[email protected]> Date: Mon Jun 24 12:05:44 2024 -0400 Added Reward Backpropogation Support (#1585) * added alignprop template * added alignprop support * Update alignprop_trainer.mdx * Update alignprop_trainer.mdx * added better why statement * fixed inference code * changed self to pipeline * removed aesthetic classifier * added aesthetic to auxiliary models * added unseen prompt logging * removed unseen prompt log * fixed minor * remove not needed import in trl/__init__.py Co-authored-by: Younes Belkada <[email protected]> * fixed styling * updated _toctree --------- Co-authored-by: Younes Belkada <[email protected]> commit 6e1652b Author: Haoran Xu <[email protected]> Date: Sun Jun 23 09:54:30 2024 -0700 Add CPO-SimPO method (#1760) * enable cpo-simpo * highlight SimPO and CPO-SimPO * add test for cpo_alpha * formatting * Update docs/source/cpo_trainer.mdx --------- Co-authored-by: Kashif Rasul <[email protected]> commit 65374c6 Author: Costa Huang <[email protected]> Date: Fri Jun 21 11:20:54 2024 -0400 New sentiment and descriptiveness dataset (#1757) * push changes * handle edge cases where the chosen and the rejected are the same commit 9956091 Author: Juyoung Suk <[email protected]> Date: Fri Jun 21 18:01:08 2024 +0900 Add dataset_text_field in examples/scripts/sft.py (#1758) commit 34d273f Author: Costa Huang <[email protected]> Date: Thu Jun 20 13:16:43 2024 -0400 Support num_train_epochs (#1743) * add a test case for num_train_epochs * fix ci * quick change * disable push to hub * debug windows ci * try another fix * skip subprocess tests on windows commit 3bf9449 Author: Mert Sayar <[email protected]> Date: Thu Jun 20 18:22:20 2024 +0300 Fix masking of response tokens (#1718) Current handling of `response_masks` inside `batch_forward_pass` function does not take padding into consideration which results with shape unmatch during masking. Since response mask is a mask tensor of response tokens, response tokens should not be concatenated with a `torch.zeros(query_length)` and masking operation should be done without slicing. Remove the concatenation of the response mask, remove the slicing from the response mask since response mask already has the length of `end - start + 1`, which is equal to length of `masks[j, start:end]`. commit ba6abee Author: idanshen <[email protected]> Date: Thu Jun 20 09:14:16 2024 -0400 Support for returning past_key_values from the model (#1742) * add support for returning past_key_values from the model * change order of keys commit a57e759 Author: 1485840691 <[email protected]> Date: Wed Jun 19 18:02:51 2024 +0800 Integrate f-divergence to DPO (Follow up) (#1610) * Step 1: update ppo_trainer and hello_world example * Step 2: Refine comments and add parameter type * Step 2: Add missing parameter comments * Step 1: Organize ptx loss into a function and add ptx_loss to train_stats * Step 1 updates: add comment to ptx_loss function, fix a bug and add warning message * Step 2: 1) Add ppo_ptx trainig example as ppo; 2) separate pretrain data fetch and iterate * Step 2: Remove loss from columns_to_log in ppo_ptx example * Remove data set revision in load imbd dataset * Run pre-commit and fix format issues * Initial draft of f-divergence fn * Update f-divergence to avoid overflow * fix test errors and comments * Add Unit tests for dpo loss with alpha and js div f * Adjust format * Fix test error * Reverse this update * Add test cases * Reverse un-needed updates * Update code style * Try to fix code fmt error * remove extra end line --------- Co-authored-by: Kashif Rasul <[email protected]> commit ae23d40 Author: Shihyueh Hsu <[email protected]> Date: Tue Jun 18 22:07:24 2024 +0800 change the `process` function in the example of DPO (#1753) * change the `process` function in the example of DPO * fix commit 83b367b Author: Younes Belkada <[email protected]> Date: Tue Jun 18 11:31:17 2024 +0200 CI / `KTOTrainer`: Remove old tests (#1750) * remove old tests * remove datasets * Update test_dpo_trainer.py * Update test_dpo_trainer.py commit d1ed730 Author: Michael <[email protected]> Date: Mon Jun 17 10:50:21 2024 -0400 prepare deepspeed accomodate fp16 and bf16 (#1728) * prepare deepspeed accomodate fp16 and bf16 * precommit commit 8f8e95e Author: Younes Belkada <[email protected]> Date: Mon Jun 17 16:49:00 2024 +0200 CPO / DPO: Fix red CI (#1749) * fix red CI * precommit commit 4e23d95 Author: Younes Belkada <[email protected]> Date: Mon Jun 17 16:41:36 2024 +0200 fix red CI commit 50c4620 Author: Kawin <[email protected]> Date: Mon Jun 17 07:14:44 2024 -0700 small KTO fixes (#1734) * add warning for imbalanced data * update documentation * update script commands to be same as in dpo * use batch_size KL examples and batch_size target examples to calculate batch_size losses * fix deepspeed issue * speed up forward with no_grad for KL * add some removed metrics * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py add reference to paper Co-authored-by: lewtun <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * add more detailed comments * convert assert to ValueError * Update kto_trainer.py * precommit formatting * remove nans in metrics by gathering across machines * fix formatting * fix choice of mismatched examples for KL term * describe weights * fix hanging issue in distributed training * linting * move metrics to cpu * Update trl/trainer/kto_trainer.py Co-authored-by: lewtun <[email protected]> * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py * remove kto_pair * speed up data processing * move bco code inside * raise error for kto_pair argument * fix formatting --------- Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: lewtun <[email protected]> Co-authored-by: Winnie Xu <[email protected]> commit 6105d03 Author: Younes Belkada <[email protected]> Date: Mon Jun 17 16:01:06 2024 +0200 `TrlParser`: Add ignore extra args option (#1748) * add ignore extra args option * Update trl/commands/cli_utils.py commit e247bbd Author: Younes Belkada <[email protected]> Date: Mon Jun 17 15:16:07 2024 +0200 CI / core: Pin `numpy` to `!=2.0.0` for CI and to users (#1747) * Update setup.py * Update setup.py * Update setup.py * Update test_best_of_n_sampler.py dummy commit * pin numpy * Update tests/test_best_of_n_sampler.py * Update setup.py commit 3d04496 Author: Michael <[email protected]> Date: Mon Jun 17 08:43:33 2024 -0400 better trl parser with yaml config (#1739) * working trl parser with config correctly overrides yaml config with command line arguments adds return_remaining_strings when return_remaining_strings is False, raises error if yaml contains extra args that are not in the dataclasses simpler and cleaner than previous yaml parsing and merging addresses #1733 * lowercase trlparser commit 2d244f8 Author: Younes Belkada <[email protected]> Date: Mon Jun 17 11:56:13 2024 +0200 Workflow: Notify tests results on slack channel (#1744) * Update tests-main.yml * Update docker-build.yml commit f5168fd Author: Igor Melnyk <[email protected]> Date: Wed Jun 12 05:54:54 2024 -0400 adds AOT (#1701) * adds AOT * Applied format changes * added docs and tests --------- Co-authored-by: Igor Melnyk <[email protected]> commit 79686e1 Author: jetlime <[email protected]> Date: Wed Jun 12 00:35:31 2024 +1000 ktotrainer: Refuse datasets which contain only one class of labels (#1724) * ktotrainer: refuse dataset which contain only one class of labels * ktotrainer: document new dataset constraint commit 34ebc4c Author: Luc Georges <[email protected]> Date: Mon Jun 10 11:17:54 2024 +0200 feat(ci): add trufflehog secrets detection (#1721) * feat(ci): add trufflehog secrets detection * fix(ci): remove unnecessary permissions commit 1d84e2b Author: Michael <[email protected]> Date: Fri Jun 7 11:42:08 2024 +0200 Fix default padding_value in dpo_config.py (#1692) dpo_config default padding value should be None, not 0, otherwise it by default overrides the padding value of any tokenizer to 0 commit 2f71b8b Author: Michael <[email protected]> Date: Fri Jun 7 10:37:27 2024 +0200 fix yaml parser for derived config classes (#1713) fixes #1712 reformatted cli_utils with ruff commit 5bcb8ad Author: Kashif Rasul <[email protected]> Date: Fri Jun 7 08:48:17 2024 +0100 RDPO fix nll loss (#1705) commit b8b972f Author: Haoran Xu <[email protected]> Date: Thu Jun 6 14:06:47 2024 -0700 Add a variant of CPO, SimPO (#1703) * add a variant of cpo: simpo * correct cpo-simpo loss * avoid 0 int error in logging * add simpo description * Update trl/trainer/cpo_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * fix formatting * add test for simpo * Update docs/source/cpo_trainer.mdx Co-authored-by: Kashif Rasul <[email protected]> * add a docstring for simpogamma * move simpo description to the above docstring * change simpo description in the doc * formatting --------- Co-authored-by: Kashif Rasul <[email protected]> commit 3eb9ccb Author: Younes Belkada <[email protected]> Date: Thu Jun 6 19:33:20 2024 +0200 set dev version (#1710) * Update setup.py * Update __init__.py commit 974b0d3 Author: Costa Huang <[email protected]> Date: Thu Jun 6 10:13:00 2024 -0400 0.9.4 release (#1708) commit 39a7d1c Author: Younes Belkada <[email protected]> Date: Thu Jun 6 15:50:17 2024 +0200 SFTTrainer: Fix backward Compatibility issue with `TrainingArguments` (#1707) * fix BC * fixup commit 0bdc638 Author: Guilherme Freire <[email protected]> Date: Thu Jun 6 14:42:58 2024 +0100 Fixed doc string and docs for the SFTConfig update (#1706) commit 275d33b Author: Costa Huang <[email protected]> Date: Wed Jun 5 14:34:59 2024 -0400 0.9.3 release (#1699) commit c0819ee Author: Younes Belkada <[email protected]> Date: Wed Jun 5 17:29:03 2024 +0200 Update sft_trainer.py (#1698) commit a03e7cc Author: Costa Huang <[email protected]> Date: Wed Jun 5 11:00:19 2024 -0400 Release 0.9.2 (#1697) * Release: 0.9.0 * Release commit a13cb89 Author: Costa Huang <[email protected]> Date: Wed Jun 5 10:20:54 2024 -0400 Quick fix on GPT4-eval (#1696) * quick fix * precommit commit 84156f1 Author: Quentin Gallouédec <[email protected]> Date: Mon Jun 3 20:09:05 2024 +0200 Fix typo in DPOTrainer's warnings (#1688) commit 4eb0b90 Author: Alex Brooks <[email protected]> Date: Mon Jun 3 10:24:32 2024 -0600 Skip packing validation (#1673) * Add test for skipping preproc if packing=True Signed-off-by: Alex-Brooks <[email protected]> * Allow skipping of validation for packing=True Signed-off-by: Alex-Brooks <[email protected]> * Use dummy dataset in no packing preproc test Signed-off-by: Alex-Brooks <[email protected]> --------- Signed-off-by: Alex-Brooks <[email protected]> commit 6c203f9 Author: Alexey Rozhkov <[email protected]> Date: Mon Jun 3 10:16:22 2024 +0100 Fix overriding optimize_device_cache with optimize_cuda_cache in PPOConfig (#1690) * Don't override optimize_device_cache when optimize_cuda_cache is not provided Raise an exception when both optimize_cuda_cache and optimize_device_cache are set * Minor fix commit f18253b Author: Kashif Rasul <[email protected]> Date: Mon Jun 3 09:43:02 2024 +0100 intial RPO loss (#1686) * intial RPO loss * fix sign * clean up commit 151a452 Author: Samuel <[email protected]> Date: Wed May 29 20:29:38 2024 +0200 Fix max completion length (#1588) commit 488b502 Author: Younes Belkada <[email protected]> Date: Wed May 29 20:19:26 2024 +0200 fix (#1678) commit 3c0a10b Author: Wang, Yi <[email protected]> Date: Mon May 27 20:52:20 2024 +0800 fix dataset load error (#1670) Signed-off-by: Wang, Yi <[email protected]> commit b031adf Author: Younes Belkada <[email protected]> Date: Fri May 24 15:20:16 2024 +0200 FIX / PPO: Fix `enable_input_require_grads` issues with PPO models (#1664) * Update modeling_base.py * Update ppo_config.py * Update ppo_trainer.py * style commit e7cb597 Author: Costa Huang <[email protected]> Date: Thu May 23 11:37:16 2024 -0400 Fix ppov2 test case (#1661) * Fix PPOv2 / RLOO refactor's stuff * update terminology to use stop token commit bc8dfbf Author: Kashif Rasul <[email protected]> Date: Thu May 23 15:28:04 2024 +0200 update eval_strategy (#1662) commit e4ed7a3 Author: Sourab Mangrulkar <[email protected]> Date: Thu May 23 18:34:22 2024 +0530 do not upcast adapters when using FSDP+QLoRA (#1654) commit 9a7efbd Author: syrn1k <[email protected]> Date: Thu May 23 15:58:49 2024 +0300 🤫 TR-DPO implementation (#1593) * 🤫 TR-DPO implementation baseline * fix comments * docs * fix linters * test added * move configs to DPOConfig * fix typo * add docs * fix import * use state.global_step * fix order of arguments * make sure plugins are not none * Update trl/trainer/utils.py Co-authored-by: Benjamin Bossan <[email protected]> * Update trl/trainer/utils.py Co-authored-by: Benjamin Bossan <[email protected]> * checking that reference model weights have changed * sync_target_model as staticmethod * set reference model --------- Co-authored-by: Nikita Surnachev <[email protected]> Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: Benjamin Bossan <[email protected]> commit b344bce Author: Anush Kini <[email protected]> Date: Thu May 23 18:27:25 2024 +0530 [DPO] Add 'robust' loss_type (#1653) * Initial commit * pre-commit fix * Minor change to comments * Added some documentation on how to use Robust DPO commit 35e12dc Author: Nicolinho <[email protected]> Date: Thu May 23 14:36:15 2024 +0200 Fix inheritance order in PPOv2Config (#1659) * fix inheritance order in PPOv2Config * fix inheritance order in rloo_config commit 1da6be1 Author: Ali Bakly <[email protected]> Date: Thu May 23 14:10:29 2024 +0200 docs: correct cDPO usage in DPOTrainer (#1655) commit e249cd8 Author: Younes Belkada <[email protected]> Date: Thu May 23 14:10:05 2024 +0200 add support for training collator (#1658) commit a02513c Author: Zach Mueller <[email protected]> Date: Thu May 23 06:48:00 2024 -0400 Apply deprecated `evaluation_strategy` (#1559) * Deprecate * Update tests/test_dpo_trainer.py --------- Co-authored-by: Kashif Rasul <[email protected]> commit 13454d2 Author: Costa Huang <[email protected]> Date: Wed May 22 08:31:10 2024 -0400 PPO / Reinforce Trainers (#1540) * Add ppov2 trainer * make eos trick optional, remove unused args * quick fix * precommit * update debugging script * fix out of bound `drop_last=True`; use built-in scheduler * Add PPO examples * push changes * quick change * quick change * various bug fixes * remove unnecessary grad accumulation setting * push new changes * fix DS3 model saving * update ppo.py * refactor * quick change * refactor * update ppo trainer * refactor * quick test * add ds2 /ds3 7 processes config * add vllm trainer * quick change * experiment with reward normalization * push changes * quick push * push changes * push various changes * refactor to use ModelConfig * quick change * refactor * refactor * Simplify DS logic * quick update * remove unnecessary files * precommit * deepspeed fix; handle edge case when eos_token_id = 0 * add PPO tldr example * add TL;DR example * fix undefined var * utilize all samples in rloo * quick setting * remove the unnecessary `value_model` * use exact_div * allow saving the deepspeed model * refactor * remove dead code * Use some shared utilities * add some end-to-end test cases * add PPOv2 docs and RLOO docs / tests * update docs * quikc push * fix ci * fix type annotation for ci * quick update * update trainer docs commit 99f2c94 Author: Sourab Mangrulkar <[email protected]> Date: Wed May 15 19:55:46 2024 +0530 don't cast the trainable lora layers to half precision (#1644) * don't cast the trainable lora layers to half precision * quality commit 6401d08 Author: Wing Lian <[email protected]> Date: Tue May 14 09:41:07 2024 -0400 Pairwise Noise Contrastive Alignment (#1632) * add NCA paired preference loss * chore: lint * set more lenient tolerance for integration tests * Update tests/test_dpo_trainer.py * skip test * fix --------- Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: younesbelkada <[email protected]> commit d632a5b Author: bartoszzuk <[email protected]> Date: Tue May 14 12:25:54 2024 +0200 Fixed wrong logs prefixes in KTOTrainer (#1641) * Fixed wrong logs prefixes in KTOTrainer * Pre-commit formating commit 5aeb752 Author: Tiezhen WANG <[email protected]> Date: Fri May 10 23:19:15 2024 +0800 Update sft_llama2.py to work with the latest API (#1637) * Update sft_llama2.py to work with the latest API SFTTrainer now takes a STFConfig argument * Update dpo_llama2.py * precommit commit b8b8978 Author: Ilya Gusev <[email protected]> Date: Fri May 10 15:43:13 2024 +0200 [ORPO] Correct label mask for pad tokens (#1625) * [ORPO] Correct label mask for pad tokens Recent [fix](57aebe9) for calculating NLL loss for a whole sequence introduced a bug. When input_ids are copied to labels, pad tokens are not masked. This PR aims to path this by masking labels based on the attention mask. * -100 -> label_pad_token_id Co-authored-by: Kashif Rasul <[email protected]> --------- Co-authored-by: Kashif Rasul <[email protected]> commit 8799952 Author: Costa Huang <[email protected]> Date: Fri May 10 09:32:20 2024 -0400 visualize rm prediction (#1636) * visualize rm prediction * quick update * quick check * quick fix * update eval steps commit 3b4c249 Author: Xiao Yu <[email protected]> Date: Fri May 3 18:19:35 2024 -0400 fixed adding bos and eos token unconditionally (#1591) * fixed adding bos and eos token unconditionally * fixed typo of tokenizer -> self.tokenizer. Also added update to ORPO * fixed code quality, and added BOS/EOS fix to KTO * code reformatting with pre-commit run --all-files * bug fix: check input id length before checking for EOS/BOS commit 0347f58 Author: lewtun <[email protected]> Date: Fri May 3 15:59:59 2024 +0200 Fix ZeRO-3 generation context manager (#1617)
    
  qgallouedec 
      added a commit
      that referenced
      this pull request
    
      Jul 18, 2024 
    
    
      
  
    
      
    
  
* Add WinRateCallback * Enable PairRM * Refactor * Streamline * Add HF judge * Add base judge * Use better prompt * Clean * Add max tokens * Use logging * Add batched inference * Squashed commit of the following: commit 9e9dc96 Author: Maxim Kopecki <[email protected]> Date: Wed Jul 10 19:11:13 2024 +0200 Added missing token kwarg in Peft model loading (#1825) commit 7ddef5c Author: Quentin Gallouédec <[email protected]> Date: Wed Jul 10 18:26:11 2024 +0200 Make use of `trust_remote_code` consistent (#1806) Co-authored-by: Quentin Gallouédec <[email protected]> commit a9cddf8 Author: Adnan Khan <[email protected]> Date: Wed Jul 10 11:25:07 2024 -0400 Delete unused benchmark.yml workflow. (#1822) commit 2860ce5 Author: Quentin Gallouédec <[email protected]> Date: Tue Jul 9 09:22:52 2024 +0200 DPO Llava 1.5 and PaliGemma support (#1797) * llava support dpo * add_special_tokens=False only when possible * format * pali gemma * refactor size * remove image resize --------- Co-authored-by: Quentin Gallouédec <[email protected]> commit 30e33bd Author: Quentin Gallouédec <[email protected]> Date: Tue Jul 9 05:37:12 2024 +0200 upgrade gh actions (#1818) Co-authored-by: Quentin Gallouédec <[email protected]> commit d5a0d2d Author: Costa Huang <[email protected]> Date: Mon Jul 8 11:12:41 2024 -0400 Set dev version (#1817) commit 314e8eb Author: Puneet Singh Bhooi <[email protected]> Date: Mon Jul 8 19:11:36 2024 +0530 fix broken url in `docs\source\index.mdx` (#1813) commit e107920 Author: Costa Huang <[email protected]> Date: Mon Jul 8 09:38:09 2024 -0400 0.9.6 release (#1816) commit 78045de Author: Alvaro Bartolome <[email protected]> Date: Mon Jul 8 01:59:26 2024 +0200 Fix `TRL_USE_RICH` environment variable handling (#1808) * Add `strtobool` custom implementation from `distutils` * Fix `TRL_USE_RICH` handling via `strtobool` * Run `make precommit` commit 747612f Author: Alvaro Bartolome <[email protected]> Date: Fri Jul 5 16:28:59 2024 +0200 Fix `torch_dtype` handling in `{DPO,SFT}Trainer` when provided via CLI (#1807) * Fix `torch_dtype` handling through CLI The `torch_dtype` is not properly handled when provided via the TRL CLI since it's provided initially as a string, but is then casted to `torch.dtype` before providing it to the `{DPO,SFT}Trainer`, which means that those trainers should handle the scenario where `torch_dtype` is a `torch.dtype` too. * Add `torch_dtype` tests in `test_{dpo,sft}_trainer.py` * Forward contribution credits * Run `make precommit` --------- Co-authored-by: Tash Srivastava <[email protected]> commit 9e3a35b Author: Michael <[email protected]> Date: Fri Jul 5 07:29:48 2024 -0400 Remove extra print in reward_trainer.py (#1799) `print_rich_table` is called twice and the first call doesn't restrict to `num_print_samples`. Remove the first, extra call commit 4402b36 Author: Quentin Gallouédec <[email protected]> Date: Thu Jul 4 14:29:25 2024 +0200 clean examples (#1791) Co-authored-by: Quentin Gallouédec <[email protected]> commit 78f8228 Author: Noah Tye <[email protected]> Date: Wed Jul 3 11:10:50 2024 -0700 Bugfix: Preserve token fields when converting TrainingArguments to SFTConfig (#1794) * Preserve token fields when converting TrainingArguments to SFTConfig TrainingArguments.to_dict() redacts token fields, so we have to individually copy them over when converting to SFTConfig to avoid breaking push_to_hub functionality. Also adds a test. * run precommit * one-line args_as_dict definition per suggestion from kashif * generalize token copying to match TrainingArguments behavior * unwrap |= on dict, to support python 3.8 * use .update instead of |= or for-loop commit b6af2ed Author: Kashif Rasul <[email protected]> Date: Wed Jul 3 08:29:16 2024 +0200 add model_init_kwargs to training_args (#1787) commit cd85b14 Author: Tommaso Buonocore <[email protected]> Date: Sat Jun 29 15:35:48 2024 +0200 Fixed typo in SFT trainer docs (#1788) 'STFConfig' instead of 'SFTConfig' appears multiple times in the doc, causing error when running the code snippets. commit a57544f Author: Kashif Rasul <[email protected]> Date: Thu Jun 27 15:47:58 2024 +0200 fix docs and examples (#1780) commit b68ff96 Author: Quentin Gallouédec <[email protected]> Date: Wed Jun 26 16:26:37 2024 +0200 Visual DPO (#1647) * Remove extra whitespaces * idefics * vdpo * sft idefics * pad with test * use prompt instead of tokenizer * rm name main * support vlm in tokenize row * temp fix for regex in lora_target_module * format * vdpo * tmp float16 hard code * concatenated_forward support for vision * style and new command line * all-linear * format * delete old examples * get image * upcast * new test * modified test * new strat for tokenizer * rm token transfer * integrate vision in dpo example * format * add FDivergenceType back * precommit * pillow test dep * optional prompt * `evaluation_strategy` to `eval_strategy` * revert vsft change (oos) * update test * test * comment and support more in process * update process * update doc for vdpo * caution about limited support * Update docs/source/dpo_trainer.mdx Co-authored-by: Kashif Rasul <[email protected]> * revert DPO example changes * cleaner way to check if a model is vision * comment * update vdpo example * rename --------- Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Kashif Rasul <[email protected]> commit c8c01cc Author: Mubin Manasia <[email protected]> Date: Wed Jun 26 03:23:36 2024 -0600 Fix Documentation Overflow Issues for Long URLs in SFTConfig (#1774) * Update sft_config.py * Update sft_config.py commit 3479606 Author: Costa Huang <[email protected]> Date: Wed Jun 26 03:18:22 2024 -0400 Remove the leading space in the tldr preference dataset (#1773) commit 7965b78 Author: Haozhe Ji <[email protected]> Date: Tue Jun 25 22:47:32 2024 +0800 add Efficient Exact Optimization (EXO) (#1735) * add exo * fix a detail * Update trl/trainer/dpo_trainer.py * Update trl/trainer/dpo_trainer.py * Update trl/trainer/dpo_trainer.py --------- Co-authored-by: Kashif Rasul <[email protected]> commit 56bd1bb Author: Quentin Gallouédec <[email protected]> Date: Tue Jun 25 16:14:26 2024 +0200 `evaluation_strategy` to `eval_strategy` (#1771) Co-authored-by: Quentin Gallouédec <[email protected]> commit 94d53e6 Author: Clara Pohland <[email protected]> Date: Mon Jun 24 21:27:00 2024 +0200 MoE Models: option to add load balancing loss (#1765) * KTO: add aux loss * use router_aux_loss_coef in KtoTrainer when aux_loss enabled * align optional aux_loss in DPO, KTO, CPO, ORPO * precommit changes * fix KL forward kwargs * add aux_loss doku entry * apply docs suggestions --------- Co-authored-by: Clara Luise Pohland <[email protected]> commit b5be100 Author: Mihir Prabhudesai <[email protected]> Date: Mon Jun 24 12:05:44 2024 -0400 Added Reward Backpropogation Support (#1585) * added alignprop template * added alignprop support * Update alignprop_trainer.mdx * Update alignprop_trainer.mdx * added better why statement * fixed inference code * changed self to pipeline * removed aesthetic classifier * added aesthetic to auxiliary models * added unseen prompt logging * removed unseen prompt log * fixed minor * remove not needed import in trl/__init__.py Co-authored-by: Younes Belkada <[email protected]> * fixed styling * updated _toctree --------- Co-authored-by: Younes Belkada <[email protected]> commit 6e1652b Author: Haoran Xu <[email protected]> Date: Sun Jun 23 09:54:30 2024 -0700 Add CPO-SimPO method (#1760) * enable cpo-simpo * highlight SimPO and CPO-SimPO * add test for cpo_alpha * formatting * Update docs/source/cpo_trainer.mdx --------- Co-authored-by: Kashif Rasul <[email protected]> commit 65374c6 Author: Costa Huang <[email protected]> Date: Fri Jun 21 11:20:54 2024 -0400 New sentiment and descriptiveness dataset (#1757) * push changes * handle edge cases where the chosen and the rejected are the same commit 9956091 Author: Juyoung Suk <[email protected]> Date: Fri Jun 21 18:01:08 2024 +0900 Add dataset_text_field in examples/scripts/sft.py (#1758) commit 34d273f Author: Costa Huang <[email protected]> Date: Thu Jun 20 13:16:43 2024 -0400 Support num_train_epochs (#1743) * add a test case for num_train_epochs * fix ci * quick change * disable push to hub * debug windows ci * try another fix * skip subprocess tests on windows commit 3bf9449 Author: Mert Sayar <[email protected]> Date: Thu Jun 20 18:22:20 2024 +0300 Fix masking of response tokens (#1718) Current handling of `response_masks` inside `batch_forward_pass` function does not take padding into consideration which results with shape unmatch during masking. Since response mask is a mask tensor of response tokens, response tokens should not be concatenated with a `torch.zeros(query_length)` and masking operation should be done without slicing. Remove the concatenation of the response mask, remove the slicing from the response mask since response mask already has the length of `end - start + 1`, which is equal to length of `masks[j, start:end]`. commit ba6abee Author: idanshen <[email protected]> Date: Thu Jun 20 09:14:16 2024 -0400 Support for returning past_key_values from the model (#1742) * add support for returning past_key_values from the model * change order of keys commit a57e759 Author: 1485840691 <[email protected]> Date: Wed Jun 19 18:02:51 2024 +0800 Integrate f-divergence to DPO (Follow up) (#1610) * Step 1: update ppo_trainer and hello_world example * Step 2: Refine comments and add parameter type * Step 2: Add missing parameter comments * Step 1: Organize ptx loss into a function and add ptx_loss to train_stats * Step 1 updates: add comment to ptx_loss function, fix a bug and add warning message * Step 2: 1) Add ppo_ptx trainig example as ppo; 2) separate pretrain data fetch and iterate * Step 2: Remove loss from columns_to_log in ppo_ptx example * Remove data set revision in load imbd dataset * Run pre-commit and fix format issues * Initial draft of f-divergence fn * Update f-divergence to avoid overflow * fix test errors and comments * Add Unit tests for dpo loss with alpha and js div f * Adjust format * Fix test error * Reverse this update * Add test cases * Reverse un-needed updates * Update code style * Try to fix code fmt error * remove extra end line --------- Co-authored-by: Kashif Rasul <[email protected]> commit ae23d40 Author: Shihyueh Hsu <[email protected]> Date: Tue Jun 18 22:07:24 2024 +0800 change the `process` function in the example of DPO (#1753) * change the `process` function in the example of DPO * fix commit 83b367b Author: Younes Belkada <[email protected]> Date: Tue Jun 18 11:31:17 2024 +0200 CI / `KTOTrainer`: Remove old tests (#1750) * remove old tests * remove datasets * Update test_dpo_trainer.py * Update test_dpo_trainer.py commit d1ed730 Author: Michael <[email protected]> Date: Mon Jun 17 10:50:21 2024 -0400 prepare deepspeed accomodate fp16 and bf16 (#1728) * prepare deepspeed accomodate fp16 and bf16 * precommit commit 8f8e95e Author: Younes Belkada <[email protected]> Date: Mon Jun 17 16:49:00 2024 +0200 CPO / DPO: Fix red CI (#1749) * fix red CI * precommit commit 4e23d95 Author: Younes Belkada <[email protected]> Date: Mon Jun 17 16:41:36 2024 +0200 fix red CI commit 50c4620 Author: Kawin <[email protected]> Date: Mon Jun 17 07:14:44 2024 -0700 small KTO fixes (#1734) * add warning for imbalanced data * update documentation * update script commands to be same as in dpo * use batch_size KL examples and batch_size target examples to calculate batch_size losses * fix deepspeed issue * speed up forward with no_grad for KL * add some removed metrics * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py add reference to paper Co-authored-by: lewtun <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * add more detailed comments * convert assert to ValueError * Update kto_trainer.py * precommit formatting * remove nans in metrics by gathering across machines * fix formatting * fix choice of mismatched examples for KL term * describe weights * fix hanging issue in distributed training * linting * move metrics to cpu * Update trl/trainer/kto_trainer.py Co-authored-by: lewtun <[email protected]> * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py * remove kto_pair * speed up data processing * move bco code inside * raise error for kto_pair argument * fix formatting --------- Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: lewtun <[email protected]> Co-authored-by: Winnie Xu <[email protected]> commit 6105d03 Author: Younes Belkada <[email protected]> Date: Mon Jun 17 16:01:06 2024 +0200 `TrlParser`: Add ignore extra args option (#1748) * add ignore extra args option * Update trl/commands/cli_utils.py commit e247bbd Author: Younes Belkada <[email protected]> Date: Mon Jun 17 15:16:07 2024 +0200 CI / core: Pin `numpy` to `!=2.0.0` for CI and to users (#1747) * Update setup.py * Update setup.py * Update setup.py * Update test_best_of_n_sampler.py dummy commit * pin numpy * Update tests/test_best_of_n_sampler.py * Update setup.py commit 3d04496 Author: Michael <[email protected]> Date: Mon Jun 17 08:43:33 2024 -0400 better trl parser with yaml config (#1739) * working trl parser with config correctly overrides yaml config with command line arguments adds return_remaining_strings when return_remaining_strings is False, raises error if yaml contains extra args that are not in the dataclasses simpler and cleaner than previous yaml parsing and merging addresses #1733 * lowercase trlparser commit 2d244f8 Author: Younes Belkada <[email protected]> Date: Mon Jun 17 11:56:13 2024 +0200 Workflow: Notify tests results on slack channel (#1744) * Update tests-main.yml * Update docker-build.yml commit f5168fd Author: Igor Melnyk <[email protected]> Date: Wed Jun 12 05:54:54 2024 -0400 adds AOT (#1701) * adds AOT * Applied format changes * added docs and tests --------- Co-authored-by: Igor Melnyk <[email protected]> commit 79686e1 Author: jetlime <[email protected]> Date: Wed Jun 12 00:35:31 2024 +1000 ktotrainer: Refuse datasets which contain only one class of labels (#1724) * ktotrainer: refuse dataset which contain only one class of labels * ktotrainer: document new dataset constraint commit 34ebc4c Author: Luc Georges <[email protected]> Date: Mon Jun 10 11:17:54 2024 +0200 feat(ci): add trufflehog secrets detection (#1721) * feat(ci): add trufflehog secrets detection * fix(ci): remove unnecessary permissions commit 1d84e2b Author: Michael <[email protected]> Date: Fri Jun 7 11:42:08 2024 +0200 Fix default padding_value in dpo_config.py (#1692) dpo_config default padding value should be None, not 0, otherwise it by default overrides the padding value of any tokenizer to 0 commit 2f71b8b Author: Michael <[email protected]> Date: Fri Jun 7 10:37:27 2024 +0200 fix yaml parser for derived config classes (#1713) fixes #1712 reformatted cli_utils with ruff commit 5bcb8ad Author: Kashif Rasul <[email protected]> Date: Fri Jun 7 08:48:17 2024 +0100 RDPO fix nll loss (#1705) commit b8b972f Author: Haoran Xu <[email protected]> Date: Thu Jun 6 14:06:47 2024 -0700 Add a variant of CPO, SimPO (#1703) * add a variant of cpo: simpo * correct cpo-simpo loss * avoid 0 int error in logging * add simpo description * Update trl/trainer/cpo_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * fix formatting * add test for simpo * Update docs/source/cpo_trainer.mdx Co-authored-by: Kashif Rasul <[email protected]> * add a docstring for simpogamma * move simpo description to the above docstring * change simpo description in the doc * formatting --------- Co-authored-by: Kashif Rasul <[email protected]> commit 3eb9ccb Author: Younes Belkada <[email protected]> Date: Thu Jun 6 19:33:20 2024 +0200 set dev version (#1710) * Update setup.py * Update __init__.py commit 974b0d3 Author: Costa Huang <[email protected]> Date: Thu Jun 6 10:13:00 2024 -0400 0.9.4 release (#1708) commit 39a7d1c Author: Younes Belkada <[email protected]> Date: Thu Jun 6 15:50:17 2024 +0200 SFTTrainer: Fix backward Compatibility issue with `TrainingArguments` (#1707) * fix BC * fixup commit 0bdc638 Author: Guilherme Freire <[email protected]> Date: Thu Jun 6 14:42:58 2024 +0100 Fixed doc string and docs for the SFTConfig update (#1706) commit 275d33b Author: Costa Huang <[email protected]> Date: Wed Jun 5 14:34:59 2024 -0400 0.9.3 release (#1699) commit c0819ee Author: Younes Belkada <[email protected]> Date: Wed Jun 5 17:29:03 2024 +0200 Update sft_trainer.py (#1698) commit a03e7cc Author: Costa Huang <[email protected]> Date: Wed Jun 5 11:00:19 2024 -0400 Release 0.9.2 (#1697) * Release: 0.9.0 * Release commit a13cb89 Author: Costa Huang <[email protected]> Date: Wed Jun 5 10:20:54 2024 -0400 Quick fix on GPT4-eval (#1696) * quick fix * precommit commit 84156f1 Author: Quentin Gallouédec <[email protected]> Date: Mon Jun 3 20:09:05 2024 +0200 Fix typo in DPOTrainer's warnings (#1688) commit 4eb0b90 Author: Alex Brooks <[email protected]> Date: Mon Jun 3 10:24:32 2024 -0600 Skip packing validation (#1673) * Add test for skipping preproc if packing=True Signed-off-by: Alex-Brooks <[email protected]> * Allow skipping of validation for packing=True Signed-off-by: Alex-Brooks <[email protected]> * Use dummy dataset in no packing preproc test Signed-off-by: Alex-Brooks <[email protected]> --------- Signed-off-by: Alex-Brooks <[email protected]> commit 6c203f9 Author: Alexey Rozhkov <[email protected]> Date: Mon Jun 3 10:16:22 2024 +0100 Fix overriding optimize_device_cache with optimize_cuda_cache in PPOConfig (#1690) * Don't override optimize_device_cache when optimize_cuda_cache is not provided Raise an exception when both optimize_cuda_cache and optimize_device_cache are set * Minor fix commit f18253b Author: Kashif Rasul <[email protected]> Date: Mon Jun 3 09:43:02 2024 +0100 intial RPO loss (#1686) * intial RPO loss * fix sign * clean up commit 151a452 Author: Samuel <[email protected]> Date: Wed May 29 20:29:38 2024 +0200 Fix max completion length (#1588) commit 488b502 Author: Younes Belkada <[email protected]> Date: Wed May 29 20:19:26 2024 +0200 fix (#1678) commit 3c0a10b Author: Wang, Yi <[email protected]> Date: Mon May 27 20:52:20 2024 +0800 fix dataset load error (#1670) Signed-off-by: Wang, Yi <[email protected]> commit b031adf Author: Younes Belkada <[email protected]> Date: Fri May 24 15:20:16 2024 +0200 FIX / PPO: Fix `enable_input_require_grads` issues with PPO models (#1664) * Update modeling_base.py * Update ppo_config.py * Update ppo_trainer.py * style commit e7cb597 Author: Costa Huang <[email protected]> Date: Thu May 23 11:37:16 2024 -0400 Fix ppov2 test case (#1661) * Fix PPOv2 / RLOO refactor's stuff * update terminology to use stop token commit bc8dfbf Author: Kashif Rasul <[email protected]> Date: Thu May 23 15:28:04 2024 +0200 update eval_strategy (#1662) commit e4ed7a3 Author: Sourab Mangrulkar <[email protected]> Date: Thu May 23 18:34:22 2024 +0530 do not upcast adapters when using FSDP+QLoRA (#1654) commit 9a7efbd Author: syrn1k <[email protected]> Date: Thu May 23 15:58:49 2024 +0300 🤫 TR-DPO implementation (#1593) * 🤫 TR-DPO implementation baseline * fix comments * docs * fix linters * test added * move configs to DPOConfig * fix typo * add docs * fix import * use state.global_step * fix order of arguments * make sure plugins are not none * Update trl/trainer/utils.py Co-authored-by: Benjamin Bossan <[email protected]> * Update trl/trainer/utils.py Co-authored-by: Benjamin Bossan <[email protected]> * checking that reference model weights have changed * sync_target_model as staticmethod * set reference model --------- Co-authored-by: Nikita Surnachev <[email protected]> Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: Benjamin Bossan <[email protected]> commit b344bce Author: Anush Kini <[email protected]> Date: Thu May 23 18:27:25 2024 +0530 [DPO] Add 'robust' loss_type (#1653) * Initial commit * pre-commit fix * Minor change to comments * Added some documentation on how to use Robust DPO commit 35e12dc Author: Nicolinho <[email protected]> Date: Thu May 23 14:36:15 2024 +0200 Fix inheritance order in PPOv2Config (#1659) * fix inheritance order in PPOv2Config * fix inheritance order in rloo_config commit 1da6be1 Author: Ali Bakly <[email protected]> Date: Thu May 23 14:10:29 2024 +0200 docs: correct cDPO usage in DPOTrainer (#1655) commit e249cd8 Author: Younes Belkada <[email protected]> Date: Thu May 23 14:10:05 2024 +0200 add support for training collator (#1658) commit a02513c Author: Zach Mueller <[email protected]> Date: Thu May 23 06:48:00 2024 -0400 Apply deprecated `evaluation_strategy` (#1559) * Deprecate * Update tests/test_dpo_trainer.py --------- Co-authored-by: Kashif Rasul <[email protected]> commit 13454d2 Author: Costa Huang <[email protected]> Date: Wed May 22 08:31:10 2024 -0400 PPO / Reinforce Trainers (#1540) * Add ppov2 trainer * make eos trick optional, remove unused args * quick fix * precommit * update debugging script * fix out of bound `drop_last=True`; use built-in scheduler * Add PPO examples * push changes * quick change * quick change * various bug fixes * remove unnecessary grad accumulation setting * push new changes * fix DS3 model saving * update ppo.py * refactor * quick change * refactor * update ppo trainer * refactor * quick test * add ds2 /ds3 7 processes config * add vllm trainer * quick change * experiment with reward normalization * push changes * quick push * push changes * push various changes * refactor to use ModelConfig * quick change * refactor * refactor * Simplify DS logic * quick update * remove unnecessary files * precommit * deepspeed fix; handle edge case when eos_token_id = 0 * add PPO tldr example * add TL;DR example * fix undefined var * utilize all samples in rloo * quick setting * remove the unnecessary `value_model` * use exact_div * allow saving the deepspeed model * refactor * remove dead code * Use some shared utilities * add some end-to-end test cases * add PPOv2 docs and RLOO docs / tests * update docs * quikc push * fix ci * fix type annotation for ci * quick update * update trainer docs commit 99f2c94 Author: Sourab Mangrulkar <[email protected]> Date: Wed May 15 19:55:46 2024 +0530 don't cast the trainable lora layers to half precision (#1644) * don't cast the trainable lora layers to half precision * quality commit 6401d08 Author: Wing Lian <[email protected]> Date: Tue May 14 09:41:07 2024 -0400 Pairwise Noise Contrastive Alignment (#1632) * add NCA paired preference loss * chore: lint * set more lenient tolerance for integration tests * Update tests/test_dpo_trainer.py * skip test * fix --------- Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: younesbelkada <[email protected]> commit d632a5b Author: bartoszzuk <[email protected]> Date: Tue May 14 12:25:54 2024 +0200 Fixed wrong logs prefixes in KTOTrainer (#1641) * Fixed wrong logs prefixes in KTOTrainer * Pre-commit formating commit 5aeb752 Author: Tiezhen WANG <[email protected]> Date: Fri May 10 23:19:15 2024 +0800 Update sft_llama2.py to work with the latest API (#1637) * Update sft_llama2.py to work with the latest API SFTTrainer now takes a STFConfig argument * Update dpo_llama2.py * precommit commit b8b8978 Author: Ilya Gusev <[email protected]> Date: Fri May 10 15:43:13 2024 +0200 [ORPO] Correct label mask for pad tokens (#1625) * [ORPO] Correct label mask for pad tokens Recent [fix](57aebe9) for calculating NLL loss for a whole sequence introduced a bug. When input_ids are copied to labels, pad tokens are not masked. This PR aims to path this by masking labels based on the attention mask. * -100 -> label_pad_token_id Co-authored-by: Kashif Rasul <[email protected]> --------- Co-authored-by: Kashif Rasul <[email protected]> commit 8799952 Author: Costa Huang <[email protected]> Date: Fri May 10 09:32:20 2024 -0400 visualize rm prediction (#1636) * visualize rm prediction * quick update * quick check * quick fix * update eval steps commit 3b4c249 Author: Xiao Yu <[email protected]> Date: Fri May 3 18:19:35 2024 -0400 fixed adding bos and eos token unconditionally (#1591) * fixed adding bos and eos token unconditionally * fixed typo of tokenizer -> self.tokenizer. Also added update to ORPO * fixed code quality, and added BOS/EOS fix to KTO * code reformatting with pre-commit run --all-files * bug fix: check input id length before checking for EOS/BOS commit 0347f58 Author: lewtun <[email protected]> Date: Fri May 3 15:59:59 2024 +0200 Fix ZeRO-3 generation context manager (#1617) * judge refactoring and unittest * format * init * doc * format * improve doc * basejudge * improve doc and add BaseAPIJudge * Doc * style * refactor callback * remove openai and pairrm judge from test * doc * rm dpo online example * new prompts and completions * skip hf judge and add hf token --------- Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]>
    
  yxliu-TAMU 
      pushed a commit
        to mincheolseong/ECEN743-GRPO-Project-Proposal
      that referenced
      this pull request
    
      Apr 20, 2025 
    
    
      
  
    
      
    
  
Current handling of `response_masks` inside `batch_forward_pass` function does not take padding into consideration which results with shape unmatch during masking. Since response mask is a mask tensor of response tokens, response tokens should not be concatenated with a `torch.zeros(query_length)` and masking operation should be done without slicing. Remove the concatenation of the response mask, remove the slicing from the response mask since response mask already has the length of `end - start + 1`, which is equal to length of `masks[j, start:end]`.
    
  yxliu-TAMU 
      pushed a commit
        to mincheolseong/ECEN743-GRPO-Project-Proposal
      that referenced
      this pull request
    
      Apr 20, 2025 
    
    
      
  
    
      
    
  
* Add WinRateCallback * Enable PairRM * Refactor * Streamline * Add HF judge * Add base judge * Use better prompt * Clean * Add max tokens * Use logging * Add batched inference * Squashed commit of the following: commit b873095 Author: Maxim Kopecki <[email protected]> Date: Wed Jul 10 19:11:13 2024 +0200 Added missing token kwarg in Peft model loading (huggingface#1825) commit 47875f7 Author: Quentin Gallouédec <[email protected]> Date: Wed Jul 10 18:26:11 2024 +0200 Make use of `trust_remote_code` consistent (huggingface#1806) Co-authored-by: Quentin Gallouédec <[email protected]> commit 1bb1757 Author: Adnan Khan <[email protected]> Date: Wed Jul 10 11:25:07 2024 -0400 Delete unused benchmark.yml workflow. (huggingface#1822) commit 064fcd7 Author: Quentin Gallouédec <[email protected]> Date: Tue Jul 9 09:22:52 2024 +0200 DPO Llava 1.5 and PaliGemma support (huggingface#1797) * llava support dpo * add_special_tokens=False only when possible * format * pali gemma * refactor size * remove image resize --------- Co-authored-by: Quentin Gallouédec <[email protected]> commit 17fe36b Author: Quentin Gallouédec <[email protected]> Date: Tue Jul 9 05:37:12 2024 +0200 upgrade gh actions (huggingface#1818) Co-authored-by: Quentin Gallouédec <[email protected]> commit 8a6ce91 Author: Costa Huang <[email protected]> Date: Mon Jul 8 11:12:41 2024 -0400 Set dev version (huggingface#1817) commit 6d1375b Author: Puneet Singh Bhooi <[email protected]> Date: Mon Jul 8 19:11:36 2024 +0530 fix broken url in `docs\source\index.mdx` (huggingface#1813) commit bd2a875 Author: Costa Huang <[email protected]> Date: Mon Jul 8 09:38:09 2024 -0400 0.9.6 release (huggingface#1816) commit 110657e Author: Alvaro Bartolome <[email protected]> Date: Mon Jul 8 01:59:26 2024 +0200 Fix `TRL_USE_RICH` environment variable handling (huggingface#1808) * Add `strtobool` custom implementation from `distutils` * Fix `TRL_USE_RICH` handling via `strtobool` * Run `make precommit` commit 18ec3e9 Author: Alvaro Bartolome <[email protected]> Date: Fri Jul 5 16:28:59 2024 +0200 Fix `torch_dtype` handling in `{DPO,SFT}Trainer` when provided via CLI (huggingface#1807) * Fix `torch_dtype` handling through CLI The `torch_dtype` is not properly handled when provided via the TRL CLI since it's provided initially as a string, but is then casted to `torch.dtype` before providing it to the `{DPO,SFT}Trainer`, which means that those trainers should handle the scenario where `torch_dtype` is a `torch.dtype` too. * Add `torch_dtype` tests in `test_{dpo,sft}_trainer.py` * Forward contribution credits * Run `make precommit` --------- Co-authored-by: Tash Srivastava <[email protected]> commit 5e0ca32 Author: Michael <[email protected]> Date: Fri Jul 5 07:29:48 2024 -0400 Remove extra print in reward_trainer.py (huggingface#1799) `print_rich_table` is called twice and the first call doesn't restrict to `num_print_samples`. Remove the first, extra call commit 044c347 Author: Quentin Gallouédec <[email protected]> Date: Thu Jul 4 14:29:25 2024 +0200 clean examples (huggingface#1791) Co-authored-by: Quentin Gallouédec <[email protected]> commit a083f95 Author: Noah Tye <[email protected]> Date: Wed Jul 3 11:10:50 2024 -0700 Bugfix: Preserve token fields when converting TrainingArguments to SFTConfig (huggingface#1794) * Preserve token fields when converting TrainingArguments to SFTConfig TrainingArguments.to_dict() redacts token fields, so we have to individually copy them over when converting to SFTConfig to avoid breaking push_to_hub functionality. Also adds a test. * run precommit * one-line args_as_dict definition per suggestion from kashif * generalize token copying to match TrainingArguments behavior * unwrap |= on dict, to support python 3.8 * use .update instead of |= or for-loop commit f0dadbb Author: Kashif Rasul <[email protected]> Date: Wed Jul 3 08:29:16 2024 +0200 add model_init_kwargs to training_args (huggingface#1787) commit 549a0cb Author: Tommaso Buonocore <[email protected]> Date: Sat Jun 29 15:35:48 2024 +0200 Fixed typo in SFT trainer docs (huggingface#1788) 'STFConfig' instead of 'SFTConfig' appears multiple times in the doc, causing error when running the code snippets. commit 34178f5 Author: Kashif Rasul <[email protected]> Date: Thu Jun 27 15:47:58 2024 +0200 fix docs and examples (huggingface#1780) commit 5bedb64 Author: Quentin Gallouédec <[email protected]> Date: Wed Jun 26 16:26:37 2024 +0200 Visual DPO (huggingface#1647) * Remove extra whitespaces * idefics * vdpo * sft idefics * pad with test * use prompt instead of tokenizer * rm name main * support vlm in tokenize row * temp fix for regex in lora_target_module * format * vdpo * tmp float16 hard code * concatenated_forward support for vision * style and new command line * all-linear * format * delete old examples * get image * upcast * new test * modified test * new strat for tokenizer * rm token transfer * integrate vision in dpo example * format * add FDivergenceType back * precommit * pillow test dep * optional prompt * `evaluation_strategy` to `eval_strategy` * revert vsft change (oos) * update test * test * comment and support more in process * update process * update doc for vdpo * caution about limited support * Update docs/source/dpo_trainer.mdx Co-authored-by: Kashif Rasul <[email protected]> * revert DPO example changes * cleaner way to check if a model is vision * comment * update vdpo example * rename --------- Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Kashif Rasul <[email protected]> commit 4de3b68 Author: Mubin Manasia <[email protected]> Date: Wed Jun 26 03:23:36 2024 -0600 Fix Documentation Overflow Issues for Long URLs in SFTConfig (huggingface#1774) * Update sft_config.py * Update sft_config.py commit dce7a70 Author: Costa Huang <[email protected]> Date: Wed Jun 26 03:18:22 2024 -0400 Remove the leading space in the tldr preference dataset (huggingface#1773) commit 178ce05 Author: Haozhe Ji <[email protected]> Date: Tue Jun 25 22:47:32 2024 +0800 add Efficient Exact Optimization (EXO) (huggingface#1735) * add exo * fix a detail * Update trl/trainer/dpo_trainer.py * Update trl/trainer/dpo_trainer.py * Update trl/trainer/dpo_trainer.py --------- Co-authored-by: Kashif Rasul <[email protected]> commit 98d7384 Author: Quentin Gallouédec <[email protected]> Date: Tue Jun 25 16:14:26 2024 +0200 `evaluation_strategy` to `eval_strategy` (huggingface#1771) Co-authored-by: Quentin Gallouédec <[email protected]> commit 42093ce Author: Clara Pohland <[email protected]> Date: Mon Jun 24 21:27:00 2024 +0200 MoE Models: option to add load balancing loss (huggingface#1765) * KTO: add aux loss * use router_aux_loss_coef in KtoTrainer when aux_loss enabled * align optional aux_loss in DPO, KTO, CPO, ORPO * precommit changes * fix KL forward kwargs * add aux_loss doku entry * apply docs suggestions --------- Co-authored-by: Clara Luise Pohland <[email protected]> commit 8a9776c Author: Mihir Prabhudesai <[email protected]> Date: Mon Jun 24 12:05:44 2024 -0400 Added Reward Backpropogation Support (huggingface#1585) * added alignprop template * added alignprop support * Update alignprop_trainer.mdx * Update alignprop_trainer.mdx * added better why statement * fixed inference code * changed self to pipeline * removed aesthetic classifier * added aesthetic to auxiliary models * added unseen prompt logging * removed unseen prompt log * fixed minor * remove not needed import in trl/__init__.py Co-authored-by: Younes Belkada <[email protected]> * fixed styling * updated _toctree --------- Co-authored-by: Younes Belkada <[email protected]> commit e9d972d Author: Haoran Xu <[email protected]> Date: Sun Jun 23 09:54:30 2024 -0700 Add CPO-SimPO method (huggingface#1760) * enable cpo-simpo * highlight SimPO and CPO-SimPO * add test for cpo_alpha * formatting * Update docs/source/cpo_trainer.mdx --------- Co-authored-by: Kashif Rasul <[email protected]> commit cfc9c3b Author: Costa Huang <[email protected]> Date: Fri Jun 21 11:20:54 2024 -0400 New sentiment and descriptiveness dataset (huggingface#1757) * push changes * handle edge cases where the chosen and the rejected are the same commit 9810adf Author: Juyoung Suk <[email protected]> Date: Fri Jun 21 18:01:08 2024 +0900 Add dataset_text_field in examples/scripts/sft.py (huggingface#1758) commit 3906b04 Author: Costa Huang <[email protected]> Date: Thu Jun 20 13:16:43 2024 -0400 Support num_train_epochs (huggingface#1743) * add a test case for num_train_epochs * fix ci * quick change * disable push to hub * debug windows ci * try another fix * skip subprocess tests on windows commit bf10bc2 Author: Mert Sayar <[email protected]> Date: Thu Jun 20 18:22:20 2024 +0300 Fix masking of response tokens (huggingface#1718) Current handling of `response_masks` inside `batch_forward_pass` function does not take padding into consideration which results with shape unmatch during masking. Since response mask is a mask tensor of response tokens, response tokens should not be concatenated with a `torch.zeros(query_length)` and masking operation should be done without slicing. Remove the concatenation of the response mask, remove the slicing from the response mask since response mask already has the length of `end - start + 1`, which is equal to length of `masks[j, start:end]`. commit d5ddb70 Author: idanshen <[email protected]> Date: Thu Jun 20 09:14:16 2024 -0400 Support for returning past_key_values from the model (huggingface#1742) * add support for returning past_key_values from the model * change order of keys commit 7a21e01 Author: 1485840691 <[email protected]> Date: Wed Jun 19 18:02:51 2024 +0800 Integrate f-divergence to DPO (Follow up) (huggingface#1610) * Step 1: update ppo_trainer and hello_world example * Step 2: Refine comments and add parameter type * Step 2: Add missing parameter comments * Step 1: Organize ptx loss into a function and add ptx_loss to train_stats * Step 1 updates: add comment to ptx_loss function, fix a bug and add warning message * Step 2: 1) Add ppo_ptx trainig example as ppo; 2) separate pretrain data fetch and iterate * Step 2: Remove loss from columns_to_log in ppo_ptx example * Remove data set revision in load imbd dataset * Run pre-commit and fix format issues * Initial draft of f-divergence fn * Update f-divergence to avoid overflow * fix test errors and comments * Add Unit tests for dpo loss with alpha and js div f * Adjust format * Fix test error * Reverse this update * Add test cases * Reverse un-needed updates * Update code style * Try to fix code fmt error * remove extra end line --------- Co-authored-by: Kashif Rasul <[email protected]> commit 80c3fd7 Author: Shihyueh Hsu <[email protected]> Date: Tue Jun 18 22:07:24 2024 +0800 change the `process` function in the example of DPO (huggingface#1753) * change the `process` function in the example of DPO * fix commit 4d4c766 Author: Younes Belkada <[email protected]> Date: Tue Jun 18 11:31:17 2024 +0200 CI / `KTOTrainer`: Remove old tests (huggingface#1750) * remove old tests * remove datasets * Update test_dpo_trainer.py * Update test_dpo_trainer.py commit cb52f7f Author: Michael <[email protected]> Date: Mon Jun 17 10:50:21 2024 -0400 prepare deepspeed accomodate fp16 and bf16 (huggingface#1728) * prepare deepspeed accomodate fp16 and bf16 * precommit commit 8f104f2 Author: Younes Belkada <[email protected]> Date: Mon Jun 17 16:49:00 2024 +0200 CPO / DPO: Fix red CI (huggingface#1749) * fix red CI * precommit commit 9574be9 Author: Younes Belkada <[email protected]> Date: Mon Jun 17 16:41:36 2024 +0200 fix red CI commit 407bb9f Author: Kawin <[email protected]> Date: Mon Jun 17 07:14:44 2024 -0700 small KTO fixes (huggingface#1734) * add warning for imbalanced data * update documentation * update script commands to be same as in dpo * use batch_size KL examples and batch_size target examples to calculate batch_size losses * fix deepspeed issue * speed up forward with no_grad for KL * add some removed metrics * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py add reference to paper Co-authored-by: lewtun <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * add more detailed comments * convert assert to ValueError * Update kto_trainer.py * precommit formatting * remove nans in metrics by gathering across machines * fix formatting * fix choice of mismatched examples for KL term * describe weights * fix hanging issue in distributed training * linting * move metrics to cpu * Update trl/trainer/kto_trainer.py Co-authored-by: lewtun <[email protected]> * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py * remove kto_pair * speed up data processing * move bco code inside * raise error for kto_pair argument * fix formatting --------- Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: lewtun <[email protected]> Co-authored-by: Winnie Xu <[email protected]> commit 5c4d5c9 Author: Younes Belkada <[email protected]> Date: Mon Jun 17 16:01:06 2024 +0200 `TrlParser`: Add ignore extra args option (huggingface#1748) * add ignore extra args option * Update trl/commands/cli_utils.py commit 85f2ff6 Author: Younes Belkada <[email protected]> Date: Mon Jun 17 15:16:07 2024 +0200 CI / core: Pin `numpy` to `!=2.0.0` for CI and to users (huggingface#1747) * Update setup.py * Update setup.py * Update setup.py * Update test_best_of_n_sampler.py dummy commit * pin numpy * Update tests/test_best_of_n_sampler.py * Update setup.py commit a17b400 Author: Michael <[email protected]> Date: Mon Jun 17 08:43:33 2024 -0400 better trl parser with yaml config (huggingface#1739) * working trl parser with config correctly overrides yaml config with command line arguments adds return_remaining_strings when return_remaining_strings is False, raises error if yaml contains extra args that are not in the dataclasses simpler and cleaner than previous yaml parsing and merging addresses huggingface#1733 * lowercase trlparser commit ced4a81 Author: Younes Belkada <[email protected]> Date: Mon Jun 17 11:56:13 2024 +0200 Workflow: Notify tests results on slack channel (huggingface#1744) * Update tests-main.yml * Update docker-build.yml commit c47241c Author: Igor Melnyk <[email protected]> Date: Wed Jun 12 05:54:54 2024 -0400 adds AOT (huggingface#1701) * adds AOT * Applied format changes * added docs and tests --------- Co-authored-by: Igor Melnyk <[email protected]> commit 7b118ac Author: jetlime <[email protected]> Date: Wed Jun 12 00:35:31 2024 +1000 ktotrainer: Refuse datasets which contain only one class of labels (huggingface#1724) * ktotrainer: refuse dataset which contain only one class of labels * ktotrainer: document new dataset constraint commit 89b9597 Author: Luc Georges <[email protected]> Date: Mon Jun 10 11:17:54 2024 +0200 feat(ci): add trufflehog secrets detection (huggingface#1721) * feat(ci): add trufflehog secrets detection * fix(ci): remove unnecessary permissions commit 8d2e1fc Author: Michael <[email protected]> Date: Fri Jun 7 11:42:08 2024 +0200 Fix default padding_value in dpo_config.py (huggingface#1692) dpo_config default padding value should be None, not 0, otherwise it by default overrides the padding value of any tokenizer to 0 commit e587593 Author: Michael <[email protected]> Date: Fri Jun 7 10:37:27 2024 +0200 fix yaml parser for derived config classes (huggingface#1713) fixes huggingface#1712 reformatted cli_utils with ruff commit 68fae5d Author: Kashif Rasul <[email protected]> Date: Fri Jun 7 08:48:17 2024 +0100 RDPO fix nll loss (huggingface#1705) commit 7b14b96 Author: Haoran Xu <[email protected]> Date: Thu Jun 6 14:06:47 2024 -0700 Add a variant of CPO, SimPO (huggingface#1703) * add a variant of cpo: simpo * correct cpo-simpo loss * avoid 0 int error in logging * add simpo description * Update trl/trainer/cpo_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * fix formatting * add test for simpo * Update docs/source/cpo_trainer.mdx Co-authored-by: Kashif Rasul <[email protected]> * add a docstring for simpogamma * move simpo description to the above docstring * change simpo description in the doc * formatting --------- Co-authored-by: Kashif Rasul <[email protected]> commit d0aa871 Author: Younes Belkada <[email protected]> Date: Thu Jun 6 19:33:20 2024 +0200 set dev version (huggingface#1710) * Update setup.py * Update __init__.py commit cdbd45c Author: Costa Huang <[email protected]> Date: Thu Jun 6 10:13:00 2024 -0400 0.9.4 release (huggingface#1708) commit e5922e5 Author: Younes Belkada <[email protected]> Date: Thu Jun 6 15:50:17 2024 +0200 SFTTrainer: Fix backward Compatibility issue with `TrainingArguments` (huggingface#1707) * fix BC * fixup commit 2a8acb7 Author: Guilherme Freire <[email protected]> Date: Thu Jun 6 14:42:58 2024 +0100 Fixed doc string and docs for the SFTConfig update (huggingface#1706) commit c693a68 Author: Costa Huang <[email protected]> Date: Wed Jun 5 14:34:59 2024 -0400 0.9.3 release (huggingface#1699) commit 8d38e15 Author: Younes Belkada <[email protected]> Date: Wed Jun 5 17:29:03 2024 +0200 Update sft_trainer.py (huggingface#1698) commit 4eddcbd Author: Costa Huang <[email protected]> Date: Wed Jun 5 11:00:19 2024 -0400 Release 0.9.2 (huggingface#1697) * Release: 0.9.0 * Release commit e2feefd Author: Costa Huang <[email protected]> Date: Wed Jun 5 10:20:54 2024 -0400 Quick fix on GPT4-eval (huggingface#1696) * quick fix * precommit commit 42d1e6a Author: Quentin Gallouédec <[email protected]> Date: Mon Jun 3 20:09:05 2024 +0200 Fix typo in DPOTrainer's warnings (huggingface#1688) commit 936ff5b Author: Alex Brooks <[email protected]> Date: Mon Jun 3 10:24:32 2024 -0600 Skip packing validation (huggingface#1673) * Add test for skipping preproc if packing=True Signed-off-by: Alex-Brooks <[email protected]> * Allow skipping of validation for packing=True Signed-off-by: Alex-Brooks <[email protected]> * Use dummy dataset in no packing preproc test Signed-off-by: Alex-Brooks <[email protected]> --------- Signed-off-by: Alex-Brooks <[email protected]> commit c24d3d9 Author: Alexey Rozhkov <[email protected]> Date: Mon Jun 3 10:16:22 2024 +0100 Fix overriding optimize_device_cache with optimize_cuda_cache in PPOConfig (huggingface#1690) * Don't override optimize_device_cache when optimize_cuda_cache is not provided Raise an exception when both optimize_cuda_cache and optimize_device_cache are set * Minor fix commit 24a67c3 Author: Kashif Rasul <[email protected]> Date: Mon Jun 3 09:43:02 2024 +0100 intial RPO loss (huggingface#1686) * intial RPO loss * fix sign * clean up commit d41594e Author: Samuel <[email protected]> Date: Wed May 29 20:29:38 2024 +0200 Fix max completion length (huggingface#1588) commit 76f780a Author: Younes Belkada <[email protected]> Date: Wed May 29 20:19:26 2024 +0200 fix (huggingface#1678) commit 2010aa7 Author: Wang, Yi <[email protected]> Date: Mon May 27 20:52:20 2024 +0800 fix dataset load error (huggingface#1670) Signed-off-by: Wang, Yi <[email protected]> commit 4cb097e Author: Younes Belkada <[email protected]> Date: Fri May 24 15:20:16 2024 +0200 FIX / PPO: Fix `enable_input_require_grads` issues with PPO models (huggingface#1664) * Update modeling_base.py * Update ppo_config.py * Update ppo_trainer.py * style commit af2eb09 Author: Costa Huang <[email protected]> Date: Thu May 23 11:37:16 2024 -0400 Fix ppov2 test case (huggingface#1661) * Fix PPOv2 / RLOO refactor's stuff * update terminology to use stop token commit a0a4335 Author: Kashif Rasul <[email protected]> Date: Thu May 23 15:28:04 2024 +0200 update eval_strategy (huggingface#1662) commit 21aea35 Author: Sourab Mangrulkar <[email protected]> Date: Thu May 23 18:34:22 2024 +0530 do not upcast adapters when using FSDP+QLoRA (huggingface#1654) commit 3a8a1b1 Author: syrn1k <[email protected]> Date: Thu May 23 15:58:49 2024 +0300 🤫 TR-DPO implementation (huggingface#1593) * 🤫 TR-DPO implementation baseline * fix comments * docs * fix linters * test added * move configs to DPOConfig * fix typo * add docs * fix import * use state.global_step * fix order of arguments * make sure plugins are not none * Update trl/trainer/utils.py Co-authored-by: Benjamin Bossan <[email protected]> * Update trl/trainer/utils.py Co-authored-by: Benjamin Bossan <[email protected]> * checking that reference model weights have changed * sync_target_model as staticmethod * set reference model --------- Co-authored-by: Nikita Surnachev <[email protected]> Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: Benjamin Bossan <[email protected]> commit 3a2b879 Author: Anush Kini <[email protected]> Date: Thu May 23 18:27:25 2024 +0530 [DPO] Add 'robust' loss_type (huggingface#1653) * Initial commit * pre-commit fix * Minor change to comments * Added some documentation on how to use Robust DPO commit cc386f4 Author: Nicolinho <[email protected]> Date: Thu May 23 14:36:15 2024 +0200 Fix inheritance order in PPOv2Config (huggingface#1659) * fix inheritance order in PPOv2Config * fix inheritance order in rloo_config commit ff876cb Author: Ali Bakly <[email protected]> Date: Thu May 23 14:10:29 2024 +0200 docs: correct cDPO usage in DPOTrainer (huggingface#1655) commit 68ffd4e Author: Younes Belkada <[email protected]> Date: Thu May 23 14:10:05 2024 +0200 add support for training collator (huggingface#1658) commit b345750 Author: Zach Mueller <[email protected]> Date: Thu May 23 06:48:00 2024 -0400 Apply deprecated `evaluation_strategy` (huggingface#1559) * Deprecate * Update tests/test_dpo_trainer.py --------- Co-authored-by: Kashif Rasul <[email protected]> commit a5df7bc Author: Costa Huang <[email protected]> Date: Wed May 22 08:31:10 2024 -0400 PPO / Reinforce Trainers (huggingface#1540) * Add ppov2 trainer * make eos trick optional, remove unused args * quick fix * precommit * update debugging script * fix out of bound `drop_last=True`; use built-in scheduler * Add PPO examples * push changes * quick change * quick change * various bug fixes * remove unnecessary grad accumulation setting * push new changes * fix DS3 model saving * update ppo.py * refactor * quick change * refactor * update ppo trainer * refactor * quick test * add ds2 /ds3 7 processes config * add vllm trainer * quick change * experiment with reward normalization * push changes * quick push * push changes * push various changes * refactor to use ModelConfig * quick change * refactor * refactor * Simplify DS logic * quick update * remove unnecessary files * precommit * deepspeed fix; handle edge case when eos_token_id = 0 * add PPO tldr example * add TL;DR example * fix undefined var * utilize all samples in rloo * quick setting * remove the unnecessary `value_model` * use exact_div * allow saving the deepspeed model * refactor * remove dead code * Use some shared utilities * add some end-to-end test cases * add PPOv2 docs and RLOO docs / tests * update docs * quikc push * fix ci * fix type annotation for ci * quick update * update trainer docs commit 9958193 Author: Sourab Mangrulkar <[email protected]> Date: Wed May 15 19:55:46 2024 +0530 don't cast the trainable lora layers to half precision (huggingface#1644) * don't cast the trainable lora layers to half precision * quality commit d7cdd28 Author: Wing Lian <[email protected]> Date: Tue May 14 09:41:07 2024 -0400 Pairwise Noise Contrastive Alignment (huggingface#1632) * add NCA paired preference loss * chore: lint * set more lenient tolerance for integration tests * Update tests/test_dpo_trainer.py * skip test * fix --------- Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: younesbelkada <[email protected]> commit a447195 Author: bartoszzuk <[email protected]> Date: Tue May 14 12:25:54 2024 +0200 Fixed wrong logs prefixes in KTOTrainer (huggingface#1641) * Fixed wrong logs prefixes in KTOTrainer * Pre-commit formating commit d6652a4 Author: Tiezhen WANG <[email protected]> Date: Fri May 10 23:19:15 2024 +0800 Update sft_llama2.py to work with the latest API (huggingface#1637) * Update sft_llama2.py to work with the latest API SFTTrainer now takes a STFConfig argument * Update dpo_llama2.py * precommit commit fb0e17e Author: Ilya Gusev <[email protected]> Date: Fri May 10 15:43:13 2024 +0200 [ORPO] Correct label mask for pad tokens (huggingface#1625) * [ORPO] Correct label mask for pad tokens Recent [fix](huggingface@512cee3) for calculating NLL loss for a whole sequence introduced a bug. When input_ids are copied to labels, pad tokens are not masked. This PR aims to path this by masking labels based on the attention mask. * -100 -> label_pad_token_id Co-authored-by: Kashif Rasul <[email protected]> --------- Co-authored-by: Kashif Rasul <[email protected]> commit 33d32c6 Author: Costa Huang <[email protected]> Date: Fri May 10 09:32:20 2024 -0400 visualize rm prediction (huggingface#1636) * visualize rm prediction * quick update * quick check * quick fix * update eval steps commit f41f309 Author: Xiao Yu <[email protected]> Date: Fri May 3 18:19:35 2024 -0400 fixed adding bos and eos token unconditionally (huggingface#1591) * fixed adding bos and eos token unconditionally * fixed typo of tokenizer -> self.tokenizer. Also added update to ORPO * fixed code quality, and added BOS/EOS fix to KTO * code reformatting with pre-commit run --all-files * bug fix: check input id length before checking for EOS/BOS commit b106f82 Author: lewtun <[email protected]> Date: Fri May 3 15:59:59 2024 +0200 Fix ZeRO-3 generation context manager (huggingface#1617) * judge refactoring and unittest * format * init * doc * format * improve doc * basejudge * improve doc and add BaseAPIJudge * Doc * style * refactor callback * remove openai and pairrm judge from test * doc * rm dpo online example * new prompts and completions * skip hf judge and add hf token --------- Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]>
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Current handling of
response_masksinsidebatch_forward_passfunction does not take padding into consideration which results with shape unmatch during masking as stated in #1717. Response mask can directly be used for the masking operation instead of doing preprocessing such as concatenation.Remove the concatenation of the response mask, remove the slicing from the response mask since response mask already has the length of
end - start, which is equal to length ofmasks[j, start:end].Update the docstring for
masksparameter under_step_safety_checkerwhich is incompatible with the docstring understepfunction.Fixes #1717