diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index d95e553bd39a..b8aa71dacbe2 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -106,6 +106,8 @@ title: Custom Diffusion - local: training/t2i_adapters title: T2I-Adapters + - local: training/ddpo + title: Reinforcement learning training with DDPO title: Training - sections: - local: using-diffusers/other-modalities diff --git a/docs/source/en/training/ddpo.md b/docs/source/en/training/ddpo.md new file mode 100644 index 000000000000..1ec961dfdd04 --- /dev/null +++ b/docs/source/en/training/ddpo.md @@ -0,0 +1,17 @@ + + +# Reinforcement learning training with DDPO + +You can fine-tune Stable Diffusion on a reward function via reinforcement learning with the 🤗 TRL library and 🤗 Diffusers. This is done with the Denoising Diffusion Policy Optimization (DDPO) algorithm introduced by Black et al. in [Training Diffusion Models with Reinforcement Learning](https://arxiv.org/abs/2305.13301), which is implemented in 🤗 TRL with the [`~trl.DDPOTrainer`]. + +For more information, check out the [`~trl.DDPOTrainer`] API reference and the [Finetune Stable Diffusion Models with DDPO via TRL](https://huggingface.co/blog/trl-ddpo) blog post. \ No newline at end of file