In #5238, the [Min-SNR](https://openaccess.thecvf.com/content/ICCV2023/papers/Hang_Efficient_Diffusion_Training_via_Min-SNR_Weighting_Strategy_ICCV_2023_paper.pdf) weight for v-prediction is implemented as $$w_t = \frac {min(\text{SNR}+1, \gamma)} {\text{SNR}+1}$$ but it should be $$w_t = \frac {min(\text{SNR}, \gamma)} {\text{SNR}+1}$$. See https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py#L931-L937.