feature/PrefGRPO-txt2img-cleaning #9

LouisRouss · 2025-09-10T20:32:30Z

Adds PrefGRPO training to Diffulab.
The implementation is done from scratch but I take inspiration from the official repository.
Tons of work left to do, draft PR just to open the subject.

This PR also:

correct GaussianDiffusion class and introduces the Sampler abstractions
serves as a test and fix for the text to image logic. I hadn't had the time to fix everything on the master for this text code that was coded long time ago and not tested.

…g and reward computation - Allow batch processing with multiple prompts

…resentation output

…rts for clarity

…ility updates

…upport and advantage computation

…asses for GRPO alignment

…osses and calculate the mean at the end

…uler and EulerMaruyama methods for flow based models

…put_size; add PreComputedEmbedder class for handling precomputed embeddings; update SD3TextEmbedder to streamline initialization and embedding retrieval.

…mprove type casting for image and text feature encoding.

… timestep handling in Euler and EulerMaruyama samplers.

…ting methods - Fix Gaussian Diffusion in general

…ng and logging

…ruyama sampler

…asses to remove ModelInputGRPO and standardize on ModelInput, enhancing type consistency and clarity. - Refactor GRPOTrainer to take into account sampler use and grpo function merged into basic ones

…er abstract class

…n and improve parameter handling

…fusion classes to make data_shape optional, enhancing flexibility in model input handling.

…sk to embedding outputs + fix part of the code

…flect reverse flow matching process

…update context embedding handling

- add attnmask to precomputedembedder

…ling and attention mask support

…low, GaussianDiffusion, and DDT

…er context handling

…curacy

- remove pre compute on dataset for context embedder

- Introduced RotaryPositionalEmbeddingNDim to support N-dimensional rotary embeddings. - Updated DiTAttention and MMDiTAttention classes to utilize the new rotary embedding structure. - Modified forward methods to accept precomputed cosine and sine values for rotary embeddings. - Enhanced PerceiverRotaryPositionalEmbedding to work with N-dimensional inputs. - Adjusted PerceiverAttention and PerceiverResampler to accommodate new rotary embedding logic. - Replaced RMSNorm with LayerNorm in DiTBlock and MMDiTBlock for consistency. - Updated MMDiT class to compute positional encodings for both text and image inputs. - Added PackedSwiGLU for improved MLP performance. - Fixed minor issues in the base trainer's model input handling.

Add RewardModel and PrefGRPORewardModel classes for reward computation

91de4b3

LouisRouss marked this pull request as draft September 10, 2025 20:33

LouisRouss added 28 commits September 11, 2025 22:50

Refactor RewardModel and PrefGRPORewardModel to enhance image handlin…

db14e8a

…g and reward computation - Allow batch processing with multiple prompts

Add return_latents option to Diffuser's denoise method for latent rep…

02e1ca8

…resentation output

Add attribute delegation and enhanced dir() support to Diffuser class

2ff6348

Fix dtype argument in model initialization

b54d0c7

Add one_step_denoise_grpo method for GRPO training in Flow class

7e60ff7

Refactor training classes to use a common trainer and reorganize impo…

b0240f0

…rts for clarity

Add GRPO support to Diffuser and Flow classes with new methods and ut…

f60fd65

…ility updates

Enhance RewardModel and PrefGRPORewardModel with n_image_per_prompt s…

2a3ffc8

…upport and advantage computation

Add GRPO support with new BatchData structures and update training cl…

43f6cd3

…asses for GRPO alignment

fix typing

6b014f7

fix loss calculation grpo flow

fd4252a

Refactor loss computation in Flow class to use a list for step-wise l…

ba8f074

…osses and calculate the mean at the end

Refactor trainer imports and implement validation step in GRPOTrainer

10a823d

Finish GRPO training loop and fix epoch level scheduler logic

7519c8c

Add clip in reward model

2f1940e

Implement StepResult and Sampler classes for diffusion process; add E…

d442082

…uler and EulerMaruyama methods for flow based models

adapt to abstraction sampler and clean GRPO logic

693e403

Refactor ContextEmbedder to implement properties for n_output and out…

342982a

…put_size; add PreComputedEmbedder class for handling precomputed embeddings; update SD3TextEmbedder to streamline initialization and embedding retrieval.

Refactor PrefGRPORewardModel to standardize clip model ID usage and i…

f7c4200

…mprove type casting for image and text feature encoding.

Refactor sampler classes to standardize set_steps method for improved…

dc033ab

… timestep handling in Euler and EulerMaruyama samplers.

Add DDIM and DDPM sampler implementations with step and parameter set…

8f17901

…ting methods - Fix Gaussian Diffusion in general

Refactor Flow and EulerMaruyama classes for improved parameter handli…

2a9847a

…ng and logging

improve tensor handling and device compatibility in flow and euler me…

c64db67

…ruyama sampler

- Refactor model input handling in Diffuser, Flow, and GRPOTrainer cl…

2d476d4

…asses to remove ModelInputGRPO and standardize on ModelInput, enhancing type consistency and clarity. - Refactor GRPOTrainer to take into account sampler use and grpo function merged into basic ones

Add a generic abstract sampler class over modelization specific sampl…

d2425fb

…er abstract class

Refactor diffusion model classes to standardize sampler initializatio…

0b9e159

…n and improve parameter handling

update docstring

3c81ce9

Refactor denoise method signatures in Diffuser, Flow, and GaussianDif…

854d82a

…fusion classes to make data_shape optional, enhancing flexibility in model input handling.

LouisRouss added 6 commits October 1, 2025 20:56

add dinoV3 and precompute functions

562f1f3

Refactor SD3TextEmbedder to improve type casting and add attention ma…

2b8a642

…sk to embedding outputs + fix part of the code

Update step method docstring in Euler and EulerMaruyama classes to re…

5c24b79

…flect reverse flow matching process

add dependencies

20945ec

improve attn unet

9960d67

Refactor ContextEmbedder to use ContextEmbedderOutput for forward method

94c9c44

LouisRouss changed the title ~~[WIP] feature/PrefGRPO~~ [WIP] feature/PrefGRPO-txt2img-cleaning Oct 23, 2025

LouisRouss and others added 16 commits October 26, 2025 11:01

Enhance MMDiTAttention and MMDiTBlock to support attention masks and …

d333dc5

…update context embedding handling

- rename mask to attn mask in context embedder output

da43da7

- add attnmask to precomputedembedder

Update DDT to utilize ContextEmbedderOutput for improved context hand…

ee8f7d0

…ling and attention mask support

use transformers instead of open clip

5aed411

Add attention mask to U-Net and use torch scaled do product attn

ba5028a

finish forward method of PrefGRPORewardModel

59b1d1d

fix torch stack xt_std

08f489b

fix unet

d7ab331

Add GRPO loss computation and update method signatures in Diffuser, F…

a74f3b0

…low, GaussianDiffusion, and DDT

Remove 'local/' from Pyright include paths in pyproject.toml

ce670ba

Fix feature appending condition in DDT class to check for None

42c1a64

Fix default value for attn_mask in PreComputedEmbedder to ensure prop…

5f2e856

…er context handling

fix reward model

c38be13

Refactor encoding input range checks in DCAE class for clarity and ac…

f084799

…curacy

clean code - update docstring

f2840de

Merge branch 'main' into feature/PrefGRPO

f58555b

LouisRouss assigned AdilZouitine Nov 2, 2025

LouisRouss marked this pull request as ready for review November 2, 2025 14:41

LouisRouss changed the title ~~[WIP] feature/PrefGRPO-txt2img-cleaning~~ feature/PrefGRPO-txt2img-cleaning Nov 2, 2025

LouisRouss unassigned AdilZouitine Nov 3, 2025

LouisRouss requested a review from AdilZouitine November 3, 2025 20:33

LouisRouss and others added 2 commits November 18, 2025 22:38

- remove pre computed embedder #TODO cleaner

cd1a40f

- remove pre compute on dataset for context embedder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature/PrefGRPO-txt2img-cleaning #9

feature/PrefGRPO-txt2img-cleaning #9

Uh oh!

LouisRouss commented Sep 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feature/PrefGRPO-txt2img-cleaning #9

Are you sure you want to change the base?

feature/PrefGRPO-txt2img-cleaning #9

Uh oh!

Conversation

LouisRouss commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LouisRouss commented Sep 10, 2025 •

edited

Loading