[Feature request]: Dreambooth support for text2video diffusion models

### Model/Pipeline/Scheduler description

Now that [the first text2video pipeline is merged](https://github.com/huggingface/diffusers/pull/2738), the next frontier is to enable efficient fine-tuning of these models.

There's already Dreambooth method for the Diffusion-based text2img models, consisting of making a regularization dataset and then finetuning the model in a specific way on both the input and on the generated dataset.

Given that the nature of these models is essentially the same (as I know for maintaining the Auto1111 extension for this model), it can be enabled by shifting the Unet model from 2d to 3d and changing the class dataset pipeline from images to video. It would be really awesome to do 🙂

### Open source status

- [X] The model implementation is available
- [X] The model weights are available (Only relevant if addition is not a scheduler).

### Provide useful links for the implementation

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature request]: Dreambooth support for text2video diffusion models #2784

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature request]: Dreambooth support for text2video diffusion models #2784

Description

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions