Skip to content

[Feature request]: Dreambooth support for text2video diffusion models #2784

@kabachuha

Description

@kabachuha

Model/Pipeline/Scheduler description

Now that the first text2video pipeline is merged, the next frontier is to enable efficient fine-tuning of these models.

There's already Dreambooth method for the Diffusion-based text2img models, consisting of making a regularization dataset and then finetuning the model in a specific way on both the input and on the generated dataset.

Given that the nature of these models is essentially the same (as I know for maintaining the Auto1111 extension for this model), it can be enabled by shifting the Unet model from 2d to 3d and changing the class dataset pipeline from images to video. It would be really awesome to do 🙂

Open source status

  • The model implementation is available
  • The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleIssues that haven't received updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions