Skip to content

[Community Pipeline] UnCLIP image / text interpolations #1869

@patrickvonplaten

Description

@patrickvonplaten

Model/Pipeline/Scheduler description

Copied from #1858:

UnCLIP / Karlo: https://huggingface.co/spaces/kakaobrain/karlo gives some very nice and precise results when doing image generation and can strongly outperform Stable Diffusion in some - see:
https://www.reddit.com/r/StableDiffusion/comments/zshufz/karlo_the_first_large_scale_open_source_dalle_2/

Another extremely interesting aspect of Dalle 2 is its ability to interpolate between text and or image embeddings. See e.g. section 3.) of the Dalle 2 paper: https://cdn.openai.com/papers/dall-e-2.pdf . This PR now allows to directly pass text embeddings and image embeddings which should enable those tasks!

I think we could create a super cool community pipeline. The pipeline could allow to automatically create interpolations between two text prompts and similarly we could create one to do interpolations between two images.

In terms of design to stay as efficient as possible the following would make sense:

    1. The user passes two text prompts and a num_interpolations input.
    1. The pipeline then embeds those two text prompts into the text embeddings x_0 and x_N and num_interpolations x_1, x_2, ... x_N-1 are created using the slerp function .
    1. Then we have num_interpolations + 2 text embeddings that should be passed in a batch through the model to create a nice interpolation of images.
    1. It'd be important to make use of enable_cpu_offload() to save memory.

It's probably easier to start with the UnCLIPImageInterpolationPipeline since image embeddings are just a single 1-d vector where as for text embeddings two latent vectors are used.

Would be more than happy to help if someone is interested in giving this a try - think it'll make for some super cool demos.

Open source status

  • The model implementation is available
  • The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

No response

Metadata

Metadata

Labels

staleIssues that haven't received updates

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions