-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Description
Model/Pipeline/Scheduler description
Copied from #1858:
UnCLIP / Karlo: https://huggingface.co/spaces/kakaobrain/karlo gives some very nice and precise results when doing image generation and can strongly outperform Stable Diffusion in some - see:
https://www.reddit.com/r/StableDiffusion/comments/zshufz/karlo_the_first_large_scale_open_source_dalle_2/Another extremely interesting aspect of Dalle 2 is its ability to interpolate between text and or image embeddings. See e.g. section 3.) of the Dalle 2 paper: https://cdn.openai.com/papers/dall-e-2.pdf . This PR now allows to directly pass text embeddings and image embeddings which should enable those tasks!
I think we could create a super cool community pipeline. The pipeline could allow to automatically create interpolations between two text prompts and similarly we could create one to do interpolations between two images.
In terms of design to stay as efficient as possible the following would make sense:
-
- The user passes two text prompts and a
num_interpolations
input.
- The user passes two text prompts and a
-
- The pipeline then embeds those two text prompts into the text embeddings x_0 and x_N and
num_interpolations
x_1, x_2, ... x_N-1 are created using theslerp
function .
- The pipeline then embeds those two text prompts into the text embeddings x_0 and x_N and
-
- Then we have
num_interpolations
+ 2 text embeddings that should be passed in a batch through the model to create a nice interpolation of images.
- Then we have
-
- It'd be important to make use of
enable_cpu_offload()
to save memory.
- It'd be important to make use of
It's probably easier to start with the UnCLIPImageInterpolationPipeline
since image embeddings are just a single 1-d vector where as for text embeddings two latent vectors are used.
Would be more than happy to help if someone is interested in giving this a try - think it'll make for some super cool demos.
Open source status
- The model implementation is available
- The model weights are available (Only relevant if addition is not a scheduler).
Provide useful links for the implementation
No response