-
Notifications
You must be signed in to change notification settings - Fork 6.1k
unCLIP variant #2297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unCLIP variant #2297
Conversation
The documentation is not available anymore as the PR was closed or merged. |
src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip_img2img.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip_img2img.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip_img2img.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip_img2img.py
Outdated
Show resolved
Hide resolved
…p_img2img.py Co-authored-by: Patrick von Platen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Let's just add docs now and we're good to merge :-)
475523e
to
e3458b0
Compare
Happy to merge after the docs are in :-) |
d7c18f4
to
79f1f5e
Compare
79f1f5e
to
19a2bbf
Compare
@@ -17,7 +17,7 @@ specific language governing permissions and limitations under the License. | |||
The Stable Diffusion model was created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), [runway](https://github.com/runwayml), and [LAION](https://laion.ai/). The [`StableDiffusionPipeline`] is capable of generating photo-realistic images given any text input using Stable Diffusion. | |||
|
|||
The original codebase can be found here: | |||
- *Stable Diffusion V1*: [CampVis/stable-diffusion](https://github.com/CompVis/stable-diffusion) | |||
- *Stable Diffusion V1*: [CompVis/stable-diffusion](https://github.com/CompVis/stable-diffusion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super! Thanks a lot for the addition
Co-authored-by: Patrick von Platen <[email protected]>
* pipeline_variant * Add docs for when clip_stats_path is specified * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip_img2img.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip_img2img.py Co-authored-by: Patrick von Platen <[email protected]> * prepare_latents # Copied from re: @patrickvonplaten * NoiseAugmentor->ImageNormalizer * stable_unclip_prior default to None re: @patrickvonplaten * prepare_prior_extra_step_kwargs * prior denoising scale model input * {DDIM,DDPM}Scheduler -> KarrasDiffusionSchedulers re: @patrickvonplaten * docs * Update docs/source/en/api/pipelines/stable_unclip.mdx Co-authored-by: Patrick von Platen <[email protected]> --------- Co-authored-by: Patrick von Platen <[email protected]>
It appears that the documentation currently points to an invalid checkpoint:
so currently if you attempt to run the model according to the documentation it will fail. The stable unclip model is also conceptually very similar to the versatile diffusion model (they have image variations and text + image conditioned image synthesis). Perhaps a note can be put into the current stable_unclip docs so people can utilize that model until the stable-unclip weights are up? |
Yeah this is a bit WIP still |
* pipeline_variant * Add docs for when clip_stats_path is specified * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip_img2img.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip_img2img.py Co-authored-by: Patrick von Platen <[email protected]> * prepare_latents # Copied from re: @patrickvonplaten * NoiseAugmentor->ImageNormalizer * stable_unclip_prior default to None re: @patrickvonplaten * prepare_prior_extra_step_kwargs * prior denoising scale model input * {DDIM,DDPM}Scheduler -> KarrasDiffusionSchedulers re: @patrickvonplaten * docs * Update docs/source/en/api/pipelines/stable_unclip.mdx Co-authored-by: Patrick von Platen <[email protected]> --------- Co-authored-by: Patrick von Platen <[email protected]>
* pipeline_variant * Add docs for when clip_stats_path is specified * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip_img2img.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip_img2img.py Co-authored-by: Patrick von Platen <[email protected]> * prepare_latents # Copied from re: @patrickvonplaten * NoiseAugmentor->ImageNormalizer * stable_unclip_prior default to None re: @patrickvonplaten * prepare_prior_extra_step_kwargs * prior denoising scale model input * {DDIM,DDPM}Scheduler -> KarrasDiffusionSchedulers re: @patrickvonplaten * docs * Update docs/source/en/api/pipelines/stable_unclip.mdx Co-authored-by: Patrick von Platen <[email protected]> --------- Co-authored-by: Patrick von Platen <[email protected]>
* pipeline_variant * Add docs for when clip_stats_path is specified * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip_img2img.py Co-authored-by: Patrick von Platen <[email protected]> * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip_img2img.py Co-authored-by: Patrick von Platen <[email protected]> * prepare_latents # Copied from re: @patrickvonplaten * NoiseAugmentor->ImageNormalizer * stable_unclip_prior default to None re: @patrickvonplaten * prepare_prior_extra_step_kwargs * prior denoising scale model input * {DDIM,DDPM}Scheduler -> KarrasDiffusionSchedulers re: @patrickvonplaten * docs * Update docs/source/en/api/pipelines/stable_unclip.mdx Co-authored-by: Patrick von Platen <[email protected]> --------- Co-authored-by: Patrick von Platen <[email protected]>
Porting weights
Changes to existing models
class_embeddings
. The embedding type is very similar to the existingtimestep
embeddings type except that the class_embeddings should not be first converted to sinusoidal embeddings and should be projected from an arbitrary input dimension. I added the newclass_embed_type
"projection" for this.New models
Added the NoiseAugmentor class which handles adding noise to the image embeddings. This had to be a separate class because it needs parameters to store the clip mean and std vectors. The class is configured with a noise schedule just as our scheduler classes are. I considered making this class much lighter weight by removing the noise schedule and just having it hold the clip stats. If we did this, we could use the existing scheduler class on the pipeline to noise the vector. I opted against doing this because it would require both of the noise schedules for the diffusion process and augmenting the image embedding to be configured the same. This would work in the current models we ported as they do use the sameInstead added a DDPMScheduler to the pipeline to hold the noising schedule and usedsquaredcos_cap_v2
betas with 1000 timesteps. However, we can't guarantee that will always be the case, and the noise scheduling code to be duplicated is quite small.StableUnCLIPImageNormalizer
to hold the CLIP statistics