-
Notifications
You must be signed in to change notification settings - Fork 6.6k
[SDXL Turbo] Add some docs #5982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
53 changes: 53 additions & 0 deletions
53
docs/source/en/api/pipelines/stable_diffusion/sdxl_turbo.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| <!--Copyright 2023 The HuggingFace Team. All rights reserved. | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. | ||
| --> | ||
|
|
||
| # SDXL Turbo | ||
|
|
||
| Stable Diffusion XL (SDXL) Turbo was proposed in [Adversarial Diffusion Distillation](https://stability.ai/research/adversarial-diffusion-distillation) by Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. | ||
|
|
||
| The abstract from the paper is: | ||
|
|
||
| *We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1–4 steps while maintaining high image quality. We use score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal in combination with an adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps. Our analyses show that our model clearly outperforms existing few-step methods (GANs,Latent Consistency Models) in a single step and reaches the performance of state-of-the-art diffusion models (SDXL) in only four steps. ADD is the first method to unlock single-step, real-time image synthesis with foundation models.* | ||
|
|
||
| ## Tips | ||
|
|
||
| - SDXL Turbo uses the exact same architecture as [SDXL](./stable_diffusion_xl). | ||
| - SDXL Turbo should disable guidance scale by setting `guidance_scale=0.0` | ||
| - SDXL Turbo should use `timestep_spacing='trailing'` for the scheduler and use between 1 and 4 steps. | ||
| - SDXL Turbo has been trained to generate images of size 512x512. | ||
| - SDXL Turbo is open-access, but not open-source meaning that one might have to buy a model license in order to use it for commercial applications. Make sure to read the [official model card](https://huggingface.co/stabilityai/sdxl-turbo) to learn more. | ||
|
|
||
| <Tip> | ||
|
|
||
| To learn how to use SDXL Turbo for various tasks, how to optimize performance, and other usage examples, take a look at the [Stable Diffusion XL](../../../using-diffusers/sdxl_turbo) guide. | ||
|
|
||
| Check out the [Stability AI](https://huggingface.co/stabilityai) Hub organization for the official base and refiner model checkpoints! | ||
|
|
||
| </Tip> | ||
|
|
||
| ## StableDiffusionXLPipeline | ||
|
|
||
| [[autodoc]] StableDiffusionXLPipeline | ||
| - all | ||
| - __call__ | ||
|
|
||
| ## StableDiffusionXLImg2ImgPipeline | ||
|
|
||
| [[autodoc]] StableDiffusionXLImg2ImgPipeline | ||
| - all | ||
| - __call__ | ||
|
|
||
| ## StableDiffusionXLInpaintPipeline | ||
|
|
||
| [[autodoc]] StableDiffusionXLInpaintPipeline | ||
| - all | ||
| - __call__ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,116 @@ | ||
| <!--Copyright 2023 The HuggingFace Team. All rights reserved. | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. | ||
| --> | ||
|
|
||
| # Stable Diffusion XL Turbo | ||
|
|
||
| [[open-in-colab]] | ||
|
|
||
| SDXL Turbo is an adversarial time-distilled [Stable Diffusion XL](https://huggingface.co/papers/2307.01952) (SDXL) model capable | ||
| of running inference in as little as 1 step. | ||
|
|
||
| This guide will show you how to use SDXL-Turbo for text-to-image and image-to-image. | ||
|
|
||
| Before you begin, make sure you have the following libraries installed: | ||
|
|
||
| ```py | ||
| # uncomment to install the necessary libraries in Colab | ||
| #!pip install -q diffusers transformers accelerate omegaconf | ||
| ``` | ||
|
|
||
| ## Load model checkpoints | ||
|
|
||
| Model weights may be stored in separate subfolders on the Hub or locally, in which case, you should use the [`~StableDiffusionXLPipeline.from_pretrained`] method: | ||
|
|
||
| ```py | ||
| from diffusers import AutoPipelineForText2Image, AutoPipelineForImage2Image | ||
| import torch | ||
|
|
||
| pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16") | ||
| pipeline = pipeline.to("cuda") | ||
| ``` | ||
|
|
||
| You can also use the [`~StableDiffusionXLPipeline.from_single_file`] method to load a model checkpoint stored in a single file format (`.ckpt` or `.safetensors`) from the Hub or locally: | ||
|
|
||
| ```py | ||
| from diffusers import StableDiffusionXLPipeline | ||
| import torch | ||
|
|
||
| pipeline = StableDiffusionXLPipeline.from_single_file( | ||
| "https://huggingface.co/stabilityai/sdxl-turbo/blob/main/sd_xl_turbo_1.0_fp16.safetensors", torch_dtype=torch.float16) | ||
| pipeline = pipeline.to("cuda") | ||
| ``` | ||
|
|
||
| ## Text-to-image | ||
|
|
||
| For text-to-image, pass a text prompt. By default, SDXL Turbo generates a 512x512 image, and that resolution gives the best results. You can try setting the `height` and `width` parameters to 768x768 or 1024x1024, but you should expect quality degradations when doing so. | ||
|
|
||
| Make sure to set `guidance_scale` to 0.0 to disable, as the model was trained without it. A single inference step is enough to generate high quality images. | ||
| Increasing the number of steps to 2, 3 or 4 should improve image quality. | ||
|
|
||
| ```py | ||
| from diffusers import AutoPipelineForText2Image | ||
| import torch | ||
|
|
||
| pipeline_text2image = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16") | ||
| pipeline_text2image = pipeline_text2image.to("cuda") | ||
|
|
||
| prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe." | ||
|
|
||
| image = pipeline_text2image(prompt=prompt, guidance_scale=0.0, num_inference_steps=1).images[0] | ||
| image | ||
| ``` | ||
|
|
||
| <div class="flex justify-center"> | ||
| <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sdxl-turbo-text2img.png" alt="generated image of a racoon in a robe"/> | ||
| </div> | ||
|
|
||
| ## Image-to-image | ||
|
|
||
| For image-to-image generation, make sure that `num_inference_steps * strength` is larger or equal to 1. | ||
| The image-to-image pipeline will run for `int(num_inference_steps * strength)` steps, e.g. `0.5 * 2.0 = 1` step in | ||
| our example below. | ||
|
|
||
| ```py | ||
| from diffusers import AutoPipelineForImage2Image | ||
| from diffusers.utils import load_image, make_image_grid | ||
|
|
||
| # use from_pipe to avoid consuming additional memory when loading a checkpoint | ||
| pipeline = AutoPipelineForImage2Image.from_pipe(pipeline_text2image).to("cuda") | ||
|
|
||
| init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") | ||
| init_image = init_image.resize((512, 512)) | ||
|
|
||
| prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k" | ||
|
|
||
| image = pipeline(prompt, image=init_image, strength=0.5, guidance_scale=0.0, num_inference_steps=2).images[0] | ||
| make_image_grid([init_image, image], rows=1, cols=2) | ||
| ``` | ||
|
|
||
| <div class="flex justify-center"> | ||
| <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sdxl-turbo-img2img.png" alt="Image-to-image generation sample using SDXL Turbo"/> | ||
| </div> | ||
|
|
||
| ## Speed-up SDXL Turbo even more | ||
|
|
||
| - Compile the UNet if you are using PyTorch version 2 or better. The first inference run will be very slow, but subsequent ones will be much faster. | ||
|
|
||
| ```py | ||
| pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True) | ||
| ``` | ||
|
|
||
| - When using the default VAE, keep it in `float32` to avoid costly `dtype` conversions before and after each generation. You only need to do this one before your first generation: | ||
|
|
||
| ```py | ||
| pipe.upcast_vae() | ||
| ``` | ||
|
|
||
| As an alternative, you can also use a [16-bit VAE](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) created by community member [`@madebyollin`](https://huggingface.co/madebyollin) that does not need to be upcasted to `float32`. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is
omegaconfrequired?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah think it's in order to load the single file format - @DN6 we should probs try to not have it be required