diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index 3441375b1dbc..a8ccd17459b0 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -72,6 +72,8 @@ title: Overview - local: using-diffusers/sdxl title: Stable Diffusion XL + - local: using-diffusers/sdxl_turbo + title: SDXL Turbo - local: using-diffusers/kandinsky title: Kandinsky - local: using-diffusers/controlnet @@ -331,6 +333,8 @@ title: Stable Diffusion 2 - local: api/pipelines/stable_diffusion/stable_diffusion_xl title: Stable Diffusion XL + - local: api/pipelines/stable_diffusion/sdxl_turbo + title: SDXL Turbo - local: api/pipelines/stable_diffusion/latent_upscale title: Latent upscaler - local: api/pipelines/stable_diffusion/upscale diff --git a/docs/source/en/api/pipelines/stable_diffusion/sdxl_turbo.md b/docs/source/en/api/pipelines/stable_diffusion/sdxl_turbo.md new file mode 100644 index 000000000000..cb5b84ee06df --- /dev/null +++ b/docs/source/en/api/pipelines/stable_diffusion/sdxl_turbo.md @@ -0,0 +1,53 @@ + + +# SDXL Turbo + +Stable Diffusion XL (SDXL) Turbo was proposed in [Adversarial Diffusion Distillation](https://stability.ai/research/adversarial-diffusion-distillation) by Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. + +The abstract from the paper is: + +*We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that efficiently samples large-scale foundational image diffusion models in just 1–4 steps while maintaining high image quality. We use score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal in combination with an adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps. Our analyses show that our model clearly outperforms existing few-step methods (GANs,Latent Consistency Models) in a single step and reaches the performance of state-of-the-art diffusion models (SDXL) in only four steps. ADD is the first method to unlock single-step, real-time image synthesis with foundation models.* + +## Tips + +- SDXL Turbo uses the exact same architecture as [SDXL](./stable_diffusion_xl). +- SDXL Turbo should disable guidance scale by setting `guidance_scale=0.0` +- SDXL Turbo should use `timestep_spacing='trailing'` for the scheduler and use between 1 and 4 steps. +- SDXL Turbo has been trained to generate images of size 512x512. +- SDXL Turbo is open-access, but not open-source meaning that one might have to buy a model license in order to use it for commercial applications. Make sure to read the [official model card](https://huggingface.co/stabilityai/sdxl-turbo) to learn more. + + + +To learn how to use SDXL Turbo for various tasks, how to optimize performance, and other usage examples, take a look at the [Stable Diffusion XL](../../../using-diffusers/sdxl_turbo) guide. + +Check out the [Stability AI](https://huggingface.co/stabilityai) Hub organization for the official base and refiner model checkpoints! + + + +## StableDiffusionXLPipeline + +[[autodoc]] StableDiffusionXLPipeline + - all + - __call__ + +## StableDiffusionXLImg2ImgPipeline + +[[autodoc]] StableDiffusionXLImg2ImgPipeline + - all + - __call__ + +## StableDiffusionXLInpaintPipeline + +[[autodoc]] StableDiffusionXLInpaintPipeline + - all + - __call__ diff --git a/docs/source/en/using-diffusers/sdxl_turbo.md b/docs/source/en/using-diffusers/sdxl_turbo.md new file mode 100644 index 000000000000..99e1c7000e3f --- /dev/null +++ b/docs/source/en/using-diffusers/sdxl_turbo.md @@ -0,0 +1,116 @@ + + +# Stable Diffusion XL Turbo + +[[open-in-colab]] + +SDXL Turbo is an adversarial time-distilled [Stable Diffusion XL](https://huggingface.co/papers/2307.01952) (SDXL) model capable +of running inference in as little as 1 step. + +This guide will show you how to use SDXL-Turbo for text-to-image and image-to-image. + +Before you begin, make sure you have the following libraries installed: + +```py +# uncomment to install the necessary libraries in Colab +#!pip install -q diffusers transformers accelerate omegaconf +``` + +## Load model checkpoints + +Model weights may be stored in separate subfolders on the Hub or locally, in which case, you should use the [`~StableDiffusionXLPipeline.from_pretrained`] method: + +```py +from diffusers import AutoPipelineForText2Image, AutoPipelineForImage2Image +import torch + +pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16") +pipeline = pipeline.to("cuda") +``` + +You can also use the [`~StableDiffusionXLPipeline.from_single_file`] method to load a model checkpoint stored in a single file format (`.ckpt` or `.safetensors`) from the Hub or locally: + +```py +from diffusers import StableDiffusionXLPipeline +import torch + +pipeline = StableDiffusionXLPipeline.from_single_file( + "https://huggingface.co/stabilityai/sdxl-turbo/blob/main/sd_xl_turbo_1.0_fp16.safetensors", torch_dtype=torch.float16) +pipeline = pipeline.to("cuda") +``` + +## Text-to-image + +For text-to-image, pass a text prompt. By default, SDXL Turbo generates a 512x512 image, and that resolution gives the best results. You can try setting the `height` and `width` parameters to 768x768 or 1024x1024, but you should expect quality degradations when doing so. + +Make sure to set `guidance_scale` to 0.0 to disable, as the model was trained without it. A single inference step is enough to generate high quality images. +Increasing the number of steps to 2, 3 or 4 should improve image quality. + +```py +from diffusers import AutoPipelineForText2Image +import torch + +pipeline_text2image = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16") +pipeline_text2image = pipeline_text2image.to("cuda") + +prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe." + +image = pipeline_text2image(prompt=prompt, guidance_scale=0.0, num_inference_steps=1).images[0] +image +``` + +
+ generated image of a racoon in a robe +
+ +## Image-to-image + +For image-to-image generation, make sure that `num_inference_steps * strength` is larger or equal to 1. +The image-to-image pipeline will run for `int(num_inference_steps * strength)` steps, e.g. `0.5 * 2.0 = 1` step in +our example below. + +```py +from diffusers import AutoPipelineForImage2Image +from diffusers.utils import load_image, make_image_grid + +# use from_pipe to avoid consuming additional memory when loading a checkpoint +pipeline = AutoPipelineForImage2Image.from_pipe(pipeline_text2image).to("cuda") + +init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") +init_image = init_image.resize((512, 512)) + +prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k" + +image = pipeline(prompt, image=init_image, strength=0.5, guidance_scale=0.0, num_inference_steps=2).images[0] +make_image_grid([init_image, image], rows=1, cols=2) +``` + +
+ Image-to-image generation sample using SDXL Turbo +
+ +## Speed-up SDXL Turbo even more + +- Compile the UNet if you are using PyTorch version 2 or better. The first inference run will be very slow, but subsequent ones will be much faster. + +```py +pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True) +``` + +- When using the default VAE, keep it in `float32` to avoid costly `dtype` conversions before and after each generation. You only need to do this one before your first generation: + +```py +pipe.upcast_vae() +``` + +As an alternative, you can also use a [16-bit VAE](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) created by community member [`@madebyollin`](https://huggingface.co/madebyollin) that does not need to be upcasted to `float32`.