diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index 09012a5c693d..0bed36ad2287 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -33,15 +33,15 @@ - local: using-diffusers/pipeline_overview title: Overview - local: using-diffusers/unconditional_image_generation - title: Unconditional Image Generation + title: Unconditional image generation - local: using-diffusers/conditional_image_generation - title: Text-to-Image Generation + title: Text-to-image generation - local: using-diffusers/img2img - title: Text-Guided Image-to-Image + title: Text-guided image-to-image - local: using-diffusers/inpaint - title: Text-Guided Image-Inpainting + title: Text-guided image-inpainting - local: using-diffusers/depth2img - title: Text-Guided Depth-to-Image + title: Text-guided depth-to-image - local: using-diffusers/reusing_seeds title: Reusing seeds for deterministic generation - local: using-diffusers/reproducibility diff --git a/docs/source/en/using-diffusers/conditional_image_generation.mdx b/docs/source/en/using-diffusers/conditional_image_generation.mdx index edd1cd926734..0b5c02415d87 100644 --- a/docs/source/en/using-diffusers/conditional_image_generation.mdx +++ b/docs/source/en/using-diffusers/conditional_image_generation.mdx @@ -10,22 +10,27 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o specific language governing permissions and limitations under the License. --> -# Conditional Image Generation +# Conditional image generation + +[[open-in-colab]] + +Conditional image generation allows you to generate images from a text prompt. The text is converted into embeddings which are used to condition the model to generate an image from noise. The [`DiffusionPipeline`] is the easiest way to use a pre-trained diffusion system for inference. -Start by creating an instance of [`DiffusionPipeline`] and specify which pipeline checkpoint you would like to download. -You can use the [`DiffusionPipeline`] for any [Diffusers' checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads). -In this guide though, you'll use [`DiffusionPipeline`] for text-to-image generation with [Latent Diffusion](https://huggingface.co/CompVis/ldm-text2im-large-256): +Start by creating an instance of [`DiffusionPipeline`] and specify which pipeline [checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads) you would like to download. + +In this guide, you'll use [`DiffusionPipeline`] for text-to-image generation with [Latent Diffusion](https://huggingface.co/CompVis/ldm-text2im-large-256): ```python >>> from diffusers import DiffusionPipeline >>> generator = DiffusionPipeline.from_pretrained("CompVis/ldm-text2im-large-256") ``` + The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components. -Because the model consists of roughly 1.4 billion parameters, we strongly recommend running it on GPU. -You can move the generator object to GPU, just like you would in PyTorch. +Because the model consists of roughly 1.4 billion parameters, we strongly recommend running it on a GPU. +You can move the generator object to a GPU, just like you would in PyTorch: ```python >>> generator.to("cuda") @@ -37,10 +42,19 @@ Now you can use the `generator` on your text prompt: >>> image = generator("An image of a squirrel in Picasso style").images[0] ``` -The output is by default wrapped into a [PIL Image object](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class). +The output is by default wrapped into a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) object. -You can save the image by simply calling: +You can save the image by calling: ```python >>> image.save("image_of_squirrel_painting.png") ``` + +Try out the Spaces below, and feel free to play around with the guidance scale parameter to see how it affects the image quality! + + \ No newline at end of file diff --git a/docs/source/en/using-diffusers/depth2img.mdx b/docs/source/en/using-diffusers/depth2img.mdx index eace64c3109a..a4141644b006 100644 --- a/docs/source/en/using-diffusers/depth2img.mdx +++ b/docs/source/en/using-diffusers/depth2img.mdx @@ -10,9 +10,13 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o specific language governing permissions and limitations under the License. --> -# Text-Guided Image-to-Image Generation +# Text-guided depth-to-image generation -The [`StableDiffusionDepth2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images as well as a `depth_map` to preserve the images' structure. If no `depth_map` is provided, the pipeline will automatically predict the depth via an integrated depth-estimation model. +[[open-in-colab]] + +The [`StableDiffusionDepth2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images. In addition, you can also pass a `depth_map` to preserve the image structure. If no `depth_map` is provided, the pipeline automatically predicts the depth via an integrated [depth-estimation model](https://github.com/isl-org/MiDaS). + +Start by creating an instance of the [`StableDiffusionDepth2ImgPipeline`]: ```python import torch @@ -25,11 +29,28 @@ pipe = StableDiffusionDepth2ImgPipeline.from_pretrained( "stabilityai/stable-diffusion-2-depth", torch_dtype=torch.float16, ).to("cuda") +``` +Now pass your prompt to the pipeline. You can also pass a `negative_prompt` to prevent certain words from guiding how an image is generated: +```python url = "http://images.cocodataset.org/val2017/000000039769.jpg" init_image = Image.open(requests.get(url, stream=True).raw) prompt = "two tigers" n_prompt = "bad, deformed, ugly, bad anatomy" image = pipe(prompt=prompt, image=init_image, negative_prompt=n_prompt, strength=0.7).images[0] +image ``` + +| Input | Output | +|---------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------| +| | | + +Play around with the Spaces below and see if you notice a difference between generated images with and without a depth map! + + diff --git a/docs/source/en/using-diffusers/img2img.mdx b/docs/source/en/using-diffusers/img2img.mdx index 6ebe1f0633f0..71540fbf5dd9 100644 --- a/docs/source/en/using-diffusers/img2img.mdx +++ b/docs/source/en/using-diffusers/img2img.mdx @@ -10,11 +10,11 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o specific language governing permissions and limitations under the License. --> -# Text-Guided Image-to-Image Generation +# Text-guided image-to-image generation [[open-in-colab]] -The [`StableDiffusionImg2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images. This tutorial shows how to use it for text-guided image-to-image generation with Stable Diffusion model. +The [`StableDiffusionImg2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images. Before you begin, make sure you have all the necessary libraries installed: @@ -22,27 +22,22 @@ Before you begin, make sure you have all the necessary libraries installed: !pip install diffusers transformers ftfy accelerate ``` -Get started by creating a [`StableDiffusionImg2ImgPipeline`] with a pretrained Stable Diffusion model. +Get started by creating a [`StableDiffusionImg2ImgPipeline`] with a pretrained Stable Diffusion model like [`nitrosocke/Ghibli-Diffusion`](https://huggingface.co/nitrosocke/Ghibli-Diffusion). ```python import torch import requests from PIL import Image from io import BytesIO - from diffusers import StableDiffusionImg2ImgPipeline -``` -Load the pipeline: - -```python device = "cuda" -pipe = StableDiffusionImg2ImgPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to( +pipe = StableDiffusionImg2ImgPipeline.from_pretrained("nitrosocke/Ghibli-Diffusion", torch_dtype=torch.float16).to( device ) ``` -Download an initial image and preprocess it so we can pass it to the pipeline: +Download and preprocess an initial image so you can pass it to the pipeline: ```python url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" @@ -53,61 +48,52 @@ init_image.thumbnail((768, 768)) init_image ``` -![img](https://huggingface.co/datasets/YiYiXu/test-doc-assets/resolve/main/image_2_image_using_diffusers_cell_8_output_0.jpeg) - -Define the prompt and run the pipeline: - -```python -prompt = "A fantasy landscape, trending on artstation" -``` +
+ +
-`strength` is a value between 0.0 and 1.0, that controls the amount of noise that is added to the input image. Values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input. +๐Ÿ’ก `strength` is a value between 0.0 and 1.0 that controls the amount of noise added to the input image. Values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input. -Let's generate two images with same pipeline and seed, but with different values for `strength`: +Define the prompt (for this checkpoint finetuned on Ghibli-style art, you need to prefix the prompt with the `ghibli style` tokens) and run the pipeline: ```python +prompt = "ghibli style, a fantasy landscape with castles" generator = torch.Generator(device=device).manual_seed(1024) image = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5, generator=generator).images[0] -``` - -```python image ``` -![img](https://huggingface.co/datasets/YiYiXu/test-doc-assets/resolve/main/image_2_image_using_diffusers_cell_13_output_0.jpeg) +
+ +
- -```python -image = pipe(prompt=prompt, image=init_image, strength=0.5, guidance_scale=7.5, generator=generator).images[0] -image -``` - -![img](https://huggingface.co/datasets/YiYiXu/test-doc-assets/resolve/main/image_2_image_using_diffusers_cell_14_output_1.jpeg) - - -As you can see, when using a lower value for `strength`, the generated image is more closer to the original `image`. - -Now let's use a different scheduler - [LMSDiscreteScheduler](https://huggingface.co/docs/diffusers/api/schedulers#diffusers.LMSDiscreteScheduler): +You can also try experimenting with a different scheduler to see how that affects the output: ```python from diffusers import LMSDiscreteScheduler lms = LMSDiscreteScheduler.from_config(pipe.scheduler.config) pipe.scheduler = lms -``` - -```python generator = torch.Generator(device=device).manual_seed(1024) image = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5, generator=generator).images[0] -``` - -```python image ``` -![img](https://huggingface.co/datasets/YiYiXu/test-doc-assets/resolve/main/image_2_image_using_diffusers_cell_19_output_0.jpeg) +
+ +
+ +Check out the Spaces below, and try generating images with different values for `strength`. You'll notice that using lower values for `strength` produces images that are more similar to the original image. + +Feel free to also switch the scheduler to the [`LMSDiscreteScheduler`] and see how that affects the output. + diff --git a/docs/source/en/using-diffusers/inpaint.mdx b/docs/source/en/using-diffusers/inpaint.mdx index e2084aa13f8b..41a6d4b7e1b2 100644 --- a/docs/source/en/using-diffusers/inpaint.mdx +++ b/docs/source/en/using-diffusers/inpaint.mdx @@ -10,9 +10,13 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o specific language governing permissions and limitations under the License. --> -# Text-Guided Image-Inpainting +# Text-guided image-inpainting -The [`StableDiffusionInpaintPipeline`] lets you edit specific parts of an image by providing a mask and a text prompt. It uses a version of Stable Diffusion specifically trained for in-painting tasks. +[[open-in-colab]] + +The [`StableDiffusionInpaintPipeline`] allows you to edit specific parts of an image by providing a mask and a text prompt. It uses a version of Stable Diffusion, like [`runwayml/stable-diffusion-inpainting`](https://huggingface.co/runwayml/stable-diffusion-inpainting) specifically trained for inpainting tasks. + +Get started by loading an instance of the [`StableDiffusionInpaintPipeline`]: ```python import PIL @@ -22,7 +26,16 @@ from io import BytesIO from diffusers import StableDiffusionInpaintPipeline +pipeline = StableDiffusionInpaintPipeline.from_pretrained( + "runwayml/stable-diffusion-inpainting", + torch_dtype=torch.float16, +) +pipeline = pipeline.to("cuda") +``` + +Download an image and a mask of a dog which you'll eventually replace: +```python def download_image(url): response = requests.get(url) return PIL.Image.open(BytesIO(response.content)).convert("RGB") @@ -33,24 +46,31 @@ mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data init_image = download_image(img_url).resize((512, 512)) mask_image = download_image(mask_url).resize((512, 512)) +``` -pipe = StableDiffusionInpaintPipeline.from_pretrained( - "runwayml/stable-diffusion-inpainting", - torch_dtype=torch.float16, -) -pipe = pipe.to("cuda") +Now you can create a prompt to replace the mask with something else: +```python prompt = "Face of a yellow cat, high resolution, sitting on a park bench" image = pipe(prompt=prompt, image=init_image, mask_image=mask_image).images[0] ``` -`image` | `mask_image` | `prompt` | **Output** | +`image` | `mask_image` | `prompt` | output | :-------------------------:|:-------------------------:|:-------------------------:|-------------------------:| -drawing | drawing | ***Face of a yellow cat, high resolution, sitting on a park bench*** | drawing | - +drawing | drawing | ***Face of a yellow cat, high resolution, sitting on a park bench*** | drawing | -You can also run this example on colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb) -A previous experimental implementation of in-painting used a different, lower-quality process. To ensure backwards compatibility, loading a pretrained pipeline that doesn't contain the new model will still apply the old in-painting method. + +A previous experimental implementation of inpainting used a different, lower-quality process. To ensure backwards compatibility, loading a pretrained pipeline that doesn't contain the new model will still apply the old inpainting method. + + +Check out the Spaces below to try out image inpainting yourself! + + diff --git a/docs/source/en/using-diffusers/unconditional_image_generation.mdx b/docs/source/en/using-diffusers/unconditional_image_generation.mdx index b1722517cc26..c0888f94c6c1 100644 --- a/docs/source/en/using-diffusers/unconditional_image_generation.mdx +++ b/docs/source/en/using-diffusers/unconditional_image_generation.mdx @@ -10,43 +10,60 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o specific language governing permissions and limitations under the License. --> +# Unconditional image generation +[[open-in-colab]] -# Unconditional Image Generation +Unconditional image generation is a relatively straightforward task. The model only generates images - without any additional context like text or an image - resembling the training data it was trained on. The [`DiffusionPipeline`] is the easiest way to use a pre-trained diffusion system for inference. Start by creating an instance of [`DiffusionPipeline`] and specify which pipeline checkpoint you would like to download. -You can use the [`DiffusionPipeline`] for any [Diffusers' checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads). -In this guide though, you'll use [`DiffusionPipeline`] for unconditional image generation with [DDPM](https://arxiv.org/abs/2006.11239): +You can use any of the ๐Ÿงจ Diffusers [checkpoints](https://huggingface.co/models?library=diffusers&sort=downloads) from the Hub (the checkpoint you'll use generates images of butterflies). + + + +๐Ÿ’ก Want to train your own unconditional image generation model? Take a look at the training [guide](training/unconditional_training) to learn how to generate your own images. + + + +In this guide, you'll use [`DiffusionPipeline`] for unconditional image generation with [DDPM](https://arxiv.org/abs/2006.11239): ```python >>> from diffusers import DiffusionPipeline ->>> generator = DiffusionPipeline.from_pretrained("google/ddpm-celebahq-256") +>>> generator = DiffusionPipeline.from_pretrained("anton-l/ddpm-butterflies-128") ``` + The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components. -Because the model consists of roughly 1.4 billion parameters, we strongly recommend running it on GPU. -You can move the generator object to GPU, just like you would in PyTorch. +Because the model consists of roughly 1.4 billion parameters, we strongly recommend running it on a GPU. +You can move the generator object to a GPU, just like you would in PyTorch: ```python >>> generator.to("cuda") ``` -Now you can use the `generator` on your text prompt: +Now you can use the `generator` to generate an image: ```python >>> image = generator().images[0] ``` -The output is by default wrapped into a [PIL Image object](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class). +The output is by default wrapped into a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) object. -You can save the image by simply calling: +You can save the image by calling: ```python >>> image.save("generated_image.png") ``` +Try out the Spaces below, and feel free to play around with the inference steps parameter to see how it affects the image quality! +