diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
index 09012a5c693d..0bed36ad2287 100644
--- a/docs/source/en/_toctree.yml
+++ b/docs/source/en/_toctree.yml
@@ -33,15 +33,15 @@
- local: using-diffusers/pipeline_overview
title: Overview
- local: using-diffusers/unconditional_image_generation
- title: Unconditional Image Generation
+ title: Unconditional image generation
- local: using-diffusers/conditional_image_generation
- title: Text-to-Image Generation
+ title: Text-to-image generation
- local: using-diffusers/img2img
- title: Text-Guided Image-to-Image
+ title: Text-guided image-to-image
- local: using-diffusers/inpaint
- title: Text-Guided Image-Inpainting
+ title: Text-guided image-inpainting
- local: using-diffusers/depth2img
- title: Text-Guided Depth-to-Image
+ title: Text-guided depth-to-image
- local: using-diffusers/reusing_seeds
title: Reusing seeds for deterministic generation
- local: using-diffusers/reproducibility
diff --git a/docs/source/en/using-diffusers/conditional_image_generation.mdx b/docs/source/en/using-diffusers/conditional_image_generation.mdx
index edd1cd926734..0b5c02415d87 100644
--- a/docs/source/en/using-diffusers/conditional_image_generation.mdx
+++ b/docs/source/en/using-diffusers/conditional_image_generation.mdx
@@ -10,22 +10,27 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License.
-->
-# Conditional Image Generation
+# Conditional image generation
+
+[[open-in-colab]]
+
+Conditional image generation allows you to generate images from a text prompt. The text is converted into embeddings which are used to condition the model to generate an image from noise.
The [`DiffusionPipeline`] is the easiest way to use a pre-trained diffusion system for inference.
-Start by creating an instance of [`DiffusionPipeline`] and specify which pipeline checkpoint you would like to download.
-You can use the [`DiffusionPipeline`] for any [Diffusers' checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads).
-In this guide though, you'll use [`DiffusionPipeline`] for text-to-image generation with [Latent Diffusion](https://huggingface.co/CompVis/ldm-text2im-large-256):
+Start by creating an instance of [`DiffusionPipeline`] and specify which pipeline [checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads) you would like to download.
+
+In this guide, you'll use [`DiffusionPipeline`] for text-to-image generation with [Latent Diffusion](https://huggingface.co/CompVis/ldm-text2im-large-256):
```python
>>> from diffusers import DiffusionPipeline
>>> generator = DiffusionPipeline.from_pretrained("CompVis/ldm-text2im-large-256")
```
+
The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components.
-Because the model consists of roughly 1.4 billion parameters, we strongly recommend running it on GPU.
-You can move the generator object to GPU, just like you would in PyTorch.
+Because the model consists of roughly 1.4 billion parameters, we strongly recommend running it on a GPU.
+You can move the generator object to a GPU, just like you would in PyTorch:
```python
>>> generator.to("cuda")
@@ -37,10 +42,19 @@ Now you can use the `generator` on your text prompt:
>>> image = generator("An image of a squirrel in Picasso style").images[0]
```
-The output is by default wrapped into a [PIL Image object](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class).
+The output is by default wrapped into a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) object.
-You can save the image by simply calling:
+You can save the image by calling:
```python
>>> image.save("image_of_squirrel_painting.png")
```
+
+Try out the Spaces below, and feel free to play around with the guidance scale parameter to see how it affects the image quality!
+
+
\ No newline at end of file
diff --git a/docs/source/en/using-diffusers/depth2img.mdx b/docs/source/en/using-diffusers/depth2img.mdx
index eace64c3109a..a4141644b006 100644
--- a/docs/source/en/using-diffusers/depth2img.mdx
+++ b/docs/source/en/using-diffusers/depth2img.mdx
@@ -10,9 +10,13 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License.
-->
-# Text-Guided Image-to-Image Generation
+# Text-guided depth-to-image generation
-The [`StableDiffusionDepth2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images as well as a `depth_map` to preserve the images' structure. If no `depth_map` is provided, the pipeline will automatically predict the depth via an integrated depth-estimation model.
+[[open-in-colab]]
+
+The [`StableDiffusionDepth2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images. In addition, you can also pass a `depth_map` to preserve the image structure. If no `depth_map` is provided, the pipeline automatically predicts the depth via an integrated [depth-estimation model](https://github.com/isl-org/MiDaS).
+
+Start by creating an instance of the [`StableDiffusionDepth2ImgPipeline`]:
```python
import torch
@@ -25,11 +29,28 @@ pipe = StableDiffusionDepth2ImgPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-depth",
torch_dtype=torch.float16,
).to("cuda")
+```
+Now pass your prompt to the pipeline. You can also pass a `negative_prompt` to prevent certain words from guiding how an image is generated:
+```python
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
init_image = Image.open(requests.get(url, stream=True).raw)
prompt = "two tigers"
n_prompt = "bad, deformed, ugly, bad anatomy"
image = pipe(prompt=prompt, image=init_image, negative_prompt=n_prompt, strength=0.7).images[0]
+image
```
+
+| Input | Output |
+|---------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
+| |
|
+
+Play around with the Spaces below and see if you notice a difference between generated images with and without a depth map!
+
+
diff --git a/docs/source/en/using-diffusers/img2img.mdx b/docs/source/en/using-diffusers/img2img.mdx
index 6ebe1f0633f0..71540fbf5dd9 100644
--- a/docs/source/en/using-diffusers/img2img.mdx
+++ b/docs/source/en/using-diffusers/img2img.mdx
@@ -10,11 +10,11 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License.
-->
-# Text-Guided Image-to-Image Generation
+# Text-guided image-to-image generation
[[open-in-colab]]
-The [`StableDiffusionImg2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images. This tutorial shows how to use it for text-guided image-to-image generation with Stable Diffusion model.
+The [`StableDiffusionImg2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images.
Before you begin, make sure you have all the necessary libraries installed:
@@ -22,27 +22,22 @@ Before you begin, make sure you have all the necessary libraries installed:
!pip install diffusers transformers ftfy accelerate
```
-Get started by creating a [`StableDiffusionImg2ImgPipeline`] with a pretrained Stable Diffusion model.
+Get started by creating a [`StableDiffusionImg2ImgPipeline`] with a pretrained Stable Diffusion model like [`nitrosocke/Ghibli-Diffusion`](https://huggingface.co/nitrosocke/Ghibli-Diffusion).
```python
import torch
import requests
from PIL import Image
from io import BytesIO
-
from diffusers import StableDiffusionImg2ImgPipeline
-```
-Load the pipeline:
-
-```python
device = "cuda"
-pipe = StableDiffusionImg2ImgPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to(
+pipe = StableDiffusionImg2ImgPipeline.from_pretrained("nitrosocke/Ghibli-Diffusion", torch_dtype=torch.float16).to(
device
)
```
-Download an initial image and preprocess it so we can pass it to the pipeline:
+Download and preprocess an initial image so you can pass it to the pipeline:
```python
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
@@ -53,61 +48,52 @@ init_image.thumbnail((768, 768))
init_image
```
-
-
-Define the prompt and run the pipeline:
-
-```python
-prompt = "A fantasy landscape, trending on artstation"
-```
+