Skip to content

Commit b2598f1

Browse files
committed
Merge branch 'main' of https://github.com/huggingface/diffusers into diffedit-inpainting-pipeline
2 parents eda67c2 + b10f527 commit b2598f1

File tree

73 files changed

+1501
-268
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

73 files changed

+1501
-268
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,8 @@
191191
title: MultiDiffusion Panorama
192192
- local: api/pipelines/stable_diffusion/controlnet
193193
title: Text-to-Image Generation with ControlNet Conditioning
194+
- local: api/pipelines/stable_diffusion/model_editing
195+
title: Text-to-Image Model Editing
194196
- local: api/pipelines/stable_diffusion/diffedit
195197
title: DiffEdit
196198
title: Stable Diffusion
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Editing Implicit Assumptions in Text-to-Image Diffusion Models
14+
15+
## Overview
16+
17+
[Editing Implicit Assumptions in Text-to-Image Diffusion Models](https://arxiv.org/abs/2303.08084) by Hadas Orgad, Bahjat Kawar, and Yonatan Belinkov.
18+
19+
The abstract of the paper is the following:
20+
21+
*Text-to-image diffusion models often make implicit assumptions about the world when generating images. While some assumptions are useful (e.g., the sky is blue), they can also be outdated, incorrect, or reflective of social biases present in the training data. Thus, there is a need to control these assumptions without requiring explicit user input or costly re-training. In this work, we aim to edit a given implicit assumption in a pre-trained diffusion model. Our Text-to-Image Model Editing method, TIME for short, receives a pair of inputs: a "source" under-specified prompt for which the model makes an implicit assumption (e.g., "a pack of roses"), and a "destination" prompt that describes the same setting, but with a specified desired attribute (e.g., "a pack of blue roses"). TIME then updates the model's cross-attention layers, as these layers assign visual meaning to textual tokens. We edit the projection matrices in these layers such that the source prompt is projected close to the destination prompt. Our method is highly efficient, as it modifies a mere 2.2% of the model's parameters in under one second. To evaluate model editing approaches, we introduce TIMED (TIME Dataset), containing 147 source and destination prompt pairs from various domains. Our experiments (using Stable Diffusion) show that TIME is successful in model editing, generalizes well for related prompts unseen during editing, and imposes minimal effect on unrelated generations.*
22+
23+
Resources:
24+
25+
* [Project Page](https://time-diffusion.github.io/).
26+
* [Paper](https://arxiv.org/abs/2303.08084).
27+
* [Original Code](https://github.com/bahjat-kawar/time-diffusion).
28+
* [Demo](https://huggingface.co/spaces/bahjat-kawar/time-diffusion).
29+
30+
## Available Pipelines:
31+
32+
| Pipeline | Tasks | Demo
33+
|---|---|:---:|
34+
| [StableDiffusionModelEditingPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_model_editing.py) | *Text-to-Image Model Editing* | [🤗 Space](https://huggingface.co/spaces/bahjat-kawar/time-diffusion)) |
35+
36+
This pipeline enables editing the diffusion model weights, such that its assumptions on a given concept are changed. The resulting change is expected to take effect in all prompt generations pertaining to the edited concept.
37+
38+
## Usage example
39+
40+
```python
41+
import torch
42+
from diffusers import StableDiffusionModelEditingPipeline
43+
44+
model_ckpt = "CompVis/stable-diffusion-v1-4"
45+
pipe = StableDiffusionModelEditingPipeline.from_pretrained(model_ckpt)
46+
47+
pipe = pipe.to("cuda")
48+
49+
source_prompt = "A pack of roses"
50+
destination_prompt = "A pack of blue roses"
51+
pipe.edit_model(source_prompt, destination_prompt)
52+
53+
prompt = "A field of roses"
54+
image = pipe(prompt).images[0]
55+
image.save("field_of_roses.png")
56+
```
57+
58+
## StableDiffusionModelEditingPipeline
59+
[[autodoc]] StableDiffusionModelEditingPipeline
60+
- __call__
61+
- all

docs/source/en/api/pipelines/stable_diffusion/overview.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ For more details about how Stable Diffusion works and how it differs from the ba
3535
| [StableDiffusionInstructPix2PixPipeline](./pix2pix) | **Experimental** *Text-Based Image Editing * | | [InstructPix2Pix: Learning to Follow Image Editing Instructions](https://huggingface.co/spaces/timbrooks/instruct-pix2pix)
3636
| [StableDiffusionAttendAndExcitePipeline](./attend_and_excite) | **Experimental** *Text-to-Image Generation * | | [Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models](https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite)
3737
| [StableDiffusionPix2PixZeroPipeline](./pix2pix_zero) | **Experimental** *Text-Based Image Editing * | | [Zero-shot Image-to-Image Translation](https://arxiv.org/abs/2302.03027)
38+
| [StableDiffusionModelEditingPipeline](./model_editing) | **Experimental** *Text-to-Image Model Editing * | | [Editing Implicit Assumptions in Text-to-Image Diffusion Models](https://arxiv.org/abs/2303.08084)
3839

3940

4041

docs/source/en/api/pipelines/stable_unclip.mdx

Lines changed: 21 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,10 @@ Stable unCLIP checkpoints are finetuned from [stable diffusion 2.1](./stable_dif
1616
Stable unCLIP also still conditions on text embeddings. Given the two separate conditionings, stable unCLIP can be used
1717
for text guided image variation. When combined with an unCLIP prior, it can also be used for full text to image generation.
1818

19+
To know more about the unCLIP process, check out the following paper:
20+
21+
[Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125) by Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen.
22+
1923
## Tips
2024

2125
Stable unCLIP takes a `noise_level` as input during inference. `noise_level` determines how much noise is added
@@ -24,23 +28,15 @@ we do not add any additional noise to the image embeddings i.e. `noise_level = 0
2428

2529
### Available checkpoints:
2630

27-
TODO
31+
* Image variation
32+
* [stabilityai/stable-diffusion-2-1-unclip](https://hf.co/stabilityai/stable-diffusion-2-1-unclip)
33+
* [stabilityai/stable-diffusion-2-1-unclip-small](https://hf.co/stabilityai/stable-diffusion-2-1-unclip-small)
34+
* Text-to-image
35+
* Coming soon!
2836

2937
### Text-to-Image Generation
3038

31-
```python
32-
import torch
33-
from diffusers import StableUnCLIPPipeline
34-
35-
pipe = StableUnCLIPPipeline.from_pretrained(
36-
"fusing/stable-unclip-2-1-l", torch_dtype=torch.float16
37-
) # TODO update model path
38-
pipe = pipe.to("cuda")
39-
40-
prompt = "a photo of an astronaut riding a horse on mars"
41-
images = pipe(prompt).images
42-
images[0].save("astronaut_horse.png")
43-
```
39+
Coming soon!
4440

4541

4642
### Text guided Image-to-Image Variation
@@ -54,19 +50,25 @@ from io import BytesIO
5450
from diffusers import StableUnCLIPImg2ImgPipeline
5551

5652
pipe = StableUnCLIPImg2ImgPipeline.from_pretrained(
57-
"fusing/stable-unclip-2-1-l-img2img", torch_dtype=torch.float16
58-
) # TODO update model path
53+
"stabilityai/stable-diffusion-2-1-unclip", torch_dtype=torch.float16, variation="fp16"
54+
)
5955
pipe = pipe.to("cuda")
6056

61-
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
57+
url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stable_unclip/tarsila_do_amaral.png"
6258

6359
response = requests.get(url)
6460
init_image = Image.open(BytesIO(response.content)).convert("RGB")
65-
init_image = init_image.resize((768, 512))
6661

62+
images = pipe(init_image).images
63+
images[0].save("fantasy_landscape.png")
64+
```
65+
66+
Optionally, you can also pass a prompt to `pipe` such as:
67+
68+
```python
6769
prompt = "A fantasy landscape, trending on artstation"
6870

69-
images = pipe(prompt, init_image).images
71+
images = pipe(init_image, prompt=prompt).images
7072
images[0].save("fantasy_landscape.png")
7173
```
7274

docs/source/en/index.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@ The library has three main components:
7676
| [stable_diffusion_self_attention_guidance](./api/pipelines/stable_diffusion/self_attention_guidance) | [Improving Sample Quality of Diffusion Models Using Self-Attention Guidance](https://arxiv.org/abs/2210.00939) | Text-to-Image Generation |
7777
| [stable_diffusion_image_variation](./stable_diffusion/image_variation) | [Stable Diffusion Image Variations](https://github.com/LambdaLabsML/lambda-diffusers#stable-diffusion-image-variations) | Image-to-Image Generation |
7878
| [stable_diffusion_latent_upscale](./stable_diffusion/latent_upscale) | [Stable Diffusion Latent Upscaler](https://twitter.com/StabilityAI/status/1590531958815064065) | Text-Guided Super Resolution Image-to-Image |
79+
| [stable_diffusion_model_editing](./api/pipelines/stable_diffusion/model_editing) | [Editing Implicit Assumptions in Text-to-Image Diffusion Models](https://time-diffusion.github.io/) | Text-to-Image Model Editing |
7980
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [Stable Diffusion 2](https://stability.ai/blog/stable-diffusion-v2-release) | Text-to-Image Generation |
8081
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [Stable Diffusion 2](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Image Inpainting |
8182
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [Depth-Conditional Stable Diffusion](https://github.com/Stability-AI/stablediffusion#depth-conditional-stable-diffusion) | Depth-to-Image Generation |
@@ -89,4 +90,4 @@ The library has three main components:
8990
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Text-to-Image Generation |
9091
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Image Variations Generation |
9192
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation |
92-
| [vq_diffusion](./api/pipelines/vq_diffusion) | [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) | Text-to-Image Generation |
93+
| [vq_diffusion](./api/pipelines/vq_diffusion) | [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) | Text-to-Image Generation |

examples/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,8 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie
4242
| [**Text-to-Image fine-tuning**](./text_to_image) |||
4343
| [**Textual Inversion**](./textual_inversion) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb)
4444
| [**Dreambooth**](./dreambooth) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_dreambooth_training.ipynb)
45+
| [**ControlNet**](./controlnet) | ✅ | ✅ | -
46+
| [**InstructPix2Pix**](./instruct_pix2pix) | ✅ | ✅ | -
4547
| [**Reinforcement Learning for Control**](https://github.com/huggingface/diffusers/blob/main/examples/rl/run_diffusers_locomotion.py) | - | - | coming soon.
4648

4749
## Community

examples/community/checkpoint_merger.py

Lines changed: 8 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -199,24 +199,20 @@ def merge(self, pretrained_model_name_or_path_list: List[Union[str, os.PathLike]
199199
if not attr.startswith("_"):
200200
checkpoint_path_1 = os.path.join(cached_folders[1], attr)
201201
if os.path.exists(checkpoint_path_1):
202-
files = list(
203-
(
204-
*glob.glob(os.path.join(checkpoint_path_1, "*.safetensors")),
205-
*glob.glob(os.path.join(checkpoint_path_1, "*.bin")),
206-
)
207-
)
202+
files = [
203+
*glob.glob(os.path.join(checkpoint_path_1, "*.safetensors")),
204+
*glob.glob(os.path.join(checkpoint_path_1, "*.bin")),
205+
]
208206
checkpoint_path_1 = files[0] if len(files) > 0 else None
209207
if len(cached_folders) < 3:
210208
checkpoint_path_2 = None
211209
else:
212210
checkpoint_path_2 = os.path.join(cached_folders[2], attr)
213211
if os.path.exists(checkpoint_path_2):
214-
files = list(
215-
(
216-
*glob.glob(os.path.join(checkpoint_path_2, "*.safetensors")),
217-
*glob.glob(os.path.join(checkpoint_path_2, "*.bin")),
218-
)
219-
)
212+
files = [
213+
*glob.glob(os.path.join(checkpoint_path_2, "*.safetensors")),
214+
*glob.glob(os.path.join(checkpoint_path_2, "*.bin")),
215+
]
220216
checkpoint_path_2 = files[0] if len(files) > 0 else None
221217
# For an attr if both checkpoint_path_1 and 2 are None, ignore.
222218
# If atleast one is present, deal with it according to interp method, of course only if the state_dict keys match.

examples/community/imagic_stable_diffusion.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@
4848

4949
def preprocess(image):
5050
w, h = image.size
51-
w, h = map(lambda x: x - x % 32, (w, h)) # resize to integer multiple of 32
51+
w, h = (x - x % 32 for x in (w, h)) # resize to integer multiple of 32
5252
image = image.resize((w, h), resample=PIL_INTERPOLATION["lanczos"])
5353
image = np.array(image).astype(np.float32) / 255.0
5454
image = image[None].transpose(0, 3, 1, 2)

examples/community/lpw_stable_diffusion.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -376,7 +376,7 @@ def get_weighted_text_embeddings(
376376

377377
def preprocess_image(image):
378378
w, h = image.size
379-
w, h = map(lambda x: x - x % 32, (w, h)) # resize to integer multiple of 32
379+
w, h = (x - x % 32 for x in (w, h)) # resize to integer multiple of 32
380380
image = image.resize((w, h), resample=PIL_INTERPOLATION["lanczos"])
381381
image = np.array(image).astype(np.float32) / 255.0
382382
image = image[None].transpose(0, 3, 1, 2)
@@ -387,7 +387,7 @@ def preprocess_image(image):
387387
def preprocess_mask(mask, scale_factor=8):
388388
mask = mask.convert("L")
389389
w, h = mask.size
390-
w, h = map(lambda x: x - x % 32, (w, h)) # resize to integer multiple of 32
390+
w, h = (x - x % 32 for x in (w, h)) # resize to integer multiple of 32
391391
mask = mask.resize((w // scale_factor, h // scale_factor), resample=PIL_INTERPOLATION["nearest"])
392392
mask = np.array(mask).astype(np.float32) / 255.0
393393
mask = np.tile(mask, (4, 1, 1))

examples/community/lpw_stable_diffusion_onnx.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -403,7 +403,7 @@ def get_weighted_text_embeddings(
403403

404404
def preprocess_image(image):
405405
w, h = image.size
406-
w, h = map(lambda x: x - x % 32, (w, h)) # resize to integer multiple of 32
406+
w, h = (x - x % 32 for x in (w, h)) # resize to integer multiple of 32
407407
image = image.resize((w, h), resample=PIL_INTERPOLATION["lanczos"])
408408
image = np.array(image).astype(np.float32) / 255.0
409409
image = image[None].transpose(0, 3, 1, 2)
@@ -413,7 +413,7 @@ def preprocess_image(image):
413413
def preprocess_mask(mask, scale_factor=8):
414414
mask = mask.convert("L")
415415
w, h = mask.size
416-
w, h = map(lambda x: x - x % 32, (w, h)) # resize to integer multiple of 32
416+
w, h = (x - x % 32 for x in (w, h)) # resize to integer multiple of 32
417417
mask = mask.resize((w // scale_factor, h // scale_factor), resample=PIL_INTERPOLATION["nearest"])
418418
mask = np.array(mask).astype(np.float32) / 255.0
419419
mask = np.tile(mask, (4, 1, 1))

0 commit comments

Comments
 (0)