huggingface
diff --git a/‎docs/source/en/api/pipelines/latent_diffusion_uncond.mdx‎
Lines changed: 6 additions & 19 deletions b/‎docs/source/en/api/pipelines/latent_diffusion_uncond.mdx‎
Lines changed: 6 additions & 19 deletions
diff --git a/‎docs/source/en/api/pipelines/model_editing.mdx‎
Lines changed: 7 additions & 39 deletions b/‎docs/source/en/api/pipelines/model_editing.mdx‎
Lines changed: 7 additions & 39 deletions
diff --git a/‎docs/source/en/api/pipelines/self_attention_guidance.mdx‎
Lines changed: 7 additions & 43 deletions b/‎docs/source/en/api/pipelines/self_attention_guidance.mdx‎
Lines changed: 7 additions & 43 deletions
diff --git a/‎docs/source/en/api/pipelines/semantic_stable_diffusion.mdx‎
Lines changed: 7 additions & 57 deletions b/‎docs/source/en/api/pipelines/semantic_stable_diffusion.mdx‎
Lines changed: 7 additions & 57 deletions
diff --git a/‎docs/source/en/api/pipelines/shap_e.mdx‎
Lines changed: 7 additions & 19 deletions b/‎docs/source/en/api/pipelines/shap_e.mdx‎
Lines changed: 7 additions & 19 deletions
@@ -12,31 +12,18 @@ specific language governing permissions and limitations under the License.
 
 # Unconditional Latent Diffusion
 
-## Overview
+Unconditional Latent Diffusion was proposed in [High-Resolution Image Synthesis with Latent Diffusion Models](https://huggingface.co/papers/2112.10752) by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer.
 
-Unconditional Latent Diffusion was proposed in [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752) by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer.
-
-The abstract of the paper is the following:
+The abstract from the paper is:
 
 *By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism to control the image generation process without retraining. However, since these models typically operate directly in pixel space, optimization of powerful DMs often consumes hundreds of GPU days and inference is expensive due to sequential evaluations. To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders. In contrast to previous work, training diffusion models on such a representation allows for the first time to reach a near-optimal point between complexity reduction and detail preservation, greatly boosting visual fidelity. By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner. Our latent diffusion models (LDMs) achieve a new state of the art for image inpainting and highly competitive performance on various tasks, including unconditional image generation, semantic scene synthesis, and super-resolution, while significantly reducing computational requirements compared to pixel-based DMs.*
 
-The original codebase can be found [here](https://github.com/CompVis/latent-diffusion).
-
-## Tips:
-
-- 
-- 
-- 
-
-## Available Pipelines:
-
-| Pipeline | Tasks | Colab
-|---|---|:---:|
-| [pipeline_latent_diffusion_uncond.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion_uncond/pipeline_latent_diffusion_uncond.py) | *Unconditional Image Generation* | - |
-
-## Examples:
+The original codebase can be found at [CompVis/latent-diffusion](https://github.com/CompVis/latent-diffusion).
 
 ## LDMPipeline
 [[autodoc]] LDMPipeline
 	- all
 	- __call__
+
+## ImagePipelineOutput
+[[autodoc]] pipelines.ImagePipelineOutput
@@ -10,52 +10,20 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 specific language governing permissions and limitations under the License.
 -->
 
-# Editing Implicit Assumptions in Text-to-Image Diffusion Models
+# Text-to-Image Model Editing
 
-## Overview
+[Editing Implicit Assumptions in Text-to-Image Diffusion Models](https://huggingface.co/papers/2303.08084) is by Hadas Orgad, Bahjat Kawar, and Yonatan Belinkov. This pipeline enables editing diffusion model weights, such that its assumptions of a given concept are changed. The resulting change is expected to take effect in all prompt generations related to the edited concept.
 
-[Editing Implicit Assumptions in Text-to-Image Diffusion Models](https://arxiv.org/abs/2303.08084) by Hadas Orgad, Bahjat Kawar, and Yonatan Belinkov.
-
-The abstract of the paper is the following:
+The abstract from the paper is:
 
 *Text-to-image diffusion models often make implicit assumptions about the world when generating images. While some assumptions are useful (e.g., the sky is blue), they can also be outdated, incorrect, or reflective of social biases present in the training data. Thus, there is a need to control these assumptions without requiring explicit user input or costly re-training. In this work, we aim to edit a given implicit assumption in a pre-trained diffusion model. Our Text-to-Image Model Editing method, TIME for short, receives a pair of inputs: a "source" under-specified prompt for which the model makes an implicit assumption (e.g., "a pack of roses"), and a "destination" prompt that describes the same setting, but with a specified desired attribute (e.g., "a pack of blue roses"). TIME then updates the model's cross-attention layers, as these layers assign visual meaning to textual tokens. We edit the projection matrices in these layers such that the source prompt is projected close to the destination prompt. Our method is highly efficient, as it modifies a mere 2.2% of the model's parameters in under one second. To evaluate model editing approaches, we introduce TIMED (TIME Dataset), containing 147 source and destination prompt pairs from various domains. Our experiments (using Stable Diffusion) show that TIME is successful in model editing, generalizes well for related prompts unseen during editing, and imposes minimal effect on unrelated generations.*
 
-Resources:
-
-* [Project Page](https://time-diffusion.github.io/).
-* [Paper](https://arxiv.org/abs/2303.08084).
-* [Original Code](https://github.com/bahjat-kawar/time-diffusion).
-* [Demo](https://huggingface.co/spaces/bahjat-kawar/time-diffusion).
-
-## Available Pipelines:
-
-| Pipeline | Tasks | Demo
-|---|---|:---:|
-| [StableDiffusionModelEditingPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_model_editing.py) | *Text-to-Image Model Editing* | [🤗 Space](https://huggingface.co/spaces/bahjat-kawar/time-diffusion)) |
-
-This pipeline enables editing the diffusion model weights, such that its assumptions on a given concept are changed. The resulting change is expected to take effect in all prompt generations pertaining to the edited concept.
-
-## Usage example
-
-```python
-import torch
-from diffusers import StableDiffusionModelEditingPipeline
-
-model_ckpt = "CompVis/stable-diffusion-v1-4"
-pipe = StableDiffusionModelEditingPipeline.from_pretrained(model_ckpt)
-
-pipe = pipe.to("cuda")
-
-source_prompt = "A pack of roses"
-destination_prompt = "A pack of blue roses"
-pipe.edit_model(source_prompt, destination_prompt)
-
-prompt = "A field of roses"
-image = pipe(prompt).images[0]
-image.save("field_of_roses.png")
-```
+You can find additional information about model editing on the [project page](https://time-diffusion.github.io/), [paper](https://arxiv.org/abs/2303.08084), [original codebase](https://github.com/bahjat-kawar/time-diffusion), and try it out in a [demo](https://huggingface.co/spaces/bahjat-kawar/time-diffusion).
 
 ## StableDiffusionModelEditingPipeline
 [[autodoc]] StableDiffusionModelEditingPipeline
 	- __call__
 	- all
+
+## StableDiffusionPipelineOutput
+[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
@@ -10,56 +10,20 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 specific language governing permissions and limitations under the License.
 -->
 
-# Self-Attention Guidance (SAG)
+# Self-Attention Guidance
 
-## Overview
+[Improving Sample Quality of Diffusion Models Using Self-Attention Guidance](https://huggingface.co/papers/2210.00939) is by Susung Hong et al.
 
-[Improving Sample Quality of Diffusion Models Using Self-Attention Guidance](https://arxiv.org/abs/2210.00939) by Susung Hong et al.
-
-The abstract of the paper is the following:
+The abstract from the paper is:
 
 *Denoising diffusion models (DDMs) have attracted attention for their exceptional generation quality and diversity. This success is largely attributed to the use of class- or text-conditional diffusion guidance methods, such as classifier and classifier-free guidance. In this paper, we present a more comprehensive perspective that goes beyond the traditional guidance methods. From this generalized perspective, we introduce novel condition- and training-free strategies to enhance the quality of generated images. As a simple solution, blur guidance improves the suitability of intermediate samples for their fine-scale information and structures, enabling diffusion models to generate higher quality samples with a moderate guidance scale. Improving upon this, Self-Attention Guidance (SAG) uses the intermediate self-attention maps of diffusion models to enhance their stability and efficacy. Specifically, SAG adversarially blurs only the regions that diffusion models attend to at each iteration and guides them accordingly. Our experimental results show that our SAG improves the performance of various diffusion models, including ADM, IDDPM, Stable Diffusion, and DiT. Moreover, combining SAG with conventional guidance methods leads to further improvement.*
 
-Resources:
-
-* [Project Page](https://ku-cvlab.github.io/Self-Attention-Guidance).
-* [Paper](https://arxiv.org/abs/2210.00939).
-* [Original Code](https://github.com/KU-CVLAB/Self-Attention-Guidance).
-* [Hugging Face Demo](https://huggingface.co/spaces/susunghong/Self-Attention-Guidance).
-* [Colab Demo](https://colab.research.google.com/github/SusungHong/Self-Attention-Guidance/blob/main/SAG_Stable.ipynb).
-
-
-## Available Pipelines:
-
-| Pipeline | Tasks | Demo
-|---|---|:---:|
-| [StableDiffusionSAGPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py) | *Text-to-Image Generation* | [🤗 Space](https://huggingface.co/spaces/susunghong/Self-Attention-Guidance) |
-
-## Usage example
-
-```python
-import torch
-from diffusers import StableDiffusionSAGPipeline
-from accelerate.utils import set_seed
-
-pipe = StableDiffusionSAGPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
-pipe = pipe.to("cuda")
-
-seed = 8978
-prompt = "."
-guidance_scale = 7.5
-num_images_per_prompt = 1
-
-sag_scale = 1.0
-
-set_seed(seed)
-images = pipe(
-    prompt, num_images_per_prompt=num_images_per_prompt, guidance_scale=guidance_scale, sag_scale=sag_scale
-).images
-images[0].save("example.png")
-```
+You can find additional information about Self-Attention Guidance on the [project page](https://ku-cvlab.github.io/Self-Attention-Guidance), [paper](https://arxiv.org/abs/2210.00939), [original codebase](https://github.com/KU-CVLAB/Self-Attention-Guidance), and try it out in a [demo](https://huggingface.co/spaces/susunghong/Self-Attention-Guidance) or [notebook](https://colab.research.google.com/github/SusungHong/Self-Attention-Guidance/blob/main/SAG_Stable.ipynb).
 
 ## StableDiffusionSAGPipeline
 [[autodoc]] StableDiffusionSAGPipeline
 	- __call__
 	- all
+
+## StableDiffusionOutput
+[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
@@ -12,68 +12,18 @@ specific language governing permissions and limitations under the License.
 
 # Semantic Guidance
 
-Semantic Guidance for Diffusion Models was proposed in [SEGA: Instructing Diffusion using Semantic Dimensions](https://arxiv.org/abs/2301.12247) and provides strong semantic control over the image generation.
-Small changes to the text prompt usually result in entirely different output images. However, with SEGA a variety of changes to the image are enabled that can be controlled easily and intuitively, and stay true to the original image composition.
+Semantic Guidance for Diffusion Models was proposed in [SEGA: Instructing Diffusion using Semantic Dimensions](https://huggingface.co/papers/2301.12247) and provides strong semantic control over image generation.
+Small changes to the text prompt usually result in entirely different output images. However, with SEGA a variety of changes to the image are enabled that can be controlled easily and intuitively, while staying true to the original image composition.
 
-The abstract of the paper is the following:
+The abstract from the paper is:
 
 *Text-to-image diffusion models have recently received a lot of interest for their astonishing ability to produce high-fidelity images from text only. However, achieving one-shot generation that aligns with the user's intent is nearly impossible, yet small changes to the input prompt often result in very different images. This leaves the user with little semantic control. To put the user in control, we show how to interact with the diffusion process to flexibly steer it along semantic directions. This semantic guidance (SEGA) allows for subtle and extensive edits, changes in composition and style, as well as optimizing the overall artistic conception. We demonstrate SEGA's effectiveness on a variety of tasks and provide evidence for its versatility and flexibility.*
 
-
-*Overview*:
-
-| Pipeline | Tasks | Colab | Demo
-|---|---|:---:|:---:|
-| [pipeline_semantic_stable_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/semantic_stable_diffusion/pipeline_semantic_stable_diffusion.py) | *Text-to-Image Generation* |  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/semantic-image-editing/blob/main/examples/SemanticGuidance.ipynb) | [Coming Soon](https://huggingface.co/AIML-TUDA)
-
-## Tips
-
-- The Semantic Guidance pipeline can be used with any [Stable Diffusion](./stable_diffusion/text2img) checkpoint.
-
-### Run Semantic Guidance
-
-The interface of [`SemanticStableDiffusionPipeline`] provides several additional parameters to influence the image generation.
-Exemplary usage may look like this:
-
-```python
-import torch
-from diffusers import SemanticStableDiffusionPipeline
-
-pipe = SemanticStableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
-pipe = pipe.to("cuda")
-
-out = pipe(
-    prompt="a photo of the face of a woman",
-    num_images_per_prompt=1,
-    guidance_scale=7,
-    editing_prompt=[
-        "smiling, smile",  # Concepts to apply
-        "glasses, wearing glasses",
-        "curls, wavy hair, curly hair",
-        "beard, full beard, mustache",
-    ],
-    reverse_editing_direction=[False, False, False, False],  # Direction of guidance i.e. increase all concepts
-    edit_warmup_steps=[10, 10, 10, 10],  # Warmup period for each concept
-    edit_guidance_scale=[4, 5, 5, 5.4],  # Guidance scale for each concept
-    edit_threshold=[
-        0.99,
-        0.975,
-        0.925,
-        0.96,
-    ],  # Threshold for each concept. Threshold equals the percentile of the latent space that will be discarded. I.e. threshold=0.99 uses 1% of the latent dimensions
-    edit_momentum_scale=0.3,  # Momentum scale that will be added to the latent guidance
-    edit_mom_beta=0.6,  # Momentum beta
-    edit_weights=[1, 1, 1, 1, 1],  # Weights of the individual concepts against each other
-)
-```
-
-For more examples check the Colab notebook.
-
-## StableDiffusionSafePipelineOutput
-[[autodoc]] pipelines.semantic_stable_diffusion.SemanticStableDiffusionPipelineOutput
-	- all
-
 ## SemanticStableDiffusionPipeline
 [[autodoc]] SemanticStableDiffusionPipeline
 	- all
 	- __call__
+
+## StableDiffusionSafePipelineOutput
+[[autodoc]] pipelines.semantic_stable_diffusion.SemanticStableDiffusionPipelineOutput
+	- all
@@ -9,28 +9,13 @@ specific language governing permissions and limitations under the License.
 
 # Shap-E
 
-## Overview
+The Shap-E model was proposed in [Shap-E: Generating Conditional 3D Implicit Functions](https://huggingface.co/papers/2305.02463) by Alex Nichol and Heewon Jun from [OpenAI](https://github.com/openai). 
 
-
-The Shap-E model was proposed in [Shap-E: Generating Conditional 3D Implicit Functions](https://arxiv.org/abs/2305.02463) by Alex Nichol and Heewon Jun from [OpenAI](https://github.com/openai). 
-
-The abstract of the paper is the following:
+The abstract from the paper is:
 
 *We present Shap-E, a conditional generative model for 3D assets. Unlike recent work on 3D generative models which produce a single output representation, Shap-E directly generates the parameters of implicit functions that can be rendered as both textured meshes and neural radiance fields. We train Shap-E in two stages: first, we train an encoder that deterministically maps 3D assets into the parameters of an implicit function; second, we train a conditional diffusion model on outputs of the encoder. When trained on a large dataset of paired 3D and text data, our resulting models are capable of generating complex and diverse 3D assets in a matter of seconds. When compared to Point-E, an explicit generative model over point clouds, Shap-E converges faster and reaches comparable or better sample quality despite modeling a higher-dimensional, multi-representation output space.*
 
-The original codebase can be found [here](https://github.com/openai/shap-e).
-
-## Available Pipelines:
-
-| Pipeline | Tasks |
-|---|---|
-| [pipeline_shap_e.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/shap_e/pipeline_shap_e.py) | *Text-to-Image Generation* | 
-| [pipeline_shap_e_img2img.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/shap_e/pipeline_shap_e_img2img.py) | *Image-to-Image Generation* |
-
-## Available checkpoints 
-
-* [`openai/shap-e`](https://huggingface.co/openai/shap-e)
-* [`openai/shap-e-img2img`](https://huggingface.co/openai/shap-e-img2img)
+The original codebase can be found at [openai/shap-e](https://github.com/openai/shap-e).
 
 ## Usage Examples
 
@@ -136,4 +121,7 @@ gif_path = export_to_gif(images[0], "burger_3d.gif")
 ## ShapEImg2ImgPipeline
 [[autodoc]] ShapEImg2ImgPipeline
 	- all
-	- __call__
+	- __call__
+
+## ShapEPipelineOutput
+[[autodoc]] pipelines.shap_e.pipeline_shap_e.ShapEPipelineOutput