Skip to content

Commit f0a6728

Browse files
committed
final review
1 parent 1160b7d commit f0a6728

File tree

5 files changed

+9
-15
lines changed

5 files changed

+9
-15
lines changed

docs/source/en/api/pipelines/alt_diffusion.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,10 @@ The abstract from the paper is:
1818

1919
*In this work, we present a conceptually simple and effective method to train a strong bilingual multimodal representation model. Starting from the pretrained multimodal representation model CLIP released by OpenAI, we switched its text encoder with a pretrained multilingual text encoder XLM-R, and aligned both languages and image representations by a two-stage training schema consisting of teacher learning and contrastive learning. We validate our method through evaluations of a wide range of tasks. We set new state-of-the-art performances on a bunch of tasks including ImageNet-CN, Flicker30k- CN, and COCO-CN. Further, we obtain very close performances with CLIP on almost all tasks, suggesting that one can simply alter the text encoder in CLIP for extended capabilities such as multilingual understanding.*
2020

21+
## Tips
22+
23+
`AltDiffusion` is conceptually the same as [Stable Diffusion](./stable_diffusion/overview).
24+
2125
<Tip>
2226

2327
Make sure to check out the Schedulers [guide](/using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](/using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.

docs/source/en/api/pipelines/audioldm.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ The abstract from the paper is:
2121

2222
*Text-to-audio (TTA) system has recently gained attention for its ability to synthesize general audio based on text descriptions. However, previous studies in TTA have limited generation quality with high computational costs. In this study, we propose AudioLDM, a TTA system that is built on a latent space to learn the continuous audio representations from contrastive language-audio pretraining (CLAP) latents. The pretrained CLAP models enable us to train LDMs with audio embedding while providing text embedding as a condition during sampling. By learning the latent representations of audio signals and their compositions without modeling the cross-modal relationship, AudioLDM is advantageous in both generation quality and computational efficiency. Trained on AudioCaps with a single GPU, AudioLDM achieves state-of-the-art TTA performance measured by both objective and subjective metrics (e.g., frechet distance). Moreover, AudioLDM is the first TTA system that enables various text-guided audio manipulations (e.g., style transfer) in a zero-shot fashion. Our implementation and demos are available at https://audioldm.github.io.*
2323

24-
The original codebase can be found at [haoheliu/AudioLDM](https://github.com/haoheliu/AudioLDM), and the pipeline was contributed by [sanchit-gandhi](https://huggingface.co/sanchit-gandhi).
24+
The original codebase can be found at [haoheliu/AudioLDM](https://github.com/haoheliu/AudioLDM).
2525

2626
## Tips
2727

docs/source/en/api/pipelines/consistency_models.mdx

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -34,12 +34,6 @@ For an additional speed-up, use `torch.compile` to generate multiple images in <
3434
image.show()
3535
```
3636

37-
<Tip>
38-
39-
Make sure to check out the Schedulers [guide](/using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](/using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
40-
41-
</Tip>
42-
4337
## ConsistencyModelPipeline
4438
[[autodoc]] ConsistencyModelPipeline
4539
- all

docs/source/en/api/pipelines/ddim.mdx

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,7 @@ The abstract from the paper is:
1818

1919
*Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples 10× to 50× faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.*
2020

21-
The original codebase can be found at [ermongroup/ddim](https://github.com/ermongroup/ddim), and you can contact the author at [tsong.me](https://tsong.me/).
22-
23-
<Tip>
24-
25-
Make sure to check out the Schedulers [guide](/using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](/using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
26-
27-
</Tip>
21+
The original codebase can be found at [ermongroup/ddim](https://github.com/ermongroup/ddim).
2822

2923
## DDIMPipeline
3024
[[autodoc]] DDIMPipeline

docs/source/en/api/pipelines/diffedit.mdx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,9 @@ The abstract from the paper is:
1818

1919
*Image generation has recently seen tremendous advances, with diffusion models allowing to synthesize convincing images for a large variety of text prompts. In this article, we propose DiffEdit, a method to take advantage of text-conditioned diffusion models for the task of semantic image editing, where the goal is to edit an image based on a text query. Semantic image editing is an extension of image generation, with the additional constraint that the generated image should be as similar as possible to a given input image. Current editing methods based on diffusion models usually require to provide a mask, making the task much easier by treating it as a conditional inpainting task. In contrast, our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited, by contrasting predictions of a diffusion model conditioned on different text prompts. Moreover, we rely on latent inference to preserve content in those regions of interest and show excellent synergies with mask-based diffusion. DiffEdit achieves state-of-the-art editing performance on ImageNet. In addition, we evaluate semantic image editing in more challenging settings, using images from the COCO dataset as well as text-based generated images.*
2020

21-
The original codebase can be found at [Xiang-cd/DiffEdit-stable-diffusion/](https://github.com/Xiang-cd/DiffEdit-stable-diffusion), and you can try it out in this [demo](https://blog.problemsolversguild.com/technical/research/2022/11/02/DiffEdit-Implementation.html).
21+
The original codebase can be found at [Xiang-cd/DiffEdit-stable-diffusion](https://github.com/Xiang-cd/DiffEdit-stable-diffusion), and you can try it out in this [demo](https://blog.problemsolversguild.com/technical/research/2022/11/02/DiffEdit-Implementation.html).
22+
23+
This pipeline was contributed by [clarencechen](https://github.com/clarencechen). ❤️
2224

2325
## Tips
2426

0 commit comments

Comments
 (0)