Skip to content

V0.15버전 PR 요청드립니다. #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 149 commits into from
Apr 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
149 commits
Select commit Hold shift + click to select a range
8e35ef0
[doc wip] literalinclude (#2718)
Mar 23, 2023
14e3a28
Rename 'CLIPFeatureExtractor' class to 'CLIPImageProcessor' (#2732)
ainoya Mar 23, 2023
2ef9bdd
Music Spectrogram diffusion pipeline (#1044)
kashif Mar 23, 2023
055c90f
[2737]: Add DPMSolverMultistepScheduler to CLIP guided community pipe…
nipunjindal Mar 23, 2023
0d7aac3
[Docs] small fixes to the text to video doc. (#2787)
sayakpaul Mar 23, 2023
dc5b4e2
Update train_text_to_image_lora.py (#2767)
haofanwang Mar 23, 2023
aa0531f
Skip `mps` in text-to-video tests (#2792)
pcuenca Mar 23, 2023
df91c44
Flax controlnet (#2727)
yiyixuxu Mar 23, 2023
1870fb0
[docs] Add Colab notebooks and Spaces (#2713)
stevhliu Mar 23, 2023
b94880e
Add AudioLDM (#2232)
sanchit-gandhi Mar 23, 2023
4a98d6e
Update train_text_to_image_lora.py (#2795)
haofanwang Mar 24, 2023
37a44bb
Add ModelEditing pipeline (#2721)
bahjat-kawar Mar 24, 2023
f6feb69
Relax DiT test (#2808)
kashif Mar 24, 2023
c4892f1
Update onnxruntime package candidates (#2666)
PeixuanZuo Mar 24, 2023
dbcb15c
[Stable UnCLIP] Finish Stable UnCLIP (#2814)
patrickvonplaten Mar 24, 2023
5883d8d
[Docs] update docs (Stable unCLIP) to reflect the updated ckpts. (#2815)
sayakpaul Mar 24, 2023
9fb0217
StableDiffusionModelEditingPipeline documentation (#2810)
bahjat-kawar Mar 24, 2023
abb22b4
Update `examples` README.md to include the latest examples (#2839)
sayakpaul Mar 27, 2023
1d7b4b6
Ruff: apply same rules as in transformers (#2827)
pcuenca Mar 27, 2023
4c26cb9
[Tests] Fix slow tests (#2846)
patrickvonplaten Mar 27, 2023
7bc2fff
Fix StableUnCLIPImg2ImgPipeline handling of explicitly passed image e…
unishift Mar 27, 2023
b10f527
Helper function to disable custom attention processors (#2791)
pcuenca Mar 27, 2023
fab4f3d
improve stable unclip doc. (#2823)
sayakpaul Mar 28, 2023
58fc824
add: better warning messages when handling multiple conditionings. (#…
sayakpaul Mar 28, 2023
d4f846f
[WIP]Flax training script for controlnet (#2818)
yiyixuxu Mar 28, 2023
81125d8
Make dynamo wrapped modules work with save_pretrained (#2726)
pcuenca Mar 28, 2023
42d9501
[Init] Make sure shape mismatches are caught early (#2847)
patrickvonplaten Mar 28, 2023
c0afca2
updated onnx pndm test (#2811)
kashif Mar 28, 2023
585f621
[Stable Diffusion] Allow users to disable Safety checker if loading m…
Stax124 Mar 28, 2023
8bdf423
fix KarrasVePipeline bug (#2828)
junhsss Mar 28, 2023
0f14335
StableDiffusionLongPromptWeightingPipeline: Do not hardcode pad token…
AkiSakurai Mar 28, 2023
b76d9fd
Remove suggestion to use cuDNN benchmark in docs (#2793)
d1g1t Mar 28, 2023
159a0bf
Remove duplicate sentence in docstrings (#2834)
qqaatw Mar 28, 2023
7d75681
Update the legacy inpainting SD pipeline, to allow calling it with on…
cmdr2 Mar 28, 2023
920a15c
Fix link to LoRA training guide in DreamBooth training guide (#2836)
ushuz Mar 28, 2023
663c654
[WIP][Docs] Use DiffusionPipeline Instead of Child Classes when Loadi…
dg845 Mar 28, 2023
25d927a
Add `last_epoch` argument to `optimization.get_scheduler` (#2850)
felixblanke Mar 28, 2023
4d0f412
[WIP] Check UNet shapes in StableDiffusionInpaintPipeline __init__ (#…
dg845 Mar 28, 2023
53377ef
[2761]: Add documentation for extra_in_channels UNet1DModel (#2817)
nipunjindal Mar 28, 2023
1384546
[Tests] Adds a test to check if `image_embeds` None case is handled p…
sayakpaul Mar 28, 2023
37c8248
Update evaluation.mdx (#2862)
tolgacangoz Mar 28, 2023
3980858
Update overview.mdx (#2864)
tolgacangoz Mar 28, 2023
ef4c2fa
Update alt_diffusion.mdx (#2865)
tolgacangoz Mar 28, 2023
03fe36f
Update paint_by_example.mdx (#2869)
tolgacangoz Mar 28, 2023
628fefb
Update stable_diffusion_safe.mdx (#2870)
tolgacangoz Mar 28, 2023
40a7b86
[Docs] Correct phrasing (#2873)
patrickvonplaten Mar 28, 2023
d82b032
[Examples] Add streaming support to the ControlNet training example i…
sayakpaul Mar 29, 2023
3be4891
feat: allow offset_noise in dreambooth training example (#2826)
yamanahlawat Mar 29, 2023
e47459c
[docs] Performance tutorial (#2773)
stevhliu Mar 29, 2023
b202127
[Docs] add an example use for `StableUnCLIPPipeline` in the pipeline …
sayakpaul Mar 30, 2023
b3d5cc4
add flax requirement (#2894)
yiyixuxu Mar 30, 2023
9062b28
Support fp16 in conversion from original ckpt (#2733)
burgalon Mar 30, 2023
4960976
make style
patrickvonplaten Mar 30, 2023
1d033a9
img2img.multiple.controlnets.pipeline (#2833)
mikegarts Mar 30, 2023
a937e1b
add load textual inversion embeddings to stable diffusion (#2009)
piEsposito Mar 30, 2023
51d970d
[docs] add the Stable diffusion with Jax/Flax Guide into the docs (#2…
yiyixuxu Mar 31, 2023
0df4ad5
Add support `Karras sigmas` for StableDiffusionKDiffusionPipeline (#2…
takuma104 Mar 31, 2023
1055175
Fix textual inversion loading (#2914)
GuiyeC Mar 31, 2023
e1144ac
Fix slow tests text inv (#2915)
patrickvonplaten Mar 31, 2023
f3fbf9b
Fix check_inputs in upscaler pipeline to allow embeds (#2892)
d1g1t Mar 31, 2023
7b6caca
Modify example with intel optimization (#2896)
mengfei25 Mar 31, 2023
b3c437e
[2884]: Fix cross_attention_kwargs in StableDiffusionImg2ImgPipeline …
nipunjindal Mar 31, 2023
d36103a
[Tests] Speed up test (#2919)
patrickvonplaten Mar 31, 2023
419660c
Have fix current pipeline link (#2910)
guspan-tanadi Mar 31, 2023
89b23d9
Update image_variation.mdx (#2911)
tolgacangoz Mar 31, 2023
c433562
Update controlnet.mdx (#2912)
tolgacangoz Mar 31, 2023
a5bdb67
fix importing diffusers without transformers installed
patrickvonplaten Mar 31, 2023
7447f75
Update pipeline_stable_diffusion_controlnet.py (#2917)
patrickvonplaten Mar 31, 2023
cd634a8
Check for all different packages of opencv (#2901)
wfng92 Mar 31, 2023
f23d6eb
fix missing import
patrickvonplaten Mar 31, 2023
723933f
add another import
patrickvonplaten Mar 31, 2023
8c530fc
make style
patrickvonplaten Mar 31, 2023
7139f0e
fix: norm group test for UNet3D. (#2959)
sayakpaul Apr 4, 2023
4274a3a
Update euler_ancestral.mdx (#2932)
tolgacangoz Apr 4, 2023
715c25d
Update unipc.mdx (#2936)
tolgacangoz Apr 4, 2023
3e2d1af
Update score_sde_ve.mdx (#2937)
tolgacangoz Apr 4, 2023
e329edf
Update score_sde_vp.mdx (#2938)
tolgacangoz Apr 4, 2023
4a1eae0
Update ddim.mdx (#2926)
tolgacangoz Apr 4, 2023
4fd7e97
Update ddpm.mdx (#2929)
tolgacangoz Apr 4, 2023
f3e72e9
Removing explicit markdown extension (#2944)
guspan-tanadi Apr 4, 2023
62c01d2
Ensure validation image RGB not RGBA (#2945)
ernestchu Apr 4, 2023
a0263b2
make style
patrickvonplaten Apr 4, 2023
a87e88b
Use `upload_folder` in training scripts (#2934)
Wauplin Apr 4, 2023
0c63c38
allow use custom local dataset for controlnet training scripts (#2928)
yiyixuxu Apr 4, 2023
1a6def3
fix post-processing (#2968)
yiyixuxu Apr 4, 2023
0d0fa2a
[docs] Simplify loading guide (#2694)
stevhliu Apr 4, 2023
ee20d1f
update flax controlnet training script (#2951)
yiyixuxu Apr 5, 2023
a9477bb
[Pipeline download] Improve pipeline download for index and passed co…
patrickvonplaten Apr 5, 2023
37b359b
The variable name has been updated. (#2970)
kadirnar Apr 6, 2023
6e8e1ed
[2905]: Add Karras pattern to discrete euler (#2956)
nipunjindal Apr 6, 2023
8826bae
Update the K-Diffusion SD pipeline, to allow calling it with only pro…
cmdr2 Apr 6, 2023
2494731
[Examples] Add support for Min-SNR weighting strategy for better conv…
sayakpaul Apr 6, 2023
e405264
[scheduler] fix some scheduler dtype error (#2992)
tenderness-git Apr 6, 2023
2de36fa
minor fix in controlnet flax example (#2986)
yiyixuxu Apr 6, 2023
8c5c30f
Explain how to install test dependencies (#2983)
pcuenca Apr 7, 2023
ce144d6
docs: Link Navigation Path API Pipelines (#2976)
guspan-tanadi Apr 7, 2023
1c96f82
Update one_step_unet.py
patrickvonplaten Apr 9, 2023
dcfa6e1
add Min-SNR loss to Controlnet flax train script (#3016)
yiyixuxu Apr 10, 2023
2cbdc58
dynamic threshold sampling bug fixes and docs (#3003)
williamberman Apr 10, 2023
1dc856e
ddpm scheduler variance fixes
williamberman Apr 7, 2023
1875c35
remove extra min arg @sayakpaul
williamberman Apr 7, 2023
0cbefef
clamp comment @sayakpaul
williamberman Apr 7, 2023
b6cc050
fix simple attention processor encoder hidden states ordering
williamberman Apr 7, 2023
18ebd57
add missing AttnProcessor2_0 to AttentionProcessor union
williamberman Apr 8, 2023
26b4319
do not overwrite scheduler instance variables with type casted versions
williamberman Apr 7, 2023
707341a
resnet skip time activation and output scale factor
williamberman Apr 9, 2023
8db5e5b
allow unet varying number of layers per block
williamberman Apr 9, 2023
c413353
add `encoder_hid_dim` to unet
williamberman Apr 9, 2023
983a7fb
Initial draft of Core ML docs (#2987)
pcuenca Apr 10, 2023
b5d0a91
fix wrong parameter name for accelerate
ykk648 Apr 10, 2023
85f1c19
find another one accelerate parameter error
ykk648 Apr 10, 2023
953c9d1
[bug fix] dpm multistep solver duplicate timesteps
williamberman Apr 4, 2023
074d281
tests and additional scheduler fixes
williamberman Apr 10, 2023
ba49272
[Pipeline] Add TextToVideoZeroPipeline (#2954)
19and99 Apr 10, 2023
67c3518
Small typo correction in comments (#3012)
rogerioagjr Apr 10, 2023
fbc9a73
mps: skip unstable test (#3037)
pcuenca Apr 11, 2023
4f48476
Update contribution.mdx (#3054)
Apr 11, 2023
8369196
fix report tool (#3047)
patrickvonplaten Apr 11, 2023
8b451eb
Fix config prints and save, load of pipelines (#2849)
patrickvonplaten Apr 11, 2023
cb9d77a
[docs] Reusing components (#3000)
stevhliu Apr 11, 2023
881a6b5
Fix imports for composable_stable_diffusion pipeline (#3002)
nthh Apr 11, 2023
091a058
make style
patrickvonplaten Apr 11, 2023
80bc0c0
config fixes (#3060)
williamberman Apr 11, 2023
67ec9cf
accelerate min version for ProjectConfiguration import (#3042)
williamberman Apr 11, 2023
8c6b47c
`AttentionProcessor.group_norm` num_channels should be `query_dim` (#…
williamberman Apr 11, 2023
cb63feb
Update documentation (#2996)
George-Ogden Apr 11, 2023
526827c
Fix scheduler type mismatch (#3041)
pcuenca Apr 11, 2023
e3095c5
Fix invocation of some slow Flax tests (#3058)
pcuenca Apr 11, 2023
c6180a3
add only cross attention to simple attention blocks (#3011)
williamberman Apr 11, 2023
52c4d32
Fix typo and format BasicTransformerBlock attributes (#2953)
offchan42 Apr 11, 2023
2d52e81
unet time embedding activation function (#3048)
williamberman Apr 11, 2023
98c5e5d
Attention processor cross attention norm group norm (#3021)
williamberman Apr 11, 2023
ea39cd7
Attn added kv processor torch 2.0 block (#3023)
williamberman Apr 11, 2023
e607a58
[Examples] Fix type-casting issue in the ControlNet training script (…
sayakpaul Apr 12, 2023
a89a14f
[LoRA] Enabling limited LoRA support for text encoder (#2918)
sayakpaul Apr 12, 2023
0c72006
fix slow tsets (#3066)
patrickvonplaten Apr 12, 2023
5a7d35e
Fix InstructPix2Pix training in multi-GPU mode (#2978)
sayakpaul Apr 12, 2023
0df47ef
[Docs] update Self-Attention Guidance docs (#2952)
SusungHong Apr 12, 2023
dc27750
Flax memory efficient attention (#2889)
pcuenca Apr 12, 2023
9d7c08f
[WIP] implement rest of the test cases (LoRA tests) (#2824)
aandyw Apr 12, 2023
639f645
fix pipeline __setattr__ value == None (#3063)
williamberman Apr 12, 2023
7b2407f
add support for pre-calculated prompt embeds to Stable Diffusion ONNX…
ssube Apr 12, 2023
524535b
[2064]: Add Karras to DPMSolverMultistepScheduler (#3001)
nipunjindal Apr 12, 2023
a4b233e
Finish docs textual inversion (#3068)
patrickvonplaten Apr 12, 2023
fa736e3
[Docs] refactor text-to-video zero (#3049)
sayakpaul Apr 12, 2023
caa5884
Update Flax TPU tests (#3069)
pcuenca Apr 12, 2023
a439343
Fix a bug of pano when not doing CFG (#3030)
ernestchu Apr 12, 2023
b9b8916
Text2video zero refinements (#3070)
19and99 Apr 12, 2023
e753454
Release: v0.15.0
patrickvonplaten Apr 12, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/pr_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ jobs:
framework: pytorch_examples
runner: docker-cpu
image: diffusers/diffusers-pytorch-cpu
report: torch_cpu
report: torch_example_cpu

name: ${{ matrix.config.name }}

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/push_tests_fast.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ jobs:
framework: pytorch_examples
runner: docker-cpu
image: diffusers/diffusers-pytorch-cpu
report: torch_cpu
report: torch_example_cpu

name: ${{ matrix.config.name }}

Expand Down
11 changes: 9 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -394,8 +394,15 @@ passes. You should run the tests impacted by your changes like this:
```bash
$ pytest tests/<TEST_TO_RUN>.py
```

Before you run the tests, please make sure you install the dependencies required for testing. You can do so
with this command:

You can also run the full suite with the following command, but it takes
```bash
$ pip install -e ".[test]"
```

You can run the full test suite with the following command, but it takes
a beefy machine to produce a result in a decent amount of time now that
Diffusers has grown a lot. Here is the command for it:

Expand Down Expand Up @@ -439,7 +446,7 @@ Push the changes to your account using:
$ git push -u origin a-descriptive-name-for-my-changes
```

6. Once you are satisfied (**and the checklist below is happy too**), go to the
6. Once you are satisfied, go to the
webpage of your fork on GitHub. Click on 'Pull request' to send your changes
to the project maintainers for review.

Expand Down
24 changes: 18 additions & 6 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
- local: quicktour
title: Quicktour
- local: stable_diffusion
title: Stable Diffusion
title: Effective and efficient diffusion
- local: installation
title: Installation
title: Get started
Expand Down Expand Up @@ -33,15 +33,15 @@
- local: using-diffusers/pipeline_overview
title: Overview
- local: using-diffusers/unconditional_image_generation
title: Unconditional Image Generation
title: Unconditional image generation
- local: using-diffusers/conditional_image_generation
title: Text-to-Image Generation
title: Text-to-image generation
- local: using-diffusers/img2img
title: Text-Guided Image-to-Image
title: Text-guided image-to-image
- local: using-diffusers/inpaint
title: Text-Guided Image-Inpainting
title: Text-guided image-inpainting
- local: using-diffusers/depth2img
title: Text-Guided Depth-to-Image
title: Text-guided depth-to-image
- local: using-diffusers/reusing_seeds
title: Improve image quality with deterministic generation
- local: using-diffusers/reproducibility
Expand All @@ -52,6 +52,8 @@
title: How to contribute a Pipeline
- local: using-diffusers/using_safetensors
title: Using safetensors
- local: using-diffusers/stable_diffusion_jax_how_to
title: Stable Diffusion in JAX/Flax
- local: using-diffusers/weighted_prompts
title: Weighting Prompts
title: Pipelines for Inference
Expand Down Expand Up @@ -95,6 +97,8 @@
title: ONNX
- local: optimization/open_vino
title: OpenVINO
- local: optimization/coreml
title: Core ML
- local: optimization/mps
title: MPS
- local: optimization/habana
Expand Down Expand Up @@ -134,6 +138,8 @@
title: AltDiffusion
- local: api/pipelines/audio_diffusion
title: Audio Diffusion
- local: api/pipelines/audioldm
title: AudioLDM
- local: api/pipelines/cycle_diffusion
title: Cycle Diffusion
- local: api/pipelines/dance_diffusion
Expand All @@ -158,6 +164,8 @@
title: Score SDE VE
- local: api/pipelines/semantic_stable_diffusion
title: Semantic Guidance
- local: api/pipelines/spectrogram_diffusion
title: "Spectrogram Diffusion"
- sections:
- local: api/pipelines/stable_diffusion/overview
title: Overview
Expand Down Expand Up @@ -187,6 +195,8 @@
title: MultiDiffusion Panorama
- local: api/pipelines/stable_diffusion/controlnet
title: Text-to-Image Generation with ControlNet Conditioning
- local: api/pipelines/stable_diffusion/model_editing
title: Text-to-Image Model Editing
title: Stable Diffusion
- local: api/pipelines/stable_diffusion_2
title: Stable Diffusion 2
Expand All @@ -196,6 +206,8 @@
title: Stochastic Karras VE
- local: api/pipelines/text_to_video
title: Text-to-Video
- local: api/pipelines/text_to_video_zero
title: Text-to-Video Zero
- local: api/pipelines/unclip
title: UnCLIP
- local: api/pipelines/latent_diffusion_uncond
Expand Down
8 changes: 8 additions & 0 deletions docs/source/en/api/loaders.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,11 @@ API to load such adapter neural networks via the [`loaders.py` module](https://g
### UNet2DConditionLoadersMixin

[[autodoc]] loaders.UNet2DConditionLoadersMixin

### TextualInversionLoaderMixin

[[autodoc]] loaders.TextualInversionLoaderMixin

### LoraLoaderMixin

[[autodoc]] loaders.LoraLoaderMixin
6 changes: 6 additions & 0 deletions docs/source/en/api/models.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -99,3 +99,9 @@ The models are built on the base class ['ModelMixin'] that is a `torch.nn.module

## FlaxAutoencoderKL
[[autodoc]] FlaxAutoencoderKL

## FlaxControlNetOutput
[[autodoc]] models.controlnet_flax.FlaxControlNetOutput

## FlaxControlNetModel
[[autodoc]] FlaxControlNetModel
6 changes: 3 additions & 3 deletions docs/source/en/api/pipelines/alt_diffusion.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.

# AltDiffusion

AltDiffusion was proposed in [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Zhongzhi Chen, Guang Liu, Bo-Wen Zhang, Fulong Ye, Qinghong Yang, Ledell Wu
AltDiffusion was proposed in [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Zhongzhi Chen, Guang Liu, Bo-Wen Zhang, Fulong Ye, Qinghong Yang, Ledell Wu.

The abstract of the paper is the following:

Expand All @@ -28,11 +28,11 @@ The abstract of the paper is the following:

## Tips

- AltDiffusion is conceptually exaclty the same as [Stable Diffusion](./api/pipelines/stable_diffusion/overview).
- AltDiffusion is conceptually exactly the same as [Stable Diffusion](./stable_diffusion/overview).

- *Run AltDiffusion*

AltDiffusion can be tested very easily with the [`AltDiffusionPipeline`], [`AltDiffusionImg2ImgPipeline`] and the `"BAAI/AltDiffusion-m9"` checkpoint exactly in the same way it is shown in the [Conditional Image Generation Guide](./using-diffusers/conditional_image_generation) and the [Image-to-Image Generation Guide](./using-diffusers/img2img).
AltDiffusion can be tested very easily with the [`AltDiffusionPipeline`], [`AltDiffusionImg2ImgPipeline`] and the `"BAAI/AltDiffusion-m9"` checkpoint exactly in the same way it is shown in the [Conditional Image Generation Guide](../../using-diffusers/conditional_image_generation) and the [Image-to-Image Generation Guide](../../using-diffusers/img2img).

- *How to load and use different schedulers.*

Expand Down
82 changes: 82 additions & 0 deletions docs/source/en/api/pipelines/audioldm.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# AudioLDM

## Overview

AudioLDM was proposed in [AudioLDM: Text-to-Audio Generation with Latent Diffusion Models](https://arxiv.org/abs/2301.12503) by Haohe Liu et al.

Inspired by [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview), AudioLDM
is a text-to-audio _latent diffusion model (LDM)_ that learns continuous audio representations from [CLAP](https://huggingface.co/docs/transformers/main/model_doc/clap)
latents. AudioLDM takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional
sound effects, human speech and music.

This pipeline was contributed by [sanchit-gandhi](https://huggingface.co/sanchit-gandhi). The original codebase can be found [here](https://github.com/haoheliu/AudioLDM).

## Text-to-Audio

The [`AudioLDMPipeline`] can be used to load pre-trained weights from [cvssp/audioldm](https://huggingface.co/cvssp/audioldm) and generate text-conditional audio outputs:

```python
from diffusers import AudioLDMPipeline
import torch
import scipy

repo_id = "cvssp/audioldm"
pipe = AudioLDMPipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "Techno music with a strong, upbeat tempo and high melodic riffs"
audio = pipe(prompt, num_inference_steps=10, audio_length_in_s=5.0).audios[0]

# save the audio sample as a .wav file
scipy.io.wavfile.write("techno.wav", rate=16000, data=audio)
```

### Tips

Prompts:
* Descriptive prompt inputs work best: you can use adjectives to describe the sound (e.g. "high quality" or "clear") and make the prompt context specific (e.g., "water stream in a forest" instead of "stream").
* It's best to use general terms like 'cat' or 'dog' instead of specific names or abstract objects that the model may not be familiar with.

Inference:
* The _quality_ of the predicted audio sample can be controlled by the `num_inference_steps` argument: higher steps give higher quality audio at the expense of slower inference.
* The _length_ of the predicted audio sample can be controlled by varying the `audio_length_in_s` argument.

### How to load and use different schedulers

The AudioLDM pipeline uses [`DDIMScheduler`] scheduler by default. But `diffusers` provides many other schedulers
that can be used with the AudioLDM pipeline such as [`PNDMScheduler`], [`LMSDiscreteScheduler`], [`EulerDiscreteScheduler`],
[`EulerAncestralDiscreteScheduler`] etc. We recommend using the [`DPMSolverMultistepScheduler`] as it's currently the fastest
scheduler there is.

To use a different scheduler, you can either change it via the [`ConfigMixin.from_config`]
method, or pass the `scheduler` argument to the `from_pretrained` method of the pipeline. For example, to use the
[`DPMSolverMultistepScheduler`], you can do the following:

```python
>>> from diffusers import AudioLDMPipeline, DPMSolverMultistepScheduler
>>> import torch

>>> pipeline = AudioLDMPipeline.from_pretrained("cvssp/audioldm", torch_dtype=torch.float16)
>>> pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)

>>> # or
>>> dpm_scheduler = DPMSolverMultistepScheduler.from_pretrained("cvssp/audioldm", subfolder="scheduler")
>>> pipeline = AudioLDMPipeline.from_pretrained("cvssp/audioldm", scheduler=dpm_scheduler, torch_dtype=torch.float16)
```

## AudioLDMPipeline
[[autodoc]] AudioLDMPipeline
- all
- __call__
9 changes: 5 additions & 4 deletions docs/source/en/api/pipelines/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ components - all of which are needed to have a functioning end-to-end diffusion
As an example, [Stable Diffusion](https://huggingface.co/blog/stable_diffusion) has three independently trained models:
- [Autoencoder](./api/models#vae)
- [Conditional Unet](./api/models#UNet2DConditionModel)
- [CLIP text encoder](https://huggingface.co/docs/transformers/v4.21.2/en/model_doc/clip#transformers.CLIPTextModel)
- [CLIP text encoder](https://huggingface.co/docs/transformers/v4.27.1/en/model_doc/clip#transformers.CLIPTextModel)
- a scheduler component, [scheduler](./api/scheduler#pndm),
- a [CLIPFeatureExtractor](https://huggingface.co/docs/transformers/v4.21.2/en/model_doc/clip#transformers.CLIPFeatureExtractor),
- a [CLIPImageProcessor](https://huggingface.co/docs/transformers/v4.27.1/en/model_doc/clip#transformers.CLIPImageProcessor),
- as well as a [safety checker](./stable_diffusion#safety_checker).
All of these components are necessary to run stable diffusion in inference even though they were trained
or created independently from each other.
Expand Down Expand Up @@ -83,6 +83,7 @@ available a colab notebook to directly try them out.
| [versatile_diffusion](./versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Image Variations Generation |
| [versatile_diffusion](./versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation |
| [vq_diffusion](./vq_diffusion) | [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) | Text-to-Image Generation |
| [text_to_video_zero](./text_to_video_zero) | [Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators](https://arxiv.org/abs/2303.13439) | Text-to-Video Generation |


**Note**: Pipelines are simple examples of how to play around with the diffusion systems as described in the corresponding papers.
Expand All @@ -108,7 +109,7 @@ from the local path.
each pipeline, one should look directly into the respective pipeline.

**Note**: All pipelines have PyTorch's autograd disabled by decorating the `__call__` method with a [`torch.no_grad`](https://pytorch.org/docs/stable/generated/torch.no_grad.html) decorator because pipelines should
not be used for training. If you want to store the gradients during the forward pass, we recommend writing your own pipeline, see also our [community-examples](https://github.com/huggingface/diffusers/tree/main/examples/community)
not be used for training. If you want to store the gradients during the forward pass, we recommend writing your own pipeline, see also our [community-examples](https://github.com/huggingface/diffusers/tree/main/examples/community).

## Contribution

Expand Down Expand Up @@ -173,7 +174,7 @@ You can also run this example on colab [![Open In Colab](https://colab.research.

### Tweak prompts reusing seeds and latents

You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked. [This notebook](https://github.com/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) shows how to do it step by step. You can also run it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb).
You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked. [This notebook](https://github.com/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) shows how to do it step by step. You can also run it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb)


### In-painting using Stable Diffusion
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/api/pipelines/paint_by_example.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.

## Overview

[Paint by Example: Exemplar-based Image Editing with Diffusion Models](https://arxiv.org/abs/2211.13227) by Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen
[Paint by Example: Exemplar-based Image Editing with Diffusion Models](https://arxiv.org/abs/2211.13227) by Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen.

The abstract of the paper is the following:

Expand Down
6 changes: 3 additions & 3 deletions docs/source/en/api/pipelines/semantic_stable_diffusion.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,11 @@ The abstract of the paper is the following:

| Pipeline | Tasks | Colab | Demo
|---|---|:---:|:---:|
| [pipeline_semantic_stable_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/semantic_stable_diffusion/pipeline_semantic_stable_diffusion) | *Text-to-Image Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/semantic-image-editing/blob/main/examples/SemanticGuidance.ipynb) | [Coming Soon](https://huggingface.co/AIML-TUDA)
| [pipeline_semantic_stable_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/semantic_stable_diffusion/pipeline_semantic_stable_diffusion.py) | *Text-to-Image Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/semantic-image-editing/blob/main/examples/SemanticGuidance.ipynb) | [Coming Soon](https://huggingface.co/AIML-TUDA)

## Tips

- The Semantic Guidance pipeline can be used with any [Stable Diffusion](./api/pipelines/stable_diffusion/text2img) checkpoint.
- The Semantic Guidance pipeline can be used with any [Stable Diffusion](./stable_diffusion/text2img) checkpoint.

### Run Semantic Guidance

Expand Down Expand Up @@ -67,7 +67,7 @@ out = pipe(
)
```

For more examples check the colab notebook.
For more examples check the Colab notebook.

## StableDiffusionSafePipelineOutput
[[autodoc]] pipelines.semantic_stable_diffusion.SemanticStableDiffusionPipelineOutput
Expand Down
Loading