Add AudioLDM #2232

sanchit-gandhi · 2023-02-03T11:15:30Z

Original codebase: https://github.com/haoheliu/AudioLDM
Checkpoints: https://huggingface.co/spaces/haoheliu/audioldm-text-to-audio-generation/tree/main/ckpt

TODOs

UNet

Convert UNet weights
Add new modelling code
Verify correctness

VAE

Convert VAE weights
Verify correctness

Scheduler

Verify correctness

CLAP Text Embedding Model

Convert CLAP weights
Verify correctness

HiFiGAN Vocoder

Convert HiFiGAN weights
Verify correctness

Pipeline

Verify correctness
Tests

Docs

Add and populate docs mdx file

HuggingFaceDocBuilderDev · 2023-02-17T10:44:42Z

The documentation is not available anymore as the PR was closed or merged.

sanchit-gandhi · 2023-02-20T17:43:52Z

Implementation matches the original ✅ Tests + clean-up TODO

sanchit-gandhi

This PR is ready for a first look! Left a few comments regarding things I was unsure about / wanted to flag

src/diffusers/models/attention.py

src/diffusers/utils/dummy_torch_and_transformers_objects.py

src/diffusers/pipelines/audioldm/pipeline_audioldm.py

tests/pipelines/audioldm/test_audioldm.py

sanchit-gandhi · 2023-03-17T14:28:21Z

Swapped height -> audio_length_in_s and slimmed-down the number of fast/slow tests. Good to go on my end! Feel free to take a final look at the changes @patrickvonplaten @williamberman

patrickvonplaten · 2023-03-21T12:45:42Z

docs/source/en/api/pipelines/audioldm.mdx

+pipe = pipe.to("cuda")
+
+prompt = "Techno music with a strong, upbeat tempo and high melodic riffs"
+audio = pipe(prompt, num_inference_steps=10, audio_length_in_s=5.0).audios[0]


patrickvonplaten · 2023-03-21T14:26:03Z

@sanchit-gandhi think one last fast test is failing - could you check? :-)

patrickvonplaten · 2023-03-23T18:00:17Z

Test failures are unrelated - merging!

* Add AudioLDM * up * add vocoder * start unet * unconditional unet * clap, vocoder and vae * clean-up: conversion scripts * fix: conversion script token_type_ids * clean-up: pipeline docstring * tests: from SD * clean-up: cpu offload vocoder instead of safety checker * feat: adapt tests to audioldm * feat: add docs * clean-up: amend pipeline docstrings * clean-up: make style * clean-up: make fix-copies * fix: add doc path to toctree * clean-up: args for conversion script * clean-up: paths to checkpoints * fix: use conditional unet * clean-up: make style * fix: type hints for UNet * clean-up: docstring for UNet * clean-up: make style * clean-up: remove duplicate in docstring * clean-up: make style * clean-up: make fix-copies * clean-up: move imports to start in code snippet * fix: pass cross_attention_dim as a list/tuple to unet * clean-up: make fix-copies * fix: update checkpoint path * fix: unet cross_attention_dim in tests * film embeddings -> class embeddings * Apply suggestions from code review Co-authored-by: Will Berman <[email protected]> * fix: unet film embed to use existing args * fix: unet tests to use existing args * fix: make style * fix: transformers import and version in init * clean-up: make style * Revert "clean-up: make style" This reverts commit 5d6d1f8. * clean-up: make style * clean-up: use pipeline tester mixin tests where poss * clean-up: skip attn slicing test * fix: add torch dtype to docs * fix: remove conversion script out of src * fix: remove .detach from 1d waveform * fix: reduce default num inf steps * fix: swap height/width -> audio_length_in_s * clean-up: make style * fix: remove nightly tests * fix: imports in conversion script * clean-up: slim-down to two slow tests * clean-up: slim-down fast tests * fix: batch consistent tests * clean-up: make style * clean-up: remove vae slicing fast test * clean-up: propagate changes to doc * fix: increase test tol to 1e-2 * clean-up: finish docs * clean-up: make style * feat: vocoder / VAE compatibility check * feat: possibly expand / cut audio waveform * fix: pipeline call signature test * fix: slow tests output len * clean-up: make style * make style --------- Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: William Berman <[email protected]>

sanchit-gandhi and others added 8 commits February 3, 2023 12:14

Add AudioLDM

88b77bc

up

9b353b0

add vocoder

1a3ea27

Merge branch 'main' into audioldm

22a315d

start unet

1023f68

unconditional unet

81bff99

Merge remote-tracking branch 'origin/audioldm' into audioldm

990b622

clap, vocoder and vae

6aa4fda

sanchit-gandhi added 5 commits February 20, 2023 13:39

clean-up: conversion scripts

2482b42

fix: conversion script token_type_ids

9d986c4

clean-up: pipeline docstring

004fed8

tests: from SD

9feb6ba

clean-up: cpu offload vocoder instead of safety checker

bf3964c

sanchit-gandhi added 11 commits February 21, 2023 13:58

feat: adapt tests to audioldm

f200e80

feat: add docs

dd04c2e

clean-up: amend pipeline docstrings

1c26ca9

clean-up: make style

d32bd7f

clean-up: make fix-copies

447013e

fix: add doc path to toctree

08d6a1f

clean-up: args for conversion script

9597761

clean-up: paths to checkpoints

10c584d

fix: use conditional unet

0f15408

clean-up: make style

d99c9e8

fix: type hints for UNet

293f2a4

sanchit-gandhi commented Feb 21, 2023

View reviewed changes

src/diffusers/models/attention.py Outdated Show resolved Hide resolved

src/diffusers/utils/dummy_torch_and_transformers_objects.py Show resolved Hide resolved

src/diffusers/pipelines/audioldm/pipeline_audioldm.py Outdated Show resolved Hide resolved

sanchit-gandhi requested review from patil-suraj and williamberman February 21, 2023 16:04

sanchit-gandhi commented Feb 21, 2023

View reviewed changes

tests/pipelines/audioldm/test_audioldm.py Outdated Show resolved Hide resolved

sanchit-gandhi and others added 11 commits March 17, 2023 12:05

clean-up: slim-down to two slow tests

a9faabb

clean-up: slim-down fast tests

9f26689

fix: batch consistent tests

7bc812d

clean-up: make style

f0002f1

clean-up: remove vae slicing fast test

a0a156a

clean-up: propagate changes to doc

a01022a

fix: increase test tol to 1e-2

460231e

Merge branch 'main' into audioldm

9cb4426

clean-up: finish docs

c8a7436

Merge remote-tracking branch 'origin/audioldm' into audioldm

01fbbcf

clean-up: make style

ee67277

Merge branch 'main' into audioldm

5620390

patrickvonplaten reviewed Mar 21, 2023

View reviewed changes

sanchit-gandhi and others added 9 commits March 23, 2023 11:22

feat: vocoder / VAE compatibility check

d8ab1a1

feat: possibly expand / cut audio waveform

56e3fb9

fix: pipeline call signature test

e66dfc7

Merge remote-tracking branch 'origin/audioldm' into audioldm

4d7849e

fix: slow tests output len

7ed071a

clean-up: make style

b90d564

Merge branch 'main' into audioldm

ef0e8b3

Merge branch 'main' into audioldm

b0ade43

make style

ef6c8e0

patrickvonplaten merged commit b94880e into huggingface:main Mar 23, 2023

sanchit-gandhi deleted the audioldm branch April 4, 2023 08:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add AudioLDM #2232

Add AudioLDM #2232

Uh oh!

sanchit-gandhi commented Feb 3, 2023 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Feb 17, 2023 •

edited

Loading

Uh oh!

sanchit-gandhi commented Feb 20, 2023

Uh oh!

sanchit-gandhi left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sanchit-gandhi commented Mar 17, 2023

Uh oh!

patrickvonplaten Mar 21, 2023

Uh oh!

patrickvonplaten commented Mar 21, 2023

Uh oh!

patrickvonplaten commented Mar 23, 2023

Uh oh!

Uh oh!

Add AudioLDM #2232

Add AudioLDM #2232

Uh oh!

Conversation

sanchit-gandhi commented Feb 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Feb 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sanchit-gandhi commented Feb 20, 2023

Uh oh!

sanchit-gandhi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sanchit-gandhi commented Mar 17, 2023

Uh oh!

patrickvonplaten Mar 21, 2023

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten commented Mar 21, 2023

Uh oh!

patrickvonplaten commented Mar 23, 2023

Uh oh!

Uh oh!

sanchit-gandhi commented Feb 3, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 17, 2023 •

edited

Loading