[Tests] Speed up some fast pipeline tests #7477

sayakpaul · 2024-03-26T10:19:03Z

What does this PR do?

Currently, the fast pipeline tests take about 20 minutes to fully execute. https://github.com/huggingface/diffusers/actions/runs/8430950874/artifacts/1358187212 gives an idea.

Below are the tests that take more than 10 seconds to run:

slowest durations
20.46s call     tests/pipelines/pia/test_pia.py::PIAPipelineFastTests::test_inference_batch_single_identical
20.17s call     tests/pipelines/animatediff/test_animatediff.py::AnimateDiffPipelineFastTests::test_inference_batch_single_identical
19.71s call     tests/pipelines/animatediff/test_animatediff.py::AnimateDiffPipelineFastTests::test_vae_slicing
19.43s call     tests/pipelines/stable_diffusion_2/test_stable_diffusion_attend_and_excite.py::StableDiffusionAttendAndExcitePipelineFastTests::test_karras_schedulers_shape
14.49s call     tests/pipelines/pia/test_pia.py::PIAPipelineFastTests::test_dict_tuple_outputs_equivalent
14.37s call     tests/pipelines/pia/test_pia.py::PIAPipelineFastTests::test_ip_adapter_single
14.28s call     tests/pipelines/pia/test_pia.py::PIAPipelineFastTests::test_ip_adapter_multi
13.95s call     tests/pipelines/animatediff/test_animatediff.py::AnimateDiffPipelineFastTests::test_save_load_local
13.90s call     tests/pipelines/pia/test_pia.py::PIAPipelineFastTests::test_save_load_optional_components
13.55s call     tests/pipelines/animatediff/test_animatediff.py::AnimateDiffPipelineFastTests::test_dict_tuple_outputs_equivalent
13.50s call     tests/pipelines/pia/test_pia.py::PIAPipelineFastTests::test_save_load_local
13.25s call     tests/pipelines/animatediff/test_animatediff.py::AnimateDiffPipelineFastTests::test_save_load_optional_components
10.60s call     tests/pipelines/audioldm2/test_audioldm2.py::AudioLDM2PipelineFastTests::test_audioldm2_num_waveforms_per_prompt
10.55s call     tests/pipelines/stable_diffusion_xl/test_stable_diffusion_xl_adapter.py::StableDiffusionXLAdapterPipelineFastTests::test_StableDiffusionMixin_component

IMO we should try to optimize them a little. This PR is a first attempt at that. I tried a lot but couldn't meaningfully optimize the other tests that fall under this category. LMK if you you folks have any suggestions.

yiyixuxu · 2024-03-26T23:30:00Z

thanks for starting this! @sayakpaul

for ip_adapter tests, I wonder if instead of getting a baseline output by running the pipeline without the ip-adapter,

diffusers/tests/pipelines/test_pipelines_common.py

Line 251 in 544710e

output_without_adapter = pipe(**inputs)[0]

I think maybe we can skip this and just compare it against a static slice

same can be applied in other tests e.g. test_dict_tuple_outputs_equivalent, test_inference_batch_single_identical

sayakpaul · 2024-03-27T03:06:14Z

Love the idea, @yiyixuxu. Let me work on this in this PR itself. Thank you!

HuggingFaceDocBuilderDev · 2024-03-27T03:59:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul · 2024-03-27T05:33:27Z

@yiyixuxu I made some changes to check against static slices for "test_ip_adapter_single". If the plan looks okay to you, I will do the same for "test_ip_adapter_multi". And then in a future PR, I will tackle the following

same can be applied in other tests e.g. test_dict_tuple_outputs_equivalent, test_inference_batch_single_identical

I think this will be easier for the reviewers too.

Have left some comments in line to explain some thoughts.

LMK.

sayakpaul · 2024-03-27T05:34:07Z

src/diffusers/utils/testing_utils.py

+    if max_torch_print:
+        torch.set_printoptions(threshold=10_000)
+
    test_name = os.environ.get("PYTEST_CURRENT_TEST")
    if not torch.is_tensor(tensor):
        tensor = torch.from_numpy(tensor)
+    if limit_to_slices:
+        tensor = tensor[0, -3:, -3:, -1]


Convenience options. They are not harmful.

sayakpaul · 2024-03-27T05:35:21Z

tests/pipelines/animatediff/test_animatediff.py

+    def test_ip_adapter_single(self):
+        expected_pipe_slice = None
+        if torch_device == "cpu":


We make use of torch_device which can change based on the environment. The slices below were obtained on Intel CPUs which is our default mode and used on the PR tests too.

For accelerators, the tests will run much faster anyway (even on MPS). So, I think this is the way to go here. But I welcome any other ideas too.

DN6 · 2024-03-27T06:30:13Z

The block output sizes in the models can also be reduced. A lot of them use 32 and 64. These can be reduce to 4 and 8

sayakpaul · 2024-03-27T06:44:36Z

The block output sizes in the models can also be reduced. A lot of them use 32 and 64. These can be reduce to 4 and 8

Agree. Could you take care of them in a separate PR? @DN6

sayakpaul · 2024-03-27T06:58:57Z

From the CI:

https://github.com/huggingface/diffusers/actions/runs/8447126617/job/23136985269?pr=7477#step:6:12714

It's passing on the CPU GCP VM I am testing this on. I can increase the tolerance to 9e-4.

yiyixuxu · 2024-03-28T00:11:26Z

how come it takes 23 min now?

sayakpaul · 2024-03-28T00:49:46Z

how come it takes 23 min now?

Probably because of the worker overload. Locally it takes less time is what I can confirm. Even on the CI, the timing is variable. I saw it took 18 minutes for a commit at one instance.

yiyixuxu

very nice!
I saw tests/pipelines/pia/test_pia.py::PIAPipelineFastTests::test_ip_adapter_single went from 14 seconds to 9.97

yiyixuxu · 2024-03-29T04:46:28Z

tests/pipelines/controlnet/test_controlnet_sdxl.py

+                expected_pipe_slice = np.array(
+                    [0.7331, 0.5907, 0.5667, 0.6029, 0.5679, 0.5968, 0.4033, 0.4761, 0.5090]
+                )
+        return super().test_ip_adapter_single(expected_pipe_slice=expected_pipe_slice)


should we pass device = "cpu" to test_ip_adapter_single when using slice? would it help precision?
I saw a test failing right now ......

See: #7477 (comment).

expected_slice is always None when the torch_device is not CPU.

yiyixuxu · 2024-03-29T04:48:21Z

cc @DN6 for a final review

we can open this task to the community too!

If the plan looks okay to you, I will do the same for "test_ip_adapter_multi". And then in a future PR, I will tackle the following

same can be applied in other tests e.g. test_dict_tuple_outputs_equivalent, test_inference_batch_single_identical

DN6

Cool with me. Additionally, if you drop the cross_attention_dim in the dummy components it should lead to a big speedup for all these tests.

sayakpaul · 2024-03-29T06:02:05Z

cc @DN6 for a final review

we can open this task to the community too!

If the plan looks okay to you, I will do the same for "test_ip_adapter_multi". And then in a future PR, I will tackle the following

same can be applied in other tests e.g. test_dict_tuple_outputs_equivalent, test_inference_batch_single_identical

@yiyixuxu I feel like this is kind of a good thing for us to completely tackle because it would greatly improve the CI experience not only just for the contributors but also for us, mantainers. So, I prefer to do this myself.

Cool with me. Additionally, if you drop the cross_attention_dim in the dummy components it should lead to a big speedup for all these tests.

@DN6 could you tackle this and #7477 (comment) in a PR?

sayakpaul · 2024-03-29T06:14:33Z

Okay so, the current failing test is different from the one I saw yesterday. The tolerance isn't that much to up, I believe. LMK.

* speed up test_vae_slicing in animatediff * speed up test_karras_schedulers_shape for attend and excite. * style. * get the static slices out. * specify torch print options. * modify * test run with controlnet * specify kwarg * fix: things * not None * flatten * controlnet img2img * complete controlet sd * finish more * finish more * finish more * finish more * finish the final batch * add cpu check for expected_pipe_slice. * finish the rest * remove print * style * fix ssd1b controlnet test * checking ssd1b * disable the test. * make the test_ip_adapter_single controlnet test more robust * fix: simple inpaint * multi * disable panorama * enable again * panorama is shaky so leave it for now * remove print * raise tolerance.

sayakpaul added 3 commits March 26, 2024 14:51

speed up test_vae_slicing in animatediff

508c472

speed up test_karras_schedulers_shape for attend and excite.

d45699e

style.

43f5496

sayakpaul requested review from DN6 and yiyixuxu March 26, 2024 10:19

sayakpaul added 4 commits March 27, 2024 08:36

Merge branch 'main' into speed-up-fastpipelines

75d5fff

get the static slices out.

ec4240b

specify torch print options.

70c459f

modify

a6781d7

sayakpaul added 18 commits March 27, 2024 09:34

test run with controlnet

eee7563

specify kwarg

031e374

fix: things

0f71843

not None

62f0669

flatten

aedccad

controlnet img2img

50de2cb

complete controlet sd

da64f0c

finish more

f8f79e3

finish more

77f93cd

finish more

ae43285

finish more

7d583e5

finish the final batch

b5e1da0

add cpu check for expected_pipe_slice.

9fa2ffd

finish the rest

b7ec1c2

remove print

8de2bdd

style

90736b0

fix ssd1b controlnet test

9752100

checking ssd1b

7260a1a

sayakpaul added 8 commits March 27, 2024 10:27

disable the test.

2cdc18e

make the test_ip_adapter_single controlnet test more robust

f562e75

fix: simple inpaint

0b0c290

multi

fe4c043

disable panorama

29a0b14

enable again

5b44d4e

panorama is shaky so leave it for now

a70526d

remove print

833b759

sayakpaul commented Mar 27, 2024

View reviewed changes

sayakpaul added 2 commits March 28, 2024 06:19

Merge branch 'main' into speed-up-fastpipelines

279c5b0

Merge branch 'main' into speed-up-fastpipelines

1b833ea

yiyixuxu reviewed Mar 29, 2024

View reviewed changes

DN6 approved these changes Mar 29, 2024

View reviewed changes

raise tolerance.

c0a6a11

sayakpaul merged commit fac7616 into main Mar 29, 2024

sayakpaul deleted the speed-up-fastpipelines branch March 29, 2024 08:41

sayakpaul mentioned this pull request Mar 29, 2024

[Tests] Speed up fast pipelines part II #7521

Merged

[Tests] Speed up some fast pipeline tests #7477

[Tests] Speed up some fast pipeline tests #7477

Uh oh!

Conversation

sayakpaul commented Mar 26, 2024

What does this PR do?

Uh oh!

yiyixuxu commented Mar 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented Mar 27, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Mar 27, 2024

Uh oh!

sayakpaul commented Mar 27, 2024

Uh oh!

sayakpaul Mar 27, 2024

Choose a reason for hiding this comment

Uh oh!

sayakpaul Mar 27, 2024

Choose a reason for hiding this comment

Uh oh!

DN6 commented Mar 27, 2024

Uh oh!

sayakpaul commented Mar 27, 2024

Uh oh!

sayakpaul commented Mar 27, 2024

Uh oh!

yiyixuxu commented Mar 28, 2024

Uh oh!

sayakpaul commented Mar 28, 2024

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Mar 29, 2024

Choose a reason for hiding this comment

Uh oh!

sayakpaul Mar 29, 2024

Choose a reason for hiding this comment

Uh oh!

yiyixuxu commented Mar 29, 2024

Uh oh!

DN6 left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Mar 29, 2024

Uh oh!

sayakpaul commented Mar 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yiyixuxu commented Mar 26, 2024 •

edited

Loading