-
Notifications
You must be signed in to change notification settings - Fork 6.6k
[Tests] Speed up some fast pipeline tests #7477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
thanks for starting this! @sayakpaul for ip_adapter tests, I wonder if instead of getting a baseline output by running the pipeline without the ip-adapter,
same can be applied in other tests e.g. test_dict_tuple_outputs_equivalent, test_inference_batch_single_identical |
|
Love the idea, @yiyixuxu. Let me work on this in this PR itself. Thank you! |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@yiyixuxu I made some changes to check against static slices for "test_ip_adapter_single". If the plan looks okay to you, I will do the same for "test_ip_adapter_multi". And then in a future PR, I will tackle the following
I think this will be easier for the reviewers too. Have left some comments in line to explain some thoughts. LMK. |
| if max_torch_print: | ||
| torch.set_printoptions(threshold=10_000) | ||
|
|
||
| test_name = os.environ.get("PYTEST_CURRENT_TEST") | ||
| if not torch.is_tensor(tensor): | ||
| tensor = torch.from_numpy(tensor) | ||
| if limit_to_slices: | ||
| tensor = tensor[0, -3:, -3:, -1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convenience options. They are not harmful.
| def test_ip_adapter_single(self): | ||
| expected_pipe_slice = None | ||
| if torch_device == "cpu": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We make use of torch_device which can change based on the environment. The slices below were obtained on Intel CPUs which is our default mode and used on the PR tests too.
For accelerators, the tests will run much faster anyway (even on MPS). So, I think this is the way to go here. But I welcome any other ideas too.
|
The block output sizes in the models can also be reduced. A lot of them use 32 and 64. These can be reduce to 4 and 8 |
Agree. Could you take care of them in a separate PR? @DN6 |
|
From the CI: It's passing on the CPU GCP VM I am testing this on. I can increase the tolerance to |
|
how come it takes 23 min now? |
Probably because of the worker overload. Locally it takes less time is what I can confirm. Even on the CI, the timing is variable. I saw it took 18 minutes for a commit at one instance. |
yiyixuxu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice!
I saw tests/pipelines/pia/test_pia.py::PIAPipelineFastTests::test_ip_adapter_single went from 14 seconds to 9.97
| expected_pipe_slice = np.array( | ||
| [0.7331, 0.5907, 0.5667, 0.6029, 0.5679, 0.5968, 0.4033, 0.4761, 0.5090] | ||
| ) | ||
| return super().test_ip_adapter_single(expected_pipe_slice=expected_pipe_slice) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we pass device = "cpu" to test_ip_adapter_single when using slice? would it help precision?
I saw a test failing right now ......
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See: #7477 (comment).
expected_slice is always None when the torch_device is not CPU.
|
cc @DN6 for a final review we can open this task to the community too!
|
DN6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool with me. Additionally, if you drop the cross_attention_dim in the dummy components it should lead to a big speedup for all these tests.
@yiyixuxu I feel like this is kind of a good thing for us to completely tackle because it would greatly improve the CI experience not only just for the contributors but also for us, mantainers. So, I prefer to do this myself.
@DN6 could you tackle this and #7477 (comment) in a PR? |
|
Okay so, the current failing test is different from the one I saw yesterday. The tolerance isn't that much to up, I believe. LMK. |
* speed up test_vae_slicing in animatediff * speed up test_karras_schedulers_shape for attend and excite. * style. * get the static slices out. * specify torch print options. * modify * test run with controlnet * specify kwarg * fix: things * not None * flatten * controlnet img2img * complete controlet sd * finish more * finish more * finish more * finish more * finish the final batch * add cpu check for expected_pipe_slice. * finish the rest * remove print * style * fix ssd1b controlnet test * checking ssd1b * disable the test. * make the test_ip_adapter_single controlnet test more robust * fix: simple inpaint * multi * disable panorama * enable again * panorama is shaky so leave it for now * remove print * raise tolerance.
* speed up test_vae_slicing in animatediff * speed up test_karras_schedulers_shape for attend and excite. * style. * get the static slices out. * specify torch print options. * modify * test run with controlnet * specify kwarg * fix: things * not None * flatten * controlnet img2img * complete controlet sd * finish more * finish more * finish more * finish more * finish the final batch * add cpu check for expected_pipe_slice. * finish the rest * remove print * style * fix ssd1b controlnet test * checking ssd1b * disable the test. * make the test_ip_adapter_single controlnet test more robust * fix: simple inpaint * multi * disable panorama * enable again * panorama is shaky so leave it for now * remove print * raise tolerance.
What does this PR do?
Currently, the fast pipeline tests take about 20 minutes to fully execute. https://github.com/huggingface/diffusers/actions/runs/8430950874/artifacts/1358187212 gives an idea.
Below are the tests that take more than 10 seconds to run:
IMO we should try to optimize them a little. This PR is a first attempt at that. I tried a lot but couldn't meaningfully optimize the other tests that fall under this category. LMK if you you folks have any suggestions.