Skip to content

[Tests] Run slow matrix sequentially #3500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 7, 2023
Merged

Conversation

pcuenca
Copy link
Member

@pcuenca pcuenca commented May 21, 2023

This is just a suspicion, feel free to close.

Both "Slow PyTorch CUDA tests on Ubuntu" and "Slow ONNXRuntime CUDA tests on Ubuntu" use the same runner (docker-gpu), and there is 1 machine configured to run those tests. My understanding is that the CI environment will run tests in a matrix in parallel by default, which could be the reason for the weird oom issues.

If this is actually the case, we could maybe reorganize the tests differently so that the "Slow Flax TPU tests", which use a different runner, run in parallel with any of these.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 21, 2023

The documentation is not available anymore as the PR was closed or merged.

@pcuenca
Copy link
Member Author

pcuenca commented May 21, 2023

Doesn't seem the cause. Running the tests inside the docker container it looks like a portion of memory is not being freed up and it accumulates. Running outside the container I don't see the same problem but compile doesn't work, I get error RuntimeError: Triton Error [CUDA]: device kernel image is invalid. This happens in the latest PyTorch (2.0.1).

@pcuenca
Copy link
Member Author

pcuenca commented May 22, 2023

Running tests with -k "not Flax and not Onnx and not compile", I don't see the OOM errors.

@patrickvonplaten
Copy link
Contributor

@pcuenca feel free to merge if it helps

@patrickvonplaten patrickvonplaten merged commit fdec231 into main Jun 7, 2023
@patrickvonplaten patrickvonplaten deleted the sequential-test-matrix branch June 7, 2023 10:01
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
[tests] Run slow matrix sequentially.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants