-
Notifications
You must be signed in to change notification settings - Fork 227
formalize the ROCm image pre-build process #1414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ROCm must build magma and miopen from source. This takes considerable time and also exceeds allowed disk space on the GitHub Action runner. ROCm images are now a two-stage process. For example, first run: GPU_ARCH_VERSION=5.5 ./manywheel/build_docker_rocm_prebuild.sh This builds and pushes the image: rocm/dev-centos-7:5.5-magma-miopen-staging Then the GHA workflow can proceed as usual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add CI to do that. Also, there are no harm in picking a bigger runner for the job (as Docker images are rebuilt very infrequently)
Also, just curios, why MAGMA built mechanism for ROCm and CUDA are diverged?
@jithunnair-amd and that's a problem, isn't it? Because if somebody needs to fix/update magma, they'll have to ask you guys to update the basic container.
Yes, I know the mechanics, but the question is: why? I.e. magma for CUDA is build by the following workflow, why can't rocm build of magma be added to it? |
I will say that we actually considered using one of our ROCm GHA runners for the builder CI jobs, but picking a bigger CPU-only GHA runner seems like a fine idea too. @jeffdaily, should we try the bigger GHA runner to see if we can avoid this two-stage docker build process?
Actually, we have been working on this recently, trying to take a look at the CUDA magma build flow to see if we can apply it to ROCm magma as well. It's a WIP. |
@jithunnair-amd I'm not saying that ROCm magma builds need to follow CUDA magma builds or vice versa, but it would be nice if workflows can be kept as close to each other as possible. |
YES! That would fix our issues, too, without all this pre-build trickery. |
@malfet @seemethere What are the runner instance names to use for larger diskspace/CPU? This doesn't indicate specific labels: https://docs.github.com/en/actions/using-github-hosted-runners/using-larger-runners#machine-specs-for-larger-runners? |
@jithunnair-amd we have self-hosted runners running on AWS which are defined in https://github.com/pytorch/test-infra/blob/main/.github/scale-config.yml |
Closing in favor of #1418. |
ROCm must build magma and miopen from source. This takes considerable time and also exceeds allowed disk space on the GitHub Action runner.
ROCm images are now a two-stage process. For example, first run:
GPU_ARCH_VERSION=5.5 ./manywheel/build_docker_rocm_prebuild.sh
This builds and pushes the image:
rocm/dev-centos-7:5.5-magma-miopen-staging
Then the GHA workflow can proceed as usual.