Skip to content

Conversation

@dkuegler
Copy link
Member

@dkuegler dkuegler commented Dec 9, 2022

Create a docker multistage build script that is compatible with FastSurfer interface rework.

All images are built with the "core" Dockerfile at Docker/Dockerfile. Also update some additional folders to not be copied automatically by the build (.dockerignore).

TODOs:

  • Testing
  • check for the updates to AMD (rocm/pytorch is now python 3.8 as well)

@dkuegler dkuegler self-assigned this Dec 9, 2022
@dkuegler
Copy link
Member Author

dkuegler commented Dec 9, 2022

@AhmedFaisal95 Please have a look and see what still needs to be done.
I have already added some documentation in Docker/README.md

At a minimum, this drastically reduces the doubling of code.

It seems that DOCKER_BUILDKIT also supports the script now. Which speeds up builds a lot, since it builds stages that are independent in parallel.

For example

DOCKER_BUILDKIT=1 docker build --rm=true --target runtime --build-arg DEVICE=cuda --build-arg FREESURFER=pruned -t fastsurfer:gpu -f ./Docker/Dockerfile .

@dkuegler dkuegler force-pushed the feature/docker_multistage branch from 52b0abf to f4f108d Compare December 9, 2022 18:56
@dkuegler dkuegler mentioned this pull request Dec 12, 2022
@af-a
Copy link
Contributor

af-a commented Dec 12, 2022

This seems to run fine except when:

  1. build arg DEVICE is set to cpu (any target)
  2. the target is runtime_surf_only and DEVICE is set to none.

Case 1

Command:

DOCKER_BUILDKIT=1 docker build --rm=true --target runtime --build-arg DEVICE=cpu -t IMAGE_ID -f ./Docker/Dockerfile .

Result:

[+] Building 0.6s (11/17)                                                                                                                                                                                                                                                                                                                                                           
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                                                                                                                                           0.3s
 => => transferring dockerfile: 5.62kB                                                                                                                                                                                                                                                                                                                                         0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                                                                                                                                              0.4s
 => => transferring context: 136B                                                                                                                                                                                                                                                                                                                                              0.0s
 => [internal] load metadata for docker.io/library/ubuntu:20.04                                                                                                                                                                                                                                                                                                                0.0s
 => CANCELED [internal] load build context                                                                                                                                                                                                                                                                                                                                     0.0s
 => [build_conda_base 1/3] FROM docker.io/library/ubuntu:20.04                                                                                                                                                                                                                                                                                                                 0.0s
 => CACHED [build_conda_base 2/3] RUN apt-get update && apt-get install -y --no-install-recommends       wget       git       ca-certificates       upx       file &&     apt clean &&     rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*                                                                                                                                       0.0s
 => CACHED [build_conda_base 3/3] RUN wget --no-check-certificate -qO ~/miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-py38_4.11.0-Linux-x86_64.sh  &&      chmod +x ~/miniconda.sh &&      ~/miniconda.sh -b -p /opt/conda &&      rm ~/miniconda.sh                                                                                                             0.0s
 => CACHED [build_conda_cpu 1/1] RUN conda install --channel pytorch cpuonly=1.0                                                                                                                                                                                                                                                                                               0.0s
 => CACHED [runtime 1/7] RUN apt-get update && apt-get install -y --no-install-recommends       tcsh       time       bc       gawk       libgomp1       libquadmath0 &&     apt clean &&     rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*                                                                                                                                    0.0s
 => CACHED [runtime 2/7] RUN echo "source /venv/bin/activate" >> ~/.bashrc                                                                                                                                                                                                                                                                                                     0.0s
 => ERROR [runtime 3/7] COPY --from=build_conda /venv /venv                                                                                                                                                                                                                                                                                                                    0.0s
------
 > [runtime 3/7] COPY --from=build_conda /venv /venv:
------
failed to compute cache key: "/venv" not found: not found

Case 2

Command:

DOCKER_BUILDKIT=1 docker build --rm=true --target runtime_surf_only --build-arg DEVICE=none --build-arg FREESURFER=pruned -t IMAGE_ID -f ./Docker/Dockerfile .

Result:

[+] Building 1.6s (4/4) FINISHED                                                           
 => [internal] load build definition from Dockerfile                                  0.3s
 => => transferring dockerfile: 140B                                                  0.0s
 => [internal] load .dockerignore                                                     0.5s
 => => transferring context: 136B                                                     0.0s
 => ERROR [internal] load metadata for docker.io/library/base_none:latest             1.0s
 => [internal] load metadata for docker.io/library/ubuntu:20.04                       0.0s
------
 > [internal] load metadata for docker.io/library/base_none:latest:
------

@af-a
Copy link
Contributor

af-a commented Dec 12, 2022

The issues mentioned above should be fixed in this PR: dkuegler#2

@dkuegler dkuegler force-pushed the feature/docker_multistage branch from e21db27 to a76d4c7 Compare December 12, 2022 12:57
@dkuegler
Copy link
Member Author

dkuegler commented Dec 12, 2022

@AhmedFaisal95 Can you confirm this works now?

We need a "basic test" of all images :).

@dkuegler
Copy link
Member Author

dkuegler commented Dec 12, 2022

@m-reuter The Apple M1 device is missing from the docker multistage built. The reason is that it was not in dev at all.

For the AMD build, the base image got updated and I am not really sure if the updated image works. Maybe, someone can check this as well?

@af-a
Copy link
Contributor

af-a commented Dec 12, 2022

@dkuegler

@AhmedFaisal95 Can you confirm this works now?

We need a "basic test" of all images :).

I already tested the fixes before submitting the PR, specifically all images except for the AMD case (no hardware).

@dkuegler dkuegler force-pushed the feature/docker_multistage branch 2 times, most recently from 9024e01 to d187232 Compare December 12, 2022 19:51
@dkuegler dkuegler force-pushed the feature/docker_multistage branch from d187232 to 5007bd9 Compare December 21, 2022 10:59
@dkuegler
Copy link
Member Author

dkuegler commented Jan 3, 2023

The build script should also compile the byte code https://docs.python.org/3/library/py_compile.html .

@dkuegler dkuegler force-pushed the feature/docker_multistage branch from 5007bd9 to e11150d Compare January 10, 2023 17:36
@dkuegler
Copy link
Member Author

I think what is left for this PR to be merged is proper testing.
Especially for the AMD build.

@m-reuter
Copy link
Member

The Apple M1 device is missing from the docker multistage built. The reason is that it was not in dev at all.

For M1 we do not use a docker, only native install on Mac. Inside Docker it was much slower for yet unknown reasons, see #109

@dkuegler dkuegler force-pushed the feature/docker_multistage branch from e11150d to 3660c30 Compare February 6, 2023 13:59
@dkuegler
Copy link
Member Author

dkuegler commented Feb 6, 2023

@agirodi can you test the AMD docker script?

@dkuegler dkuegler force-pushed the feature/docker_multistage branch from 3660c30 to 0310bbe Compare February 8, 2023 12:03
@dkuegler dkuegler force-pushed the feature/docker_multistage branch 2 times, most recently from 7e23aea to 160d867 Compare February 15, 2023 13:28
@agirodi
Copy link
Contributor

agirodi commented Feb 23, 2023

@agirodi can you test the AMD docker script?

I just did and it is working correctly. The build went without issues and running produced a correct segmentation.

@dkuegler dkuegler marked this pull request as ready for review February 23, 2023 17:27
@dkuegler
Copy link
Member Author

Then, this PR should be ready for merging.

@dkuegler dkuegler force-pushed the feature/docker_multistage branch from 160d867 to 53d6070 Compare March 2, 2023 11:05
@dkuegler dkuegler force-pushed the feature/docker_multistage branch from 53d6070 to df70ab0 Compare March 10, 2023 14:22
@m-reuter
Copy link
Member

A few points for documentation:

  • due to the minimal larger size when including FreeSurfer binaries, we will stop publishing docker images without FreeSurfer
  • pytorch gpu is still much larger than cpu version so we will continue to publish both versions
  • AMD is experimental and can break without notice, which is why we will provide build files, but will not publish the container for now.
  • As dependencies (versions) remain relatively stable, we will have a base release image without pytorch (including all other dependencies like python packages and FreeSurfer binaries), and then two versions from that with pytorch GPU and CPU.
  • FastSurfer is then added on top of these three as the final layer. The non-pyorch entry point remains recon-surf, for the other two it will be run_fastsurfer.sh.. Users who want to directly run_prediction, need to modify the docker run command and overwrite the entry point or add another layer.

The advantage of this approach is, that for future docker builds we can pull the base release images and use them (in most cases they probably exist locally anyway). Adding FastSurfer is then extremely fast :-)

@dkuegler dkuegler force-pushed the feature/docker_multistage branch from df70ab0 to 458828d Compare March 24, 2023 16:50
@dkuegler dkuegler force-pushed the feature/docker_multistage branch 2 times, most recently from 1cdab68 to 02a0b3f Compare May 30, 2023 10:28
@dkuegler dkuegler force-pushed the feature/docker_multistage branch from 02a0b3f to a868f83 Compare June 15, 2023 13:39
@dkuegler dkuegler force-pushed the feature/docker_multistage branch 2 times, most recently from 9813c67 to ca50067 Compare July 4, 2023 09:13
@dkuegler dkuegler force-pushed the feature/docker_multistage branch 5 times, most recently from 4f689e7 to 5801fed Compare July 25, 2023 18:09
dkuegler and others added 11 commits August 8, 2023 17:14
…urfer interface rework.

All images are built with the "core" Dockerfile at Docker/Dockerfile.
Also update some additional folders to not be copied automatically by the build (.dockerignore).
This enables building runtime_surf_only with DEVICE=none.
Fix: Change build_conda_cpu base (build_conda_base --> build_conda_common)
Add some more files and folders to .dockerignore
Remove the chmod on checkpoints command from the Dockerfile
change the file with BUILD information to BUILD.txt (instead of VERSION.txt)
change the formatting options in segstats.py
Fundamentally, the cpu image now is ~3GB and does not include the cudatoolkit any more.
pytorch is now installed from the pytorch download page with all relevant platform-specific requirements such as the cudatoolkit or rocm drivers/runtime.
All this now always part of the conda environment.

- Update/fix build.py build script for python
- add conda-pack script to pack the conda environment
- update the Docker/README.md
- added an automatically created build.log file into each subject directory
- added a BUILD.info file into the docker container, that contains version information about version number, branch, git status at build time, checkpoints contained in container, and pip package list.
- added a FastSurferCNN/version.py script that replaces the version-related commands in run_fastsurfer.sh
- added baked in support for AWS
- remove torchio as standard imports in FastSurferCNN/data_loader/(dataset|loader).py, so we do not need torchio at inference time any more.
- added FastSurferCNN/utils/run_tools, a wrapper to subprocess.Popen that allows easier, non-blocking executiong of scripts and binaries.
- added BUILD.info to .gitignore
- removed BUILD.txt from .dockerignore
- run_fastsurfer.sh now also supports a --version command, see --help
- deactivate user-site imports (-s flag to python)
- improve debug info of image (in /install folder, if --build-arg DEBUG=true)
- add --progress=plain to docker build if running inside the python script because no over-write on the console is possible
- merge different pip sections in the Dockerfile, as multiple sections were not merged automatically by conda
- filter __pycache__ from version
- up ubuntu default versions to 22.04
- fix python -s for no user directory packages
- add `--tag my_fastsurfer:<device>` in README.md
- remove rocm environment variable
- add documentation for HSA_OVERRIDE_GFX_VERSION
@dkuegler dkuegler force-pushed the feature/docker_multistage branch from 5801fed to b296711 Compare August 8, 2023 15:15
@m-reuter m-reuter merged commit c293c00 into Deep-MI:dev Aug 8, 2023
@dkuegler dkuegler deleted the feature/docker_multistage branch August 8, 2023 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants