Skip to content

Commit 4c758b3

Browse files
authored
Build nccl after installing cuda (#1670)
Fix: pytorch/pytorch#116977 Nccl 2.19.3 don't exist for cuda 11.8 and cuda 12.1. Refer to https://docs.nvidia.com/deeplearning/nccl/release-notes/rel_2-19-3.html#rel_2-19-3 CUDA 12.0, 12.2, 12.3 are supported. Hence we do manual build. Follow this build process: https://github.com/NVIDIA/nccl/tree/v2.19.3-1?tab=readme-ov-file#build We want nccl version be exactly the same as installed here: https://github.com/pytorch/pytorch/blob/main/.github/scripts/generate_binary_build_matrix.py#L45
1 parent 588ab91 commit 4c758b3

File tree

1 file changed

+12
-12
lines changed

1 file changed

+12
-12
lines changed

common/install_cuda.sh

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -33,13 +33,13 @@ function install_118 {
3333
rm -rf tmp_cudnn
3434

3535
# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
36-
mkdir tmp_nccl && cd tmp_nccl
37-
wget -q https://developer.download.nvidia.com/compute/redist/nccl/v2.15.5/nccl_2.15.5-1+cuda11.8_x86_64.txz
38-
tar xf nccl_2.15.5-1+cuda11.8_x86_64.txz
39-
cp -a nccl_2.15.5-1+cuda11.8_x86_64/include/* /usr/local/cuda/include/
40-
cp -a nccl_2.15.5-1+cuda11.8_x86_64/lib/* /usr/local/cuda/lib64/
36+
# Follow build: https://github.com/NVIDIA/nccl/tree/v2.19.3-1?tab=readme-ov-file#build
37+
git clone -b v2.19.3-1 --depth 1 https://github.com/NVIDIA/nccl.git
38+
cd nccl && make -j src.build
39+
cp -a build/include/* /usr/local/cuda/include/
40+
cp -a build/lib/* /usr/local/cuda/lib64/
4141
cd ..
42-
rm -rf tmp_nccl
42+
rm -rf nccl
4343

4444
install_cusparselt_040
4545

@@ -66,13 +66,13 @@ function install_121 {
6666
rm -rf tmp_cudnn
6767

6868
# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
69-
mkdir tmp_nccl && cd tmp_nccl
70-
wget -q https://developer.download.nvidia.com/compute/redist/nccl/v2.18.1/nccl_2.18.1-1+cuda12.1_x86_64.txz
71-
tar xf nccl_2.18.1-1+cuda12.1_x86_64.txz
72-
cp -a nccl_2.18.1-1+cuda12.1_x86_64/include/* /usr/local/cuda/include/
73-
cp -a nccl_2.18.1-1+cuda12.1_x86_64/lib/* /usr/local/cuda/lib64/
69+
# Follow build: https://github.com/NVIDIA/nccl/tree/v2.19.3-1?tab=readme-ov-file#build
70+
git clone -b v2.19.3-1 --depth 1 https://github.com/NVIDIA/nccl.git
71+
cd nccl && make -j src.build
72+
cp -a build/include/* /usr/local/cuda/include/
73+
cp -a build/lib/* /usr/local/cuda/lib64/
7474
cd ..
75-
rm -rf tmp_nccl
75+
rm -rf nccl
7676

7777
install_cusparselt_040
7878

0 commit comments

Comments
 (0)