build extensions in parallel #1882

yuantailing · 2025-02-20T04:40:42Z

Previous Behaviour: extensions are built in series, despite the fact that multiple files in the same extension are compiled in parallel

This pull request adds support for parallel building of multiple extensions. Benchmark results show:

CPU	Build Parameters	Build Time
AMD EPYC 24-Core (48 threads)	Original (No optimizations)	45m29.243s
AMD EPYC 24-Core (48 threads)	`--parallel 4`	12m55.243s
AMD EPYC 24-Core (48 threads)	`--parallel 16`	6m47.962s
AMD EPYC 24-Core (48 threads)	`NVCC_APPEND_FLAGS="--threads 8"`	19m23.878s
AMD EPYC 24-Core (48 threads)	`--parallel 4`, `NVCC_APPEND_FLAGS="--threads 8"`	7m33.151s
AMD EPYC 24-Core (48 threads)	`--parallel 16`, `NVCC_APPEND_FLAGS="--threads 8"`	5m58.479s
Intel Xeon 112-Core (224 threads)	`NVCC_APPEND_FLAGS="--threads 8"`	14m9.081s
Intel Xeon 112-Core (224 threads)	`--parallel 16`, `NVCC_APPEND_FLAGS="--threads 8"`	2m24.733s

Memory usage is shown below. The "mem used" values are obtained using the free command, and background memory usage is included.

Build Parameters	Peak mem used
Original	24.11 GiB
`--parallel 16`	58.25 GiB
`NVCC_APPEND_FLAGS="--threads 8"`	91.39 GiB
`--parallel 16`, `NVCC_APPEND_FLAGS="--threads 8"`	150.96 GiB

Image: nvcr.io/nvidia/pytorch:25.01-py3
(or other images with the same CUDA version and TORCH_CUDA_ARCH_LISTS)

cmdline: time NVCC_APPEND_FLAGS="--threads 8" pip wheel -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext --distributed_adam --distributed_lamb --cuda_ext --permutation_search --bnp --xentropy --focal_loss --group_norm --index_mul_2d --deprecated_fused_adam --deprecated_fused_lamb --fast_layer_norm --fmha --fast_multihead_attn --transducer --cudnn_gbn --peer_memory --nccl_p2p --fast_bottleneck --fused_conv_bias_relu --nccl_allocator --gpu_direct_storage --parallel 16" ./

alpha0422 · 2025-02-20T04:43:50Z

@crcrpar Could you help review this PR? This reduces APEX build time a lot.

crcrpar · 2025-02-21T07:29:15Z

README.md

 cd apex
 # if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key... 
-pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
+pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext --cuda_ext --parallel 4" ./


Likewise --thread option, this would increase CPU mem usage, so could you separately add the example command with --thread and --parallel?

Updated README.md

alpha0422 · 2025-02-24T01:53:10Z

README.md


+To reduce the build time of APEX, parallel building can be enhanced via
+```bash
+export NVCC_APPEND_FLAGS="--threads 4"


I'd suggest not exporting this env, it affects nvcc globally.

Moved it to Temporary Environment Scope

crcrpar

Thank you for implementing this nice option

Build extensions in parallel (NVIDIA/apex#1882)

build extensions in parallel

f0df092

crcrpar reviewed Feb 21, 2025

View reviewed changes

fix: setup.py develop supports --parallel

d9c3507

yuantailing force-pushed the build_in_parallel branch from 1ef7547 to d9c3507 Compare February 21, 2025 08:50

update README.md

f1ee5b1

yuantailing force-pushed the build_in_parallel branch from 8e833de to f1ee5b1 Compare February 21, 2025 18:48

alpha0422 reviewed Feb 24, 2025

View reviewed changes

Update README.md: Remove an export command

fd28caf

crcrpar approved these changes Feb 25, 2025

View reviewed changes

crcrpar merged commit c9e6f05 into NVIDIA:master Feb 25, 2025

oraluben added a commit to oraluben/SageAttention that referenced this pull request Jul 7, 2025

Build extensions in parallel (NVIDIA/apex#1882)

7d598bd

oraluben mentioned this pull request Jul 7, 2025

Build extensions in parallel (NVIDIA/apex#1882) thu-ml/SageAttention#207

Merged

XiaomingXu1995 added a commit to thu-ml/SageAttention that referenced this pull request Jul 13, 2025

Merge pull request #207 from oraluben/parallel-build-ext

cf387aa

Build extensions in parallel (NVIDIA/apex#1882)

forrestl111 pushed a commit to forrestl111/SageAttention that referenced this pull request Jul 23, 2025

Build extensions in parallel (NVIDIA/apex#1882)

0124107

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

build extensions in parallel #1882

build extensions in parallel #1882

Uh oh!

yuantailing commented Feb 20, 2025 •

edited

Loading

Uh oh!

alpha0422 commented Feb 20, 2025

Uh oh!

crcrpar Feb 21, 2025

Uh oh!

yuantailing Feb 21, 2025

Uh oh!

alpha0422 Feb 24, 2025

Uh oh!

yuantailing Feb 24, 2025

Uh oh!

crcrpar left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

build extensions in parallel #1882

build extensions in parallel #1882

Uh oh!

Conversation

yuantailing commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alpha0422 commented Feb 20, 2025

Uh oh!

crcrpar Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

yuantailing Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

alpha0422 Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

yuantailing Feb 24, 2025

Choose a reason for hiding this comment

Uh oh!

crcrpar left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yuantailing commented Feb 20, 2025 •

edited

Loading