Skip to content

Conversation

@shumway
Copy link
Collaborator

@shumway shumway commented Dec 9, 2025

The existing sequence sort code is slow and shows up in build profiles.

This PR converts it to constexpr functions for much more efficient operation. The sorting is still O(N^2), but our sequences are small enough it executes quickly. This reduced compilation time of a small convolution by more than 10%. There are other sequence operations we can improve if this change works well.

The design implements the insert-sort algorithm as constexpr functions on arrays, and add helper code to convert from sequence to arrays and back to structs. The unique filter also works well as a constexpr function. We are somewhat limited in options for C++17 compatibility, and this PR uses standard build-time optimization techniques for metaprogramming prior to C++20.

Also add unit tests in a new file unit_sequence.hpp to check this sort functionality and other sequence functionality.

With 192 build threads on the narrow build for MIOpen, this cut the average time per thread from 875s to 819s, reducing total build time by 6% and total wall time by 1.8% (the wall time varies depending on ninja target scheduler).t

Old sequence sort code was showing up on build profiles. Convert it to constexpr functions for much more efficient operation. The sorting is still O(N^2), but our sequences are small enough it executes quickly. This reduced compilation time of a small convolution by more than 10%. There are other sequence operations we can improve if this change works well.
@shumway shumway force-pushed the jshumway/build-time branch from 09297c0 to fef942f Compare December 9, 2025 15:25
@shumway shumway merged commit 15ed65d into develop Dec 10, 2025
25 checks passed
@shumway shumway deleted the jshumway/build-time branch December 10, 2025 20:25
kabrahamAMD pushed a commit that referenced this pull request Dec 11, 2025
Old sequence sort code was showing up on build profiles. Convert it to constexpr functions for much more efficient build-time execution. The sorting is still O(N^2), but our sequences are small enough it executes quickly. This reduced compilation time of a small convolution by more than 10% and time overall time spent in the compiler on a narrow build by %6.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants