-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Remove power-of-two constraint on the trip count when tail folding #82626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Comments
artagnon
added a commit
to artagnon/llvm-project
that referenced
this issue
May 6, 2024
9a087a3 (LoopVectorize: MaxVF should not be larger than the loop trip count) was the first commit to add the condition PowerOf2_32() of the trip-count to, what is now getMaximizedVFForTarget(). It made sense at the time, as there was no tail-folding support. Much later, 2025e09 ([LV] Make sure VF doesn't exceed compile time known TC) came along to patch this with an extra condition on FoldTailByMasking, in order to ensure that that the VF doesn't exceed the trip-count. However, it didn't go far enough, and we can do better, as there is existing code to clamp the trip-count, and do tail-folding. Fixes llvm#82626.
artagnon
added a commit
to artagnon/llvm-project
that referenced
this issue
May 23, 2024
9a087a3 (LoopVectorize: MaxVF should not be larger than the loop trip count) was the first commit to add the condition PowerOf2_32() of the trip-count to, what is now getMaximizedVFForTarget(). It made sense at the time, as there was no tail-folding support. Much later, 2025e09 ([LV] Make sure VF doesn't exceed compile time known TC) came along to patch this with an extra condition on FoldTailByMasking, in order to ensure that that the VF doesn't exceed the trip-count. However, it didn't go far enough, and we can do better, as there is existing code to clamp the trip-count, and do tail-folding. Fixes llvm#82626.
artagnon
added a commit
to artagnon/llvm-project
that referenced
this issue
May 28, 2024
9a087a3 (LoopVectorize: MaxVF should not be larger than the loop trip count) was the first commit to add the condition PowerOf2_32() of the trip-count to, what is now getMaximizedVFForTarget(). It made sense at the time, as there was no tail-folding support. Much later, 2025e09 ([LV] Make sure VF doesn't exceed compile time known TC) came along to patch this with an extra condition on FoldTailByMasking, in order to ensure that that the VF doesn't exceed the trip-count. However, it didn't go far enough, and we can do better, as there is existing code to clamp the trip-count, and do tail-folding. Fixes llvm#82626.
artagnon
added a commit
to artagnon/llvm-project
that referenced
this issue
May 28, 2024
9a087a3 (LoopVectorize: MaxVF should not be larger than the loop trip count) was the first commit to add the condition PowerOf2_32() of the trip-count to, what is now getMaximizedVFForTarget(). It made sense at the time, as there was no tail-folding support. Much later, 2025e09 ([LV] Make sure VF doesn't exceed compile time known TC) came along to patch this with an extra condition on FoldTailByMasking, in order to ensure that that the VF doesn't exceed the trip-count. However, it didn't go far enough, and we can do better, as there is existing code to clamp the trip-count, and do tail-folding. Fixes llvm#82626.
artagnon
added a commit
to artagnon/llvm-project
that referenced
this issue
Jun 4, 2024
9a087a3 (LoopVectorize: MaxVF should not be larger than the loop trip count) was the first commit to add the condition PowerOf2_32() of the trip-count to, what is now getMaximizedVFForTarget(). It made sense at the time, as there was no tail-folding support. Much later, 2025e09 ([LV] Make sure VF doesn't exceed compile time known TC) came along to patch this with an extra condition on FoldTailByMasking, in order to ensure that that the VF doesn't exceed the trip-count. However, it didn't go far enough, and we can do better, as there is existing code to clamp the trip-count, and do tail-folding. Fixes llvm#82626.
artagnon
added a commit
to artagnon/llvm-project
that referenced
this issue
Jul 22, 2024
9a087a3 (LoopVectorize: MaxVF should not be larger than the loop trip count) was the first commit to add the condition PowerOf2_32() of the trip-count to, what is now getMaximizedVFForTarget(). It made sense at the time, as there was no tail-folding support. Much later, 2025e09 ([LV] Make sure VF doesn't exceed compile time known TC) came along to patch this with an extra condition on FoldTailByMasking, in order to ensure that that the VF doesn't exceed the trip-count. However, it didn't go far enough, and we can do better, as there is existing code to clamp the trip-count, and do tail-folding. Fixes llvm#82626.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Currently, in
getMaximizedVFForTarget
in LoopVectorize.cpp, we add a constraint when tail-folding that the trip count (number of elements processed per loop iteration) we are working with must be a power of two in order for us to clamp the vector factor (number of vector elements processed per loop iteration when the loop is vectorised). If we know, for example, the maximum trip count is 7, it seems wasteful to still choose VF=16 or VF=vscale x 16. We would be better off clamping the value to be VF=8 or VF=vscale x 8 in that case, with the remaining element being ignored since we're tail-folding.llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Line 4718 in 27498e9
The text was updated successfully, but these errors were encountered: