Skip to content

Remove power-of-two constraint on the trip count when tail folding #82626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
SamTebbs33 opened this issue Feb 22, 2024 · 0 comments
Open

Remove power-of-two constraint on the trip count when tail folding #82626

SamTebbs33 opened this issue Feb 22, 2024 · 0 comments

Comments

@SamTebbs33
Copy link
Collaborator

Currently, in getMaximizedVFForTarget in LoopVectorize.cpp, we add a constraint when tail-folding that the trip count (number of elements processed per loop iteration) we are working with must be a power of two in order for us to clamp the vector factor (number of vector elements processed per loop iteration when the loop is vectorised). If we know, for example, the maximum trip count is 7, it seems wasteful to still choose VF=16 or VF=vscale x 16. We would be better off clamping the value to be VF=8 or VF=vscale x 8 in that case, with the remaining element being ignored since we're tail-folding.

(!FoldTailByMasking || isPowerOf2_32(MaxTripCount))) {

@SamTebbs33 SamTebbs33 changed the title Clamp vector factor when the max trip count is known Remove power-of-two constraint on the trip count when tail folding Feb 22, 2024
artagnon added a commit to artagnon/llvm-project that referenced this issue May 6, 2024
9a087a3 (LoopVectorize: MaxVF should not be larger than the loop trip
count) was the first commit to add the condition PowerOf2_32() of the
trip-count to, what is now getMaximizedVFForTarget(). It made sense at
the time, as there was no tail-folding support. Much later, 2025e09
([LV] Make sure VF doesn't exceed compile time known TC) came along to
patch this with an extra condition on FoldTailByMasking, in order to
ensure that that the VF doesn't exceed the trip-count. However, it
didn't go far enough, and we can do better, as there is existing code to
clamp the trip-count, and do tail-folding.

Fixes llvm#82626.
artagnon added a commit to artagnon/llvm-project that referenced this issue May 23, 2024
9a087a3 (LoopVectorize: MaxVF should not be larger than the loop trip
count) was the first commit to add the condition PowerOf2_32() of the
trip-count to, what is now getMaximizedVFForTarget(). It made sense at
the time, as there was no tail-folding support. Much later, 2025e09
([LV] Make sure VF doesn't exceed compile time known TC) came along to
patch this with an extra condition on FoldTailByMasking, in order to
ensure that that the VF doesn't exceed the trip-count. However, it
didn't go far enough, and we can do better, as there is existing code to
clamp the trip-count, and do tail-folding.

Fixes llvm#82626.
artagnon added a commit to artagnon/llvm-project that referenced this issue May 28, 2024
9a087a3 (LoopVectorize: MaxVF should not be larger than the loop trip
count) was the first commit to add the condition PowerOf2_32() of the
trip-count to, what is now getMaximizedVFForTarget(). It made sense at
the time, as there was no tail-folding support. Much later, 2025e09
([LV] Make sure VF doesn't exceed compile time known TC) came along to
patch this with an extra condition on FoldTailByMasking, in order to
ensure that that the VF doesn't exceed the trip-count. However, it
didn't go far enough, and we can do better, as there is existing code to
clamp the trip-count, and do tail-folding.

Fixes llvm#82626.
artagnon added a commit to artagnon/llvm-project that referenced this issue May 28, 2024
9a087a3 (LoopVectorize: MaxVF should not be larger than the loop trip
count) was the first commit to add the condition PowerOf2_32() of the
trip-count to, what is now getMaximizedVFForTarget(). It made sense at
the time, as there was no tail-folding support. Much later, 2025e09
([LV] Make sure VF doesn't exceed compile time known TC) came along to
patch this with an extra condition on FoldTailByMasking, in order to
ensure that that the VF doesn't exceed the trip-count. However, it
didn't go far enough, and we can do better, as there is existing code to
clamp the trip-count, and do tail-folding.

Fixes llvm#82626.
artagnon added a commit to artagnon/llvm-project that referenced this issue Jun 4, 2024
9a087a3 (LoopVectorize: MaxVF should not be larger than the loop trip
count) was the first commit to add the condition PowerOf2_32() of the
trip-count to, what is now getMaximizedVFForTarget(). It made sense at
the time, as there was no tail-folding support. Much later, 2025e09
([LV] Make sure VF doesn't exceed compile time known TC) came along to
patch this with an extra condition on FoldTailByMasking, in order to
ensure that that the VF doesn't exceed the trip-count. However, it
didn't go far enough, and we can do better, as there is existing code to
clamp the trip-count, and do tail-folding.

Fixes llvm#82626.
artagnon added a commit to artagnon/llvm-project that referenced this issue Jul 22, 2024
9a087a3 (LoopVectorize: MaxVF should not be larger than the loop trip
count) was the first commit to add the condition PowerOf2_32() of the
trip-count to, what is now getMaximizedVFForTarget(). It made sense at
the time, as there was no tail-folding support. Much later, 2025e09
([LV] Make sure VF doesn't exceed compile time known TC) came along to
patch this with an extra condition on FoldTailByMasking, in order to
ensure that that the VF doesn't exceed the trip-count. However, it
didn't go far enough, and we can do better, as there is existing code to
clamp the trip-count, and do tail-folding.

Fixes llvm#82626.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant