[LV] Remove assertions in IV overflow check #115705

lukel97 · 2024-11-11T10:24:57Z

In #111310 an assert was added that for the IV overflow check used with tail folding, the overflow check is never known.

However when applying the loop guards, it looks like it's possible that we might actually know the IV won't overflow: this occurs in 500.perlbench_r from SPEC CPU 2017 and triggers the assertion:

Assertion failed: (!isIndvarOverflowCheckKnownFalse(Cost, VF * UF) && !SE.isKnownPredicate(CmpInst::getInversePredicate(ICmpInst::ICMP_ULT), TC2OverflowSCEV, SE.getSCEV(Step)) && "unexpectedly proved overflow check to be known"), function emitIterationCountCheck, file LoopVectorize.cpp, line 2501.

There is a discrepancy between isIndvarOverflowCheckKnownFalse and the ICMP_ULT check, because the former uses getSmallConstantMaxTripCount which only takes into trip counts that fit into 32 bits. There doesn't seem to be an easy way to make the assertion aware of this, so this PR just removes it for now.

There are two potential follow up things from this PR:

We miss calculating the max trip count in @trip_count_max_1024, it looks like we might need to apply loop guards somewhere in ScalarEvolution::computeExitLimitFromICmp
In @overflow_at_0, if %tc == 0 then we the overflow check will always return false, even though it will overflow

Fixes #115755

In llvm#111310 an assert was added that for the IV overflow check used with tail folding, the overflow check is never known. However when applying the loop guards, it looks like it's possible that we might actually know the trip count won't overflow: this occurs in 500.perlbench_r from SPEC CPU 2017 and triggers the assertion: Assertion failed: (!isIndvarOverflowCheckKnownFalse(Cost, VF * UF) && !SE.isKnownPredicate(CmpInst::getInversePredicate(ICmpInst::ICMP_ULT), TC2OverflowSCEV, SE.getSCEV(Step)) && "unexpectedly proved overflow check to be known"), function emitIterationCountCheck, file LoopVectorize.cpp, line 2501. This removes the assert and instead replaces the icmp if the overflow check is known, the same way as is done for the minimum iterations check.

llvmbot · 2024-11-11T10:25:32Z

@llvm/pr-subscribers-vectorizers

Author: Luke Lau (lukel97)

Changes

In #111310 an assert was added that for the IV overflow check used with tail folding, the overflow check is never known.

However when applying the loop guards, it looks like it's possible that we might actually know the IV won't overflow: this occurs in 500.perlbench_r from SPEC CPU 2017 and triggers the assertion:

Assertion failed: (!isIndvarOverflowCheckKnownFalse(Cost, VF * UF) && !SE.isKnownPredicate(CmpInst::getInversePredicate(ICmpInst::ICMP_ULT), TC2OverflowSCEV, SE.getSCEV(Step)) && "unexpectedly proved overflow check to be known"), function emitIterationCountCheck, file LoopVectorize.cpp, line 2501.

This removes the assert and instead replaces the icmp if the overflow check is known, the same way as is done for the minimum iterations check.

Full diff: https://github.com/llvm/llvm-project/pull/115705.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+12-9)
(added) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll (+85)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 1ebc62f9843905..5a55eb7a49f360 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -2491,17 +2491,20 @@ void InnerLoopVectorizer::emitIterationCountCheck(BasicBlock *Bypass) {
     Value *LHS = Builder.CreateSub(MaxUIntTripCount, Count);
 
     Value *Step = CreateStep();
-#ifndef NDEBUG
     ScalarEvolution &SE = *PSE.getSE();
     const SCEV *TC2OverflowSCEV = SE.applyLoopGuards(SE.getSCEV(LHS), OrigLoop);
-    assert(
-        !isIndvarOverflowCheckKnownFalse(Cost, VF * UF) &&
-        !SE.isKnownPredicate(CmpInst::getInversePredicate(ICmpInst::ICMP_ULT),
-                             TC2OverflowSCEV, SE.getSCEV(Step)) &&
-        "unexpectedly proved overflow check to be known");
-#endif
-    // Don't execute the vector loop if (UMax - n) < (VF * UF).
-    CheckMinIters = Builder.CreateICmp(ICmpInst::ICMP_ULT, LHS, Step);
+    const SCEV *StepSCEV = SE.getSCEV(Step);
+
+    // Check if the tc + step is >= maxuint.
+    if (SE.isKnownPredicate(ICmpInst::ICMP_ULT, TC2OverflowSCEV, StepSCEV)) {
+      CheckMinIters = Builder.getTrue();
+    } else if (!SE.isKnownPredicate(
+                   CmpInst::getInversePredicate(CmpInst::ICMP_ULT),
+                   TC2OverflowSCEV, StepSCEV)) {
+      // Generate the IV overflow check only if we cannot prove the IV won't
+      // overflow, or known to always overflow.
+      CheckMinIters = Builder.CreateICmp(ICmpInst::ICMP_ULT, LHS, Step);
+    } // else tc + step known < maxuint, use CheckMinIters preset to false
   }
 
   // Create new preheader for vector loop.
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll
new file mode 100644
index 00000000000000..3ddc6606f57e90
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll
@@ -0,0 +1,85 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes=loop-vectorize \
+; RUN: -force-tail-folding-style=data-with-evl \
+; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
+; RUN: -mtriple=riscv64 -mattr=+v -S < %s | FileCheck %s
+
+; If we know the IV will never overflow then we can skip the IV overflow check
+
+define void @f(ptr %p, i64 %tc) vscale_range(2, 1024) {
+; CHECK-LABEL: define void @f(
+; CHECK-SAME: ptr [[P:%.*]], i64 [[TC:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[GUARD:%.*]] = icmp ugt i64 [[TC]], 1024
+; CHECK-NEXT:    br i1 [[GUARD]], label %[[EXIT:.*]], label %[[LOOP_PREHEADER:.*]]
+; CHECK:       [[LOOP_PREHEADER]]:
+; CHECK-NEXT:    [[TMP0:%.*]] = sub i64 -1, [[TC]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP2:%.*]] = mul i64 [[TMP1]], 2
+; CHECK-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP4:%.*]] = mul i64 [[TMP3]], 2
+; CHECK-NEXT:    [[TMP5:%.*]] = sub i64 [[TMP4]], 1
+; CHECK-NEXT:    [[N_RND_UP:%.*]] = add i64 [[TC]], [[TMP5]]
+; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP4]]
+; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
+; CHECK-NEXT:    [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP7:%.*]] = mul i64 [[TMP6]], 2
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[EVL_BASED_IV:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_EVL_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[AVL:%.*]] = sub i64 [[TC]], [[EVL_BASED_IV]]
+; CHECK-NEXT:    [[TMP8:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[AVL]], i32 2, i1 true)
+; CHECK-NEXT:    [[TMP9:%.*]] = add i64 [[EVL_BASED_IV]], 0
+; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr i64, ptr [[P]], i64 [[TMP9]]
+; CHECK-NEXT:    [[TMP11:%.*]] = getelementptr i64, ptr [[TMP10]], i32 0
+; CHECK-NEXT:    [[VP_OP_LOAD:%.*]] = call <vscale x 2 x i64> @llvm.vp.load.nxv2i64.p0(ptr align 8 [[TMP11]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP8]])
+; CHECK-NEXT:    [[VP_OP:%.*]] = call <vscale x 2 x i64> @llvm.vp.add.nxv2i64(<vscale x 2 x i64> [[VP_OP_LOAD]], <vscale x 2 x i64> shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer), <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP8]])
+; CHECK-NEXT:    call void @llvm.vp.store.nxv2i64.p0(<vscale x 2 x i64> [[VP_OP]], ptr align 8 [[TMP11]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP8]])
+; CHECK-NEXT:    [[TMP12:%.*]] = zext i32 [[TMP8]] to i64
+; CHECK-NEXT:    [[INDEX_EVL_NEXT]] = add i64 [[TMP12]], [[EVL_BASED_IV]]
+; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP7]]
+; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    br i1 true, label %[[EXIT_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; CHECK:       [[SCALAR_PH]]:
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[LOOP_PREHEADER]] ]
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[I:%.*]] = phi i64 [ [[I_NEXT:%.*]], %[[LOOP]] ], [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr i64, ptr [[P]], i64 [[I]]
+; CHECK-NEXT:    [[X:%.*]] = load i64, ptr [[GEP]], align 8
+; CHECK-NEXT:    [[Y:%.*]] = add i64 [[X]], 1
+; CHECK-NEXT:    store i64 [[Y]], ptr [[GEP]], align 8
+; CHECK-NEXT:    [[I_NEXT]] = add i64 [[I]], 1
+; CHECK-NEXT:    [[DONE:%.*]] = icmp eq i64 [[I_NEXT]], [[TC]]
+; CHECK-NEXT:    br i1 [[DONE]], label %[[EXIT_LOOPEXIT]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK:       [[EXIT_LOOPEXIT]]:
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret void
+;
+entry:
+  %guard = icmp ugt i64 %tc, 1024
+  br i1 %guard, label %exit, label %loop
+loop:
+  %i = phi i64 [%i.next, %loop], [0, %entry]
+  %gep = getelementptr i64, ptr %p, i64 %i
+  %x = load i64, ptr %gep
+  %y = add i64 %x, 1
+  store i64 %y, ptr %gep
+  %i.next = add i64 %i, 1
+  %done = icmp eq i64 %i.next, %tc
+  br i1 %done, label %exit, label %loop
+exit:
+  ret void
+}
+;.
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
+;.

llvmbot · 2024-11-11T10:25:32Z

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

In #111310 an assert was added that for the IV overflow check used with tail folding, the overflow check is never known.

However when applying the loop guards, it looks like it's possible that we might actually know the IV won't overflow: this occurs in 500.perlbench_r from SPEC CPU 2017 and triggers the assertion:

Assertion failed: (!isIndvarOverflowCheckKnownFalse(Cost, VF * UF) && !SE.isKnownPredicate(CmpInst::getInversePredicate(ICmpInst::ICMP_ULT), TC2OverflowSCEV, SE.getSCEV(Step)) && "unexpectedly proved overflow check to be known"), function emitIterationCountCheck, file LoopVectorize.cpp, line 2501.

This removes the assert and instead replaces the icmp if the overflow check is known, the same way as is done for the minimum iterations check.

Full diff: https://github.com/llvm/llvm-project/pull/115705.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+12-9)
(added) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll (+85)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 1ebc62f9843905..5a55eb7a49f360 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -2491,17 +2491,20 @@ void InnerLoopVectorizer::emitIterationCountCheck(BasicBlock *Bypass) {
     Value *LHS = Builder.CreateSub(MaxUIntTripCount, Count);
 
     Value *Step = CreateStep();
-#ifndef NDEBUG
     ScalarEvolution &SE = *PSE.getSE();
     const SCEV *TC2OverflowSCEV = SE.applyLoopGuards(SE.getSCEV(LHS), OrigLoop);
-    assert(
-        !isIndvarOverflowCheckKnownFalse(Cost, VF * UF) &&
-        !SE.isKnownPredicate(CmpInst::getInversePredicate(ICmpInst::ICMP_ULT),
-                             TC2OverflowSCEV, SE.getSCEV(Step)) &&
-        "unexpectedly proved overflow check to be known");
-#endif
-    // Don't execute the vector loop if (UMax - n) < (VF * UF).
-    CheckMinIters = Builder.CreateICmp(ICmpInst::ICMP_ULT, LHS, Step);
+    const SCEV *StepSCEV = SE.getSCEV(Step);
+
+    // Check if the tc + step is >= maxuint.
+    if (SE.isKnownPredicate(ICmpInst::ICMP_ULT, TC2OverflowSCEV, StepSCEV)) {
+      CheckMinIters = Builder.getTrue();
+    } else if (!SE.isKnownPredicate(
+                   CmpInst::getInversePredicate(CmpInst::ICMP_ULT),
+                   TC2OverflowSCEV, StepSCEV)) {
+      // Generate the IV overflow check only if we cannot prove the IV won't
+      // overflow, or known to always overflow.
+      CheckMinIters = Builder.CreateICmp(ICmpInst::ICMP_ULT, LHS, Step);
+    } // else tc + step known < maxuint, use CheckMinIters preset to false
   }
 
   // Create new preheader for vector loop.
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll
new file mode 100644
index 00000000000000..3ddc6606f57e90
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll
@@ -0,0 +1,85 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes=loop-vectorize \
+; RUN: -force-tail-folding-style=data-with-evl \
+; RUN: -prefer-predicate-over-epilogue=predicate-dont-vectorize \
+; RUN: -mtriple=riscv64 -mattr=+v -S < %s | FileCheck %s
+
+; If we know the IV will never overflow then we can skip the IV overflow check
+
+define void @f(ptr %p, i64 %tc) vscale_range(2, 1024) {
+; CHECK-LABEL: define void @f(
+; CHECK-SAME: ptr [[P:%.*]], i64 [[TC:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[GUARD:%.*]] = icmp ugt i64 [[TC]], 1024
+; CHECK-NEXT:    br i1 [[GUARD]], label %[[EXIT:.*]], label %[[LOOP_PREHEADER:.*]]
+; CHECK:       [[LOOP_PREHEADER]]:
+; CHECK-NEXT:    [[TMP0:%.*]] = sub i64 -1, [[TC]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP2:%.*]] = mul i64 [[TMP1]], 2
+; CHECK-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP4:%.*]] = mul i64 [[TMP3]], 2
+; CHECK-NEXT:    [[TMP5:%.*]] = sub i64 [[TMP4]], 1
+; CHECK-NEXT:    [[N_RND_UP:%.*]] = add i64 [[TC]], [[TMP5]]
+; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP4]]
+; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
+; CHECK-NEXT:    [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP7:%.*]] = mul i64 [[TMP6]], 2
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[EVL_BASED_IV:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_EVL_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[AVL:%.*]] = sub i64 [[TC]], [[EVL_BASED_IV]]
+; CHECK-NEXT:    [[TMP8:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[AVL]], i32 2, i1 true)
+; CHECK-NEXT:    [[TMP9:%.*]] = add i64 [[EVL_BASED_IV]], 0
+; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr i64, ptr [[P]], i64 [[TMP9]]
+; CHECK-NEXT:    [[TMP11:%.*]] = getelementptr i64, ptr [[TMP10]], i32 0
+; CHECK-NEXT:    [[VP_OP_LOAD:%.*]] = call <vscale x 2 x i64> @llvm.vp.load.nxv2i64.p0(ptr align 8 [[TMP11]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP8]])
+; CHECK-NEXT:    [[VP_OP:%.*]] = call <vscale x 2 x i64> @llvm.vp.add.nxv2i64(<vscale x 2 x i64> [[VP_OP_LOAD]], <vscale x 2 x i64> shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer), <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP8]])
+; CHECK-NEXT:    call void @llvm.vp.store.nxv2i64.p0(<vscale x 2 x i64> [[VP_OP]], ptr align 8 [[TMP11]], <vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), i32 [[TMP8]])
+; CHECK-NEXT:    [[TMP12:%.*]] = zext i32 [[TMP8]] to i64
+; CHECK-NEXT:    [[INDEX_EVL_NEXT]] = add i64 [[TMP12]], [[EVL_BASED_IV]]
+; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP7]]
+; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    br i1 true, label %[[EXIT_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; CHECK:       [[SCALAR_PH]]:
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[LOOP_PREHEADER]] ]
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[I:%.*]] = phi i64 [ [[I_NEXT:%.*]], %[[LOOP]] ], [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr i64, ptr [[P]], i64 [[I]]
+; CHECK-NEXT:    [[X:%.*]] = load i64, ptr [[GEP]], align 8
+; CHECK-NEXT:    [[Y:%.*]] = add i64 [[X]], 1
+; CHECK-NEXT:    store i64 [[Y]], ptr [[GEP]], align 8
+; CHECK-NEXT:    [[I_NEXT]] = add i64 [[I]], 1
+; CHECK-NEXT:    [[DONE:%.*]] = icmp eq i64 [[I_NEXT]], [[TC]]
+; CHECK-NEXT:    br i1 [[DONE]], label %[[EXIT_LOOPEXIT]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK:       [[EXIT_LOOPEXIT]]:
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret void
+;
+entry:
+  %guard = icmp ugt i64 %tc, 1024
+  br i1 %guard, label %exit, label %loop
+loop:
+  %i = phi i64 [%i.next, %loop], [0, %entry]
+  %gep = getelementptr i64, ptr %p, i64 %i
+  %x = load i64, ptr %gep
+  %y = add i64 %x, 1
+  store i64 %y, ptr %gep
+  %i.next = add i64 %i, 1
+  %done = icmp eq i64 %i.next, %tc
+  br i1 %done, label %exit, label %loop
+exit:
+  ret void
+}
+;.
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
+;.

david-arm · 2024-11-11T12:06:16Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    const SCEV *StepSCEV = SE.getSCEV(Step);
+
+    // Check if (UMax - n) < (VF * UF).
+    if (SE.isKnownPredicate(ICmpInst::ICMP_ULT, TC2OverflowSCEV, StepSCEV)) {


The patch to add the NDEBUG code landed only recently - #111310 by @fhahn. It looks like this assert was added for a reason - is it possible that isIndvarOverflowCheckKnownFalse is returning the incorrect answer for the EVL case?

Yeah, this is the review comment that led to the assert being added: #111310 (comment) cc @ayalz

I'm not sure if that assert always holds though, but I could be missing something. isIndvarOverflowCheckKnownFalse doesn't have the information from SCEVs nor the loop guards, which seems to be where the discrepancy between it and the assert lies. We could maybe instead move the SCEV stuff into it?

Ah this is interesting and thanks for explaining and putting up a fix! So the test you've added shows isIndvarOverflowCheckKnownFalse is returning the wrong answer, or at least an answer that's inconsistent with the SE.isKnownPredicate call. Once we've applied the loop guards we know that the check is false. So it looks like there are a few choices:

In the assert above we could avoid applying the loop guards and then isIndvarOverflowCheckKnownFalse will be consistent with the call to SE.isKnownPredicate. That might be a much smaller change than this patch I think? Or,

Teach isIndvarOverflowCheckKnownFalse to apply loop guards, which I think would require changing ScalarEvolution::getSmallConstantMaxTripCount to apply loop guards. It looks like ScalarEvolution::getSmallConstantTripMultiple already does something similar. Or,

Use the current version of your patch.

The only problem with 3 is that the code is now more complex and it doesn't look like there are any tests for the CheckMinIters = Builder.getTrue(); case you've added.

I personally feel like 2 is the most powerful change because you've highlighted a problem in how we estimate the small constant max trip count. If the vectoriser is always going to apply loop guards anyway, then we should do the same when estimating the trip count. However, I realise it would expand the scope of this patch quite a bit and I perfectly understand if you don't have time to do this right now! What do you think?

This is an interesting case. For the loop in the test case, the induction can overflow if %tc == 0 I think, so isIndvarOverflowCheckKnownFalse should return false in general?

I think getSmallConstantMaxTripCount already should make use of loop guards in this case, but we cannot compute a tighter maximum because %tc may be 0, in which case we would wrap.

Thanks for the review and pointers! 2 seems good to me too, I'm happy to give it a stab.

This is an interesting case. For the loop in the test case, the induction can overflow if %tc == 0 I think, so isIndvarOverflowCheckKnownFalse should return false in general?

Oh woops, I think the exit condition in the test case should have been icmp uge.

But if we did emit the overflow check would it even have caught it at runtime? If %tc == 0 then I think (UMax - %tc) < (VF * UF) will always return false.

I took a look at teaching isIndvarOverflowCheckKnownFalse but as @fhahn said I think it's already doing the correct thing. getConstantMaxBackedgeTakenCount was returning 0xFFFFFFFFFFFFFFFF due to the overflow case at %tc == 0. But getConstantTripCount throws it away because it only considers trip counts that fit into 32 bits.

The SCEV assertion uses the full 64 bits of the TC type though, and in combination with what I think is a separate coincidental issue that (UMax - %tc) < (VF * UF) when %tc == 0, it means that the SCEV is known whilst isIndvarOverflowCheckKnownFalse is unknown.

I couldn't think of an easy way of updating the assertion to be in sync with isIndvarOverflowCheckKnownFalse, so I've reworked this PR to just remove the assert and nothing else for now.

I've added some new cases to show things that I think could be split off from this PR:

@overflow_at_0 is the case that originally crashed with the assertion, but it also shows how the overflow check seems to let %tc == 0 slip through, which maybe is something that needs fixed in a separate PR?

@no_overflow_at_0 shows a case where isIndvarOverflowCheckKnownFalse works and knows that the maximum trip count is 1025, so nothing needs to be done here

@trip_count_max_1024 shows a case where we currently can't calculate the max trip count, i.e. the debug output doesn't contain:LV: Found maximum trip count: N, and so isIndvarOverflowCheckKnownFalse also returns false. I think in a separate PR we can teach computeMaxBECountForLT to apply the loop guards, which seems to fix this.

david-arm · 2024-11-11T12:06:51Z

llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll

+
+; If we know the IV will never overflow then we can skip the IV overflow check
+
+define void @f(ptr %p, i64 %tc) vscale_range(2, 1024) {


I think it would be helpful if the test was added in a different commit so we can see the changes.

This crashes without the assertion removed unfortunately. I could split this into two patches to remove the assert first, and then a second one that shows the change in the overflow check?

It should be possible to prefix the run line with not --crash, the test would also need to require asserts.

Ah sorry, I was being silly and forgetting that the test crashed before! I was thinking it would be nice to see the effect of the code change, but that's not possible. In that case I personally don't need to see the test before because it won't be very interesting!

lukel97 · 2024-11-12T02:38:38Z

As a side note, on RISC-V vscale always has to be a power of 2 so I think we might always be able to elide the overflow check. But I'm not sure if we have any way of expressing that constraint on vscale currently.

…CEV. Add more tests that show how overflow can also occur at %tc = 0

fhahn

LGTM, IIUC this is genuine case where isIndvarOverflowCheckKnownFalse has to return false, but we can get better results after committing to a concrete tail-folding decisions.

Could you check if this fixes #115755 and add Fixes https://github.com/llvm/llvm-project/issues/115755 to the PR description?

fhahn · 2024-11-13T19:18:05Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-        !SE.isKnownPredicate(CmpInst::getInversePredicate(ICmpInst::ICMP_ULT),
-                             TC2OverflowSCEV, SE.getSCEV(Step)) &&
-        "unexpectedly proved overflow check to be known");
-#endif
    // Don't execute the vector loop if (UMax - n) < (VF * UF).
    CheckMinIters = Builder.CreateICmp(ICmpInst::ICMP_ULT, LHS, Step);


Suggested change

CheckMinIters = Builder.CreateICmp(ICmpInst::ICMP_ULT, LHS, Step);

CheckMinIters = Builder.CreateICmp(ICmpInst::ICMP_ULT, LHS, CreateStep());

lukel97 · 2024-11-14T06:50:10Z

Could you check if this fixes #115755 and add Fixes https://github.com/llvm/llvm-project/issues/115755 to the PR description?

It indeed fixes it, I've updated the PR description

This is a follow on from llvm#115705. Applying the loop guard allows us to calculate the maximum trip count in more places, which in turn allows isIndvarOverflowCheckKnownFalse to skip the overflow check.

lukel97 requested review from ayalz, david-arm, fhahn, preames and rengolin November 11, 2024 10:24

llvmbot added vectorizers llvm:transforms labels Nov 11, 2024

Update comments

4ba9c49

lukel97 force-pushed the allow-known-overflow-check branch from 3e4288e to 4ba9c49 Compare November 11, 2024 10:35

david-arm reviewed Nov 11, 2024

View reviewed changes

fhahn mentioned this pull request Nov 11, 2024

[RISCV][LV] LoopVectorize crashed with assertion failure when enabling EVL vectorization #115755

Closed

Rework to just remove the assert and not try to infer anything from S…

e3c2509

…CEV. Add more tests that show how overflow can also occur at %tc = 0

lukel97 changed the title ~~[LV] Use SCEV to check if IV overflow check is known~~ [LV] Remove assertions in IV overflow check Nov 12, 2024

fhahn approved these changes Nov 13, 2024

View reviewed changes

Use CreateStep

4fed4aa

lukel97 merged commit 050e2d3 into llvm:main Nov 14, 2024
8 checks passed

lukel97 mentioned this pull request Nov 14, 2024

[SCEV] Apply loop guards to End in computeMaxBECountForLT #116187

Closed

lukel97 mentioned this pull request Nov 21, 2024

[SCEV] Memoize collected loop guards. NFCI #116947

Closed


		; If we know the IV will never overflow then we can skip the IV overflow check

		define void @f(ptr %p, i64 %tc) vscale_range(2, 1024) {

	CheckMinIters = Builder.CreateICmp(ICmpInst::ICMP_ULT, LHS, Step);
	CheckMinIters = Builder.CreateICmp(ICmpInst::ICMP_ULT, LHS, CreateStep());

[LV] Remove assertions in IV overflow check #115705

[LV] Remove assertions in IV overflow check #115705

Uh oh!

Conversation

lukel97 commented Nov 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 11, 2024

Uh oh!

llvmbot commented Nov 11, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukel97 Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Nov 12, 2024

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Nov 14, 2024

Uh oh!

Uh oh!

Uh oh!

lukel97 commented Nov 11, 2024 •

edited

Loading

lukel97 Nov 12, 2024 •

edited

Loading