[LV] Use SCEV to check if minimum iteration check is known. #111310

fhahn · 2024-10-06T20:16:24Z

Use SCEV to check if the minimum iteration check (TC < Step) is known to be false.

This is a first step towards addressing
#111098. To catch the exact case from the issue, we need to do extra work to make sure the wrap flags on the shl are preserved and used by SCEV.

Note that skeleton creation will be gradually moved to VPlan and this simplification should be done as VPlan transform eventually. The current plan is to move skeleton creation to VPlan starting from parts closest to the parts already created by VPlan, starting with induction resume value creation (started with #110577), then memory and SCEV checks and finally minimum iteration checks.

Use SCEV to check if the minimum iteration check (TC < Step) is known to be false. This is a first step towards addressing llvm#111098. To catch the exact case from the issue, we need to do extra work to make sure the wrap flags on the shl are preserved and used by SCEV.

llvmbot · 2024-10-06T20:17:01Z

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Use SCEV to check if the minimum iteration check (TC < Step) is known to be false.

This is a first step towards addressing
#111098. To catch the exact case from the issue, we need to do extra work to make sure the wrap flags on the shl are preserved and used by SCEV.

Note that skeleton creation will be gradually moved to VPlan and this simplification should be done as VPlan transform eventually. The current plan is to move skeleton creation to VPlan starting from parts closest to the parts already created by VPlan, starting with induction resume value creation (started with #110577), then memory and SCEV checks and finally minimum iteration checks.

Full diff: https://github.com/llvm/llvm-project/pull/111310.diff

8 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+15-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll (+9-35)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/pr60831-sve-inv-store-crash.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll (+1-4)
(modified) llvm/test/Transforms/LoopVectorize/if-reduction.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll (+1-2)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 35c042b3ab7fc5..c349fa65343c4d 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -2438,12 +2438,21 @@ void InnerLoopVectorizer::emitIterationCountCheck(BasicBlock *Bypass) {
   };
 
   TailFoldingStyle Style = Cost->getTailFoldingStyle();
-  if (Style == TailFoldingStyle::None)
-    CheckMinIters =
-        Builder.CreateICmp(P, Count, CreateStep(), "min.iters.check");
-  else if (VF.isScalable() &&
-           !isIndvarOverflowCheckKnownFalse(Cost, VF, UF) &&
-           Style != TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck) {
+  if (Style == TailFoldingStyle::None) {
+    Value *Step = CreateStep();
+    ScalarEvolution &SE = *PSE.getSE();
+    // Check if we can prove that the trip count is >= the step.
+    const SCEV *TripCountSCEV = SE.getTripCountFromExitCount(
+        PSE.getBackedgeTakenCount(), CountTy, OrigLoop);
+    if (SE.isKnownPredicate(CmpInst::getInversePredicate(P),
+                            SE.applyLoopGuards(TripCountSCEV, OrigLoop),
+                            SE.getSCEV(Step)))
+      CheckMinIters = Builder.getFalse();
+    else
+      CheckMinIters = Builder.CreateICmp(P, Count, Step, "min.iters.check");
+  } else if (VF.isScalable() &&
+             !isIndvarOverflowCheckKnownFalse(Cost, VF, UF) &&
+             Style != TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck) {
     // vscale is not necessarily a power-of-2, which means we cannot guarantee
     // an overflow to zero when updating induction variables and so an
     // additional overflow check is required before entering the vector loop.
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll b/llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll
index 8c50d86489c9dd..7dcab6d807cf72 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll
@@ -11,8 +11,7 @@ define void @f1(ptr %A) #0 {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 4
-; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP1]]
-; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP3:%.*]] = mul i64 [[TMP2]], 4
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll b/llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll
index 93034f4dbe56ec..5496eed16e5443 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll
@@ -11,10 +11,7 @@ target triple = "aarch64-unknown-linux-gnu"
 define void @test_widen(ptr noalias %a, ptr readnone %b) #4 {
 ; TFNONE-LABEL: @test_widen(
 ; TFNONE-NEXT:  entry:
-; TFNONE-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-; TFNONE-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 2
-; TFNONE-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1025, [[TMP1]]
-; TFNONE-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; TFNONE-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; TFNONE:       vector.ph:
 ; TFNONE-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
 ; TFNONE-NEXT:    [[TMP3:%.*]] = mul i64 [[TMP2]], 2
@@ -146,10 +143,7 @@ for.cond.cleanup:
 define void @test_if_then(ptr noalias %a, ptr readnone %b) #4 {
 ; TFNONE-LABEL: @test_if_then(
 ; TFNONE-NEXT:  entry:
-; TFNONE-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-; TFNONE-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 2
-; TFNONE-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1025, [[TMP1]]
-; TFNONE-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; TFNONE-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; TFNONE:       vector.ph:
 ; TFNONE-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
 ; TFNONE-NEXT:    [[TMP3:%.*]] = mul i64 [[TMP2]], 2
@@ -310,10 +304,7 @@ for.cond.cleanup:
 define void @test_widen_if_then_else(ptr noalias %a, ptr readnone %b) #4 {
 ; TFNONE-LABEL: @test_widen_if_then_else(
 ; TFNONE-NEXT:  entry:
-; TFNONE-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-; TFNONE-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 2
-; TFNONE-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1025, [[TMP1]]
-; TFNONE-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; TFNONE-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; TFNONE:       vector.ph:
 ; TFNONE-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
 ; TFNONE-NEXT:    [[TMP3:%.*]] = mul i64 [[TMP2]], 2
@@ -490,10 +481,7 @@ for.cond.cleanup:
 define void @test_widen_nomask(ptr noalias %a, ptr readnone %b) #4 {
 ; TFNONE-LABEL: @test_widen_nomask(
 ; TFNONE-NEXT:  entry:
-; TFNONE-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-; TFNONE-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 2
-; TFNONE-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1025, [[TMP1]]
-; TFNONE-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; TFNONE-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; TFNONE:       vector.ph:
 ; TFNONE-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
 ; TFNONE-NEXT:    [[TMP3:%.*]] = mul i64 [[TMP2]], 2
@@ -548,11 +536,6 @@ define void @test_widen_nomask(ptr noalias %a, ptr readnone %b) #4 {
 ;
 ; TFFALLBACK-LABEL: @test_widen_nomask(
 ; TFFALLBACK-NEXT:  entry:
-; TFFALLBACK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-; TFFALLBACK-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 2
-; TFFALLBACK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1025, [[TMP1]]
-; TFFALLBACK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
-; TFFALLBACK:       vector.ph:
 ; TFFALLBACK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
 ; TFFALLBACK-NEXT:    [[TMP3:%.*]] = mul i64 [[TMP2]], 2
 ; TFFALLBACK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 1025, [[TMP3]]
@@ -561,7 +544,7 @@ define void @test_widen_nomask(ptr noalias %a, ptr readnone %b) #4 {
 ; TFFALLBACK-NEXT:    [[TMP5:%.*]] = mul i64 [[TMP4]], 2
 ; TFFALLBACK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; TFFALLBACK:       vector.body:
-; TFFALLBACK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
+; TFFALLBACK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
 ; TFFALLBACK-NEXT:    [[TMP6:%.*]] = getelementptr i64, ptr [[B:%.*]], i64 [[INDEX]]
 ; TFFALLBACK-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 2 x i64>, ptr [[TMP6]], align 8
 ; TFFALLBACK-NEXT:    [[TMP7:%.*]] = call <vscale x 2 x i64> @foo_vector_nomask(<vscale x 2 x i64> [[WIDE_LOAD]])
@@ -569,12 +552,9 @@ define void @test_widen_nomask(ptr noalias %a, ptr readnone %b) #4 {
 ; TFFALLBACK-NEXT:    store <vscale x 2 x i64> [[TMP7]], ptr [[TMP8]], align 8
 ; TFFALLBACK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP5]]
 ; TFFALLBACK-NEXT:    [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; TFFALLBACK-NEXT:    br i1 [[TMP9]], label [[SCALAR_PH]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
-; TFFALLBACK:       scalar.ph:
-; TFFALLBACK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[N_VEC]], [[VECTOR_BODY]] ]
-; TFFALLBACK-NEXT:    br label [[FOR_BODY:%.*]]
+; TFFALLBACK-NEXT:    br i1 [[TMP9]], label [[FOR_BODY:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
 ; TFFALLBACK:       for.body:
-; TFFALLBACK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
+; TFFALLBACK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ], [ [[N_VEC]], [[VECTOR_BODY]] ]
 ; TFFALLBACK-NEXT:    [[GEP:%.*]] = getelementptr i64, ptr [[B]], i64 [[INDVARS_IV]]
 ; TFFALLBACK-NEXT:    [[LOAD:%.*]] = load i64, ptr [[GEP]], align 8
 ; TFFALLBACK-NEXT:    [[CALL:%.*]] = call i64 @foo(i64 [[LOAD]]) #[[ATTR5:[0-9]+]]
@@ -626,10 +606,7 @@ for.cond.cleanup:
 define void @test_widen_optmask(ptr noalias %a, ptr readnone %b) #4 {
 ; TFNONE-LABEL: @test_widen_optmask(
 ; TFNONE-NEXT:  entry:
-; TFNONE-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-; TFNONE-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 2
-; TFNONE-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1025, [[TMP1]]
-; TFNONE-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; TFNONE-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; TFNONE:       vector.ph:
 ; TFNONE-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
 ; TFNONE-NEXT:    [[TMP3:%.*]] = mul i64 [[TMP2]], 2
@@ -791,10 +768,7 @@ for.cond.cleanup:
 define double @test_widen_fmuladd_and_call(ptr noalias %a, ptr readnone %b, double %m) #4 {
 ; TFNONE-LABEL: @test_widen_fmuladd_and_call(
 ; TFNONE-NEXT:  entry:
-; TFNONE-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-; TFNONE-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 2
-; TFNONE-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1025, [[TMP1]]
-; TFNONE-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; TFNONE-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; TFNONE:       vector.ph:
 ; TFNONE-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
 ; TFNONE-NEXT:    [[TMP3:%.*]] = mul i64 [[TMP2]], 2
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/pr60831-sve-inv-store-crash.ll b/llvm/test/Transforms/LoopVectorize/AArch64/pr60831-sve-inv-store-crash.ll
index 0e95d742092e65..d18cdc1ae617a4 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/pr60831-sve-inv-store-crash.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/pr60831-sve-inv-store-crash.ll
@@ -10,8 +10,7 @@ define void @test_invar_gep(ptr %dst) #0 {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 4
-; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 100, [[TMP1]]
-; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP3:%.*]] = mul i64 [[TMP2]], 4
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll
index 94b90aa3cfb308..1d150141e6251e 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll
@@ -757,8 +757,7 @@ define void @simple_memset_trip1024(i32 %val, ptr %ptr, i64 %n) #0 {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 4
-; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP1]]
-; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
 ; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP3:%.*]] = mul i64 [[TMP2]], 4
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll b/llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll
index 4a2f9d07ed91c6..4a3bc4679bba49 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll
@@ -7,10 +7,7 @@ target triple = "aarch64-unknown-linux-gnu"
 define void @test_widen(ptr noalias %a, ptr readnone %b) #1 {
 ; WIDE-LABEL: @test_widen(
 ; WIDE-NEXT:  entry:
-; WIDE-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-; WIDE-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 4
-; WIDE-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1025, [[TMP1]]
-; WIDE-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; WIDE-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
 ; WIDE:       vector.ph:
 ; WIDE-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
 ; WIDE-NEXT:    [[TMP3:%.*]] = mul i64 [[TMP2]], 4
diff --git a/llvm/test/Transforms/LoopVectorize/if-reduction.ll b/llvm/test/Transforms/LoopVectorize/if-reduction.ll
index 383b62b368ef0f..5f6824a022d56d 100644
--- a/llvm/test/Transforms/LoopVectorize/if-reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/if-reduction.ll
@@ -1668,8 +1668,7 @@ define i32 @fcmp_0_sub_select1(ptr noalias %x, i32 %N) nounwind readonly {
 ; CHECK:       [[FOR_HEADER]]:
 ; CHECK-NEXT:    [[ZEXT:%.*]] = zext i32 [[N]] to i64
 ; CHECK-NEXT:    [[TMP0:%.*]] = sub i64 0, [[ZEXT]]
-; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP0]], 4
-; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
 ; CHECK:       [[VECTOR_PH]]:
 ; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], 4
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
diff --git a/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll b/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
index b3ec3e8f0f3c63..a85242874410a2 100644
--- a/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
+++ b/llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll
@@ -423,8 +423,7 @@ define void @zext_of_i1_stride(i1 %g, ptr %dst) mustprogress {
 ; CHECK-NEXT:    [[G_64:%.*]] = zext i1 [[G]] to i64
 ; CHECK-NEXT:    [[TMP0:%.*]] = udiv i64 15, [[G_64]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = add nuw nsw i64 [[TMP0]], 1
-; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP1]], 4
-; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_SCEVCHECK:%.*]]
+; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_SCEVCHECK:%.*]]
 ; CHECK:       vector.scevcheck:
 ; CHECK-NEXT:    [[IDENT_CHECK:%.*]] = icmp ne i1 [[G]], true
 ; CHECK-NEXT:    br i1 [[IDENT_CHECK]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]

ayalz · 2024-10-09T13:14:58Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    const SCEV *TripCountSCEV = SE.getTripCountFromExitCount(
+        PSE.getBackedgeTakenCount(), CountTy, OrigLoop);


Could this be simplified into (P)SE.getSCEV(Count)?

Updated, thanks! Initially thought this may produce worse results, but at least for the existing tests the results don't get pessimized.

ayalz · 2024-10-09T14:36:05Z

llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll

@@ -11,8 +11,7 @@ define void @f1(ptr %A) #0 {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 4
-; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP1]]
-; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]


Step of 4 * vscale is known to be smaller than count of 1024, based on vscale_range(1,16) attribute?

ayalz · 2024-10-09T14:38:03Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    if (SE.isKnownPredicate(CmpInst::getInversePredicate(P),
+                            SE.applyLoopGuards(TripCountSCEV, OrigLoop),
+                            SE.getSCEV(Step)))
+      CheckMinIters = Builder.getFalse();


Worth leaving a TODO to simplify skeleton by emitting an unconditional branch to vector preheader, instead of the conditional branch on false below?

Done, thanks!

ayalz · 2024-10-09T14:42:00Z

llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll

-; TFNONE-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 2
-; TFNONE-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1025, [[TMP1]]
-; TFNONE-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; TFNONE-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]


ditto with vscale_range(2,16)

ayalz · 2024-10-09T14:51:01Z

llvm/test/Transforms/LoopVectorize/if-reduction.ll

@@ -1668,8 +1668,7 @@ define i32 @fcmp_0_sub_select1(ptr noalias %x, i32 %N) nounwind readonly {
 ; CHECK:       [[FOR_HEADER]]:
 ; CHECK-NEXT:    [[ZEXT:%.*]] = zext i32 [[N]] to i64
 ; CHECK-NEXT:    [[TMP0:%.*]] = sub i64 0, [[ZEXT]]
-; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP0]], 4
-; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]


Somewhat confusing (min) iter check here, bumping %indvars.iv.next = sub nuw nsw i64 %indvars.iv, 1 repeatedly starting with %indvars.iv set to zero?

Agreed, might be worth fixing independently.

The simplification is fine for the input I think, BTC is (-1 + (-1 * (zext i32 %N to i64))<nsw>)<nsw>, trip count with info from the dominating loop guard is (-1 * (zext i32 (1 smax %N) to i64))<nsw> which should be u>= 4. https://llvm.godbolt.org/z/1EMWbGb81

Sure, worth fixing test independently, before or after. Subtracting 1 from 0 on first iteration, and implicitly casting the above negative BTC and trip count to unsigned, defy the claimed nuw.

Worth leaving behind a FIXME note.

ayalz · 2024-10-09T14:57:57Z

llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll

@@ -423,8 +423,7 @@ define void @zext_of_i1_stride(i1 %g, ptr %dst) mustprogress {
 ; CHECK-NEXT:    [[G_64:%.*]] = zext i1 [[G]] to i64
 ; CHECK-NEXT:    [[TMP0:%.*]] = udiv i64 15, [[G_64]]


Better divide 15 by G_64 after scevcheck'ing below that G is 1 (not 0), than before?

Yep would probably be better for this particular check. There are other SCEV checks that are much more expensive (like wrapping checks), so we would probably need to distinguish between them.

Sure, guards should be ordered according to cost and frequency, but in this case a potential division by zero is introduced, unguarded.

Worth leaving behind a FIXME note.

ayalz · 2024-10-09T15:02:33Z

llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll

@@ -423,8 +423,7 @@ define void @zext_of_i1_stride(i1 %g, ptr %dst) mustprogress {
 ; CHECK-NEXT:    [[G_64:%.*]] = zext i1 [[G]] to i64
 ; CHECK-NEXT:    [[TMP0:%.*]] = udiv i64 15, [[G_64]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = add nuw nsw i64 [[TMP0]], 1
-; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP1]], 4
-; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_SCEVCHECK:%.*]]
+; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_SCEVCHECK:%.*]]


Count of 16 (assuming G = 1) is known to be greater than step of 4.

Yes, the step of 4 is used here, based on the versioned G

preames · 2024-10-09T17:42:01Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    ScalarEvolution &SE = *PSE.getSE();
+    // Check if we can prove that the trip count is >= the step.
+    const SCEV *TripCountSCEV = SE.getSCEV(Count);
+    if (SE.isKnownPredicate(CmpInst::getInversePredicate(P),


The other direction of this check is also interesting - when the entry guard is known to hold, and thus the bypass around the loop is never taken.

The complementary case when Count is known to be smaller than Step is presumably avoided when setting MaxVF (and UF), perhaps worth asserting?

There is one case where this improves, in case where we version the stride in LoopAccessAnalysis. Those predicates won't be used when retrieving the max trip count from SCEV, but triggers here.

Test case is in llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll

preames · 2024-10-09T17:44:46Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      CheckMinIters = Builder.CreateICmp(P, Count, Step, "min.iters.check");
+  } else if (VF.isScalable() &&
+             !isIndvarOverflowCheckKnownFalse(Cost, VF, UF) &&
+             Style != TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck) {


The optimization you're doing here applies to the check in this if-block as well. Maybe factor out an getOptimizedCompare lambda or something?

The runtime comparison introduced below checks for overflow, in case the overflow check is not known (to be false) at compile time. Perhaps worth asserting that this predicate is indeed unknown to SCEV.

Added an assert, thanks!

ayalz · 2024-10-10T10:01:08Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    if (SE.isKnownPredicate(CmpInst::getInversePredicate(P),
+                            SE.applyLoopGuards(TripCountSCEV, OrigLoop),
+                            SE.getSCEV(Step)))
+      CheckMinIters = Builder.getFalse();


This is redundant - CheckMinIters is initialized to false above, serving tail-folding case.

dropped, thanks!

ayalz · 2024-10-10T10:03:20Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    ScalarEvolution &SE = *PSE.getSE();
+    // Check if we can prove that the trip count is >= the step.
+    const SCEV *TripCountSCEV = SE.getSCEV(Count);
+    if (SE.isKnownPredicate(CmpInst::getInversePredicate(P),


The complementary case when Count is known to be smaller than Step is presumably avoided when setting MaxVF (and UF), perhaps worth asserting?

ayalz · 2024-10-10T10:07:48Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      CheckMinIters = Builder.CreateICmp(P, Count, Step, "min.iters.check");
+  } else if (VF.isScalable() &&
+             !isIndvarOverflowCheckKnownFalse(Cost, VF, UF) &&
+             Style != TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck) {


The runtime comparison introduced below checks for overflow, in case the overflow check is not known (to be false) at compile time. Perhaps worth asserting that this predicate is indeed unknown to SCEV.

ayalz · 2024-10-11T10:56:58Z

llvm/test/Transforms/LoopVectorize/if-reduction.ll

@@ -1668,8 +1668,7 @@ define i32 @fcmp_0_sub_select1(ptr noalias %x, i32 %N) nounwind readonly {
 ; CHECK:       [[FOR_HEADER]]:
 ; CHECK-NEXT:    [[ZEXT:%.*]] = zext i32 [[N]] to i64
 ; CHECK-NEXT:    [[TMP0:%.*]] = sub i64 0, [[ZEXT]]
-; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP0]], 4
-; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]


Sure, worth fixing test independently, before or after. Subtracting 1 from 0 on first iteration, and implicitly casting the above negative BTC and trip count to unsigned, defy the claimed nuw.

ayalz · 2024-10-11T10:59:18Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    ScalarEvolution &SE = *PSE.getSE();
+    // TODO: Emit unconditional branch to vector preheader instead of
+    // conditional branch with known condition.
+    const SCEV *TripCountSCEV = SE.getSCEV(Count);


nit: can also apply loop guards to TripCountSCEV here.

Done, thanks!

ayalz · 2024-10-11T11:04:24Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      // TODO: Should not attempt to vectorize when the vector loop is known to
+      // never execute.


Suggested change

// TODO: Should not attempt to vectorize when the vector loop is known to

// never execute.

// TODO: Ensure step is at most the trip count when determining max VF and UF, w/o tail folding.

?

Updated, thanks!

ayalz · 2024-10-11T11:05:06Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    } else if (!SE.isKnownPredicate(CmpInst::getInversePredicate(P),
+                                    SE.applyLoopGuards(TripCountSCEV, OrigLoop),
+                                    SE.getSCEV(Step))) {
+      // Only generate the minimum iteration check only if we cannot prove the


Suggested change

// Only generate the minimum iteration check only if we cannot prove the

// Generate the minimum iteration check only if we cannot prove the

Updated, thanks!

ayalz · 2024-10-11T11:08:10Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+                                    SE.applyLoopGuards(TripCountSCEV, OrigLoop),
+                                    SE.getSCEV(Step))) {
+      // Only generate the minimum iteration check only if we cannot prove the
+      // check is known to be false.


Suggested change

// check is known to be false.

// check is known to be true, or known to be false.

Updated, thanks!

ayalz · 2024-10-11T11:10:15Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      // Only generate the minimum iteration check only if we cannot prove the
+      // check is known to be false.
+      CheckMinIters = Builder.CreateICmp(P, Count, Step, "min.iters.check");
+    }


Suggested change

}

}

// else step is known to be smaller than trip count, use CheckMinIters preset to false.

added, thanks!

ayalz · 2024-10-11T11:15:16Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    // Check if we can prove that the trip count is >= the step.
+    // TODO: Emit unconditional branch to vector preheader instead of
+    // conditional branch with known condition.
+    const SCEV *TripCountSCEV = SE.getSCEV(LHS);


nit: can also apply loop guards to TripCountSCEV here.

Done, thanks!

ayalz · 2024-10-11T11:21:34Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    // TODO: Emit unconditional branch to vector preheader instead of
+    // conditional branch with known condition.


No known condition is used below.

Dropped, thanks!

ayalz · 2024-10-11T11:21:50Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -2455,8 +2471,17 @@ void InnerLoopVectorizer::emitIterationCountCheck(BasicBlock *Bypass) {
        ConstantInt::get(CountTy, cast<IntegerType>(CountTy)->getMask());
    Value *LHS = Builder.CreateSub(MaxUIntTripCount, Count);

+    Value *Step = CreateStep();
+    ScalarEvolution &SE = *PSE.getSE();
+    // Check if we can prove that the trip count is >= the step.


The condition below checks if the trip count is too close to UMax - such that bumping it by step overflows, rather than checking if trip count can be proven to be >= step. Another TODO?

TODO to clarify in the naming?

ayalz · 2024-10-11T11:26:08Z

llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll

@@ -423,8 +423,7 @@ define void @zext_of_i1_stride(i1 %g, ptr %dst) mustprogress {
 ; CHECK-NEXT:    [[G_64:%.*]] = zext i1 [[G]] to i64
 ; CHECK-NEXT:    [[TMP0:%.*]] = udiv i64 15, [[G_64]]


Sure, guards should be ordered according to cost and frequency, but in this case a potential division by zero is introduced, unguarded.

david-arm · 2024-10-15T13:26:45Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    Value *Step = CreateStep();
+    ScalarEvolution &SE = *PSE.getSE();
+    // Check if we can prove that the trip count is >= the step.
+    const SCEV *TripCountSCEV = SE.applyLoopGuards(SE.getSCEV(LHS), OrigLoop);


Does this have any side-effects, i.e. change the existing IR in any way? If not, everything from lines 2475-2481 are entirely related to the assert. Perhaps wrap it in a #ifndef NDEBUG?

It shouldn't have any side-effects (it may lead to a SCEV being cached for LHS, but that shouldn't impact anything unless there's a bug in SCEV invalidation).

Wrapped everything related in #ifndef NDEBUG, thanks!

david-arm

LGTM! Maybe worth waiting a day or so in case others have any further comments?

ayalz

+1, with several last comments.

ayalz · 2024-10-16T18:42:24Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    } else if (!SE.isKnownPredicate(CmpInst::getInversePredicate(P),
+                                    TripCountSCEV, SE.getSCEV(Step))) {
+      // Generate the minimum iteration check only if we cannot prove the
+      // check is known to be true, or known to be false


Suggested change

// check is known to be true, or known to be false

// check is known to be true, or known to be false.

Fixed, thanks!

ayalz · 2024-10-16T18:47:47Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+
+      // else step is known to be smaller than trip count, use CheckMinIters
+      // preset to false.
+    }


Suggested change

// else step is known to be smaller than trip count, use CheckMinIters

// preset to false.

}

} // else step known to be < trip count, use CheckMinIters preset to false.

nit: else belongs more accurately to the remaining, "otherwise" case following the if-elseif.

adjusted, thanks!

ayalz · 2024-10-16T18:58:29Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    const SCEV *TripCountSCEV = SE.applyLoopGuards(SE.getSCEV(LHS), OrigLoop);
+    assert(
+        !SE.isKnownPredicate(CmpInst::getInversePredicate(ICmpInst::ICMP_ULT),
+                             TripCountSCEV, SE.getSCEV(Step)) &&


Worth also sanity checking "!isIndvarOverflowCheckKnownTrue", i.e., (UMax - n) is not known to be < (VF * UF)?

Added, thanks!

ayalz · 2024-10-16T19:01:25Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    Value *Step = CreateStep();
+#ifndef NDEBUG
+    ScalarEvolution &SE = *PSE.getSE();
+    const SCEV *TripCountSCEV = SE.applyLoopGuards(SE.getSCEV(LHS), OrigLoop);


Suggested change

const SCEV *TripCountSCEV = SE.applyLoopGuards(SE.getSCEV(LHS), OrigLoop);

const SCEV *TC2OverflowSCEV = SE.applyLoopGuards(SE.getSCEV(LHS), OrigLoop);

or something like that, to denote SCEV of "LHS" (UMax - count), distance of Trip Count to overflow, rather than that of trip count itself.

Renamed, thanks!

ayalz · 2024-10-16T19:03:10Z

llvm/test/Transforms/LoopVectorize/if-reduction.ll

@@ -1668,8 +1668,7 @@ define i32 @fcmp_0_sub_select1(ptr noalias %x, i32 %N) nounwind readonly {
 ; CHECK:       [[FOR_HEADER]]:
 ; CHECK-NEXT:    [[ZEXT:%.*]] = zext i32 [[N]] to i64
 ; CHECK-NEXT:    [[TMP0:%.*]] = sub i64 0, [[ZEXT]]
-; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP0]], 4
-; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]


Worth leaving behind a FIXME note.

ayalz · 2024-10-16T19:03:32Z

llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll

@@ -423,8 +423,7 @@ define void @zext_of_i1_stride(i1 %g, ptr %dst) mustprogress {
 ; CHECK-NEXT:    [[G_64:%.*]] = zext i1 [[G]] to i64
 ; CHECK-NEXT:    [[TMP0:%.*]] = udiv i64 15, [[G_64]]


Worth leaving behind a FIXME note.

In llvm#111310 an assert was added that for the IV overflow check used with tail folding, the overflow check is never known. However when applying the loop guards, it looks like it's possible that we might actually know the trip count won't overflow: this occurs in 500.perlbench_r from SPEC CPU 2017 and triggers the assertion: Assertion failed: (!isIndvarOverflowCheckKnownFalse(Cost, VF * UF) && !SE.isKnownPredicate(CmpInst::getInversePredicate(ICmpInst::ICMP_ULT), TC2OverflowSCEV, SE.getSCEV(Step)) && "unexpectedly proved overflow check to be known"), function emitIterationCountCheck, file LoopVectorize.cpp, line 2501. This removes the assert and instead replaces the icmp if the overflow check is known, the same way as is done for the minimum iterations check.

In #111310 an assert was added that for the IV overflow check used with tail folding, the overflow check is never known. However when applying the loop guards, it looks like it's possible that we might actually know the IV won't overflow: this occurs in 500.perlbench_r from SPEC CPU 2017 and triggers the assertion: Assertion failed: (!isIndvarOverflowCheckKnownFalse(Cost, VF * UF) && !SE.isKnownPredicate(CmpInst::getInversePredicate(ICmpInst::ICMP_ULT), TC2OverflowSCEV, SE.getSCEV(Step)) && "unexpectedly proved overflow check to be known"), function emitIterationCountCheck, file LoopVectorize.cpp, line 2501. There is a discrepancy between `isIndvarOverflowCheckKnownFalse` and the ICMP_ULT check, because the former uses `getSmallConstantMaxTripCount` which only takes into trip counts that fit into 32 bits. There doesn't seem to be an easy way to make the assertion aware of this, so this PR just removes it for now. There are two potential follow up things from this PR: 1. We miss calculating the max trip count in `@trip_count_max_1024`, it looks like we might need to apply loop guards somewhere in `ScalarEvolution::computeExitLimitFromICmp` 2. In `@overflow_at_0`, if `%tc == 0` then we the overflow check will always return false, even though it will overflow Fixes #115755

fhahn requested review from preames, rengolin and ayalz October 6, 2024 20:16

llvmbot added vectorizers llvm:transforms labels Oct 6, 2024

fhahn mentioned this pull request Oct 6, 2024

Loop vectorizer may create redundant preheader and scalar loop #111098

Open

ayalz reviewed Oct 9, 2024

View reviewed changes

fhahn added 2 commits October 9, 2024 14:18

Merge remote-tracking branch 'origin/main' into lv-use-scev-for-min-iter

a2e66a5

!fixup retrieve trip count SCEV from Count value directly.

86ea8c3

ayalz reviewed Oct 9, 2024

View reviewed changes

Merge remote-tracking branch 'origin/main' into lv-use-scev-for-min-iter

41e31e9

preames reviewed Oct 9, 2024

View reviewed changes

!fixup address latest comments, thanks

8a89ff3

ayalz reviewed Oct 10, 2024

View reviewed changes

fhahn added 2 commits October 10, 2024 11:35

Merge remote-tracking branch 'origin/main' into lv-use-scev-for-min-iter

0030b1b

!fixup address latest comments, thanks!

5339f4d

ayalz reviewed Oct 11, 2024

View reviewed changes

fhahn added 2 commits October 13, 2024 21:13

Merge remote-tracking branch 'origin/main' into lv-use-scev-for-min-iter

31cee3e

!fixup addres comments, thanks!

cacf91b

david-arm reviewed Oct 15, 2024

View reviewed changes

fhahn added 2 commits October 16, 2024 08:14

Merge remote-tracking branch 'origin/main' into lv-use-scev-for-min-iter

aac81c6

!fixup wrap code only used in assertion in NDEBUG

5e455de

david-arm approved these changes Oct 16, 2024

View reviewed changes

ayalz approved these changes Oct 16, 2024

View reviewed changes

fhahn added 2 commits October 18, 2024 11:35

Merge remote-tracking branch 'origin/main' into lv-use-scev-for-min-iter

092c2d6

!fixup address latest comments, thanks!

660fe44

fhahn merged commit c7496ce into llvm:main Oct 18, 2024
6 of 7 checks passed

fhahn deleted the lv-use-scev-for-min-iter branch October 18, 2024 22:23

lukel97 mentioned this pull request Nov 11, 2024

[LV] Remove assertions in IV overflow check #115705

Merged

		const SCEV *TripCountSCEV = SE.getTripCountFromExitCount(
		PSE.getBackedgeTakenCount(), CountTy, OrigLoop);

		@@ -423,8 +423,7 @@ define void @zext_of_i1_stride(i1 %g, ptr %dst) mustprogress {
		; CHECK-NEXT: [[G_64:%.*]] = zext i1 [[G]] to i64
		; CHECK-NEXT: [[TMP0:%.*]] = udiv i64 15, [[G_64]]

		// TODO: Should not attempt to vectorize when the vector loop is known to
		// never execute.

	// TODO: Should not attempt to vectorize when the vector loop is known to
	// never execute.
	// TODO: Ensure step is at most the trip count when determining max VF and UF, w/o tail folding.

	// Only generate the minimum iteration check only if we cannot prove the
	// Generate the minimum iteration check only if we cannot prove the

	// check is known to be false.
	// check is known to be true, or known to be false.

	}
	}
	// else step is known to be smaller than trip count, use CheckMinIters preset to false.

		// TODO: Emit unconditional branch to vector preheader instead of
		// conditional branch with known condition.

	const SCEV *TripCountSCEV = SE.applyLoopGuards(SE.getSCEV(LHS), OrigLoop);
	const SCEV *TC2OverflowSCEV = SE.applyLoopGuards(SE.getSCEV(LHS), OrigLoop);

[LV] Use SCEV to check if minimum iteration check is known. #111310

[LV] Use SCEV to check if minimum iteration check is known. #111310

Uh oh!

Conversation

fhahn commented Oct 6, 2024

Uh oh!

llvmbot commented Oct 6, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!