[VPlan] Add initial CFG simplification, removing BranchOnCond true. #106748

fhahn · 2024-08-30T15:32:59Z

Add an initial CFG simplification transform, which removes the dead edges for blocks terminated with BranchOnCond true.

At the moment, this removes the edge between middle block and scalar preheader when folding the tail.

Note: not all tests have been updated yet, but the current tests give a good idea of the change.

llvmbot · 2024-08-30T15:33:31Z

@llvm/pr-subscribers-backend-powerpc

@llvm/pr-subscribers-backend-systemz

Author: Florian Hahn (fhahn)

Changes

Add an initial CFG simplification transform, which removes the dead edges for blocks terminated with BranchOnCond true.

At the moment, this removes the edge between middle block and scalar preheader when folding the tail.

Note: not all tests have been updated yet, but the current tests give a good idea of the change.

Patch is 42.62 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/106748.diff

34 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+13-3)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+5)
(modified) llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h (+1)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+44)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence-fold-tail.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/pr73894.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/short-trip-count.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-interleave.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-iv32.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-masked-loadstore.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-ordered-reduction.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/force-target-instruction-cost.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/divs-with-tail-folding.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/fp64_to_uint32-cost-model.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-force-tail-with-evl.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-interleaved-accesses-gap.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/as_cast.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/dead_instructions.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-divisible-TC.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/memdep-fold-tail.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/pr51614-fold-tail-by-masking.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/select-reduction.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/store-reduction-results-in-tail-folded-loop.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/strict-fadd-interleave-only.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/tail-folding-alloca-in-loop.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/tail-folding-counting-down.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/tail-folding-switch.ll (+2-2)

diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index ac322638056879..70689da342ae0c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -1007,9 +1007,19 @@ void VPlan::execute(VPTransformState *State) {
          "middle block has unexpected successors");
   VPBasicBlock *ScalarPhVPBB = cast<VPBasicBlock>(
       MiddleSuccs.size() == 1 ? MiddleSuccs[0] : MiddleSuccs[1]);
-  assert(!isa<VPIRBasicBlock>(ScalarPhVPBB) &&
-         "scalar preheader cannot be wrapped already");
-  replaceVPBBWithIRVPBB(ScalarPhVPBB, ScalarPh);
+  if (!isa<VPIRBasicBlock>(ScalarPhVPBB)) {
+    assert(!isa<VPIRBasicBlock>(ScalarPhVPBB) &&
+           "scalar preheader cannot be wrapped already");
+    replaceVPBBWithIRVPBB(ScalarPhVPBB, ScalarPh);
+  } else {
+    // There is no edge to the scalar pre-header in VPlan. Phis in ScalarPh have
+    // been created during skeleton construction, so remove the incoming values
+    // from the middle block here.
+    // TODO: Remove this once phis in the scalar preheader are managed in VPlan
+    // directly.
+    for (auto &Phi : ScalarPh->phis())
+      Phi.removeIncomingValue(MiddleBB);
+  }
   replaceVPBBWithIRVPBB(MiddleVPBB, MiddleBB);
 
   // Disconnect the middle block from its single successor (the scalar loop
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index e7ea5cb23b90d3..9d6296be020aa3 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -3496,6 +3496,11 @@ class VPlan {
     return LiveOuts;
   }
 
+  void removeLiveOut(PHINode *PN) {
+    delete LiveOuts[PN];
+    LiveOuts.erase(PN);
+  }
+
   VPValue *getSCEVExpansion(const SCEV *S) const {
     return SCEVToExpansion.lookup(S);
   }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
index 5f86f2c969651b..0c81b3b9cd30fd 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
@@ -85,6 +85,7 @@ inline specific_intval<0> m_SpecificInt(uint64_t V) {
   return specific_intval<0>(APInt(64, V));
 }
 
+inline specific_intval<1> m_True() { return specific_intval<1>(APInt(64, 1)); }
 inline specific_intval<1> m_False() { return specific_intval<1>(APInt(64, 0)); }
 
 /// Matching combinators
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 9796ee64f6ef90..6ac972738eb86a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1119,6 +1119,49 @@ void VPlanTransforms::truncateToMinimalBitwidths(
          "some entries in MinBWs haven't been processed");
 }
 
+/// Remove BranchOnCond recipes with constant conditions together with removing
+/// dead edges to their successors. Remove blocks that become dead (no remaining
+/// predecessors())
+static void simplifyCFG(VPlan &Plan) {
+  using namespace llvm::VPlanPatternMatch;
+  SmallVector<VPBasicBlock *> WorkList;
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_deep(Plan.getEntry()))) {
+    VPRecipeBase *Term = VPBB->getTerminator();
+    if (!Term || !match(Term, m_BranchOnCond(m_True())))
+      continue;
+    WorkList.push_back(VPBB);
+  }
+
+  SetVector<VPBasicBlock *> PossiblyDeadBlocks;
+  for (VPBasicBlock *VPBB : WorkList) {
+    VPRecipeBase *Term = VPBB->getTerminator();
+    VPBasicBlock *DeadSucc = cast<VPBasicBlock>(VPBB->getSuccessors()[1]);
+    VPBlockUtils::disconnectBlocks(VPBB, DeadSucc);
+    PossiblyDeadBlocks.insert(DeadSucc);
+    Term->eraseFromParent();
+  }
+  for (VPBasicBlock *VPBB : PossiblyDeadBlocks) {
+    if (VPBB->getNumPredecessors() != 0)
+      continue;
+    // The block doesn't have any predecessors, remove it.
+    //
+    // To do so, first remove all recipes in the block. At the moment, recipes
+    // with users outside the block must be live-outs. Those are removed.
+    SmallVector<PHINode *> DeadLiveOuts;
+    for (VPRecipeBase &R : make_early_inc_range(reverse(*VPBB))) {
+      if (auto *V = dyn_cast<VPSingleDefRecipe>(&R)) {
+        for (VPUser *U : to_vector(V->users())) {
+          auto *LO = cast<VPLiveOut>(U);
+          Plan.removeLiveOut(LO->getPhi());
+        }
+      }
+      R.eraseFromParent();
+    }
+    delete VPBB;
+  }
+}
+
 void VPlanTransforms::optimize(VPlan &Plan, ScalarEvolution &SE) {
   removeRedundantCanonicalIVs(Plan);
   removeRedundantInductionCasts(Plan);
@@ -1131,6 +1174,7 @@ void VPlanTransforms::optimize(VPlan &Plan, ScalarEvolution &SE) {
 
   removeRedundantExpandSCEVRecipes(Plan);
   mergeBlocksIntoPredecessors(Plan);
+  simplifyCFG(Plan);
 }
 
 // Add a VPActiveLaneMaskPHIRecipe and related recipes to \p Plan and replace
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence-fold-tail.ll b/llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence-fold-tail.ll
index e9c9288e734394..1e9d0d1867ce46 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence-fold-tail.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence-fold-tail.ll
@@ -71,14 +71,13 @@ define i32 @test_phi_iterator_invalidation(ptr %A, ptr noalias %B) {
 ; CHECK-NEXT:    br i1 [[TMP31]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       middle.block:
 ; CHECK-NEXT:    [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[TMP24]], i32 3
-; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
 ; CHECK:       scalar.ph:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 1004, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
-; CHECK-NEXT:    [[SCALAR_RECUR_INIT:%.*]] = phi i16 [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[SCALAR_RECUR:%.*]] = phi i16 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[FOR_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[SCALAR_RECUR:%.*]] = phi i16 [ 0, [[SCALAR_PH]] ], [ [[FOR_NEXT:%.*]], [[LOOP]] ]
 ; CHECK-NEXT:    [[SEXT:%.*]] = sext i16 [[SCALAR_RECUR]] to i32
 ; CHECK-NEXT:    [[IV_NEXT]] = add i64 [[IV]], 1
 ; CHECK-NEXT:    [[GEP_A:%.*]] = getelementptr i32, ptr [[A]], i64 [[IV_NEXT]]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/pr73894.ll b/llvm/test/Transforms/LoopVectorize/AArch64/pr73894.ll
index a70eafb6078a03..0def9aabd4942c 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/pr73894.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/pr73894.ll
@@ -54,10 +54,10 @@ define i32 @pr70988(ptr %src, i32 %n) {
 ; CHECK-NEXT:    br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       middle.block:
 ; CHECK-NEXT:    [[RDX_MINMAX:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP17]], i32 [[TMP18]])
-; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
 ; CHECK:       scalar.ph:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
-; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi i32 [ [[RDX_MINMAX]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ]
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[INDUC:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDUC_NEXT:%.*]], [[LOOP]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll
index 7514690ee3c9e9..f17ae4ab876cd1 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll
@@ -37,9 +37,9 @@ define void @trip1025_i64(ptr noalias nocapture noundef %dst, ptr noalias nocapt
 ; CHECK-NEXT:    [[TMP15:%.*]] = extractelement <vscale x 2 x i1> [[TMP14]], i32 0
 ; CHECK-NEXT:    br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[FOR_END:%.*]]
 ; CHECK:       scalar.ph:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
 ; CHECK:       for.body:
 ; CHECK-NEXT:    [[I_06:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.*]], [[FOR_BODY]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll b/llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll
index a91f92348ab261..c4f244e3697308 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll
@@ -58,9 +58,9 @@ define void @test(ptr %p, i64 %a, i8 %b) {
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add i32 [[INDEX]], 4
 ; CHECK-NEXT:    br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY1]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
 ; CHECK:       scalar.ph:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 4, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    br label [[FOR_COND:%.*]]
 ; CHECK:       for.cond:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[ADD:%.*]], [[FOR_BODY:%.*]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/short-trip-count.ll b/llvm/test/Transforms/LoopVectorize/RISCV/short-trip-count.ll
index bb716d78ca4119..700ba41b243f50 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/short-trip-count.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/short-trip-count.ll
@@ -25,9 +25,9 @@ define void @small_trip_count_min_vlen_128(ptr nocapture %a) nounwind vscale_ran
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add i32 [[INDEX]], [[TMP2]]
 ; CHECK-NEXT:    br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
 ; CHECK:       scalar.ph:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ [[IV_NEXT:%.*]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
@@ -84,9 +84,9 @@ define void @small_trip_count_min_vlen_32(ptr nocapture %a) nounwind vscale_rang
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add i32 [[INDEX]], [[TMP4]]
 ; CHECK-NEXT:    br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
 ; CHECK:       scalar.ph:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ [[IV_NEXT:%.*]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-interleave.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-interleave.ll
index 69aa7bc7409837..e5b765c244edbc 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-interleave.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-interleave.ll
@@ -75,9 +75,9 @@ define void @interleave(ptr noalias %a, ptr noalias %b, i64 %N) {
 ; IF-EVL-NEXT:    [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
 ; IF-EVL-NEXT:    br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; IF-EVL:       middle.block:
-; IF-EVL-NEXT:    br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
+; IF-EVL-NEXT:    br label [[FOR_COND_CLEANUP:%.*]]
 ; IF-EVL:       scalar.ph:
-; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
+; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
 ; IF-EVL-NEXT:    br label [[FOR_BODY:%.*]]
 ; IF-EVL:       for.body:
 ; IF-EVL-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-iv32.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-iv32.ll
index cb4cf3adfbaa09..8aafa3a51adb02 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-iv32.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-iv32.ll
@@ -44,9 +44,9 @@ define void @iv32(ptr noalias %a, ptr noalias %b, i32 %N) {
 ; IF-EVL-NEXT:    [[TMP18:%.*]] = icmp eq i32 [[IV_NEXT]], [[N_VEC]]
 ; IF-EVL-NEXT:    br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[FOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; IF-EVL:       middle.block:
-; IF-EVL-NEXT:    br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
+; IF-EVL-NEXT:    br label [[FOR_COND_CLEANUP:%.*]]
 ; IF-EVL:       scalar.ph:
-; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY1:%.*]] ]
+; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 0, [[ENTRY1:%.*]] ]
 ; IF-EVL-NEXT:    br label [[FOR_BODY1:%.*]]
 ; IF-EVL:       for.body:
 ; IF-EVL-NEXT:    [[IV1:%.*]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT1:%.*]], [[FOR_BODY1]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-masked-loadstore.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-masked-loadstore.ll
index b8b2558247fa64..45fa95b1afe041 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-masked-loadstore.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-masked-loadstore.ll
@@ -58,9 +58,9 @@ define void @masked_loadstore(ptr noalias %a, ptr noalias %b, i64 %n) {
 ; IF-EVL-NEXT:    [[TMP25:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
 ; IF-EVL-NEXT:    br i1 [[TMP25]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; IF-EVL:       middle.block:
-; IF-EVL-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; IF-EVL-NEXT:    br label [[EXIT:%.*]]
 ; IF-EVL:       scalar.ph:
-; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
+; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
 ; IF-EVL-NEXT:    br label [[FOR_BODY:%.*]]
 ; IF-EVL:       for.body:
 ; IF-EVL-NEXT:    [[I_011:%.*]] = phi i64 [ [[INC:%.*]], [[FOR_INC:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-ordered-reduction.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-ordered-reduction.ll
index 314d30f86ee57d..a0c66354e7745c 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-ordered-reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-ordered-reduction.ll
@@ -46,10 +46,10 @@ define float @fadd(ptr noalias nocapture readonly %a, i64 %n) {
 ; IF-EVL-NEXT:    [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
 ; IF-EVL-NEXT:    br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; IF-EVL:       middle.block:
-; IF-EVL-NEXT:    br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
+; IF-EVL-NEXT:    br label [[FOR_END:%.*]]
 ; IF-EVL:       scalar.ph:
-; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
-; IF-EVL-NEXT:    [[BC_MERGE_RDX:%.*]] = phi float [ [[TMP14]], [[MIDDLE_BLOCK]] ], [ 0.000000e+00, [[ENTRY]] ]
+; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
+; IF-EVL-NEXT:    [[BC_MERGE_RDX:%.*]] = phi float [ 0.000000e+00, [[ENTRY]] ]
 ; IF-EVL-NEXT:    br label [[FOR_BODY:%.*]]
 ; IF-EVL:       for.body:
 ; IF-EVL-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll
index 362bfd61ebd076..2ca57a91228183 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll
@@ -49,9 +49,9 @@ define void @foo(ptr noalias %a, ptr noalias %b, ptr noalias %c, i64 %N) {
 ; IF-EVL-NEXT:    [[TMP22:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
 ; IF-EVL-NEXT:    br i1 [[TMP22]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; IF-EVL:       middle.block:
-; IF-EVL-NEXT:    br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
+; IF-EVL-NEXT:    br label [[FOR_COND_CLEANUP:%.*]]
 ; IF-EVL:       scalar.ph:
-; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
+; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
 ; IF-EVL-NEXT:    br label [[FOR_BODY:%.*]]
 ; IF-EVL:       for.body:
 ; IF-EVL-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/SystemZ/force-target-instruction-cost.ll b/llvm/test/Transforms/LoopVectorize/SystemZ/force-target-instruction-cost.ll
index 3477c8d879106b..b89c5c677a134c 100644
--- a/llvm/test/Transforms/LoopVectorize/SystemZ/force-target-instruction-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/SystemZ/force-target-instruction-cost.ll
@@ -37,9 +37,9 @@ define void @test_scalar_steps_target_instruction_cost(ptr %dst) {
 ; CHECK-NEXT:    [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10
 ; CHECK-NEXT:    br i1 [[TMP7]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 30, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP:.*]]
 ; CHECK:       [[LOOP]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll b/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
index 07a1cca1bc21e7..f55a9f649c43a8 100644
--- a/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
+++ b/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
@@ -139,9 +139,9 @@ define void @test(ptr %p, i40 %a) {
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add i32 [[INDEX]], 16
 ; CHECK-NEXT:    br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
 ; CHECK:       scalar.ph:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
 ; CHECK:       for.body:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
diff --git a/ll...
[truncated]

llvmbot · 2024-08-30T15:33:32Z

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Add an initial CFG simplification transform, which removes the dead edges for blocks terminated with BranchOnCond true.

At the moment, this removes the edge between middle block and scalar preheader when folding the tail.

Note: not all tests have been updated yet, but the current tests give a good idea of the change.

Patch is 42.62 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/106748.diff

34 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+13-3)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+5)
(modified) llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h (+1)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+44)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence-fold-tail.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/pr73894.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/short-trip-count.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-interleave.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-iv32.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-masked-loadstore.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-ordered-reduction.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/force-target-instruction-cost.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/divs-with-tail-folding.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/fp64_to_uint32-cost-model.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-force-tail-with-evl.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-interleaved-accesses-gap.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/as_cast.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/dead_instructions.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-divisible-TC.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/memdep-fold-tail.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/pr51614-fold-tail-by-masking.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/select-reduction.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/store-reduction-results-in-tail-folded-loop.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/strict-fadd-interleave-only.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/tail-folding-alloca-in-loop.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/tail-folding-counting-down.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/tail-folding-switch.ll (+2-2)

diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index ac322638056879..70689da342ae0c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -1007,9 +1007,19 @@ void VPlan::execute(VPTransformState *State) {
          "middle block has unexpected successors");
   VPBasicBlock *ScalarPhVPBB = cast<VPBasicBlock>(
       MiddleSuccs.size() == 1 ? MiddleSuccs[0] : MiddleSuccs[1]);
-  assert(!isa<VPIRBasicBlock>(ScalarPhVPBB) &&
-         "scalar preheader cannot be wrapped already");
-  replaceVPBBWithIRVPBB(ScalarPhVPBB, ScalarPh);
+  if (!isa<VPIRBasicBlock>(ScalarPhVPBB)) {
+    assert(!isa<VPIRBasicBlock>(ScalarPhVPBB) &&
+           "scalar preheader cannot be wrapped already");
+    replaceVPBBWithIRVPBB(ScalarPhVPBB, ScalarPh);
+  } else {
+    // There is no edge to the scalar pre-header in VPlan. Phis in ScalarPh have
+    // been created during skeleton construction, so remove the incoming values
+    // from the middle block here.
+    // TODO: Remove this once phis in the scalar preheader are managed in VPlan
+    // directly.
+    for (auto &Phi : ScalarPh->phis())
+      Phi.removeIncomingValue(MiddleBB);
+  }
   replaceVPBBWithIRVPBB(MiddleVPBB, MiddleBB);
 
   // Disconnect the middle block from its single successor (the scalar loop
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index e7ea5cb23b90d3..9d6296be020aa3 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -3496,6 +3496,11 @@ class VPlan {
     return LiveOuts;
   }
 
+  void removeLiveOut(PHINode *PN) {
+    delete LiveOuts[PN];
+    LiveOuts.erase(PN);
+  }
+
   VPValue *getSCEVExpansion(const SCEV *S) const {
     return SCEVToExpansion.lookup(S);
   }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
index 5f86f2c969651b..0c81b3b9cd30fd 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanPatternMatch.h
@@ -85,6 +85,7 @@ inline specific_intval<0> m_SpecificInt(uint64_t V) {
   return specific_intval<0>(APInt(64, V));
 }
 
+inline specific_intval<1> m_True() { return specific_intval<1>(APInt(64, 1)); }
 inline specific_intval<1> m_False() { return specific_intval<1>(APInt(64, 0)); }
 
 /// Matching combinators
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 9796ee64f6ef90..6ac972738eb86a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1119,6 +1119,49 @@ void VPlanTransforms::truncateToMinimalBitwidths(
          "some entries in MinBWs haven't been processed");
 }
 
+/// Remove BranchOnCond recipes with constant conditions together with removing
+/// dead edges to their successors. Remove blocks that become dead (no remaining
+/// predecessors())
+static void simplifyCFG(VPlan &Plan) {
+  using namespace llvm::VPlanPatternMatch;
+  SmallVector<VPBasicBlock *> WorkList;
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_deep(Plan.getEntry()))) {
+    VPRecipeBase *Term = VPBB->getTerminator();
+    if (!Term || !match(Term, m_BranchOnCond(m_True())))
+      continue;
+    WorkList.push_back(VPBB);
+  }
+
+  SetVector<VPBasicBlock *> PossiblyDeadBlocks;
+  for (VPBasicBlock *VPBB : WorkList) {
+    VPRecipeBase *Term = VPBB->getTerminator();
+    VPBasicBlock *DeadSucc = cast<VPBasicBlock>(VPBB->getSuccessors()[1]);
+    VPBlockUtils::disconnectBlocks(VPBB, DeadSucc);
+    PossiblyDeadBlocks.insert(DeadSucc);
+    Term->eraseFromParent();
+  }
+  for (VPBasicBlock *VPBB : PossiblyDeadBlocks) {
+    if (VPBB->getNumPredecessors() != 0)
+      continue;
+    // The block doesn't have any predecessors, remove it.
+    //
+    // To do so, first remove all recipes in the block. At the moment, recipes
+    // with users outside the block must be live-outs. Those are removed.
+    SmallVector<PHINode *> DeadLiveOuts;
+    for (VPRecipeBase &R : make_early_inc_range(reverse(*VPBB))) {
+      if (auto *V = dyn_cast<VPSingleDefRecipe>(&R)) {
+        for (VPUser *U : to_vector(V->users())) {
+          auto *LO = cast<VPLiveOut>(U);
+          Plan.removeLiveOut(LO->getPhi());
+        }
+      }
+      R.eraseFromParent();
+    }
+    delete VPBB;
+  }
+}
+
 void VPlanTransforms::optimize(VPlan &Plan, ScalarEvolution &SE) {
   removeRedundantCanonicalIVs(Plan);
   removeRedundantInductionCasts(Plan);
@@ -1131,6 +1174,7 @@ void VPlanTransforms::optimize(VPlan &Plan, ScalarEvolution &SE) {
 
   removeRedundantExpandSCEVRecipes(Plan);
   mergeBlocksIntoPredecessors(Plan);
+  simplifyCFG(Plan);
 }
 
 // Add a VPActiveLaneMaskPHIRecipe and related recipes to \p Plan and replace
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence-fold-tail.ll b/llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence-fold-tail.ll
index e9c9288e734394..1e9d0d1867ce46 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence-fold-tail.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence-fold-tail.ll
@@ -71,14 +71,13 @@ define i32 @test_phi_iterator_invalidation(ptr %A, ptr noalias %B) {
 ; CHECK-NEXT:    br i1 [[TMP31]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       middle.block:
 ; CHECK-NEXT:    [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[TMP24]], i32 3
-; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
 ; CHECK:       scalar.ph:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 1004, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
-; CHECK-NEXT:    [[SCALAR_RECUR_INIT:%.*]] = phi i16 [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT:    [[SCALAR_RECUR:%.*]] = phi i16 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[FOR_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:    [[SCALAR_RECUR:%.*]] = phi i16 [ 0, [[SCALAR_PH]] ], [ [[FOR_NEXT:%.*]], [[LOOP]] ]
 ; CHECK-NEXT:    [[SEXT:%.*]] = sext i16 [[SCALAR_RECUR]] to i32
 ; CHECK-NEXT:    [[IV_NEXT]] = add i64 [[IV]], 1
 ; CHECK-NEXT:    [[GEP_A:%.*]] = getelementptr i32, ptr [[A]], i64 [[IV_NEXT]]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/pr73894.ll b/llvm/test/Transforms/LoopVectorize/AArch64/pr73894.ll
index a70eafb6078a03..0def9aabd4942c 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/pr73894.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/pr73894.ll
@@ -54,10 +54,10 @@ define i32 @pr70988(ptr %src, i32 %n) {
 ; CHECK-NEXT:    br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       middle.block:
 ; CHECK-NEXT:    [[RDX_MINMAX:%.*]] = call i32 @llvm.smax.i32(i32 [[TMP17]], i32 [[TMP18]])
-; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
 ; CHECK:       scalar.ph:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
-; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi i32 [ [[RDX_MINMAX]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ]
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[INDUC:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDUC_NEXT:%.*]], [[LOOP]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll
index 7514690ee3c9e9..f17ae4ab876cd1 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll
@@ -37,9 +37,9 @@ define void @trip1025_i64(ptr noalias nocapture noundef %dst, ptr noalias nocapt
 ; CHECK-NEXT:    [[TMP15:%.*]] = extractelement <vscale x 2 x i1> [[TMP14]], i32 0
 ; CHECK-NEXT:    br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[FOR_END:%.*]]
 ; CHECK:       scalar.ph:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
 ; CHECK:       for.body:
 ; CHECK-NEXT:    [[I_06:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.*]], [[FOR_BODY]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll b/llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll
index a91f92348ab261..c4f244e3697308 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll
@@ -58,9 +58,9 @@ define void @test(ptr %p, i64 %a, i8 %b) {
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add i32 [[INDEX]], 4
 ; CHECK-NEXT:    br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY1]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
 ; CHECK:       scalar.ph:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 4, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    br label [[FOR_COND:%.*]]
 ; CHECK:       for.cond:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[ADD:%.*]], [[FOR_BODY:%.*]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/short-trip-count.ll b/llvm/test/Transforms/LoopVectorize/RISCV/short-trip-count.ll
index bb716d78ca4119..700ba41b243f50 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/short-trip-count.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/short-trip-count.ll
@@ -25,9 +25,9 @@ define void @small_trip_count_min_vlen_128(ptr nocapture %a) nounwind vscale_ran
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add i32 [[INDEX]], [[TMP2]]
 ; CHECK-NEXT:    br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
 ; CHECK:       scalar.ph:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ [[IV_NEXT:%.*]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
@@ -84,9 +84,9 @@ define void @small_trip_count_min_vlen_32(ptr nocapture %a) nounwind vscale_rang
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add i32 [[INDEX]], [[TMP4]]
 ; CHECK-NEXT:    br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
 ; CHECK:       scalar.ph:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ [[IV_NEXT:%.*]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-interleave.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-interleave.ll
index 69aa7bc7409837..e5b765c244edbc 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-interleave.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-interleave.ll
@@ -75,9 +75,9 @@ define void @interleave(ptr noalias %a, ptr noalias %b, i64 %N) {
 ; IF-EVL-NEXT:    [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
 ; IF-EVL-NEXT:    br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; IF-EVL:       middle.block:
-; IF-EVL-NEXT:    br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
+; IF-EVL-NEXT:    br label [[FOR_COND_CLEANUP:%.*]]
 ; IF-EVL:       scalar.ph:
-; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
+; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
 ; IF-EVL-NEXT:    br label [[FOR_BODY:%.*]]
 ; IF-EVL:       for.body:
 ; IF-EVL-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-iv32.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-iv32.ll
index cb4cf3adfbaa09..8aafa3a51adb02 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-iv32.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-iv32.ll
@@ -44,9 +44,9 @@ define void @iv32(ptr noalias %a, ptr noalias %b, i32 %N) {
 ; IF-EVL-NEXT:    [[TMP18:%.*]] = icmp eq i32 [[IV_NEXT]], [[N_VEC]]
 ; IF-EVL-NEXT:    br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[FOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; IF-EVL:       middle.block:
-; IF-EVL-NEXT:    br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
+; IF-EVL-NEXT:    br label [[FOR_COND_CLEANUP:%.*]]
 ; IF-EVL:       scalar.ph:
-; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY1:%.*]] ]
+; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 0, [[ENTRY1:%.*]] ]
 ; IF-EVL-NEXT:    br label [[FOR_BODY1:%.*]]
 ; IF-EVL:       for.body:
 ; IF-EVL-NEXT:    [[IV1:%.*]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT1:%.*]], [[FOR_BODY1]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-masked-loadstore.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-masked-loadstore.ll
index b8b2558247fa64..45fa95b1afe041 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-masked-loadstore.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-masked-loadstore.ll
@@ -58,9 +58,9 @@ define void @masked_loadstore(ptr noalias %a, ptr noalias %b, i64 %n) {
 ; IF-EVL-NEXT:    [[TMP25:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
 ; IF-EVL-NEXT:    br i1 [[TMP25]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; IF-EVL:       middle.block:
-; IF-EVL-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; IF-EVL-NEXT:    br label [[EXIT:%.*]]
 ; IF-EVL:       scalar.ph:
-; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
+; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
 ; IF-EVL-NEXT:    br label [[FOR_BODY:%.*]]
 ; IF-EVL:       for.body:
 ; IF-EVL-NEXT:    [[I_011:%.*]] = phi i64 [ [[INC:%.*]], [[FOR_INC:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-ordered-reduction.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-ordered-reduction.ll
index 314d30f86ee57d..a0c66354e7745c 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-ordered-reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-ordered-reduction.ll
@@ -46,10 +46,10 @@ define float @fadd(ptr noalias nocapture readonly %a, i64 %n) {
 ; IF-EVL-NEXT:    [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
 ; IF-EVL-NEXT:    br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; IF-EVL:       middle.block:
-; IF-EVL-NEXT:    br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
+; IF-EVL-NEXT:    br label [[FOR_END:%.*]]
 ; IF-EVL:       scalar.ph:
-; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
-; IF-EVL-NEXT:    [[BC_MERGE_RDX:%.*]] = phi float [ [[TMP14]], [[MIDDLE_BLOCK]] ], [ 0.000000e+00, [[ENTRY]] ]
+; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
+; IF-EVL-NEXT:    [[BC_MERGE_RDX:%.*]] = phi float [ 0.000000e+00, [[ENTRY]] ]
 ; IF-EVL-NEXT:    br label [[FOR_BODY:%.*]]
 ; IF-EVL:       for.body:
 ; IF-EVL-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll
index 362bfd61ebd076..2ca57a91228183 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll
@@ -49,9 +49,9 @@ define void @foo(ptr noalias %a, ptr noalias %b, ptr noalias %c, i64 %N) {
 ; IF-EVL-NEXT:    [[TMP22:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
 ; IF-EVL-NEXT:    br i1 [[TMP22]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; IF-EVL:       middle.block:
-; IF-EVL-NEXT:    br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
+; IF-EVL-NEXT:    br label [[FOR_COND_CLEANUP:%.*]]
 ; IF-EVL:       scalar.ph:
-; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
+; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ]
 ; IF-EVL-NEXT:    br label [[FOR_BODY:%.*]]
 ; IF-EVL:       for.body:
 ; IF-EVL-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/SystemZ/force-target-instruction-cost.ll b/llvm/test/Transforms/LoopVectorize/SystemZ/force-target-instruction-cost.ll
index 3477c8d879106b..b89c5c677a134c 100644
--- a/llvm/test/Transforms/LoopVectorize/SystemZ/force-target-instruction-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/SystemZ/force-target-instruction-cost.ll
@@ -37,9 +37,9 @@ define void @test_scalar_steps_target_instruction_cost(ptr %dst) {
 ; CHECK-NEXT:    [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10
 ; CHECK-NEXT:    br i1 [[TMP7]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 30, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP:.*]]
 ; CHECK:       [[LOOP]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll b/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
index 07a1cca1bc21e7..f55a9f649c43a8 100644
--- a/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
+++ b/llvm/test/Transforms/LoopVectorize/SystemZ/pr47665.ll
@@ -139,9 +139,9 @@ define void @test(ptr %p, i40 %a) {
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add i32 [[INDEX]], 16
 ; CHECK-NEXT:    br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
 ; CHECK:       scalar.ph:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
 ; CHECK:       for.body:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
diff --git a/ll...
[truncated]

fhahn

ping :)

fhahn · 2024-09-11T08:59:42Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -3504,6 +3504,11 @@ class VPlan {
    return LiveOuts;
  }

+  void removeLiveOut(PHINode *PN) {


This shouldn't be needed once VPIRInstruction lands (#100735_

david-arm · 2024-09-13T14:26:51Z

llvm/lib/Transforms/Vectorize/VPlan.cpp

-         "scalar preheader cannot be wrapped already");
-  replaceVPBBWithIRVPBB(ScalarPhVPBB, ScalarPh);
+  if (!isa<VPIRBasicBlock>(ScalarPhVPBB)) {
+    assert(!isa<VPIRBasicBlock>(ScalarPhVPBB) &&


Isn't this testing exactly the same thing in the if statement? I think you can remove this as being redundant.

Yep, removed the assert, thanks!

david-arm

I think the idea of cleaning up the CFG makes a lot of sense, but I am a bit confused by some of the changes and how this works. I'm probably missing something, but I left a few comments anyway!

david-arm · 2024-09-16T12:13:29Z

llvm/lib/Transforms/Vectorize/VPlan.cpp

+    // from the middle block here.
+    // TODO: Remove this once phis in the scalar preheader are managed in VPlan
+    // directly.
+    for (auto &Phi : ScalarPh->phis())


Does this change somehow relate to the change in simplifyCFG? Perhaps I'm missing something, but they seem unrelated.

This seems to be relying upon the assumption that if ScalarPh is already a IRVPBB, then there is no path from the middle block to the scalar.ph block. I can't see that assumption tested or documented anywhere. Also, given that we've just extracted ScalarPhVPBB from a successor of MiddleVPBB it seems a little contradictory?

It was needed to remove live-outs. I've now updated the PR to be based on

#110577 and #109975, which removes the need for those changes.

david-arm · 2024-09-16T12:20:35Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_deep(Plan.getEntry()))) {
+    VPRecipeBase *Term = VPBB->getTerminator();
+    if (!Term || !match(Term, m_BranchOnCond(m_True())))


Should we also be looking for m_BranchOnCond(m_False())?

Eventually yes, for now I don't think we ever form BranchOnConds on false

david-arm · 2024-09-16T12:22:23Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  for (VPBasicBlock *VPBB : PossiblyDeadBlocks) {
+    if (VPBB->getNumPredecessors() != 0)
+      continue;
+    // The block doesn't have any predecessors, remove it.


I guess it could have successors though, which may now also be dead?

In the original version this case couldn't happen so far, but it can now it is based on #109975. Rewrote the code to properly handle cases with additional successors and predecessor.

david-arm · 2024-09-16T12:23:52Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    //
+    // To do so, first remove all recipes in the block. At the moment, recipes
+    // with users outside the block must be live-outs. Those are removed.
+    SmallVector<PHINode *> DeadLiveOuts;


Unused variable

Whole live-out handling has been removed as based on #110577

david-arm · 2024-09-16T12:24:43Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    SmallVector<PHINode *> DeadLiveOuts;
+    for (VPRecipeBase &R : make_early_inc_range(reverse(*VPBB))) {
+      if (auto *V = dyn_cast<VPSingleDefRecipe>(&R)) {
+        for (VPUser *U : to_vector(V->users())) {


Why not just

for (VPUser *U : V->users()) {

?

Users are stored in an SmallVector, removing users while iterating over users() may cause accesses via an invalid iterator.

david-arm · 2024-09-25T08:39:56Z

Just a thought - is it worth adding '[WIP]' to the title given that it has failing tests that still need updating?

fhahn

This patch now depends on both #110577 and #109975 which helps to simplify the patch as there's no need to manually remove live-outs or manually update phis in the scalar ph which previously have been created during skeleton construction outside VPlan.

It probably only makes sense to review the patch once #110577 and #109975 landed

This patch changes the way blocks are managed by VPlan. Previously all blocks reachable from entry would be cleaned up when a VPlan is destroyed. With this patch, each VPlan keeps track of blocks created for it in a list and this list is then used to delete all blocks in the list when the VPlan is destroyed. To do so, block creation is funneled through helpers in directly in VPlan. The main advantage of doing so is it simplifies CFG transformations, as those do not have to take care of deleting any blocks, just adjusting the CFG. This helps to simplify llvm#108378 and llvm#106748. This also simplifies handling of 'immutable' blocks a VPlan holds references to, which at the moment only include the scalar header block. Note that the original constructors taking VPBlockBase are retained at the moment for unit tests.

This patch changes the way blocks are managed by VPlan. Previously all blocks reachable from entry would be cleaned up when a VPlan is destroyed. With this patch, each VPlan keeps track of blocks created for it in a list and this list is then used to delete all blocks in the list when the VPlan is destroyed. To do so, block creation is funneled through helpers in directly in VPlan. The main advantage of doing so is it simplifies CFG transformations, as those do not have to take care of deleting any blocks, just adjusting the CFG. This helps to simplify #108378 and #106748. This also simplifies handling of 'immutable' blocks a VPlan holds references to, which at the moment only include the scalar header block. PR: #120918

…20918) This patch changes the way blocks are managed by VPlan. Previously all blocks reachable from entry would be cleaned up when a VPlan is destroyed. With this patch, each VPlan keeps track of blocks created for it in a list and this list is then used to delete all blocks in the list when the VPlan is destroyed. To do so, block creation is funneled through helpers in directly in VPlan. The main advantage of doing so is it simplifies CFG transformations, as those do not have to take care of deleting any blocks, just adjusting the CFG. This helps to simplify llvm/llvm-project#108378 and llvm/llvm-project#106748. This also simplifies handling of 'immutable' blocks a VPlan holds references to, which at the moment only include the scalar header block. PR: llvm/llvm-project#120918

fhahn

Ping :)

The patch should now be ready for another round of reviews, all predecessor patches should have landed by now.

ayalz · 2025-02-19T07:04:18Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    ResumePhi->addOperand(
+        ResumePhi->getOperand(ScalarPH->getNumPredecessors() == 1 ? 0 : 1));


Would be good to note above that "adding an incoming value" means replicating the last one? Does the documentation of ResumePhi deserve updating: "The first operand is the incoming value from the predecessor in VPlan, the second operand is the incoming value for all other predecessors (which are currently not modeled in VPlan)" - second operand is replicated to correspond to all predecessors but the first? The new predecessor of scalar preheader appears last, and so a corresponding incoming value should be the last operand, but unclear if ResumePhi should have two distinct operands, or one (potentially replicated) operand per predecessor.

Updated to use the last one + comment. ResumePhi now have incoming values for each predecessor, which is why we need to update it here.

ayalz · 2025-02-19T15:17:26Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -2817,6 +2817,7 @@ BasicBlock *InnerLoopVectorizer::createVectorizedLoopSkeleton(
  // faster.
  emitMemRuntimeChecks(LoopScalarPreHeader);

+  replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader);


How/Is this move dependent - replacing the scalar preheader VPBB with IRBB here instead of earlier when calling createVectorLoopSkeleton() above?
(Here being createVectorizedLoopSkeleton() and its overridings - better have more distinct names, independently).

At the original points, the scalar PH may be unreachable, which means at the moment we cannot use getPlan() . Calling it later ensures it will be connected, for now.

Could independently improve this, by either storing parent plan in all VPBBs (not just the entries) or passing Plan to replaceVPBBWithIRVPBB

The original position seems (more) reasonable, being right after LoopScalarPreHeader is built. Worth leaving behind some comment why this replacement is currently done later, "for now"?

VPlan is assumed to always be connected, with all its VPBB's reachable from its entry. Can the original points maintain this connectivity, w/o storing the parental plan in all VPBBs nor passing Plan to replaceVPBBWithIRVPBB()?

BTW, would be good to clarify that VPlan::getPlanEntry() avoids going into an infinite loop, if invoked on flat region-less cyclic CFG, based on visiting operands in order, and relying on the operand associated with the preheader block to appear (first) before that of the latch when visiting header phis.

Thanks, I added a note at the original place.

We could store the plan in all blocks w/o a parent region, there is already a field in all blocks to do so?

BTW, would be good to clarify that VPlan::getPlanEntry() avoids going into an infinite loop, if invoked on flat region-less cyclic CFG, based on visiting operands in order, and relying on the operand associated with the preheader block to appear (first) before that of the latch when visiting header phis.
getPlanEntry at the moment uses a SmallSetVector for its worklist, which naturally avoids infinite cycles.

We could store the plan in all blocks w/o a parent region, there is already a field in all blocks to do so?

Agreed, it seems better to store the plan for orphan blocks in their existing field rather than null, at-least for unreachable blocks, although best maintain connectivity rather than have a block point to a plan which in turn cannot reach it?

Yep, we will be able to do so once we move to model the full skeleton independently in VPlan and not rely on legacy's skeleton creation.

ayalz · 2025-02-19T22:49:39Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  auto ExitBlocks = BestPlan.getExitBlocks();
  bool PlanForEarlyExitLoop =
-      BestPlan.getVectorLoopRegion() &&
-      BestPlan.getVectorLoopRegion()->getSingleSuccessor() !=
-          BestPlan.getMiddleBlock();
+      std::distance(ExitBlocks.begin(), ExitBlocks.end()) > 2 ||
+      (std::distance(ExitBlocks.begin(), ExitBlocks.end()) == 1 &&
+       (*ExitBlocks.begin())->getNumPredecessors() > 1);


(independent) If this is for (bailing out of) asserting only, should be under #ifndef NDEBUG?

ExitBlocks are the successor-less VPIRBB's at the end of VPlan's scope, excluding the scalar header, as depicted in https://llvm.org/docs/Vectorizers.html#early-exit-vectorization. What does the single-exit-block-with-multiple-predecessors case stand for?

The whole block is wrapped under ifndef NDEBUG.

The single exit block with multiple predecessor is generated when early exiting and latch exit go to the same exit block.

ayalz · 2025-02-19T22:56:26Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  using namespace llvm::VPlanPatternMatch;
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_deep(Plan.getEntry()))) {
+    if (VPBB->getNumSuccessors() != 2 || VPBB->begin() == VPBB->end() ||


If VPBB has two successors it must contain at least one recipe - to decide which successor to take. The condition begin != end can be asserted or dropped.

Dropped, thanks

ayalz · 2025-02-19T22:57:19Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -1437,6 +1437,49 @@ void VPlanTransforms::truncateToMinimalBitwidths(
         "some entries in MinBWs haven't been processed");
 }

+/// Remove BranchOnCond recipes with constant conditions together with removing


Suggested change

/// Remove BranchOnCond recipes with constant conditions together with removing

/// Remove BranchOnCond recipes with true conditions together with removing

Done, thanks

ayalz · 2025-02-19T23:06:03Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+             !isa<VPHeaderPHIRecipe>(&R) &&
+             "Cannot update VPIRInstructions wrapping phis or header phis yet");
+      auto *VPI = dyn_cast<VPInstruction>(&R);
+      if (VPI && VPI->getOpcode() == VPInstruction::ResumePhi) {


nit: early continue?

Used early break, thanks

ayalz · 2025-02-19T23:07:15Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    unsigned DeadIdx = std::distance(Preds.begin(), find(Preds, VPBB));
+
+    // Remove values coming from VPBB from phi-like recipes in DeadSucc.
+    for (VPRecipeBase &R : make_early_inc_range(*DeadSucc)) {


Break if a non-phi-like recipes is encountered.

Done, thanks

ayalz · 2025-02-19T23:10:39Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -1446,6 +1489,7 @@ void VPlanTransforms::optimize(VPlan &Plan) {
  runPass(legalizeAndOptimizeInductions, Plan);
  runPass(removeRedundantExpandSCEVRecipes, Plan);
  runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType());
+  simplifyCFG(Plan);


Can/should be runPass()'ed?

Yep, done, thanks!

ayalz · 2025-02-19T23:14:06Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -1437,6 +1437,49 @@ void VPlanTransforms::truncateToMinimalBitwidths(
         "some entries in MinBWs haven't been processed");
 }

+/// Remove BranchOnCond recipes with constant conditions together with removing
+/// dead edges to their successors.


The successors across the removed edges are assumed to have ResumePhi recipes, which are fixed.

Worth adding a comment that said ResumePhis are expected to already be fixed?

I am not sure, they are expected to be valid coming in and the transform will keep them valid.

ayalz · 2025-02-19T23:16:00Z

llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll

@@ -15,7 +15,6 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
 ; CHECK-NEXT:    [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP6:%.*]] = mul i64 [[TMP5]], 8
-; CHECK-NEXT:    [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[N_VEC]]


(many test changes, yet to review)

The above branch-on-false from entry to scalar preheader or vector preheader can/should also be eliminated, turning the scalar loop into unreachable dead code?

Yep, more to clean up :)

ayalz

(testcase changes reviewed)

ayalz · 2025-02-20T08:27:39Z

llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll

@@ -15,7 +15,6 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
 ; CHECK-NEXT:    [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP6:%.*]] = mul i64 [[TMP5]], 8
-; CHECK-NEXT:    [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[N_VEC]]


The above branch-on-false from entry to scalar preheader or vector preheader can/should also be eliminated, turning the scalar loop into unreachable dead code?

ayalz · 2025-02-20T08:29:39Z

llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll

@@ -42,10 +41,10 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
 ; CHECK-NEXT:    [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 [[INDEX_NEXT]], i64 8)
 ; CHECK-NEXT:    br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]


This branch-on-true from vector.body to middle.block or back to itself, can/should also be eliminated - as part of optimizing a vector loop found to have a trip-count of 1?

Yes, optimizeForVFAndUF removes the region in some cases, but not yet with active-lane-masks.

ayalz · 2025-02-20T08:35:41Z

llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll

-; TFCOMMON-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-; TFCOMMON-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 2
-; TFCOMMON-NEXT:    [[TMP2:%.*]] = sub i64 [[TMP1]], 1
-; TFCOMMON-NEXT:    [[N_RND_UP:%.*]] = add i64 1025, [[TMP2]]
-; TFCOMMON-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
-; TFCOMMON-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]


How/Is this (and similar discardings below) related to eliminating branch-on-true?

This is due to removing the only use of N_VEC which was in the resume phi.

As the test runs simplifycfg the removal of the edge and incoming value is hidden from the test changes.

ayalz · 2025-02-20T09:02:25Z

llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics-reduction.ll

-; IF-EVL-OUTLOOP-NEXT:   EMIT branch-on-cond ir<true>
-; IF-EVL-OUTLOOP-NEXT: Successor(s): ir-bb<for.end>, scalar.ph
-; IF-EVL-OUTLOOP-EMPTY:
-; IF-EVL-OUTLOOP-NEXT: scalar.ph:
-; IF-EVL-OUTLOOP-NEXT:   EMIT vp<[[IV_RESUME:%.+]]> = resume-phi vp<[[VTC]]>, ir<0>
-; IF-EVL-OUTLOOP-NEXT:   EMIT vp<[[RED_RESUME:%.+]]> = resume-phi vp<[[RDX]]>, ir<%start>
-; IF-EVL-OUTLOOP-NEXT: Successor(s): ir-bb<for.body>
-; IF-EVL-OUTLOOP-EMPTY:
-; IF-EVL-OUTLOOP-NEXT: ir-bb<for.body>:
-; IF-EVL-OUTLOOP-NEXT:   IR   %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] (extra operand: vp<[[IV_RESUME]]> from scalar.ph)
-; IF-EVL-OUTLOOP-NEXT:   IR   %rdx = phi i32 [ %start, %entry ], [ %add, %for.body ]
-; IF-EVL-OUTLOOP:        IR   %exitcond.not = icmp eq i64 %iv.next, %n
-; IF-EVL-OUTLOOP-NEXT: No successors


(This seems to have cleared the scalar loop from VPlan)

ayalz · 2025-02-20T09:05:23Z

llvm/test/Transforms/LoopVectorize/X86/fp64_to_uint32-cost-model.ll

@@ -29,7 +29,7 @@ for.body:                                         ; preds = %for.body.preheader,
  store i32 %conv, ptr %arrayidx2, align 4
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %2 = trunc i64 %indvars.iv.next to i32
-  %cmp = icmp ult i32 %2, %0
+  %cmp = icmp ult i32 %2, 1000


Change of test itself? Related?

Unrelated, undone, thanks

ayalz · 2025-02-20T09:09:06Z

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll

-; CHECK-NEXT:   EMIT vp<[[RESUME_1:%.+]]> = extract-from-end ir<%conv>, ir<1>
-; CHECK-NEXT:   EMIT branch-on-cond ir<true>
-; CHECK-NEXT: Successor(s): ir-bb<exit>, scalar.ph
-; CHECK-EMPTY:
-; CHECK-NEXT: scalar.ph
-; CHECK-NEXT:   EMIT vp<[[RESUME_1_P:%.*]]> = resume-phi vp<[[RESUME_1]]>, ir<0>
-; CHECK-NEXT:   EMIT vp<[[RESUME_IV:%.*]]> = resume-phi vp<[[VEC_TC]]>, ir<0>
-; CHECK-NEXT: Successor(s): ir-bb<loop>
-; CHECK-EMPTY:
-; CHECK-NEXT: ir-bb<loop>:
-; CHECK-NEXT:   IR   %0 = phi i32 [ 0, %entry ], [ %conv, %loop ] (extra operand: vp<[[RESUME_1_P]]> from scalar.ph)
-; CHECK-NEXT:   IR   %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop ] (extra operand: vp<[[RESUME_IV]]> from scalar.ph)
-; CHECK:        IR   %ec = icmp eq i32 %iv.next, 20001
-; CHECK-NEXT: No successors


(ditto, here and below)

ayalz · 2025-02-20T09:15:14Z

llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll

@@ -2804,7 +2802,7 @@ define i32 @sink_into_replication_region(i32 %y) {
 ; UNROLL-NO-IC-NEXT:    [[VAR7]] = udiv i32 219220132, [[VAR3]]
 ; UNROLL-NO-IC-NEXT:    [[VAR8]] = add nsw i32 [[VAR3]], -1
 ; UNROLL-NO-IC-NEXT:    [[VAR9:%.*]] = icmp slt i32 [[VAR3]], 2
-; UNROLL-NO-IC-NEXT:    br i1 [[VAR9]], label [[BB1]], label [[BB2]], !prof [[PROF30:![0-9]+]], !llvm.loop [[LOOP31:![0-9]+]]
+; UNROLL-NO-IC-NEXT:    br i1 [[VAR9]], label [[BB1]], label [[BB2]], !prof [[PROF29:![0-9]+]], !llvm.loop [[LOOP30:![0-9]+]]


This PROF#/LOOP# decrement change propagates all the way below.

ayalz · 2025-02-20T09:23:36Z

llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll

@@ -262,7 +262,6 @@ define void @uniform_gep(i64 %k, ptr noalias %A, ptr noalias %B) {
 ; CHECK-NEXT: Successor(s): vector.ph
 ; CHECK-EMPTY:
 ; CHECK-NEXT: vector.ph:
-; CHECK-NEXT:   vp<[[END:%.+]]> = DERIVED-IV ir<21> + vp<[[VEC_TC]]> * ir<1>


This dce is related, middle block w/ branch-on-true simply not CHECKed?

github-actions · 2025-03-13T21:49:15Z

✅ With the latest revision this PR passed the C/C++ code formatter.

fhahn

rebased on top of latest main, ping :)

fhahn · 2025-03-13T21:34:10Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -1437,6 +1437,49 @@ void VPlanTransforms::truncateToMinimalBitwidths(
         "some entries in MinBWs haven't been processed");
 }

+/// Remove BranchOnCond recipes with constant conditions together with removing


Done, thanks

fhahn · 2025-03-13T21:34:55Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -1437,6 +1437,49 @@ void VPlanTransforms::truncateToMinimalBitwidths(
         "some entries in MinBWs haven't been processed");
 }

+/// Remove BranchOnCond recipes with constant conditions together with removing
+/// dead edges to their successors.


fhahn · 2025-03-13T21:35:02Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  using namespace llvm::VPlanPatternMatch;
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_deep(Plan.getEntry()))) {
+    if (VPBB->getNumSuccessors() != 2 || VPBB->begin() == VPBB->end() ||


Dropped, thanks

fhahn · 2025-03-13T21:35:59Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+      continue;
+
+    VPBasicBlock *DeadSucc = cast<VPBasicBlock>(VPBB->getSuccessors()[1]);
+    const auto &Preds = DeadSucc->getPredecessors();


Yes, updated to use RemovedSucc.

fhahn · 2025-03-13T21:36:47Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    unsigned DeadIdx = std::distance(Preds.begin(), find(Preds, VPBB));
+
+    // Remove values coming from VPBB from phi-like recipes in DeadSucc.
+    for (VPRecipeBase &R : make_early_inc_range(*DeadSucc)) {


Done, thanks

fhahn · 2025-03-13T21:37:46Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -1446,6 +1489,7 @@ void VPlanTransforms::optimize(VPlan &Plan) {
  runPass(legalizeAndOptimizeInductions, Plan);
  runPass(removeRedundantExpandSCEVRecipes, Plan);
  runPass(simplifyRecipes, Plan, *Plan.getCanonicalIV()->getScalarType());
+  simplifyCFG(Plan);


Yep, done, thanks!

fhahn · 2025-03-13T21:38:04Z

llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll

@@ -15,7 +15,6 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
 ; CHECK-NEXT:    [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP6:%.*]] = mul i64 [[TMP5]], 8
-; CHECK-NEXT:    [[IND_END:%.*]] = getelementptr i8, ptr [[DST]], i64 [[N_VEC]]


Yep, more to clean up :)

fhahn · 2025-03-13T21:40:25Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    ResumePhi->addOperand(
+        ResumePhi->getOperand(ScalarPH->getNumPredecessors() == 1 ? 0 : 1));


Updated to use the last one + comment. ResumePhi now have incoming values for each predecessor, which is why we need to update it here.

fhahn · 2025-03-13T21:45:45Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  auto ExitBlocks = BestPlan.getExitBlocks();
  bool PlanForEarlyExitLoop =
-      BestPlan.getVectorLoopRegion() &&
-      BestPlan.getVectorLoopRegion()->getSingleSuccessor() !=
-          BestPlan.getMiddleBlock();
+      std::distance(ExitBlocks.begin(), ExitBlocks.end()) > 2 ||
+      (std::distance(ExitBlocks.begin(), ExitBlocks.end()) == 1 &&
+       (*ExitBlocks.begin())->getNumPredecessors() > 1);


The whole block is wrapped under ifndef NDEBUG.

The single exit block with multiple predecessor is generated when early exiting and latch exit go to the same exit block.

fhahn · 2025-03-13T21:58:17Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -2817,6 +2817,7 @@ BasicBlock *InnerLoopVectorizer::createVectorizedLoopSkeleton(
  // faster.
  emitMemRuntimeChecks(LoopScalarPreHeader);

+  replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader);


At the original points, the scalar PH may be unreachable, which means at the moment we cannot use getPlan() . Calling it later ensures it will be connected, for now.

Could independently improve this, by either storing parent plan in all VPBBs (not just the entries) or passing Plan to replaceVPBBWithIRVPBB

ayalz · 2025-03-20T12:56:04Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -2817,6 +2817,7 @@ BasicBlock *InnerLoopVectorizer::createVectorizedLoopSkeleton(
  // faster.
  emitMemRuntimeChecks(LoopScalarPreHeader);

+  replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader);


The original position seems (more) reasonable, being right after LoopScalarPreHeader is built. Worth leaving behind some comment why this replacement is currently done later, "for now"?

VPlan is assumed to always be connected, with all its VPBB's reachable from its entry. Can the original points maintain this connectivity, w/o storing the parental plan in all VPBBs nor passing Plan to replaceVPBBWithIRVPBB()?

BTW, would be good to clarify that VPlan::getPlanEntry() avoids going into an infinite loop, if invoked on flat region-less cyclic CFG, based on visiting operands in order, and relying on the operand associated with the preheader block to appear (first) before that of the latch when visiting header phis.

ayalz · 2025-03-20T12:58:57Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-      BestPlan.getVectorLoopRegion()->getSingleSuccessor() !=
-          BestPlan.getMiddleBlock();
+      ExitBlocks.size() > 1 ||
+      (ExitBlocks.size() == 1 && ExitBlocks[0]->getNumPredecessors() > 1);


Suggested change

(ExitBlocks.size() == 1 && ExitBlocks[0]->getNumPredecessors() > 1);

ExitBlocks[0]->getNumPredecessors() > 1;

asserting that ExitBlocks.size() >= 1?

Is this change independent?

This change should be gone in the latest version

ayalz · 2025-03-20T13:04:38Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -1437,6 +1437,49 @@ void VPlanTransforms::truncateToMinimalBitwidths(
         "some entries in MinBWs haven't been processed");
 }

+/// Remove BranchOnCond recipes with constant conditions together with removing
+/// dead edges to their successors.


Worth adding a comment that said ResumePhis are expected to already be fixed?

ayalz · 2025-03-20T13:49:45Z

llvm/lib/Transforms/Vectorize/VPlan.h

  VPBasicBlock *getMiddleBlock() {
-    return cast<VPBasicBlock>(getScalarPreheader()->getPredecessors().front());
+    if (!getScalarPreheader()->getPredecessors().empty())


Scalar preheader could also/only be reached from runtime guards that bypass the vector region and its middle block? Would it be better to retrieve the middle block from its predecessor vector loop region, than from its successors scalar preheader and/or exit blocks? Is the term "middle block" well defined, when early exits are involved which currently splits it into two blocks, rather than having a single block with three successors?

Done, thanks.

The middle block selects between the scalar preheader and the exit block from the latch, which is the same with and w/o early exits.

ayalz · 2025-03-23T10:31:53Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+static void simplifyCFG(VPlan &Plan) {
+  using namespace llvm::VPlanPatternMatch;
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_deep(Plan.getEntry()))) {


Can the traversal be shallow, at-least while its candidates reside outside of regions?

Yep, done, thanks

ayalz · 2025-03-23T10:44:27Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+      VPI->eraseFromParent();
+    }
+    // Disconnect blocks and remove the terminator. RemovedSucc will be deleted
+    // automatically on VPlan destruction.


Suggested change

// automatically on VPlan destruction.

// automatically on VPlan destruction if it becomes unreachable.

If RemovedSucc becomes unreachable, i.e., Preds consists of VPBB only, do the resumePhis need to be cleared, or better bail out early?

I think it would be good to clear them regardless, as the left-over users may pessimize transforms.

ayalz · 2025-03-23T10:51:37Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+
+    VPBasicBlock *RemovedSucc = cast<VPBasicBlock>(VPBB->getSuccessors()[1]);
+    const auto &Preds = RemovedSucc->getPredecessors();
+    unsigned DeadIdx = std::distance(Preds.begin(), find(Preds, VPBB));


Assert that VPBB feeds a single value to RemovedSucc?

Done thanks

ayalz · 2025-03-23T10:54:18Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    const auto &Preds = RemovedSucc->getPredecessors();
+    unsigned DeadIdx = std::distance(Preds.begin(), find(Preds, VPBB));
+
+    // Remove values coming from VPBB from phi-like recipes in RemovedSucc.


Suggested change

// Remove values coming from VPBB from phi-like recipes in RemovedSucc.

// Values coming from VPBB into ResumePhi recipes of RemoveSucc are removed from these recipes.

clarifying double from.

Updated, thanks!

ayalz · 2025-03-23T10:55:38Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+             "Cannot update VPIRInstructions wrapping phis or header phis yet");
+      auto *VPI = dyn_cast<VPInstruction>(&R);
+      if (!VPI || VPI->getOpcode() != VPInstruction::ResumePhi)
+        break;


Better break if !isPhi(), as we're traversing all and only phi recipes which appear first in the block, and then assert that the phi is a ResumePhi recipe?

Will do separately, as isPhi needs updating to consider ReusmePhi

ayalz · 2025-03-23T11:00:38Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -1467,6 +1467,49 @@ void VPlanTransforms::truncateToMinimalBitwidths(
         "some entries in MinBWs haven't been processed");
 }

+/// Remove BranchOnCond recipes with true conditions together with removing
+/// dead edges to their successors.
+static void simplifyCFG(VPlan &Plan) {


Perhaps a more modest name would be more accurate, at this stage?

Updated to simplifyBranchOnCondTrue, thanks

fhahn

Ping :)

david-arm · 2025-04-02T07:59:05Z

Hi @fhahn, I think this might need a rebase due to conflicts in the tests.

david-arm

I've still got a couple of files to review, but I'll leave the comments I have so far!

david-arm · 2025-04-02T08:42:37Z

llvm/test/Transforms/LoopVectorize/AArch64/masked-call-scalarize.ll

-; TFCOMMON-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
-; TFCOMMON-NEXT:    [[TMP1:%.*]] = sub i64 [[TMP0]], 2
-; TFCOMMON-NEXT:    [[TMP2:%.*]] = icmp ugt i64 [[TMP0]], 2
+; TFCOMMON-NEXT:    [[N_RND_UP:%.*]] = add i64 [[TMP0:%.*]], 1


At first I found this pretty confusing - it seemed to change the preheader code without any branch simplifications in the middle block. Then I realised we also run instsimplify afterwards, which makes it hard to understand what's going on.

Yep unfortunately running extra passes can lead to some hard-to-understand changes, which is one of the main. reason this is discouraged :(

We are at least getting close to a point where we can perform most of the required simplifciations to clean up directly on the VPlans, before even generating code.

david-arm · 2025-04-02T08:49:33Z

llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll

-; TFA_INTERLEAVE-NEXT:    [[TMP4:%.*]] = mul i64 [[TMP3]], 4
-; TFA_INTERLEAVE-NEXT:    [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
-; TFA_INTERLEAVE-NEXT:    [[TMP6:%.*]] = mul i64 [[TMP5]], 2
+; TFA_INTERLEAVE-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()


Again, not your fault but it's downright confusing!

I manually updated the checks as it should just be a simple removal, but the auto-generated checks will lead to a number of renaming changes due to value numbers changing.

david-arm · 2025-04-02T08:52:38Z

llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll

@@ -230,54 +230,54 @@ define float @fadd_strict_unroll(ptr noalias nocapture readonly %a, i64 %n) #0 {
 ; CHECK-UNORDERED-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK-UNORDERED:       vector.body:
 ; CHECK-UNORDERED-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
-; CHECK-UNORDERED-NEXT:    [[VEC_PHI:%.*]] = phi <vscale x 8 x float> [ insertelement (<vscale x 8 x float> splat (float -0.000000e+00), float 0.000000e+00, i32 0), [[VECTOR_PH]] ], [ [[TMP18:%.*]], [[VECTOR_BODY]] ]


I suspect these changes are just due to the test script being a little annoying and renaming variables when it doesn't have to. I've seen this quite a lot recently.

Given the sheer volume of changes in this file, is it worth regenerating the CHECK lines in a pre-commit?

I think the main problem is the value numbers changing, causing a number of redundant renaming in the check lines. I manually removed the branches, should be much smaller now.

david-arm · 2025-04-02T09:11:35Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-  replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader);
+  // NOTE: The Plan's scalar preheader isn't replaced with a VPIRBasicBlock
+  // wrapping LoopScalarPreHeader here at the moment, because the Plan's scalar
+  // preheader may be unreachable at this point.


Is it worth helping the reader by pointing to the place where we do replace it, i.e. createVectorizedLoopSkeleton?

Done, thanks!

david-arm · 2025-04-02T09:39:50Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-      BestPlan.getVectorLoopRegion()->getSingleSuccessor() !=
-          BestPlan.getMiddleBlock();
+      ExitBlocks.size() > 1 ||
+      (ExitBlocks.size() == 1 && ExitBlocks[0]->getNumPredecessors() > 1);


Yeah I agree with @ayalz, this does look like an independent change. Is it worth a separate NFC patch?

I removed the changes, they are not needed in the latest version. I can put them up separately, as this might be slightly simpler than having to retrieve middle block and vector loop region.

ayalz · 2025-04-02T09:03:02Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -2484,12 +2484,13 @@ void InnerLoopVectorizer::introduceCheckBlockInVPlan(BasicBlock *CheckIRBB) {
  PreVectorPH->swapSuccessors();

  // We just connected a new block to the scalar preheader. Update all
-  // ResumePhis by adding an incoming value for it.
+  // ResumePhis by adding an incoming value for it, replacing the last value.


Should this be

Suggested change

// ResumePhis by adding an incoming value for it, replacing the last value.

// ResumePhis by adding an incoming value for it, replicating the last value.

?

Done thanks

ayalz · 2025-04-02T09:05:04Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -2658,7 +2659,9 @@ void InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
  LoopScalarPreHeader =
      SplitBlock(LoopVectorPreHeader, LoopVectorPreHeader->getTerminator(), DT,
                 LI, nullptr, Twine(Prefix) + "scalar.ph");
-  replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader);
+  // NOTE: The Plan's scalar preheader isn't replaced with a VPIRBasicBlock


Suggested change

// NOTE: The Plan's scalar preheader isn't replaced with a VPIRBasicBlock

// NOTE: The Plan's scalar preheader VPBB isn't replaced with a VPIRBasicBlock

Done, thanks!

ayalz · 2025-04-02T09:13:46Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -2817,6 +2817,7 @@ BasicBlock *InnerLoopVectorizer::createVectorizedLoopSkeleton(
  // faster.
  emitMemRuntimeChecks(LoopScalarPreHeader);

+  replaceVPBBWithIRVPBB(Plan.getScalarPreheader(), LoopScalarPreHeader);


We could store the plan in all blocks w/o a parent region, there is already a field in all blocks to do so?

Agreed, it seems better to store the plan for orphan blocks in their existing field rather than null, at-least for unreachable blocks, although best maintain connectivity rather than have a block point to a plan which in turn cannot reach it?

ayalz · 2025-04-02T09:14:26Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-      BestPlan.getVectorLoopRegion()->getSingleSuccessor() !=
-          BestPlan.getMiddleBlock();
+      ExitBlocks.size() > 1 ||
+      (ExitBlocks.size() == 1 && ExitBlocks[0]->getNumPredecessors() > 1);


ayalz · 2025-04-02T09:16:18Z

llvm/lib/Transforms/Vectorize/VPlan.h

+      return cast<VPBasicBlock>(RegionSucc);
+    return cast<VPBasicBlock>(RegionSucc->getSuccessors()[1]);
+  }
+  const VPBasicBlock *getMiddleBlock() const {


Suggested change

const VPBasicBlock *getMiddleBlock() const {

const VPBasicBlock *getMiddleBlock() const {

Done thanks

ayalz · 2025-04-02T10:11:40Z

llvm/lib/Transforms/Vectorize/VPlan.h

+  /// latch. If the scalar tail loop or exit block are known to always execute,
+  /// the middle block may branch directly to the block.


Suggested change

/// latch. If the scalar tail loop or exit block are known to always execute,

/// the middle block may branch directly to the block.

/// latch. If the middle block is known to always proceed to one of these two blocks, it may branch to it unconditionally.

Also explain about the early-exit case, where middle.block is the 2nd successor of middle.split block, as depicted in https://llvm.org/docs/Vectorizers.html#early-exit-vectorization, which corresponds to the scalar preheader being absent from RegionSucc's successors? The middle block in this case conceptually has three successors: scalar preheader, latch.exit, early.exit with the first two postponed to be a successor's successors.

Note that if the middle block branches unconditionally to exit block (or scalar preheader block), the two blocks may subsequently be merged, causing RegionSucc to have no successors (or be the scalar preheader itself).

Updated the comment.

Note that if the middle block branches unconditionally to exit block (or scalar preheader block), the two blocks may subsequently be merged, causing RegionSucc to have no successors (or be the scalar preheader itself).

Yep, for now, we don't merge VPIRBBs into other blocks.

ayalz · 2025-04-03T07:57:53Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+        break;
+      VPBuilder B(VPI);
+      SmallVector<VPValue *> NewOperands;
+      // Create new operand list, with the dead incoming value filtered out.


Would erase()'ing the dying operand from VPI->operands be easier, perhaps with some removeOperand() API, than replacing the VPInstruction with a new one?

We could, the question is how to best limit this to some recipes, as I think it only makes sense for phi-like recipes (or maybe just ResumePhi). Could do as follow-up?

Sure, can be limited to ResumePhi, until needed elsewhere.

ayalz · 2025-04-03T08:00:48Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -1682,6 +1682,52 @@ void VPlanTransforms::truncateToMinimalBitwidths(
         "some entries in MinBWs haven't been processed");
 }

+/// Remove BranchOnCond recipes with true conditions together with removing
+/// dead edges to their successors.
+static void simplifyBranchOnCondTrue(VPlan &Plan) {


Suggested change

static void simplifyBranchOnCondTrue(VPlan &Plan) {

static void removeBranchOnCondTrue(VPlan &Plan) {

?

Updated, thanks!

ayalz

LGTM, thanks for accommodating, good to wait if @david-arm has further comments.

ayalz · 2025-04-03T12:48:16Z

llvm/lib/Transforms/Vectorize/VPlan.h

  VPBasicBlock *getMiddleBlock() {
-    return cast<VPBasicBlock>(getScalarPreheader()->getPredecessors().front());
+    VPRegionBlock *LoopRegion = getVectorLoopRegion();
+    if (!LoopRegion)


nit: do the callers of getMiddleBlock() expect it to return null or assert it does not, or should the callee assert so?

ayalz · 2025-04-03T12:50:16Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+        break;
+      VPBuilder B(VPI);
+      SmallVector<VPValue *> NewOperands;
+      // Create new operand list, with the dead incoming value filtered out.


Sure, can be limited to ResumePhi, until needed elsewhere.

ayalz · 2025-04-03T12:59:24Z

llvm/test/Transforms/LoopVectorize/X86/pr81872.ll

@@ -39,15 +39,15 @@ define void @test(ptr noundef align 8 dereferenceable_or_null(16) %arr) #0 {
 ; CHECK-NEXT:    [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 12
 ; CHECK-NEXT:    br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !prof [[PROF1:![0-9]+]], !llvm.loop [[LOOP2:![0-9]+]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    br i1 true, label [[BB6:%.*]], label [[SCALAR_PH]], !prof [[PROF5:![0-9]+]]
+; CHECK-NEXT:    br label [[BB6:%.*]]


Note: branch weights, which seem inaccurate, are lost.

david-arm

LGTM!

If ScalarPH has predecessors, we may need to update its reduction resume values. If there is a middle block, it must be the first predecessor. Note that the first predecessor may not be the middle block, if the middle block doesn't branch to the scalar preheader. In that case, fixReductionScalarResumeWhenVectorizingEpilog will be a no-op. In preparation for #106748.

…sumephis (NFC) If ScalarPH has predecessors, we may need to update its reduction resume values. If there is a middle block, it must be the first predecessor. Note that the first predecessor may not be the middle block, if the middle block doesn't branch to the scalar preheader. In that case, fixReductionScalarResumeWhenVectorizingEpilog will be a no-op. In preparation for llvm/llvm-project#106748.

Adds extra test coverage for #106748.

…moved. Adds extra test coverage for llvm/llvm-project#106748.

…ond true. (#106748) Add an initial CFG simplification transform, which removes the dead edges for blocks terminated with BranchOnCond true. At the moment, this removes the edge between middle block and scalar preheader when folding the tail. PR: llvm/llvm-project#106748

fhahn requested review from rengolin, ayalz and aniragil August 30, 2024 15:32

llvmbot added backend:SystemZ vectorizers llvm:transforms labels Aug 30, 2024

fhahn force-pushed the vplan-simplifycfg branch from c66649c to de6eba7 Compare September 5, 2024 20:24

fhahn commented Sep 11, 2024

View reviewed changes

david-arm reviewed Sep 13, 2024

View reviewed changes

david-arm reviewed Sep 16, 2024

View reviewed changes

fhahn force-pushed the vplan-simplifycfg branch from 76d0ac9 to 491b667 Compare October 6, 2024 13:45

llvmbot added the backend:PowerPC label Oct 6, 2024

fhahn commented Oct 6, 2024

View reviewed changes

This was referenced Dec 22, 2024

[VPlan] Manage created blocks directly in VPlan. (NFC) #120918

Merged

[VPlan] Remove loop region in optimizeForVFAndUF. #108378

Merged

fhahn force-pushed the vplan-simplifycfg branch from 491b667 to ee438c3 Compare February 15, 2025 21:15

fhahn commented Feb 15, 2025

View reviewed changes

fhahn force-pushed the vplan-simplifycfg branch from ee438c3 to 688dc38 Compare February 15, 2025 21:18

ayalz reviewed Feb 19, 2025

View reviewed changes

ayalz reviewed Feb 20, 2025

View reviewed changes

fhahn added 2 commits March 13, 2025 21:07

[VPlan] Add initial CFG simplification, removing BranchOnCond true.

e3db837

!fixup update after rebase

195677a

fhahn force-pushed the vplan-simplifycfg branch from 688dc38 to b658dd2 Compare March 13, 2025 21:45

!fixup update after recent changes.

19f1666

fhahn force-pushed the vplan-simplifycfg branch from b658dd2 to 19f1666 Compare March 13, 2025 21:55

fhahn commented Mar 13, 2025

View reviewed changes

ayalz reviewed Mar 23, 2025

View reviewed changes

fhahn added 4 commits March 23, 2025 16:09

Merge remote-tracking branch 'origin/main' into vplan-simplifycfg

2fd15a4

!fixup address latest comments, thanks!

641c647

Merge remote-tracking branch 'origin/main' into vplan-simplifycfg

f36fad0

!fixup update tests after merge.

364ee91

fhahn commented Apr 1, 2025

View reviewed changes

david-arm reviewed Apr 2, 2025

View reviewed changes

fhahn added 2 commits April 2, 2025 11:46

Merge remote-tracking branch 'origin/main' into vplan-simplifycfg

5ad4020

!fixup address latest comments, thanks

b6ced87

ayalz reviewed Apr 3, 2025

View reviewed changes

fhahn added 2 commits April 3, 2025 11:56

Merge remote-tracking branch 'origin/main' into vplan-simplifycfg

e86e71b

!fixup address latest comments, thanks!

e6ae199

ayalz approved these changes Apr 3, 2025

View reviewed changes

david-arm approved these changes Apr 3, 2025

View reviewed changes

fhahn added a commit that referenced this pull request Apr 3, 2025

[LV] Add test where epilogue is vectorized and backedge removed.

0f696c2

Adds extra test coverage for #106748.

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 3, 2025

Automerge: [LV] Add test where epilogue is vectorized and backedge re…

ad631cf

…moved. Adds extra test coverage for llvm/llvm-project#106748.

Merge remote-tracking branch 'origin/main' into vplan-simplifycfg

c03e5f1

fhahn merged commit 5fbd065 into llvm:main Apr 4, 2025
7 checks passed

fhahn deleted the vplan-simplifycfg branch April 4, 2025 14:44

		ResumePhi->addOperand(
		ResumePhi->getOperand(ScalarPH->getNumPredecessors() == 1 ? 0 : 1));

	/// Remove BranchOnCond recipes with constant conditions together with removing
	/// Remove BranchOnCond recipes with true conditions together with removing

		@@ -42,10 +41,10 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
		; CHECK-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 [[INDEX_NEXT]], i64 8)
		; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]

	(ExitBlocks.size() == 1 && ExitBlocks[0]->getNumPredecessors() > 1);
	ExitBlocks[0]->getNumPredecessors() > 1;

	// automatically on VPlan destruction.
	// automatically on VPlan destruction if it becomes unreachable.

	// Remove values coming from VPBB from phi-like recipes in RemovedSucc.
	// Values coming from VPBB into ResumePhi recipes of RemoveSucc are removed from these recipes.

	// ResumePhis by adding an incoming value for it, replacing the last value.
	// ResumePhis by adding an incoming value for it, replicating the last value.

	// NOTE: The Plan's scalar preheader isn't replaced with a VPIRBasicBlock
	// NOTE: The Plan's scalar preheader VPBB isn't replaced with a VPIRBasicBlock

	const VPBasicBlock *getMiddleBlock() const {

	const VPBasicBlock *getMiddleBlock() const {

		/// latch. If the scalar tail loop or exit block are known to always execute,
		/// the middle block may branch directly to the block.

	/// latch. If the scalar tail loop or exit block are known to always execute,
	/// the middle block may branch directly to the block.
	/// latch. If the middle block is known to always proceed to one of these two blocks, it may branch to it unconditionally.

	static void simplifyBranchOnCondTrue(VPlan &Plan) {
	static void removeBranchOnCondTrue(VPlan &Plan) {

[VPlan] Add initial CFG simplification, removing BranchOnCond true. #106748

[VPlan] Add initial CFG simplification, removing BranchOnCond true. #106748

Uh oh!

Conversation

fhahn commented Aug 30, 2024

Uh oh!

llvmbot commented Aug 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 30, 2024

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david-arm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david-arm commented Sep 25, 2024

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

llvmbot commented Aug 30, 2024 •

edited

Loading