Skip to content

[VPlan] Refactor VPlan creation, add transform introducing region (NFC). #128419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 9, 2025

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Feb 23, 2025

Create an empty VPlan first, then let the HCFG builder create a plain CFG for the top-level loop (w/o a top-level region). The top-level region is introduced by a separate VPlan-transform. This is instead of creating the vector loop region before building the VPlan CFG for the input loop.

This simplifies the HCFG builder (which should probably be renamed) and moves along the roadmap ('buildLoop') outlined in [1].

As follow-up, I plan to also preserve the exit branches in the initial VPlan out of the CFG builder, including connections to the exit blocks.

The conversion from plain CFG with potentially multiple exits to a single entry/exit region will be done as VPlan transform in a follow-up.

This is needed to enable VPlan-based predication. Currently early exit support relies on building the block-in masks on the original CFG, because exiting branches and conditions aren't preserved in the VPlan. So in order to switch to VPlan-based predication, we will have to preserve them in the initial plain CFG, so the exit conditions are available explicitly when we convert to single entry/exit regions.

Another follow-up is updating the outer loop handling to also introduce VPRegionBlocks for nested loops as transform. Currently the existing logic in the builder will take care of creating VPRegionBlocks for nested loops, but not the top-level loop.

[1] https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf

@llvmbot
Copy link
Member

llvmbot commented Feb 23, 2025

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Create an empty VPlan first, then let the HCFG builder create a plain CFG for the top-level loop (w/o a top-level region). The top-level region is introduced by a separate VPlan-transform. This is instead of creating the vector loop region before building the VPlan CFG for the input loop.

This simplifies the HCFG builder (which should probably be renamed) and moves along the roadmap ('buildLoop') outlined in [1].

As follow-up, I plan to also preserve the exit branches in the initial VPlan out of the CFG builder, including connections to the exit blocks.

The conversion from plain CFG with potentially multiple exits to a single entry/exit region will be done as VPlan transform in a follow-up.

This is needed to enable VPlan-based predication. Currently early exit support relies on building the block-in masks on the original CFG, because exiting branches and conditions aren't preserved in the VPlan. So in order to switch to VPlan-based predication, we will have to preserve them in the initial plain CFG, so the exit conditions are available explicitly when we convert to single entry/exit regions.

Another follow-up is updating the outer loop handling to also introduce VPRegionBlocks for nested loops as transform. Currently the existing logic in the builder will take care of creating VPRegionBlocks for nested loops, but not the top-level loop.

[1] https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf


Patch is 20.68 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128419.diff

7 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+9-7)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+7-84)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+2-15)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp (+20-33)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+76)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+15)
  • (modified) llvm/unittests/Transforms/Vectorize/VPlanTestBase.h (+4-2)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index ced01df7b0d44..4e3b3373b0e72 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -9307,14 +9307,15 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
             return !CM.requiresScalarEpilogue(VF.isVector());
           },
           Range);
-  VPlanPtr Plan = VPlan::createInitialVPlan(Legal->getWidestInductionType(),
-                                            PSE, RequiresScalarEpilogueCheck,
-                                            CM.foldTailByMasking(), OrigLoop);
-
+  auto Plan = std::make_unique<VPlan>(OrigLoop);
   // Build hierarchical CFG.
   VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
   HCFGBuilder.buildHierarchicalCFG();
 
+  VPlanTransforms::introduceTopLevelVectorLoopRegion(
+      *Plan, Legal->getWidestInductionType(), PSE, RequiresScalarEpilogueCheck,
+      CM.foldTailByMasking(), OrigLoop);
+
   // Don't use getDecisionAndClampRange here, because we don't know the UF
   // so this function is better to be conservative, rather than to split
   // it up into different VPlans.
@@ -9610,13 +9611,14 @@ VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
   assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");
 
   // Create new empty VPlan
-  auto Plan = VPlan::createInitialVPlan(Legal->getWidestInductionType(), PSE,
-                                        true, false, OrigLoop);
-
+  auto Plan = std::make_unique<VPlan>(OrigLoop);
   // Build hierarchical CFG
   VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
   HCFGBuilder.buildHierarchicalCFG();
 
+  VPlanTransforms::introduceTopLevelVectorLoopRegion(
+      *Plan, Legal->getWidestInductionType(), PSE, true, false, OrigLoop);
+
   for (ElementCount VF : Range)
     Plan->addVF(VF);
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index cd111365c134c..4c473821105fa 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -876,85 +876,6 @@ VPlan::~VPlan() {
     delete BackedgeTakenCount;
 }
 
-VPlanPtr VPlan::createInitialVPlan(Type *InductionTy,
-                                   PredicatedScalarEvolution &PSE,
-                                   bool RequiresScalarEpilogueCheck,
-                                   bool TailFolded, Loop *TheLoop) {
-  auto Plan = std::make_unique<VPlan>(TheLoop);
-  VPBlockBase *ScalarHeader = Plan->getScalarHeader();
-
-  // Connect entry only to vector preheader initially. Entry will also be
-  // connected to the scalar preheader later, during skeleton creation when
-  // runtime guards are added as needed. Note that when executing the VPlan for
-  // an epilogue vector loop, the original entry block here will be replaced by
-  // a new VPIRBasicBlock wrapping the entry to the epilogue vector loop after
-  // generating code for the main vector loop.
-  VPBasicBlock *VecPreheader = Plan->createVPBasicBlock("vector.ph");
-  VPBlockUtils::connectBlocks(Plan->getEntry(), VecPreheader);
-
-  // Create SCEV and VPValue for the trip count.
-  // We use the symbolic max backedge-taken-count, which works also when
-  // vectorizing loops with uncountable early exits.
-  const SCEV *BackedgeTakenCountSCEV = PSE.getSymbolicMaxBackedgeTakenCount();
-  assert(!isa<SCEVCouldNotCompute>(BackedgeTakenCountSCEV) &&
-         "Invalid loop count");
-  ScalarEvolution &SE = *PSE.getSE();
-  const SCEV *TripCount = SE.getTripCountFromExitCount(BackedgeTakenCountSCEV,
-                                                       InductionTy, TheLoop);
-  Plan->TripCount =
-      vputils::getOrCreateVPValueForSCEVExpr(*Plan, TripCount, SE);
-
-  // Create VPRegionBlock, with empty header and latch blocks, to be filled
-  // during processing later.
-  VPBasicBlock *HeaderVPBB = Plan->createVPBasicBlock("vector.body");
-  VPBasicBlock *LatchVPBB = Plan->createVPBasicBlock("vector.latch");
-  VPBlockUtils::insertBlockAfter(LatchVPBB, HeaderVPBB);
-  auto *TopRegion = Plan->createVPRegionBlock(
-      HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);
-
-  VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
-  VPBasicBlock *MiddleVPBB = Plan->createVPBasicBlock("middle.block");
-  VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);
-
-  VPBasicBlock *ScalarPH = Plan->createVPBasicBlock("scalar.ph");
-  VPBlockUtils::connectBlocks(ScalarPH, ScalarHeader);
-  if (!RequiresScalarEpilogueCheck) {
-    VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
-    return Plan;
-  }
-
-  // If needed, add a check in the middle block to see if we have completed
-  // all of the iterations in the first vector loop.  Three cases:
-  // 1) If (N - N%VF) == N, then we *don't* need to run the remainder.
-  //    Thus if tail is to be folded, we know we don't need to run the
-  //    remainder and we can set the condition to true.
-  // 2) If we require a scalar epilogue, there is no conditional branch as
-  //    we unconditionally branch to the scalar preheader.  Do nothing.
-  // 3) Otherwise, construct a runtime check.
-  BasicBlock *IRExitBlock = TheLoop->getUniqueLatchExitBlock();
-  auto *VPExitBlock = Plan->createVPIRBasicBlock(IRExitBlock);
-  // The connection order corresponds to the operands of the conditional branch.
-  VPBlockUtils::insertBlockAfter(VPExitBlock, MiddleVPBB);
-  VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
-
-  auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
-  // Here we use the same DebugLoc as the scalar loop latch terminator instead
-  // of the corresponding compare because they may have ended up with
-  // different line numbers and we want to avoid awkward line stepping while
-  // debugging. Eg. if the compare has got a line number inside the loop.
-  VPBuilder Builder(MiddleVPBB);
-  VPValue *Cmp =
-      TailFolded
-          ? Plan->getOrAddLiveIn(ConstantInt::getTrue(
-                IntegerType::getInt1Ty(TripCount->getType()->getContext())))
-          : Builder.createICmp(CmpInst::ICMP_EQ, Plan->getTripCount(),
-                               &Plan->getVectorTripCount(),
-                               ScalarLatchTerm->getDebugLoc(), "cmp.n");
-  Builder.createNaryOp(VPInstruction::BranchOnCond, {Cmp},
-                       ScalarLatchTerm->getDebugLoc());
-  return Plan;
-}
-
 void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
                              VPTransformState &State) {
   Type *TCTy = TripCountV->getType();
@@ -1123,11 +1044,13 @@ void VPlan::printLiveIns(raw_ostream &O) const {
   }
 
   O << "\n";
-  if (TripCount->isLiveIn())
-    O << "Live-in ";
-  TripCount->printAsOperand(O, SlotTracker);
-  O << " = original trip-count";
-  O << "\n";
+  if (TripCount) {
+    if (TripCount->isLiveIn())
+      O << "Live-in ";
+    TripCount->printAsOperand(O, SlotTracker);
+    O << " = original trip-count";
+    O << "\n";
+  }
 }
 
 LLVM_DUMP_METHOD
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index d86914f0fb026..924dbd7b1e9b2 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -3500,21 +3500,6 @@ class VPlan {
     VPBB->setPlan(this);
   }
 
-  /// Create initial VPlan, having an "entry" VPBasicBlock (wrapping
-  /// original scalar pre-header) which contains SCEV expansions that need
-  /// to happen before the CFG is modified (when executing a VPlan for the
-  /// epilogue vector loop, the original entry needs to be replaced by a new
-  /// one); a VPBasicBlock for the vector pre-header, followed by a region for
-  /// the vector loop, followed by the middle VPBasicBlock. If a check is needed
-  /// to guard executing the scalar epilogue loop, it will be added to the
-  /// middle block, together with VPBasicBlocks for the scalar preheader and
-  /// exit blocks. \p InductionTy is the type of the canonical induction and
-  /// used for related values, like the trip count expression.
-  static VPlanPtr createInitialVPlan(Type *InductionTy,
-                                     PredicatedScalarEvolution &PSE,
-                                     bool RequiresScalarEpilogueCheck,
-                                     bool TailFolded, Loop *TheLoop);
-
   /// Prepare the plan for execution, setting up the required live-in values.
   void prepareToExecute(Value *TripCount, Value *VectorTripCount,
                         VPTransformState &State);
@@ -3582,6 +3567,8 @@ class VPlan {
     TripCount = NewTripCount;
   }
 
+  void setTripCount(VPValue *NewTripCount) { TripCount = NewTripCount; }
+
   /// The backedge taken count of the original loop.
   VPValue *getOrCreateBackedgeTakenCount() {
     if (!BackedgeTakenCount)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
index dcf1057b991ee..6499565400724 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
@@ -180,7 +180,7 @@ VPBasicBlock *PlainCFGBuilder::getOrCreateVPBB(BasicBlock *BB) {
 
   // Get or create a region for the loop containing BB.
   Loop *LoopOfBB = LI->getLoopFor(BB);
-  if (!LoopOfBB || !doesContainLoop(LoopOfBB, TheLoop))
+  if (!LoopOfBB || LoopOfBB == TheLoop || !doesContainLoop(LoopOfBB, TheLoop))
     return VPBB;
 
   auto *RegionOfVPBB = Loop2Region.lookup(LoopOfBB);
@@ -353,29 +353,6 @@ void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
 // Main interface to build the plain CFG.
 void PlainCFGBuilder::buildPlainCFG(
     DenseMap<VPBlockBase *, BasicBlock *> &VPB2IRBB) {
-  // 0. Reuse the top-level region, vector-preheader and exit VPBBs from the
-  // skeleton. These were created directly rather than via getOrCreateVPBB(),
-  // revisit them now to update BB2VPBB. Note that header/entry and
-  // latch/exiting VPBB's of top-level region have yet to be created.
-  VPRegionBlock *TheRegion = Plan.getVectorLoopRegion();
-  BasicBlock *ThePreheaderBB = TheLoop->getLoopPreheader();
-  assert((ThePreheaderBB->getTerminator()->getNumSuccessors() == 1) &&
-         "Unexpected loop preheader");
-  auto *VectorPreheaderVPBB =
-      cast<VPBasicBlock>(TheRegion->getSinglePredecessor());
-  // ThePreheaderBB conceptually corresponds to both Plan.getPreheader() (which
-  // wraps the original preheader BB) and Plan.getEntry() (which represents the
-  // new vector preheader); here we're interested in setting BB2VPBB to the
-  // latter.
-  BB2VPBB[ThePreheaderBB] = VectorPreheaderVPBB;
-  Loop2Region[LI->getLoopFor(TheLoop->getHeader())] = TheRegion;
-
-  // The existing vector region's entry and exiting VPBBs correspond to the loop
-  // header and latch.
-  VPBasicBlock *VectorHeaderVPBB = TheRegion->getEntryBasicBlock();
-  VPBasicBlock *VectorLatchVPBB = TheRegion->getExitingBasicBlock();
-  BB2VPBB[TheLoop->getHeader()] = VectorHeaderVPBB;
-  VectorHeaderVPBB->clearSuccessors();
 
   // 1. Scan the body of the loop in a topological order to visit each basic
   // block after having visited its predecessor basic blocks. Create a VPBB for
@@ -386,6 +363,9 @@ void PlainCFGBuilder::buildPlainCFG(
 
   // Loop PH needs to be explicitly visited since it's not taken into account by
   // LoopBlocksDFS.
+  BasicBlock *ThePreheaderBB = TheLoop->getLoopPreheader();
+  assert((ThePreheaderBB->getTerminator()->getNumSuccessors() == 1) &&
+         "Unexpected loop preheader");
   for (auto &I : *ThePreheaderBB) {
     if (I.getType()->isVoidTy())
       continue;
@@ -406,18 +386,16 @@ void PlainCFGBuilder::buildPlainCFG(
     } else {
       // BB is a loop header, set the predecessor for the region, except for the
       // top region, whose predecessor was set when creating VPlan's skeleton.
-      assert(isHeaderVPBB(VPBB) && "isHeaderBB and isHeaderVPBB disagree");
-      if (TheRegion != Region)
+      if (LoopForBB != TheLoop)
         setRegionPredsFromBB(Region, BB);
     }
 
     // Create VPInstructions for BB.
     createVPInstructionsForVPBB(VPBB, BB);
 
-    if (TheLoop->getLoopLatch() == BB) {
-      VPBB->setOneSuccessor(VectorLatchVPBB);
-      VectorLatchVPBB->clearPredecessors();
-      VectorLatchVPBB->setPredecessors({VPBB});
+    if (BB == TheLoop->getLoopLatch()) {
+      VPBasicBlock *HeaderVPBB = getOrCreateVPBB(LoopForBB->getHeader());
+      VPBlockUtils::connectBlocks(VPBB, HeaderVPBB);
       continue;
     }
 
@@ -449,16 +427,22 @@ void PlainCFGBuilder::buildPlainCFG(
     VPBasicBlock *Successor0 = getOrCreateVPBB(IRSucc0);
     VPBasicBlock *Successor1 = getOrCreateVPBB(IRSucc1);
     if (BB == LoopForBB->getLoopLatch()) {
-      // For a latch we need to set the successor of the region rather than that
-      // of VPBB and it should be set to the exit, i.e., non-header successor,
+      // For a latch we need to set the successor of the region rather
+      // than that
+      // of VPBB and it should be set to the exit, i.e., non-header
+      // successor,
       // except for the top region, whose successor was set when creating
       // VPlan's skeleton.
-      assert(TheRegion != Region &&
+      assert(LoopForBB != TheLoop &&
              "Latch of the top region should have been handled earlier");
       Region->setOneSuccessor(isHeaderVPBB(Successor0) ? Successor1
                                                        : Successor0);
       Region->setExiting(VPBB);
       continue;
+
+      VPBasicBlock *HeaderVPBB = getOrCreateVPBB(LoopForBB->getHeader());
+      VPBlockUtils::connectBlocks(VPBB, HeaderVPBB);
+      continue;
     }
 
     // Don't connect any blocks outside the current loop except the latch for
@@ -482,6 +466,9 @@ void PlainCFGBuilder::buildPlainCFG(
   // corresponding VPlan operands.
   fixHeaderPhis();
 
+  VPBlockUtils::connectBlocks(Plan.getEntry(),
+                              getOrCreateVPBB(TheLoop->getHeader()));
+
   for (const auto &[IRBB, VPB] : BB2VPBB)
     VPB2IRBB[VPB] = IRBB;
 }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index ce81b2e147df8..9d06337ab7190 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -32,6 +32,82 @@
 
 using namespace llvm;
 
+void VPlanTransforms::introduceTopLevelVectorLoopRegion(
+    VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
+    bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop) {
+  auto *HeaderVPBB = cast<VPBasicBlock>(Plan.getEntry()->getSingleSuccessor());
+  VPBlockUtils::disconnectBlocks(Plan.getEntry(), HeaderVPBB);
+
+  VPBasicBlock *OriginalLatch =
+      cast<VPBasicBlock>(HeaderVPBB->getSinglePredecessor());
+  VPBlockUtils::disconnectBlocks(OriginalLatch, HeaderVPBB);
+  VPBasicBlock *VecPreheader = Plan.createVPBasicBlock("vector.ph");
+  VPBlockUtils::connectBlocks(Plan.getEntry(), VecPreheader);
+
+  // Create SCEV and VPValue for the trip count.
+  // We use the symbolic max backedge-taken-count, which works also when
+  // vectorizing loops with uncountable early exits.
+  const SCEV *BackedgeTakenCountSCEV = PSE.getSymbolicMaxBackedgeTakenCount();
+  assert(!isa<SCEVCouldNotCompute>(BackedgeTakenCountSCEV) &&
+         "Invalid loop count");
+  ScalarEvolution &SE = *PSE.getSE();
+  const SCEV *TripCount = SE.getTripCountFromExitCount(BackedgeTakenCountSCEV,
+                                                       InductionTy, TheLoop);
+  Plan.setTripCount(
+      vputils::getOrCreateVPValueForSCEVExpr(Plan, TripCount, SE));
+
+  // Create VPRegionBlock, with empty header and latch blocks, to be filled
+  // during processing later.
+  VPBasicBlock *LatchVPBB = Plan.createVPBasicBlock("vector.latch");
+  VPBlockUtils::insertBlockAfter(LatchVPBB, OriginalLatch);
+  auto *TopRegion = Plan.createVPRegionBlock(
+      HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);
+  for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB)) {
+    VPBB->setParent(TopRegion);
+  }
+
+  VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
+  VPBasicBlock *MiddleVPBB = Plan.createVPBasicBlock("middle.block");
+  VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);
+
+  VPBasicBlock *ScalarPH = Plan.createVPBasicBlock("scalar.ph");
+  VPBlockUtils::connectBlocks(ScalarPH, Plan.getScalarHeader());
+  if (!RequiresScalarEpilogueCheck) {
+    VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
+    return;
+  }
+
+  // If needed, add a check in the middle block to see if we have completed
+  // all of the iterations in the first vector loop.  Three cases:
+  // 1) If (N - N%VF) == N, then we *don't* need to run the remainder.
+  //    Thus if tail is to be folded, we know we don't need to run the
+  //    remainder and we can set the condition to true.
+  // 2) If we require a scalar epilogue, there is no conditional branch as
+  //    we unconditionally branch to the scalar preheader.  Do nothing.
+  // 3) Otherwise, construct a runtime check.
+  BasicBlock *IRExitBlock = TheLoop->getUniqueLatchExitBlock();
+  auto *VPExitBlock = Plan.createVPIRBasicBlock(IRExitBlock);
+  // The connection order corresponds to the operands of the conditional branch.
+  VPBlockUtils::insertBlockAfter(VPExitBlock, MiddleVPBB);
+  VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
+
+  auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
+  // Here we use the same DebugLoc as the scalar loop latch terminator instead
+  // of the corresponding compare because they may have ended up with
+  // different line numbers and we want to avoid awkward line stepping while
+  // debugging. Eg. if the compare has got a line number inside the loop.
+  VPBuilder Builder(MiddleVPBB);
+  VPValue *Cmp =
+      TailFolded
+          ? Plan.getOrAddLiveIn(ConstantInt::getTrue(
+                IntegerType::getInt1Ty(TripCount->getType()->getContext())))
+          : Builder.createICmp(CmpInst::ICMP_EQ, Plan.getTripCount(),
+                               &Plan.getVectorTripCount(),
+                               ScalarLatchTerm->getDebugLoc(), "cmp.n");
+  Builder.createNaryOp(VPInstruction::BranchOnCond, {Cmp},
+                       ScalarLatchTerm->getDebugLoc());
+}
+
 void VPlanTransforms::VPInstructionsToVPRecipes(
     VPlanPtr &Plan,
     function_ref<const InductionDescriptor *(PHINode *)>
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 3dd476a8526d6..d1e825e987848 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -52,6 +52,21 @@ struct VPlanTransforms {
       verifyVPlanIsValid(Plan);
   }
 
+  /// Introduce the top-level VPRegionBlock for the main loop in \p Plan. Coming
+  /// in this function, \p Plan's top-level loop is modeled using a plain CFG.
+  /// This transforms replaces the plain CFG with a VPRegionBlock wrapping the
+  /// top-level loop and creates a VPValue expressions for the original trip
+  /// count. It will also introduce a dedicated VPBasicBlock for the vector
+  /// pre-header as well a VPBasicBlock as exit block of the region
+  /// (middle.block). If a check is needed to guard executing the scalar
+  /// epilogue loop, it will be added to the middle block, together with
+  /// VPBasicBlocks for the scalar preheader and exit blocks. \p InductionTy is
+  /// the type of the canonical induction and used for related values, like the
+  /// trip count expression.
+  static void introduceTopLevelVectorLoopRegion(
+      VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
+      bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop);
+
   /// Replaces the VPInstructions in \p Plan with corresponding
   /// widen recipes.
   static void
diff --git a/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h b/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h
index 8d03e91fb26c3..caf5d2357411d 100644
--- a/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h
+++ b/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h
@@ -14,6 +14,7 @@
 
 #include "../lib/Transforms/Vectorize/VPlan.h"
 #include "../lib/Transforms/Vectorize/VPlanHCFGBuilder.h"
+#include "../lib/Transforms/Vectorize/VPlanTransforms.h"
 #include "llvm...
[truncated]

fhahn added a commit to fhahn/llvm-project that referenced this pull request Feb 23, 2025
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform.

The main logic to perform predication is ready to review, although
there are few things to note that should be improved, either directly in
the PR or in the future:
 * Edge and block masks are cached in VPRecipeBuilder, so they can be
   accessed during recipe construction. A better alternative may be to
   add mask operands to all VPInstructions that need them and use that
   during recipe construction
 * The mask caching in a map also means that this map needs updating
   each time a new recipe replaces a VPInstruction; this would also be
   handled by adding mask operands.

Currently this is still WIP due to early-exit loop handling not working
due to the exit conditions not being available in the initial VPlans.
This will be fixed with llvm#128419
and follow-ups

All tests except early-exit loops are passing
Create an empty VPlan first, then let the HCFG builder create a plain
CFG for the top-level loop (w/o a top-level region). The top-level
region is introduced by a separate VPlan-transform. This is instead
of creating the vector loop region before building the VPlan CFG
for the input loop.

This simplifies the HCFG builder (which should probably be renamed) and
moves along the roadmap ('buildLoop') outlined in [1].

As follow-up, I plan to also preserve the exit branches in the initial
VPlan out of the CFG builder, including connections to the exit blocks.

The conversion from plain CFG with potentially multiple exits to a
single entry/exit region will be done as VPlan transform in a follow-up.

This is needed to enable VPlan-based predication. Currently early exit
support relies on building the block-in masks on the original CFG,
because exiting branches and conditions aren't preserved in the VPlan.
So in order to switch to VPlan-based predication, we will have to
preserve them in the initial plain CFG, so the exit conditions are
available explicitly when we convert to single entry/exit regions.

Another follow-up is updating the outer loop handling to also introduce
VPRegionBlocks for nested loops as transform. Currently the existing
logic in the builder will take care of creating VPRegionBlocks for
nested loops, but not the top-level loop.

[1] https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf
Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment on lines 9311 to 9318
// Build hierarchical CFG.
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
HCFGBuilder.buildHierarchicalCFG();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like buildHierarchicalCFG should now (as in TODO) also be a VPlanTransform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that sounds good. I think it would make sense to consolidate the transforms for initial VPlan construction into a separate VPlanConstruction.cpp

@@ -52,6 +52,21 @@ struct VPlanTransforms {
verifyVPlanIsValid(Plan);
}

/// Introduce the top-level VPRegionBlock for the main loop in \p Plan. Coming
/// in this function, \p Plan's top-level loop is modeled using a plain CFG.
/// This transforms replaces the plain CFG with a VPRegionBlock wrapping the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// This transforms replaces the plain CFG with a VPRegionBlock wrapping the
/// This transform wraps the plain CFG of the top-level loop within a VPRegionBlock

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

Plan.setTripCount(
vputils::getOrCreateVPValueForSCEVExpr(Plan, TripCount, SE));

// Create VPRegionBlock, with empty header and latch blocks, to be filled
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Create VPRegionBlock, with empty header and latch blocks, to be filled
// Create VPRegionBlock, with existing header and new empty latch block, to be filled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

VPBlockUtils::insertBlockAfter(LatchVPBB, OriginalLatch);
auto *TopRegion = Plan.createVPRegionBlock(
HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);
for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use RPOT and break when reaching LatchVPBB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left as is for now, as this is just to set the parent, so RPOT should not be needed


VPBasicBlock *OriginalLatch =
cast<VPBasicBlock>(HeaderVPBB->getSinglePredecessor());
VPBlockUtils::disconnectBlocks(OriginalLatch, HeaderVPBB);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert OriginalLatch is now free of successors to retain shallow dfs scan below rather than RPOTing until latch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doje, thanks

Comment on lines 441 to 442
continue;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this continue;?
How does this work with the following code being unreachable - is it really needed? tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code was left over from earlier iterations, removed, thanks

Comment on lines 443 to 444
VPBasicBlock *HeaderVPBB = getOrCreateVPBB(LoopForBB->getHeader());
VPBlockUtils::connectBlocks(VPBB, HeaderVPBB);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be folded with handling TheLoop's latch above, with an early continue if LoopForBB == TheLoop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code here not needed, removed, thanks

// header and latch.
VPBasicBlock *VectorHeaderVPBB = TheRegion->getEntryBasicBlock();
VPBasicBlock *VectorLatchVPBB = TheRegion->getExitingBasicBlock();
BB2VPBB[TheLoop->getHeader()] = VectorHeaderVPBB;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header of TheLoop need not be mapped in BB2VPBB (yet)? Its corresponding VPBB cannot be retrieved via TheRegion (yet).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, there's no region yet for TheLoop, only the initial entry block to the VPlan.

@@ -52,6 +52,21 @@ struct VPlanTransforms {
verifyVPlanIsValid(Plan);
}

/// Introduce the top-level VPRegionBlock for the main loop in \p Plan. Coming
/// in this function, \p Plan's top-level loop is modeled using a plain CFG.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// in this function, \p Plan's top-level loop is modeled using a plain CFG.
/// into this function, \p Plan's top-level loop is modeled using a plain CFG.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

/// Introduce the top-level VPRegionBlock for the main loop in \p Plan. Coming
/// in this function, \p Plan's top-level loop is modeled using a plain CFG.
/// This transforms replaces the plain CFG with a VPRegionBlock wrapping the
/// top-level loop and creates a VPValue expressions for the original trip
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// top-level loop and creates a VPValue expressions for the original trip
/// and creates a VPValue expressions for the original trip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the new transform to VPlanConstruction.cpp, to serve as new location for all things initial VPlan construction related (soon also building the initial VPlan).

fhahn added a commit to fhahn/llvm-project that referenced this pull request Mar 1, 2025
Further simplify VPlan CFG builder by moving introduction of inner
regions to a VPlan transform, building on
llvm#128419

The HCFG builder now only constructs plain CFGs. I will move it to
VPlanConstruction as follow-up.
@@ -22,6 +22,7 @@ add_llvm_component_library(LLVMVectorize
VectorCombine.cpp
VPlan.cpp
VPlanAnalysis.cpp
VPlanConstruction.cpp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should files designed to hold VPlan tranforms be called VPlanTransforms*.cpp to help classify them in the absence of a subdirectory, as in VPlanTransformUnroll, VPlanTransformPredicate, etc.,?

What transform(s) are destines to reside in this file - all the preliminary Scalar Transforms according to roadmap (VPlanTransformsScalar.cpp?), or only some of them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should files designed to hold VPlan tranforms be called VPlanTransforms*.cpp to help classify them in the absence of a subdirectory, as in VPlanTransformUnroll, VPlanTransformPredicate, etc.,?

Sounds good! We could also introduce a subfolder?

What transform(s) are destines to reside in this file - all the preliminary Scalar Transforms according to roadmap (VPlanTransformsScalar.cpp?), or only some of them?

In the short term, in addition also the code to build the initial VPlan (i.e. what currently resides in VPHCFGBuilder) and possibly recipe widening?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's go with VPlanConstruction.cpp for now, which expresses what it's about rather than how (may or may not involve VPlanTransforms).

// resetTripCount().
void setTripCount(VPValue *NewTripCount) {
assert(!TripCount && NewTripCount && "TripCount should not be set yet.");

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

Comment on lines 3609 to 3610
// Set the trip count assuming it is currently null; if it is not - use
// resetTripCount().
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Set the trip count assuming it is currently null; if it is not - use
// resetTripCount().
/// Set the trip count assuming it is currently null; if it is not - use
/// resetTripCount().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks


// Set the trip count assuming it is currently null; if it is not - use
// resetTripCount().
void setTripCount(VPValue *NewTripCount) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly more reasonable to place setTripCount() right after getTripCount() and before resetTripCount()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

Comment on lines 9327 to 9329
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
// TODO: Convert to VPlan-transform and consoliate all transforms for VPlan
// creation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: better keep the comments together?

Suggested change
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
// TODO: Convert to VPlan-transform and consoliate all transforms for VPlan
// creation.
// TODO: Convert to VPlan-transform and consoliate all transforms for VPlan
// creation.
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

vputils::getOrCreateVPValueForSCEVExpr(Plan, TripCount, SE));

// Create VPRegionBlock, with existing header and new empty latch block, to be
// filled
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// filled
// filled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

Comment on lines +55 to +56
for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB))
VPBB->setParent(TopRegion);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to justify setting parenthood by explaining something like

Suggested change
for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB))
VPBB->setParent(TopRegion);
// All VPBB's reachable shallowly from HeaderVPBB belong to top level loop, because VPlan is expected to end at top level latch.
for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB))
VPBB->setParent(TopRegion);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!

VPBlockUtils::disconnectBlocks(OriginalLatch, HeaderVPBB);
VPBasicBlock *VecPreheader = Plan.createVPBasicBlock("vector.ph");
VPBlockUtils::connectBlocks(Plan.getEntry(), VecPreheader);
assert(OriginalLatch->getNumSuccessors() == 0 && "expected no predecessors");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(OriginalLatch->getNumSuccessors() == 0 && "expected no predecessors");
assert(OriginalLatch->getNumSuccessors() == 0 && "VPlan should end at top level latch");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

Comment on lines -50 to -53
; CHECK-NEXT: middle.block:
; CHECK-NEXT: EMIT vp<[[C:%.+]]> = icmp eq ir<8>, vp<[[VTC]]>
; CHECK-NEXT: EMIT branch-on-cond vp<[[C]]>
; CHECK-NEXT: Successor(s): ir-bb<exit>, scalar.ph
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

middle.block get added later now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the test checks the VPlan created by the builder, which now only contains the inner region and the CFG for the outer loop.

Other blocks are added later, together with introducing top-level region.

Comment on lines -58 to -62
; CHECK-NEXT: ir-bb<outer.header>:
; CHECK-NEXT: IR %outer.iv = phi i64 [ 0, %entry ], [ %outer.iv.next, %outer.latch ]
; CHECK-NEXT: IR %gep.1 = getelementptr inbounds [8 x i64], ptr @arr2, i64 0, i64 %outer.iv
; CHECK-NEXT: IR store i64 %outer.iv, ptr %gep.1, align 4
; CHECK-NEXT: IR %add = add nsw i64 %outer.iv, %n
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ir-bb<outer.header> gets added later now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@fhahn fhahn merged commit fd26708 into llvm:main Mar 9, 2025
6 of 10 checks passed
@fhahn fhahn deleted the vplan-introduce-region-as-transform branch March 9, 2025 15:05
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 9, 2025
… region (NFC). (#128419)

Create an empty VPlan first, then let the HCFG builder create a plain
CFG for the top-level loop (w/o a top-level region). The top-level
region is introduced by a separate VPlan-transform. This is instead of
creating the vector loop region before building the VPlan CFG for the
input loop.

This simplifies the HCFG builder (which should probably be renamed) and
moves along the roadmap ('buildLoop') outlined in [1].

As follow-up, I plan to also preserve the exit branches in the initial
VPlan out of the CFG builder, including connections to the exit blocks.

The conversion from plain CFG with potentially multiple exits to a
single entry/exit region will be done as VPlan transform in a follow-up.

This is needed to enable VPlan-based predication. Currently early exit
support relies on building the block-in masks on the original CFG,
because exiting branches and conditions aren't preserved in the VPlan.
So in order to switch to VPlan-based predication, we will have to
preserve them in the initial plain CFG, so the exit conditions are
available explicitly when we convert to single entry/exit regions.

Another follow-up is updating the outer loop handling to also introduce
VPRegionBlocks for nested loops as transform. Currently the existing
logic in the builder will take care of creating VPRegionBlocks for
nested loops, but not the top-level loop.

[1]
https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf

PR: llvm/llvm-project#128419
fhahn added a commit to fhahn/llvm-project that referenced this pull request Mar 9, 2025
Further simplify VPlan CFG builder by moving introduction of inner
regions to a VPlan transform, building on
llvm#128419

The HCFG builder now only constructs plain CFGs. I will move it to
VPlanConstruction as follow-up.
@llvm-ci
Copy link
Collaborator

llvm-ci commented Mar 9, 2025

LLVM Buildbot has detected a new failure on builder lld-x86_64-win running on as-worker-93 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/2451

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM-Unit :: Support/./SupportTests.exe/38/87' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe-LLVM-Unit-13384-38-87.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=87 GTEST_SHARD_INDEX=38 C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe
--

Script:
--
C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe --gtest_filter=ProgramEnvTest.CreateProcessLongPath
--
C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(160): error: Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(163): error: fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied



C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:160
Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:163
fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied




********************


fhahn added a commit to fhahn/llvm-project that referenced this pull request Mar 20, 2025
Further simplify VPlan CFG builder by moving introduction of inner
regions to a VPlan transform, building on
llvm#128419

The HCFG builder now only constructs plain CFGs. I will move it to
VPlanConstruction as follow-up.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Mar 20, 2025
Further simplify VPlan CFG builder by moving introduction of inner
regions to a VPlan transform, building on
llvm#128419

The HCFG builder now only constructs plain CFGs. I will move it to
VPlanConstruction as follow-up.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Mar 20, 2025
Further simplify VPlan CFG builder by moving introduction of inner
regions to a VPlan transform, building on
llvm#128419

The HCFG builder now only constructs plain CFGs. I will move it to
VPlanConstruction as follow-up.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Mar 20, 2025
Further simplify VPlan CFG builder by moving introduction of inner
regions to a VPlan transform, building on
llvm#128419

The HCFG builder now only constructs plain CFGs. I will move it to
VPlanConstruction as follow-up.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Mar 23, 2025
Further simplify VPlan CFG builder by moving introduction of inner
regions to a VPlan transform, building on
llvm#128419

The HCFG builder now only constructs plain CFGs. I will move it to
VPlanConstruction as follow-up.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Mar 23, 2025
Further simplify VPlan CFG builder by moving introduction of inner
regions to a VPlan transform, building on
llvm#128419

The HCFG builder now only constructs plain CFGs. I will move it to
VPlanConstruction as follow-up.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Mar 30, 2025
Further simplify VPlan CFG builder by moving introduction of inner
regions to a VPlan transform, building on
llvm#128419

The HCFG builder now only constructs plain CFGs. I will move it to
VPlanConstruction as follow-up.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Mar 30, 2025
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform.

The main logic to perform predication is ready to review, although
there are few things to note that should be improved, either directly in
the PR or in the future:
 * Edge and block masks are cached in VPRecipeBuilder, so they can be
   accessed during recipe construction. A better alternative may be to
   add mask operands to all VPInstructions that need them and use that
   during recipe construction
 * The mask caching in a map also means that this map needs updating
   each time a new recipe replaces a VPInstruction; this would also be
   handled by adding mask operands.

Currently this is still WIP due to early-exit loop handling not working
due to the exit conditions not being available in the initial VPlans.
This will be fixed with llvm#128419
and follow-ups

All tests except early-exit loops are passing
fhahn added a commit to fhahn/llvm-project that referenced this pull request Mar 30, 2025
Further simplify VPlan CFG builder by moving introduction of inner
regions to a VPlan transform, building on
llvm#128419

The HCFG builder now only constructs plain CFGs. I will move it to
VPlanConstruction as follow-up.
fhahn added a commit to fhahn/llvm-project that referenced this pull request Apr 5, 2025
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform.

The main logic to perform predication is ready to review, although
there are few things to note that should be improved, either directly in
the PR or in the future:
 * Edge and block masks are cached in VPRecipeBuilder, so they can be
   accessed during recipe construction. A better alternative may be to
   add mask operands to all VPInstructions that need them and use that
   during recipe construction
 * The mask caching in a map also means that this map needs updating
   each time a new recipe replaces a VPInstruction; this would also be
   handled by adding mask operands.

Currently this is still WIP due to early-exit loop handling not working
due to the exit conditions not being available in the initial VPlans.
This will be fixed with llvm#128419
and follow-ups

All tests except early-exit loops are passing
fhahn added a commit that referenced this pull request Apr 16, 2025
Further simplify VPlan CFG builder by moving introduction of inner
regions to a VPlan transform, building on
#128419.

The HCFG builder now only constructs plain CFGs. I will move it to
VPlanConstruction as follow-up.

Depends on #128419.

PR: #129402
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 16, 2025
…C) (#129402)

Further simplify VPlan CFG builder by moving introduction of inner
regions to a VPlan transform, building on
llvm/llvm-project#128419.

The HCFG builder now only constructs plain CFGs. I will move it to
VPlanConstruction as follow-up.

Depends on llvm/llvm-project#128419.

PR: llvm/llvm-project#129402
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
…9402)

Further simplify VPlan CFG builder by moving introduction of inner
regions to a VPlan transform, building on
llvm#128419.

The HCFG builder now only constructs plain CFGs. I will move it to
VPlanConstruction as follow-up.

Depends on llvm#128419.

PR: llvm#129402
fhahn added a commit to fhahn/llvm-project that referenced this pull request Apr 28, 2025
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform.

The main logic to perform predication is ready to review, although
there are few things to note that should be improved, either directly in
the PR or in the future:
 * Edge and block masks are cached in VPRecipeBuilder, so they can be
   accessed during recipe construction. A better alternative may be to
   add mask operands to all VPInstructions that need them and use that
   during recipe construction
 * The mask caching in a map also means that this map needs updating
   each time a new recipe replaces a VPInstruction; this would also be
   handled by adding mask operands.

Currently this is still WIP due to early-exit loop handling not working
due to the exit conditions not being available in the initial VPlans.
This will be fixed with llvm#128419
and follow-ups

All tests except early-exit loops are passing
fhahn added a commit to fhahn/llvm-project that referenced this pull request Apr 28, 2025
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform.

The main logic to perform predication is ready to review, although
there are few things to note that should be improved, either directly in
the PR or in the future:
 * Edge and block masks are cached in VPRecipeBuilder, so they can be
   accessed during recipe construction. A better alternative may be to
   add mask operands to all VPInstructions that need them and use that
   during recipe construction
 * The mask caching in a map also means that this map needs updating
   each time a new recipe replaces a VPInstruction; this would also be
   handled by adding mask operands.

Currently this is still WIP due to early-exit loop handling not working
due to the exit conditions not being available in the initial VPlans.
This will be fixed with llvm#128419
and follow-ups

All tests except early-exit loops are passing
fhahn added a commit to fhahn/llvm-project that referenced this pull request Apr 28, 2025
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform.

The main logic to perform predication is ready to review, although
there are few things to note that should be improved, either directly in
the PR or in the future:
 * Edge and block masks are cached in VPRecipeBuilder, so they can be
   accessed during recipe construction. A better alternative may be to
   add mask operands to all VPInstructions that need them and use that
   during recipe construction
 * The mask caching in a map also means that this map needs updating
   each time a new recipe replaces a VPInstruction; this would also be
   handled by adding mask operands.

Currently this is still WIP due to early-exit loop handling not working
due to the exit conditions not being available in the initial VPlans.
This will be fixed with llvm#128419
and follow-ups

All tests except early-exit loops are passing
fhahn added a commit to fhahn/llvm-project that referenced this pull request May 3, 2025
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform.

The main logic to perform predication is ready to review, although
there are few things to note that should be improved, either directly in
the PR or in the future:
 * Edge and block masks are cached in VPRecipeBuilder, so they can be
   accessed during recipe construction. A better alternative may be to
   add mask operands to all VPInstructions that need them and use that
   during recipe construction
 * The mask caching in a map also means that this map needs updating
   each time a new recipe replaces a VPInstruction; this would also be
   handled by adding mask operands.

Currently this is still WIP due to early-exit loop handling not working
due to the exit conditions not being available in the initial VPlans.
This will be fixed with llvm#128419
and follow-ups

All tests except early-exit loops are passing
fhahn added a commit to fhahn/llvm-project that referenced this pull request May 10, 2025
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform.

The main logic to perform predication is ready to review, although
there are few things to note that should be improved, either directly in
the PR or in the future:
 * Edge and block masks are cached in VPRecipeBuilder, so they can be
   accessed during recipe construction. A better alternative may be to
   add mask operands to all VPInstructions that need them and use that
   during recipe construction
 * The mask caching in a map also means that this map needs updating
   each time a new recipe replaces a VPInstruction; this would also be
   handled by adding mask operands.

Currently this is still WIP due to early-exit loop handling not working
due to the exit conditions not being available in the initial VPlans.
This will be fixed with llvm#128419
and follow-ups

All tests except early-exit loops are passing
fhahn added a commit to fhahn/llvm-project that referenced this pull request May 10, 2025
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform.

The main logic to perform predication is ready to review, although
there are few things to note that should be improved, either directly in
the PR or in the future:
 * Edge and block masks are cached in VPRecipeBuilder, so they can be
   accessed during recipe construction. A better alternative may be to
   add mask operands to all VPInstructions that need them and use that
   during recipe construction
 * The mask caching in a map also means that this map needs updating
   each time a new recipe replaces a VPInstruction; this would also be
   handled by adding mask operands.

Currently this is still WIP due to early-exit loop handling not working
due to the exit conditions not being available in the initial VPlans.
This will be fixed with llvm#128419
and follow-ups

All tests except early-exit loops are passing
fhahn added a commit to fhahn/llvm-project that referenced this pull request May 11, 2025
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform.

The main logic to perform predication is ready to review, although
there are few things to note that should be improved, either directly in
the PR or in the future:
 * Edge and block masks are cached in VPRecipeBuilder, so they can be
   accessed during recipe construction. A better alternative may be to
   add mask operands to all VPInstructions that need them and use that
   during recipe construction
 * The mask caching in a map also means that this map needs updating
   each time a new recipe replaces a VPInstruction; this would also be
   handled by adding mask operands.

Currently this is still WIP due to early-exit loop handling not working
due to the exit conditions not being available in the initial VPlans.
This will be fixed with llvm#128419
and follow-ups

All tests except early-exit loops are passing
fhahn added a commit to fhahn/llvm-project that referenced this pull request May 12, 2025
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform.

The main logic to perform predication is ready to review, although
there are few things to note that should be improved, either directly in
the PR or in the future:
 * Edge and block masks are cached in VPRecipeBuilder, so they can be
   accessed during recipe construction. A better alternative may be to
   add mask operands to all VPInstructions that need them and use that
   during recipe construction
 * The mask caching in a map also means that this map needs updating
   each time a new recipe replaces a VPInstruction; this would also be
   handled by adding mask operands.

Currently this is still WIP due to early-exit loop handling not working
due to the exit conditions not being available in the initial VPlans.
This will be fixed with llvm#128419
and follow-ups

All tests except early-exit loops are passing
fhahn added a commit to fhahn/llvm-project that referenced this pull request May 14, 2025
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform.

The main logic to perform predication is ready to review, although
there are few things to note that should be improved, either directly in
the PR or in the future:
 * Edge and block masks are cached in VPRecipeBuilder, so they can be
   accessed during recipe construction. A better alternative may be to
   add mask operands to all VPInstructions that need them and use that
   during recipe construction
 * The mask caching in a map also means that this map needs updating
   each time a new recipe replaces a VPInstruction; this would also be
   handled by adding mask operands.

Currently this is still WIP due to early-exit loop handling not working
due to the exit conditions not being available in the initial VPlans.
This will be fixed with llvm#128419
and follow-ups

All tests except early-exit loops are passing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants