-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[VPlan] Introduce all loop regions as VPlan transform. (NFC) #129402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesFurther simplify VPlan CFG builder by moving introduction of inner The HCFG builder now only constructs plain CFGs. I will move it to Depends on #128419, which is currently included in the PR as well. Patch is 33.93 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/129402.diff 10 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/CMakeLists.txt b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
index 38670ba304e53..f4e1d7c952675 100644
--- a/llvm/lib/Transforms/Vectorize/CMakeLists.txt
+++ b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
@@ -22,6 +22,7 @@ add_llvm_component_library(LLVMVectorize
VectorCombine.cpp
VPlan.cpp
VPlanAnalysis.cpp
+ VPlanConstruction.cpp
VPlanHCFGBuilder.cpp
VPlanRecipes.cpp
VPlanSLP.cpp
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index c447fa4843591..d37a28d0b9a5f 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -9322,13 +9322,16 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
return !CM.requiresScalarEpilogue(VF.isVector());
},
Range);
- VPlanPtr Plan = VPlan::createInitialVPlan(Legal->getWidestInductionType(),
- PSE, RequiresScalarEpilogueCheck,
- CM.foldTailByMasking(), OrigLoop);
-
+ auto Plan = std::make_unique<VPlan>(OrigLoop);
// Build hierarchical CFG.
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
- HCFGBuilder.buildHierarchicalCFG();
+ // TODO: Convert to VPlan-transform and consoliate all transforms for VPlan
+ // creation.
+ HCFGBuilder.buildPlainCFG();
+
+ VPlanTransforms::introduceTopLevelVectorLoopRegion(
+ *Plan, Legal->getWidestInductionType(), PSE, RequiresScalarEpilogueCheck,
+ CM.foldTailByMasking(), OrigLoop);
// Don't use getDecisionAndClampRange here, because we don't know the UF
// so this function is better to be conservative, rather than to split
@@ -9625,12 +9628,13 @@ VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");
// Create new empty VPlan
- auto Plan = VPlan::createInitialVPlan(Legal->getWidestInductionType(), PSE,
- true, false, OrigLoop);
-
+ auto Plan = std::make_unique<VPlan>(OrigLoop);
// Build hierarchical CFG
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
- HCFGBuilder.buildHierarchicalCFG();
+ HCFGBuilder.buildPlainCFG();
+
+ VPlanTransforms::introduceTopLevelVectorLoopRegion(
+ *Plan, Legal->getWidestInductionType(), PSE, true, false, OrigLoop);
for (ElementCount VF : Range)
Plan->addVF(VF);
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 563784e4af924..944a11b96325d 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -880,85 +880,6 @@ VPlan::~VPlan() {
delete BackedgeTakenCount;
}
-VPlanPtr VPlan::createInitialVPlan(Type *InductionTy,
- PredicatedScalarEvolution &PSE,
- bool RequiresScalarEpilogueCheck,
- bool TailFolded, Loop *TheLoop) {
- auto Plan = std::make_unique<VPlan>(TheLoop);
- VPBlockBase *ScalarHeader = Plan->getScalarHeader();
-
- // Connect entry only to vector preheader initially. Entry will also be
- // connected to the scalar preheader later, during skeleton creation when
- // runtime guards are added as needed. Note that when executing the VPlan for
- // an epilogue vector loop, the original entry block here will be replaced by
- // a new VPIRBasicBlock wrapping the entry to the epilogue vector loop after
- // generating code for the main vector loop.
- VPBasicBlock *VecPreheader = Plan->createVPBasicBlock("vector.ph");
- VPBlockUtils::connectBlocks(Plan->getEntry(), VecPreheader);
-
- // Create SCEV and VPValue for the trip count.
- // We use the symbolic max backedge-taken-count, which works also when
- // vectorizing loops with uncountable early exits.
- const SCEV *BackedgeTakenCountSCEV = PSE.getSymbolicMaxBackedgeTakenCount();
- assert(!isa<SCEVCouldNotCompute>(BackedgeTakenCountSCEV) &&
- "Invalid loop count");
- ScalarEvolution &SE = *PSE.getSE();
- const SCEV *TripCount = SE.getTripCountFromExitCount(BackedgeTakenCountSCEV,
- InductionTy, TheLoop);
- Plan->TripCount =
- vputils::getOrCreateVPValueForSCEVExpr(*Plan, TripCount, SE);
-
- // Create VPRegionBlock, with empty header and latch blocks, to be filled
- // during processing later.
- VPBasicBlock *HeaderVPBB = Plan->createVPBasicBlock("vector.body");
- VPBasicBlock *LatchVPBB = Plan->createVPBasicBlock("vector.latch");
- VPBlockUtils::insertBlockAfter(LatchVPBB, HeaderVPBB);
- auto *TopRegion = Plan->createVPRegionBlock(
- HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);
-
- VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
- VPBasicBlock *MiddleVPBB = Plan->createVPBasicBlock("middle.block");
- VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);
-
- VPBasicBlock *ScalarPH = Plan->createVPBasicBlock("scalar.ph");
- VPBlockUtils::connectBlocks(ScalarPH, ScalarHeader);
- if (!RequiresScalarEpilogueCheck) {
- VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
- return Plan;
- }
-
- // If needed, add a check in the middle block to see if we have completed
- // all of the iterations in the first vector loop. Three cases:
- // 1) If (N - N%VF) == N, then we *don't* need to run the remainder.
- // Thus if tail is to be folded, we know we don't need to run the
- // remainder and we can set the condition to true.
- // 2) If we require a scalar epilogue, there is no conditional branch as
- // we unconditionally branch to the scalar preheader. Do nothing.
- // 3) Otherwise, construct a runtime check.
- BasicBlock *IRExitBlock = TheLoop->getUniqueLatchExitBlock();
- VPIRBasicBlock *VPExitBlock = Plan->getExitBlock(IRExitBlock);
- // The connection order corresponds to the operands of the conditional branch.
- VPBlockUtils::insertBlockAfter(VPExitBlock, MiddleVPBB);
- VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
-
- auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
- // Here we use the same DebugLoc as the scalar loop latch terminator instead
- // of the corresponding compare because they may have ended up with
- // different line numbers and we want to avoid awkward line stepping while
- // debugging. Eg. if the compare has got a line number inside the loop.
- VPBuilder Builder(MiddleVPBB);
- VPValue *Cmp =
- TailFolded
- ? Plan->getOrAddLiveIn(ConstantInt::getTrue(
- IntegerType::getInt1Ty(TripCount->getType()->getContext())))
- : Builder.createICmp(CmpInst::ICMP_EQ, Plan->getTripCount(),
- &Plan->getVectorTripCount(),
- ScalarLatchTerm->getDebugLoc(), "cmp.n");
- Builder.createNaryOp(VPInstruction::BranchOnCond, {Cmp},
- ScalarLatchTerm->getDebugLoc());
- return Plan;
-}
-
void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
VPTransformState &State) {
Type *TCTy = TripCountV->getType();
@@ -1135,11 +1056,13 @@ void VPlan::printLiveIns(raw_ostream &O) const {
}
O << "\n";
- if (TripCount->isLiveIn())
- O << "Live-in ";
- TripCount->printAsOperand(O, SlotTracker);
- O << " = original trip-count";
- O << "\n";
+ if (TripCount) {
+ if (TripCount->isLiveIn())
+ O << "Live-in ";
+ TripCount->printAsOperand(O, SlotTracker);
+ O << " = original trip-count";
+ O << "\n";
+ }
}
LLVM_DUMP_METHOD
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 70e684826ed2d..8a6597c9bbcba 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -3537,21 +3537,6 @@ class VPlan {
VPBB->setPlan(this);
}
- /// Create initial VPlan, having an "entry" VPBasicBlock (wrapping
- /// original scalar pre-header) which contains SCEV expansions that need
- /// to happen before the CFG is modified (when executing a VPlan for the
- /// epilogue vector loop, the original entry needs to be replaced by a new
- /// one); a VPBasicBlock for the vector pre-header, followed by a region for
- /// the vector loop, followed by the middle VPBasicBlock. If a check is needed
- /// to guard executing the scalar epilogue loop, it will be added to the
- /// middle block, together with VPBasicBlocks for the scalar preheader and
- /// exit blocks. \p InductionTy is the type of the canonical induction and
- /// used for related values, like the trip count expression.
- static VPlanPtr createInitialVPlan(Type *InductionTy,
- PredicatedScalarEvolution &PSE,
- bool RequiresScalarEpilogueCheck,
- bool TailFolded, Loop *TheLoop);
-
/// Prepare the plan for execution, setting up the required live-in values.
void prepareToExecute(Value *TripCount, Value *VectorTripCount,
VPTransformState &State);
@@ -3617,7 +3602,15 @@ class VPlan {
/// the original trip count have been replaced.
void resetTripCount(VPValue *NewTripCount) {
assert(TripCount && NewTripCount && TripCount->getNumUsers() == 0 &&
- "TripCount always must be set");
+ "TripCount must be set when resetting");
+ TripCount = NewTripCount;
+ }
+
+ // Set the trip count assuming it is currently null; if it is not - use
+ // resetTripCount().
+ void setTripCount(VPValue *NewTripCount) {
+ assert(!TripCount && NewTripCount && "TripCount should not be set yet.");
+
TripCount = NewTripCount;
}
diff --git a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
new file mode 100644
index 0000000000000..6c9855f73d7ad
--- /dev/null
+++ b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
@@ -0,0 +1,131 @@
+//===-- VPlanConstruction.cpp - Transforms for initial VPlan construction -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file implements transforms for initial VPlan construction
+///
+//===----------------------------------------------------------------------===//
+
+#include "LoopVectorizationPlanner.h"
+#include "VPlan.h"
+#include "VPlanCFG.h"
+#include "VPlanDominatorTree.h"
+#include "VPlanTransforms.h"
+#include "llvm/Analysis/LoopInfo.h"
+#include "llvm/Analysis/ScalarEvolution.h"
+
+using namespace llvm;
+
+/// Introduce VPRegionBlocks for each loop modeled using a plain CFG in \p Plan.
+static void introduceInnerLoopRegions(VPlan &Plan) {
+ VPDominatorTree VPDT;
+ VPDT.recalculate(Plan);
+
+ for (VPBlockBase *HeaderVPBB :
+ vp_depth_first_shallow(Plan.getVectorLoopRegion()->getEntry())) {
+ if (HeaderVPBB->getNumPredecessors() != 2)
+ continue;
+ VPBlockBase *PreheaderVPBB = HeaderVPBB->getPredecessors()[0];
+ VPBlockBase *LatchVPBB = HeaderVPBB->getPredecessors()[1];
+ if (!VPDT.dominates(HeaderVPBB, LatchVPBB))
+ continue;
+ assert(VPDT.dominates(PreheaderVPBB, HeaderVPBB) &&
+ "preheader must dominate header");
+ VPBlockUtils::disconnectBlocks(PreheaderVPBB, HeaderVPBB);
+ VPBlockUtils::disconnectBlocks(LatchVPBB, HeaderVPBB);
+ VPBlockBase *Succ = LatchVPBB->getSingleSuccessor();
+ VPBlockUtils::disconnectBlocks(LatchVPBB, Succ);
+
+ auto *R = Plan.createVPRegionBlock(HeaderVPBB, LatchVPBB, "",
+ false /*isReplicator*/);
+ for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB))
+ VPBB->setParent(R);
+
+ VPBlockUtils::insertBlockAfter(R, PreheaderVPBB);
+ VPBlockUtils::connectBlocks(R, Succ);
+ }
+}
+
+void VPlanTransforms::introduceTopLevelVectorLoopRegion(
+ VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
+ bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop) {
+ // TODO: Generalize to introduce all loop regions.
+ auto *HeaderVPBB = cast<VPBasicBlock>(Plan.getEntry()->getSingleSuccessor());
+ VPBlockUtils::disconnectBlocks(Plan.getEntry(), HeaderVPBB);
+
+ VPBasicBlock *OriginalLatch =
+ cast<VPBasicBlock>(HeaderVPBB->getSinglePredecessor());
+ VPBlockUtils::disconnectBlocks(OriginalLatch, HeaderVPBB);
+ VPBasicBlock *VecPreheader = Plan.createVPBasicBlock("vector.ph");
+ VPBlockUtils::connectBlocks(Plan.getEntry(), VecPreheader);
+ assert(OriginalLatch->getNumSuccessors() == 0 && "expected no predecessors");
+
+ // Create SCEV and VPValue for the trip count.
+ // We use the symbolic max backedge-taken-count, which works also when
+ // vectorizing loops with uncountable early exits.
+ const SCEV *BackedgeTakenCountSCEV = PSE.getSymbolicMaxBackedgeTakenCount();
+ assert(!isa<SCEVCouldNotCompute>(BackedgeTakenCountSCEV) &&
+ "Invalid loop count");
+ ScalarEvolution &SE = *PSE.getSE();
+ const SCEV *TripCount = SE.getTripCountFromExitCount(BackedgeTakenCountSCEV,
+ InductionTy, TheLoop);
+ Plan.setTripCount(
+ vputils::getOrCreateVPValueForSCEVExpr(Plan, TripCount, SE));
+
+ // Create VPRegionBlock, with existing header and new empty latch block, to be
+ // filled
+ VPBasicBlock *LatchVPBB = Plan.createVPBasicBlock("vector.latch");
+ VPBlockUtils::insertBlockAfter(LatchVPBB, OriginalLatch);
+ auto *TopRegion = Plan.createVPRegionBlock(
+ HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);
+ for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB))
+ VPBB->setParent(TopRegion);
+
+ VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
+ VPBasicBlock *MiddleVPBB = Plan.createVPBasicBlock("middle.block");
+ VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);
+
+ VPBasicBlock *ScalarPH = Plan.createVPBasicBlock("scalar.ph");
+ VPBlockUtils::connectBlocks(ScalarPH, Plan.getScalarHeader());
+ if (!RequiresScalarEpilogueCheck) {
+ VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
+ return;
+ }
+
+ // If needed, add a check in the middle block to see if we have completed
+ // all of the iterations in the first vector loop. Three cases:
+ // 1) If (N - N%VF) == N, then we *don't* need to run the remainder.
+ // Thus if tail is to be folded, we know we don't need to run the
+ // remainder and we can set the condition to true.
+ // 2) If we require a scalar epilogue, there is no conditional branch as
+ // we unconditionally branch to the scalar preheader. Do nothing.
+ // 3) Otherwise, construct a runtime check.
+ BasicBlock *IRExitBlock = TheLoop->getUniqueLatchExitBlock();
+ auto *VPExitBlock = Plan.getExitBlock(IRExitBlock);
+ // The connection order corresponds to the operands of the conditional branch.
+ VPBlockUtils::insertBlockAfter(VPExitBlock, MiddleVPBB);
+ VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
+
+ auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
+ // Here we use the same DebugLoc as the scalar loop latch terminator instead
+ // of the corresponding compare because they may have ended up with
+ // different line numbers and we want to avoid awkward line stepping while
+ // debugging. Eg. if the compare has got a line number inside the loop.
+ VPBuilder Builder(MiddleVPBB);
+ VPValue *Cmp =
+ TailFolded
+ ? Plan.getOrAddLiveIn(ConstantInt::getTrue(
+ IntegerType::getInt1Ty(TripCount->getType()->getContext())))
+ : Builder.createICmp(CmpInst::ICMP_EQ, Plan.getTripCount(),
+ &Plan.getVectorTripCount(),
+ ScalarLatchTerm->getDebugLoc(), "cmp.n");
+ Builder.createNaryOp(VPInstruction::BranchOnCond, {Cmp},
+ ScalarLatchTerm->getDebugLoc());
+
+ introduceInnerLoopRegions(Plan);
+}
diff --git a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
index 19af0225c128f..5e0bb04e6ef33 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
@@ -12,9 +12,7 @@
/// components and steps:
//
/// 1. PlainCFGBuilder class: builds a plain VPBasicBlock-based CFG that
-/// faithfully represents the CFG in the incoming IR. A VPRegionBlock (Top
-/// Region) is created to enclose and serve as parent of all the VPBasicBlocks
-/// in the plain CFG.
+/// faithfully represents the CFG in the incoming IR.
/// NOTE: At this point, there is a direct correspondence between all the
/// VPBasicBlocks created for the initial plain CFG and the incoming
/// BasicBlocks. However, this might change in the future.
@@ -23,6 +21,7 @@
#include "VPlanHCFGBuilder.h"
#include "LoopVectorizationPlanner.h"
+#include "VPlanCFG.h"
#include "llvm/Analysis/LoopIterator.h"
#define DEBUG_TYPE "loop-vectorize"
@@ -56,12 +55,8 @@ class PlainCFGBuilder {
// Hold phi node's that need to be fixed once the plain CFG has been built.
SmallVector<PHINode *, 8> PhisToFix;
- /// Maps loops in the original IR to their corresponding region.
- DenseMap<Loop *, VPRegionBlock *> Loop2Region;
-
// Utility functions.
void setVPBBPredsFromBB(VPBasicBlock *VPBB, BasicBlock *BB);
- void setRegionPredsFromBB(VPRegionBlock *VPBB, BasicBlock *BB);
void fixHeaderPhis();
VPBasicBlock *getOrCreateVPBB(BasicBlock *BB);
#ifndef NDEBUG
@@ -82,25 +77,6 @@ class PlainCFGBuilder {
// Set predecessors of \p VPBB in the same order as they are in \p BB. \p VPBB
// must have no predecessors.
void PlainCFGBuilder::setVPBBPredsFromBB(VPBasicBlock *VPBB, BasicBlock *BB) {
- auto GetLatchOfExit = [this](BasicBlock *BB) -> BasicBlock * {
- auto *SinglePred = BB->getSinglePredecessor();
- Loop *LoopForBB = LI->getLoopFor(BB);
- if (!SinglePred || LI->getLoopFor(SinglePred) == LoopForBB)
- return nullptr;
- // The input IR must be in loop-simplify form, ensuring a single predecessor
- // for exit blocks.
- assert(SinglePred == LI->getLoopFor(SinglePred)->getLoopLatch() &&
- "SinglePred must be the only loop latch");
- return SinglePred;
- };
- if (auto *LatchBB = GetLatchOfExit(BB)) {
- auto *PredRegion = getOrCreateVPBB(LatchBB)->getParent();
- assert(VPBB == cast<VPBasicBlock>(PredRegion->getSingleSuccessor()) &&
- "successor must already be set for PredRegion; it must have VPBB "
- "as single successor");
- VPBB->setPredecessors({PredRegion});
- return;
- }
// Collect VPBB predecessors.
SmallVector<VPBlockBase *, 2> VPBBPreds;
for (BasicBlock *Pred : predecessors(BB))
@@ -112,13 +88,6 @@ static bool isHeaderBB(BasicBlock *BB, Loop *L) {
return L && BB == L->getHeader();
}
-void PlainCFGBuilder::setRegionPredsFromBB(VPRegionBlock *Region,
- BasicBlock *BB) {
- // BB is a loop header block. Connect the region to the loop preheader.
- Loop *LoopOfBB = LI->getLoopFor(BB);
- Region->setPredecessors({getOrCreateVPBB(LoopOfBB->getLoopPredecessor())});
-}
-
// Add operands to VPInstructions representing phi nodes from the input IR.
void PlainCFGBuilder::fixHeaderPhis() {
for (auto *Phi : PhisToFix) {
@@ -149,19 +118,6 @@ static bool isHeaderVPBB(VPBasicBlock *VPBB) {
return VPBB->getParent() && VPBB->getParent()->getEntry() == VPBB;
}
-/// Return true of \p L loop is contained within \p OuterLoop.
-static bool doesContainLoop(const Loop *L, const Loop *OuterLoop) {
- if (L->getLoopDepth() < OuterLoop->getLoopDepth())
- return false;
- const Loop *P = L;
- while (P) {
- if (P == OuterLoop)
- return true;
- P = P->getParentLoop();
- }
- return false;
-}
-
// Create a new empty VPBasicBlock for an incoming BasicBlock in the region
// corresponding to the containing loop or retrieve an existing one if it was
// already created. If no region exists yet for the loop containing \p BB, a new
@@ -177,31 +133,6 @@ VPBasicBlock *PlainCFGBuilder::getOrCreateVPBB(BasicBlock *BB) {
LLVM_DEBUG(dbgs() << "Creating VPBasicBlock for " << Name << "\n");
VPBasicBloc...
[truncated]
|
if (HeaderVPBB->getNumPredecessors() != 2) | ||
continue; | ||
VPBlockBase *PreheaderVPBB = HeaderVPBB->getPredecessors()[0]; | ||
VPBlockBase *LatchVPBB = HeaderVPBB->getPredecessors()[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would maybe be the chance to fix a long-standing bug in the VPlan-native path. LLVM will currently always canonicalize loops allowed in the VPlan-native path so that the latches second successor is the header in passes running before the vectorizer, but this is not guaranteed, and when relaxing restrictions on inner loops, then it does happen in the default LLVM pipeline that the loop exit condition is wrong for inner regions. The solution could look something like this:
if (LatchVPBB->getSuccessors()[1] != HeaderVPBB) {
auto *Term = LatchVPBB->getTerminator();
auto *Not = new VPInstruction(VPInstruction::Not, {Term->getOperand(0)});
Not->insertBefore(Term);
Term->setOperand(0, Not);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that would be good, but probably best done separately. I added a test case to show the issue: 437d587
I'd be happy to share a follow-up fix, unless you'd be planning to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and put up a fix in #132292, so this doesn't get missed
f863c84
to
f6de6d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased now that #128419 landed. Should be ready for review now.
if (HeaderVPBB->getNumPredecessors() != 2) | ||
continue; | ||
VPBlockBase *PreheaderVPBB = HeaderVPBB->getPredecessors()[0]; | ||
VPBlockBase *LatchVPBB = HeaderVPBB->getPredecessors()[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that would be good, but probably best done separately. I added a test case to show the issue: 437d587
I'd be happy to share a follow-up fix, unless you'd be planning to do so?
8442ffa
to
cbd6762
Compare
ping :) |
cbd6762
to
13b9b0a
Compare
13b9b0a
to
4d6cabd
Compare
Further simplify VPlan CFG builder by moving introduction of inner regions to a VPlan transform, building on llvm#128419 The HCFG builder now only constructs plain CFGs. I will move it to VPlanConstruction as follow-up.
4d6cabd
to
d5ba9a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping :)
@@ -54,15 +53,12 @@ class VPlanHCFGBuilder { | |||
/// created for a input IR basic block. | |||
DenseMap<VPBlockBase *, BasicBlock *> VPB2IRBB; | |||
|
|||
/// Build plain CFG for TheLoop and connects it to Plan's entry. | |||
void buildPlainCFG(); | |||
|
|||
public: | |||
VPlanHCFGBuilder(Loop *Lp, LoopInfo *LI, VPlan &P) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPlanHCFGBuilder(Loop *Lp, LoopInfo *LI, VPlan &P) | |
VPlanCFGBuilder(Loop *Lp, LoopInfo *LI, VPlan &P) |
but better be consistent with VPlanHCFGBuilder.h file name, i.e., do so when moving to VPlanConstruction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I left it as-is to avoid unneccessary changes. Will move the code to VPlanConstruction.cpp as follow-up, removing VPlanHCFGBuilder.{h,cpp}.
void VPlanHCFGBuilder::buildPlainCFG() { | ||
PlainCFGBuilder PCFGBuilder(TheLoop, LI, Plan); | ||
PCFGBuilder.buildPlainCFG(VPB2IRBB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems even more redundant now to have both VPlanHCFGBuilder and PCFGBuilder, both with their buildPlainCFG().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I left it as-is to avoid unneccessary changes. Will move the code to VPlanConstruction.cpp as follow-up, removing VPlanHCFGBuilder.{h,cpp}.
"preheader must dominate header"); | ||
VPBlockUtils::disconnectBlocks(PreheaderVPBB, HeaderVPBB); | ||
VPBlockUtils::disconnectBlocks(LatchVPBB, HeaderVPBB); | ||
VPBlockBase *Succ = LatchVPBB->getSingleSuccessor(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPBlockBase *Succ = LatchVPBB->getSingleSuccessor(); | |
assert(LatchVPBB->getNumSuccessors() <= 1 && "Latch has more than one successor"); | |
VPBlockBase *Succ = LatchVPBB->getSingleSuccessor(); |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
auto *R = Plan.createVPRegionBlock(HeaderVPBB, LatchVPBB, "", | ||
false /*isReplicator*/); | ||
// All VPBB's reachable shallowly from HeaderVPBB belong to top level loop, | ||
// because VPlan is expected to end at top level latch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// because VPlan is expected to end at top level latch. | |
// because VPlan is expected to end at top level latch disconnected above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks!
auto *OrigExiting = TopRegion->getExiting(); | ||
VPBasicBlock *LatchVPBB = Plan.createVPBasicBlock("vector.latch"); | ||
VPBlockUtils::insertBlockAfter(LatchVPBB, OrigExiting); | ||
TopRegion->setExiting(LatchVPBB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this additional latch block really needed, or can it be removed (independently)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unsed only for convenience, when adjusting reductions it is used to place selects there if needed. I'll look into remove it separartely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps worth a comment, along with explaining that this section assigns distinct names to the (just created) Top Region and its latch block, which is introduced as a convenience.
Can also reset the name of Top Region's header to "vector.body" here, instead of setting it during plain CFG construction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
if (HeaderVPBB->getNumPredecessors() != 2) | ||
return nullptr; | ||
VPBlockBase *PreheaderVPBB = HeaderVPBB->getPredecessors()[0]; | ||
VPBlockBase *LatchVPBB = HeaderVPBB->getPredecessors()[1]; | ||
if (!VPDT.dominates(HeaderVPBB, LatchVPBB)) | ||
return nullptr; | ||
assert(VPDT.dominates(PreheaderVPBB, HeaderVPBB) && | ||
"preheader must dominate header"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if isHeaderVPBB() separately, based on dominance for flat cfg rather than enclosing region for HCFG, and have introduceRegion() return non-null region if so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the code to check for header blocks to VPBasicBlock::isHeader
and check it before calling introduceRegion,
thanks
// All VPBB's reachable shallowly from HeaderVPBB belong to top level loop, | ||
// because VPlan is expected to end at top level latch. | ||
for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB)) | ||
VPBB->setParent(R); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about setting the parent of the newly introduced region itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
} | ||
return false; | ||
} | ||
|
||
// Create a new empty VPBasicBlock for an incoming BasicBlock in the region | ||
// corresponding to the containing loop or retrieve an existing one if it was |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The region/loop part deserves updating/removal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks.
@@ -376,23 +311,21 @@ void PlainCFGBuilder::buildPlainCFG( | |||
for (BasicBlock *BB : RPO) { | |||
// Create or retrieve the VPBasicBlock for this BB. | |||
VPBasicBlock *VPBB = getOrCreateVPBB(BB); | |||
VPRegionBlock *Region = VPBB->getParent(); | |||
Loop *LoopForBB = LI->getLoopFor(BB); | |||
// Set VPBB predecessors in the same order as they are in the incoming BB. | |||
if (!isHeaderBB(BB, LoopForBB)) { | |||
setVPBBPredsFromBB(VPBB, BB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can setVPBBPredsFromBB() also handle header blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, the code relies on the header having a specific predecessor order (preheader, then latch). We might be able to remove this restriction, because the dominance based approach could be generalized to handle it. It would require extra logic to determine which block is the preheader/latch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed an update to not rely on the predecessor order (see cc81801 for this change in isolation)
setRegionPredsFromBB(Region, BB); | ||
} else { | ||
VPBB->setPredecessors({getOrCreateVPBB(LoopForBB->getLoopPredecessor()), | ||
getOrCreateVPBB(LoopForBB->getLoopLatch())}); | ||
} | ||
|
||
// Create VPInstructions for BB. | ||
createVPInstructionsForVPBB(VPBB, BB); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can VPBB's successors be set according to BB's successors, in all cases, similar to predecessors. Or according to case order for switches, and true/false for conditional branches, but independent of loop latch or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, we could , with extra logic when introducing regions to figure out what successor is the exit block and which one the header.
Should be doable, but might be better separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assumptions about successors and successor order should be consistent with terminating recipes that target them, including the removal of early-exit branches/successors. Perhaps similar to [header]phisToFix/fixHeaderPhis() that should arguably post-process predecessors to align with phi recipes as they are fixed, after first mirroring the IR predecessors as-is, a post-processing of VPIRInstructions that first wrap IR terminals should take care of both converting them into BranchOnCond/Switch VP[nonIR]Instructions along with possible swapping of successors, plus removal of early-exit branches? Can be done separately if preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the patch to make the phi operand order consistent with predecessors.
At the moment,connections to exit blocks are dropped, but they should be retained as follow up (initial patch here 317d975 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this may clarify some confusion raised elsewhere.
ping :) |
@@ -648,6 +648,22 @@ bool VPBasicBlock::isExiting() const { | |||
return getParent() && getParent()->getExitingBasicBlock() == this; | |||
} | |||
|
|||
std::optional<std::pair<VPBasicBlock *, VPBasicBlock *>> | |||
VPBasicBlock::isHeader(const VPDominatorTree &VPDT) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better retain as a static isHeader(VPBB, VPDT) rather than a method of VPBB which depends on an external analysis (or a VPDT::isHeader(VPBB) method)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved back to VPlanConstruction.cpp, thanks.
std::optional<std::pair<VPBasicBlock *, VPBasicBlock *>> | ||
VPBasicBlock::isHeader(const VPDominatorTree &VPDT) const { | ||
ArrayRef<VPBlockBase *> Preds = getPredecessors(); | ||
if (Preds.size() != 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior corresponds to CFG-based VPlan, in contrast to the HCFG-based one (in VPlanHCFGBuilder.c below) where isHeader() is implemented by comparing the VPBB with it's parent's entry block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, moved to VPLanConstruction.cpp for now, thanks
/// Checks if the block is a loop header block in the plain CFG; that is, it | ||
/// has exactly 2 predecessors (preheader and latch), where the block | ||
/// dominates the latch and the preheader dominates the block. If it is a | ||
/// header block, returns a pair with the corresponding preheader and latch | ||
/// blocks. Otherwise return std::nullopt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds more like getPreheaderAndLatch()? Return value of "isHeader()" is intuitively boolean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the name, thanks
|
||
VPlanTransforms::introduceTopLevelVectorLoopRegion( | ||
*Plan, Legal->getWidestInductionType(), PSE, true, false, OrigLoop); | ||
VPlanTransforms::introduceRegions(*Plan, Legal->getWidestInductionType(), PSE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPlanTransforms::introduceRegions(*Plan, Legal->getWidestInductionType(), PSE, | |
VPlanTransforms::createLoopRegions(*Plan, Legal->getWidestInductionType(), PSE, |
to complement and be consistent with createAndOptimizeReplicateRegions()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed, thanks
for (unsigned Idx : {0, 1}) { | ||
auto *PreheaderVPBB = cast<VPBasicBlock>(Preds[Idx]); | ||
auto *LatchVPBB = cast<VPBasicBlock>(Preds[1 - Idx]); | ||
if (VPDT.dominates(PreheaderVPBB, this) && VPDT.dominates(this, LatchVPBB)) | ||
return {std::make_pair(PreheaderVPBB, LatchVPBB)}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simpler to unroll - check first and second predecessors, and then swap them and check again? Or canonicalize predecessor order earlier to be consistent with header phis as they are fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swapped the order, thanks
static VPRegionBlock *introduceRegion(VPlan &Plan, VPBasicBlock *PreheaderVPBB, | ||
VPBasicBlock *HeaderVPBB, | ||
VPBasicBlock *LatchVPBB) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suffice to receive PreheaderVPBB, as HeaderVPBB must be its single successor and LatchVPBB must be the latter's other predecessor. Or HeaderVPBB, if its predecessors order is already canonicalized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to pass the potential header VPB in to the function here and do the check here as well, so there's no need to pass 3 different blocks.
Loop *LoopOfBB = LI->getLoopFor(BB); | ||
Region->setPredecessors({getOrCreateVPBB(LoopOfBB->getLoopPredecessor())}); | ||
} | ||
|
||
// Add operands to VPInstructions representing phi nodes from the input IR. | ||
void PlainCFGBuilder::fixHeaderPhis() { | ||
for (auto *Phi : PhisToFix) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Header phi's expect preheader as first predecessor and latch as second, corresponding to operands added below, so make sure here the predecessors are ordered accordingly, i.e., swap if needed? Later when loop regions are introduced, can rely on this order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the patch to use the original order here and swap if needed when region is introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But here it's easy to set (both) things right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previosuly it wasnt consistent, but with the current patch it is; both use the order from the CFG in the current version and are a 1-1 translation of the CFG.
In the current patch, the canonicalization of header phi operand and header predecessor order is done completely separate from CFG construction, when the region is introduced.
In general, I think we should move away from directly relying on the operands to retrieve incoming values and move towards an interface getIncomingValueFor(VPB)
, which takes care of checking what the index for VPB
is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responding to some comments inline regarding phi/predecessor ordering to start with, to clarify. Would be good to decide first where we want the canonicalization of phi/predecessors to happen.
Doing it separately from initial VPlan construction has the benefit of reducing the number of special cases needed during initial construction and shifts the complexity of detecting the preheader/latch to a VPlan transform, currently integrated with introducing regions, but could also be a separate canonicalization.
Ah, ok, I see. Sketching how separate canonicalization may look like, for clarify and to help reason about what if anything should be fused:
Let VPlanCFG0 denote the initial VPlan, constructed from IR. VPlanCFG0 may indeed be a consistent 1-1 translation of the CFG and IR, arguably with all its branches - including early-exits and latch-to-exit (and perhaps unconditional branches too), possibly also representing the BasicBlock operands of phi's and branches as VPValues wrapping these Values, as operands of the corresponding VPIRInstructions.
Let VPlanCFG1 denote the result of canonicalizing VPlanCFG0's phi's and branches according to predecessors/successors order (instead of their BasicBlock operands). It may be better to change the types of recipes used, even when the original predecessor/successor order is retained, to clarify that they belong to distinct VPlan "modes" or "stages". Early-exit branch recipes and successors can be eliminated now.
Let VPlanHCFG0 denote the (first) HCFG VPlan produced by a VPlan transform devoted to canonical-CFG --> HCFG conversion that introduces child (or actually all loop) regions into VPlanCFG1. Would this transform need to update/change any recipes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let VPlanCFG0 denote the initial VPlan, constructed from IR. VPlanCFG0 may indeed be a consistent 1-1 translation of the CFG and IR, arguably with all its branches - including early-exits and latch-to-exit (and perhaps unconditional branches too), possibly also representing the BasicBlock operands of phi's and branches as VPValues wrapping these Values, as operands of the corresponding VPIRInstructions.
Yep, I have some follow-up patches to retain the exit branches and keep the exit blocks connected.
In a sense, the specific operand order (1st op = preheader, 2nd op = latch) for header phis is purely a consequence of introducing a region, which itself only has a single predecessor I think. So introducing the canonical order at this point seems like a good fit; the header phis are the only ones which need their order adjusted.
Using a separate recipe for the initial recipes sounds good, something like VPInstruction which always has an underlying instruction would be good, but already models the operands/defs in VPlan via operands. Unfortunately VPIRInstruction is already taken. We would need a version that defines result VPValues and uses VPvalue as operands.
setRegionPredsFromBB(Region, BB); | ||
} else { | ||
VPBB->setPredecessors({getOrCreateVPBB(LoopForBB->getLoopPredecessor()), | ||
getOrCreateVPBB(LoopForBB->getLoopLatch())}); | ||
} | ||
|
||
// Create VPInstructions for BB. | ||
createVPInstructionsForVPBB(VPBB, BB); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assumptions about successors and successor order should be consistent with terminating recipes that target them, including the removal of early-exit branches/successors. Perhaps similar to [header]phisToFix/fixHeaderPhis() that should arguably post-process predecessors to align with phi recipes as they are fixed, after first mirroring the IR predecessors as-is, a post-processing of VPIRInstructions that first wrap IR terminals should take care of both converting them into BranchOnCond/Switch VP[nonIR]Instructions along with possible swapping of successors, plus removal of early-exit branches? Can be done separately if preferred.
@@ -351,6 +282,8 @@ void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB, | |||
// Main interface to build the plain CFG. | |||
void PlainCFGBuilder::buildPlainCFG( | |||
DenseMap<VPBlockBase *, BasicBlock *> &VPB2IRBB) { | |||
VPIRBasicBlock *Entry = cast<VPIRBasicBlock>(Plan.getEntry()); | |||
BB2VPBB[Entry->getIRBasicBlock()] = Entry; | |||
|
|||
// 1. Scan the body of the loop in a topological order to visit each basic | |||
// block after having visited its predecessor basic blocks. Create a VPBB for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo below: incomming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will adjust separately.
// Don't connect any blocks outside the current loop except the latch, which | ||
// is handled below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For TheLoop this now avoids connecting the latch to its exit block, left for skeleton/middle-block construction?
Intention is to remove early exit branches from all loops?
Latch was previously "handled above", now "handled below" - where, in following setTwoSuccessors()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ths should say for the latches of inner loops. Updated, thanks
@@ -9312,14 +9312,14 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) { | |||
Range); | |||
auto Plan = std::make_unique<VPlan>(OrigLoop); | |||
// Build hierarchical CFG. | |||
// Convert to VPlan-transform and consoliate all transforms for VPlan | |||
// TODO: Convert to VPlan-transform and consoliate all transforms for VPlan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: Convert to VPlan-transform and consoliate all transforms for VPlan | |
// TODO: Convert to VPlan-transform and consolidate all transforms for VPlan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks!
// Swap the operands of header phis if needed. After creating the region, the | ||
// incoming value from the preheader must be the first operand and the one | ||
// from the latch must be the second operand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The consistency is between recipes and CFG, so better set it correctly before introducing regions/HCFG? (Specifically, between operands of phi recipes and predecessors, and between conditions/cases of branch/switch recipes and successors).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it is consistent in the plain CFG, and needs to be update here to match the new order of predecessors introduced here.
/// header block, returns a pair with the corresponding preheader and latch | ||
/// blocks. Otherwise return std::nullopt. | ||
static std::optional<std::pair<VPBasicBlock *, VPBasicBlock *>> | ||
getPreheaderAndLatch(VPBlockBase *HeaderVPB, const VPDominatorTree &VPDT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about having the plain CFG-based VPlan which mirrors the IR have consistent recipes and CFG where the predecessors of header blocks are ordered preheader, latch? That would simplify a bool isHeader(VPB, VPDT)
which would have only one option to check, and also createLoopRegion(VPB)
which could directly retrieve the preheader and latch from the header? And possibly support other transforms that may operate on CFG-based VPlan after its construction and before its conversion to HCFG, if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means effectively going back to the orignial verison, which enforced the specific order? The current version moves the canonicalization of order outside the CFG construction, and leaves the plain CFG a 1-1 mapping of the incomig CFG, with both phi recipes and VPBB predecessor order matching the order in the original CFG.
VPDominatorTree VPDT; | ||
VPDT.recalculate(Plan); | ||
for (VPBlockBase *HeaderVPB : vp_depth_first_shallow(Plan.getEntry())) | ||
createLoopRegion(Plan, HeaderVPB, VPDT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createLoopRegion(Plan, HeaderVPB, VPDT); | |
if (isHeader(HeaderVPB, VPDT)) | |
createLoopRegion(Plan, HeaderVPB); |
? Otherwise createLoopRegion() should more accurately be called tryToCreateLoopRegion()...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the name to tryToCreateLoopRegion
, to avoid adding new arguments, thanks
Loop *LoopOfBB = LI->getLoopFor(BB); | ||
Region->setPredecessors({getOrCreateVPBB(LoopOfBB->getLoopPredecessor())}); | ||
} | ||
|
||
// Add operands to VPInstructions representing phi nodes from the input IR. | ||
void PlainCFGBuilder::fixHeaderPhis() { | ||
for (auto *Phi : PhisToFix) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But here it's easy to set (both) things right?
// For header phis, make sure the incoming value from the loop | ||
// predecessor is the first operand of the recipe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retain this behavior, to maintain consistency between VPWidenPHIRecipe's getIncomingBlock(I) which corresponds to the I'th predecessor and getIncomingValue(I) which corresponds to the I'th operand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped the comment, I think at this point the header phis are not really special (the fixed predecessor order comes when we introduce the corresponding regions)
static bool isHeaderVPBB(VPBasicBlock *VPBB) { | ||
return VPBB->getParent() && VPBB->getParent()->getEntry() == VPBB; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation is obsolete being HCFG based, i.e., should always return false; should be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks
// Don't connect any blocks outside the current loop except the latches for | ||
// inner loops. | ||
if (LoopForBB && | ||
(LoopForBB == TheLoop || BB != LoopForBB->getLoopLatch())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is somewhat confusing: can LoopForBB be null, given that BB traverses all (and only) blocks of TheLoop?
BranchOnCond recipes are introduced for conditional branches (that reside inside the top loop) excluding that of the top loop latch, and if one destination resides outside top loop (suffice to check the latter only?). I.e., early-exits from the top loop are effectively removed, keeping a single successor - the one that stays in the loop. I.e.2, any branch that resides in an inner-loop and exits it (early or via latch) to a destination inside top loop, is represented by a branch recipe. The code here however avoids connecting a block that early exits from an inner loop to its exit block destination, if it leaves the inner loop - but may reside in the top loop. Unclear if such early exits from inner loops are supported (by native).
The latch of an inner loop ends with a branch that is covered by a BranchOnCond recipe, and is connected to both its header and exit, so that dominance info would identify the loop (although a loop header VPBB's could be marked as such), and the CFG remains connected.
The latch of the top loop originally has its header and exit block as successors. The latter was wrapped in a VPIRBB when VPlan was constructed, but will be connected to the latch/region later via middle block, so connecting them here is avoided. Should we avoid recreating it needlessly as Successor0 or Successor1? Alternatively, should the exit block be recorded in BB2VPBB at the beginning of buildPlainCFG(), as done with entry block? And connect the top loop's latch to the exit block, mirroring IR CFG, and adjust this connection later when middle block is introduced? Similar to how a region of an inner loop is connected to its preheader and exit instead of the header and latch, when converting to HCFG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is somewhat confusing: can LoopForBB be null, given that BB traverses all (and only) blocks of TheLoop?
It cannot be null, I removed the check.
Initial VPlan-construction should also include connecting the exit blocks of the top-level loop eventually. I have a follow-up patch to do so, as it is probably best tackled separately.
if (BB == TheLoop->getLoopLatch()) { | ||
VPBasicBlock *HeaderVPBB = getOrCreateVPBB(LoopForBB->getHeader()); | ||
VPBlockUtils::connectBlocks(VPBB, HeaderVPBB); | ||
continue; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Top loop latch was connected here to header only, avoiding connection to exit.)
setRegionPredsFromBB(Region, BB); | ||
} else { | ||
VPBB->setPredecessors({getOrCreateVPBB(LoopForBB->getLoopPredecessor()), | ||
getOrCreateVPBB(LoopForBB->getLoopLatch())}); | ||
} | ||
|
||
// Create VPInstructions for BB. | ||
createVPInstructionsForVPBB(VPBB, BB); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this may clarify some confusion raised elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responding to some comments inline regarding phi/predecessor ordering to start with, to clarify. Would be good to decide first where we want the canonicalization of phi/predecessors to happen.
Doing it separately from initial VPlan construction has the benefit of reducing the number of special cases needed during initial construction and shifts the complexity of detecting the preheader/latch to a VPlan transform, currently integrated with introducing regions, but could also be a separate canonicalization.
// Swap the operands of header phis if needed. After creating the region, the | ||
// incoming value from the preheader must be the first operand and the one | ||
// from the latch must be the second operand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it is consistent in the plain CFG, and needs to be update here to match the new order of predecessors introduced here.
Loop *LoopOfBB = LI->getLoopFor(BB); | ||
Region->setPredecessors({getOrCreateVPBB(LoopOfBB->getLoopPredecessor())}); | ||
} | ||
|
||
// Add operands to VPInstructions representing phi nodes from the input IR. | ||
void PlainCFGBuilder::fixHeaderPhis() { | ||
for (auto *Phi : PhisToFix) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previosuly it wasnt consistent, but with the current patch it is; both use the order from the CFG in the current version and are a 1-1 translation of the CFG.
In the current patch, the canonicalization of header phi operand and header predecessor order is done completely separate from CFG construction, when the region is introduced.
In general, I think we should move away from directly relying on the operands to retrieve incoming values and move towards an interface getIncomingValueFor(VPB)
, which takes care of checking what the index for VPB
is.
/// header block, returns a pair with the corresponding preheader and latch | ||
/// blocks. Otherwise return std::nullopt. | ||
static std::optional<std::pair<VPBasicBlock *, VPBasicBlock *>> | ||
getPreheaderAndLatch(VPBlockBase *HeaderVPB, const VPDominatorTree &VPDT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means effectively going back to the orignial verison, which enforced the specific order? The current version moves the canonicalization of order outside the CFG construction, and leaves the plain CFG a 1-1 mapping of the incomig CFG, with both phi recipes and VPBB predecessor order matching the order in the original CFG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left the code dealing with adding operands to phis as-is (keep IR order for both predecessors and operands during intiial construction, canonicalize when region is introduced), pending adjustment if needed.
All other comments should be addressed. Ping :)
@@ -9312,14 +9312,14 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) { | |||
Range); | |||
auto Plan = std::make_unique<VPlan>(OrigLoop); | |||
// Build hierarchical CFG. | |||
// Convert to VPlan-transform and consoliate all transforms for VPlan | |||
// TODO: Convert to VPlan-transform and consoliate all transforms for VPlan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks!
VPDominatorTree VPDT; | ||
VPDT.recalculate(Plan); | ||
for (VPBlockBase *HeaderVPB : vp_depth_first_shallow(Plan.getEntry())) | ||
createLoopRegion(Plan, HeaderVPB, VPDT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the name to tryToCreateLoopRegion
, to avoid adding new arguments, thanks
auto *OrigExiting = TopRegion->getExiting(); | ||
VPBasicBlock *LatchVPBB = Plan.createVPBasicBlock("vector.latch"); | ||
VPBlockUtils::insertBlockAfter(LatchVPBB, OrigExiting); | ||
TopRegion->setExiting(LatchVPBB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
// For header phis, make sure the incoming value from the loop | ||
// predecessor is the first operand of the recipe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped the comment, I think at this point the header phis are not really special (the fixed predecessor order comes when we introduce the corresponding regions)
static bool isHeaderVPBB(VPBasicBlock *VPBB) { | ||
return VPBB->getParent() && VPBB->getParent()->getEntry() == VPBB; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks
// Don't connect any blocks outside the current loop except the latches for | ||
// inner loops. | ||
if (LoopForBB && | ||
(LoopForBB == TheLoop || BB != LoopForBB->getLoopLatch())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is somewhat confusing: can LoopForBB be null, given that BB traverses all (and only) blocks of TheLoop?
It cannot be null, I removed the check.
Initial VPlan-construction should also include connecting the exit blocks of the top-level loop eventually. I have a follow-up patch to do so, as it is probably best tackled separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left the code dealing with adding operands to phis as-is (keep IR order for both predecessors and operands during intiial construction, canonicalize when region is introduced), pending adjustment if needed.
OK. Would it be better to apply this canonicalization when looking for region headers - swapping the predecessors of the header together with the operands of its phi recipes? Admittedly these predecessors are soon to be disconnected, but the number of headers (that require clear yet somewhat redundant swapping) is small.
/// dominates the latch and the preheader dominates the block. If it is a | ||
/// header block, returns a pair with the corresponding preheader and latch | ||
/// blocks. Otherwise return std::nullopt. | ||
static std::optional<std::pair<VPBasicBlock *, VPBasicBlock *>> | ||
getPreheaderAndLatch(VPBlockBase *HeaderVPB, const VPDominatorTree &VPDT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// dominates the latch and the preheader dominates the block. If it is a | |
/// header block, returns a pair with the corresponding preheader and latch | |
/// blocks. Otherwise return std::nullopt. | |
static std::optional<std::pair<VPBasicBlock *, VPBasicBlock *>> | |
getPreheaderAndLatch(VPBlockBase *HeaderVPB, const VPDominatorTree &VPDT) { | |
/// dominates the latch and the preheader dominates the block. If it is a | |
/// header block return true, making sure the preheader appears first and | |
/// the latch second. Otherwise return false. | |
static bool canonicalHeader(VPBlockBase *HeaderVPB, const VPDominatorTree &VPDT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks!
getPreheaderAndLatch(VPBlockBase *HeaderVPB, const VPDominatorTree &VPDT) { | ||
ArrayRef<VPBlockBase *> Preds = HeaderVPB->getPredecessors(); | ||
if (Preds.size() != 2) | ||
return std::nullopt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return std::nullopt; | |
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done thanks
auto *LatchVPBB = cast<VPBasicBlock>(Preds[1]); | ||
if (VPDT.dominates(PreheaderVPBB, HeaderVPB) && | ||
VPDT.dominates(HeaderVPB, LatchVPBB)) | ||
return {std::make_pair(PreheaderVPBB, LatchVPBB)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return {std::make_pair(PreheaderVPBB, LatchVPBB)}; | |
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
auto Res = getPreheaderAndLatch(HeaderVPB, VPDT); | ||
if (!Res) | ||
return; | ||
|
||
const auto &[PreheaderVPBB, LatchVPBB] = *Res; | ||
|
||
// Swap the operands of header phis if needed. After creating the region, the | ||
// incoming value from the preheader must be the first operand and the one | ||
// from the latch must be the second operand. | ||
if (HeaderVPB->getPredecessors()[0] != PreheaderVPBB) { | ||
for (VPRecipeBase &R : cast<VPBasicBlock>(HeaderVPB)->phis()) { | ||
VPValue *Inc0 = R.getOperand(0); | ||
R.setOperand(0, R.getOperand(1)); | ||
R.setOperand(1, Inc0); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto Res = getPreheaderAndLatch(HeaderVPB, VPDT); | |
if (!Res) | |
return; | |
const auto &[PreheaderVPBB, LatchVPBB] = *Res; | |
// Swap the operands of header phis if needed. After creating the region, the | |
// incoming value from the preheader must be the first operand and the one | |
// from the latch must be the second operand. | |
if (HeaderVPB->getPredecessors()[0] != PreheaderVPBB) { | |
for (VPRecipeBase &R : cast<VPBasicBlock>(HeaderVPB)->phis()) { | |
VPValue *Inc0 = R.getOperand(0); | |
R.setOperand(0, R.getOperand(1)); | |
R.setOperand(1, Inc0); | |
} | |
} | |
auto *PreheaderVPBB = HeaderVPB->getPredecessors()[0]; | |
auto *LatchVPBB = HeaderVPB->getPredecessors()[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done thanks
VPDT.dominates(HeaderVPB, LatchVPBB)) | ||
return {std::make_pair(PreheaderVPBB, LatchVPBB)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPDT.dominates(HeaderVPB, LatchVPBB)) | |
return {std::make_pair(PreheaderVPBB, LatchVPBB)}; | |
VPDT.dominates(HeaderVPB, LatchVPBB)) { | |
// Canonicalize predecessors of header so that preheader is first and latch second. | |
HeaderVPB->swapPredecessors(); | |
for (VPRecipeBase &R : cast<VPBasicBlock>(HeaderVPB)->phis()) | |
R.swapOperands(); | |
return true; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks!
VPDT.dominates(HeaderVPB, LatchVPBB)) | ||
return {std::make_pair(PreheaderVPBB, LatchVPBB)}; | ||
|
||
return std::nullopt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return std::nullopt; | |
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
VPDominatorTree VPDT; | ||
VPDT.recalculate(Plan); | ||
for (VPBlockBase *HeaderVPB : vp_depth_first_shallow(Plan.getEntry())) | ||
tryToCreateLoopRegion(Plan, HeaderVPB, VPDT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tryToCreateLoopRegion(Plan, HeaderVPB, VPDT); | |
if (canonicalHeader(HeaderVPB, VPDT) | |
createLoopRegion(Plan, HeaderVPB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done thanks!
auto *PreheaderVPBB = cast<VPBasicBlock>(Preds[0]); | ||
auto *LatchVPBB = cast<VPBasicBlock>(Preds[1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto *PreheaderVPBB = cast<VPBasicBlock>(Preds[0]); | |
auto *LatchVPBB = cast<VPBasicBlock>(Preds[1]); | |
auto *PreheaderVPBB = Preds[0]; | |
auto *LatchVPBB = Preds[1]; |
or are the casts needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks
Sounds good, updated, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
"Introduce all loop regions as VPlan transform" may be a more accurate title than "Introduce child regions as VPlan transform." |
Thanks, updated! Also added NFC, as it only changes minor debug output |
…C) (#129402) Further simplify VPlan CFG builder by moving introduction of inner regions to a VPlan transform, building on llvm/llvm-project#128419. The HCFG builder now only constructs plain CFGs. I will move it to VPlanConstruction as follow-up. Depends on llvm/llvm-project#128419. PR: llvm/llvm-project#129402
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/16117 Here is the relevant piece of the build log for the reference
|
…9402) Further simplify VPlan CFG builder by moving introduction of inner regions to a VPlan transform, building on llvm#128419. The HCFG builder now only constructs plain CFGs. I will move it to VPlanConstruction as follow-up. Depends on llvm#128419. PR: llvm#129402
…(NFC) Follow-up as discussed in llvm/llvm-project#129402. After bc03d6c, the VPlanHCFGBuilder doesn't actually build a HCFG any longer. Move what remains directly into VPlanConstruction.cpp.
Follow-up as discussed in llvm#129402. After bc03d6c, the VPlanHCFGBuilder doesn't actually build a HCFG any longer. Move what remains directly into VPlanConstruction.cpp.
…#132292) As pointed out by @iamlouk in llvm#129402, the current code doesn't handle latches with different successor orders correctly. Introduce a `NOT`, if needed. Depends on llvm#129402 PR: llvm#132292
Follow-up as discussed in llvm#129402. After bc03d6c, the VPlanHCFGBuilder doesn't actually build a HCFG any longer. Move what remains directly into VPlanConstruction.cpp.
…#132292) As pointed out by @iamlouk in llvm#129402, the current code doesn't handle latches with different successor orders correctly. Introduce a `NOT`, if needed. Depends on llvm#129402 PR: llvm#132292
Follow-up as discussed in llvm#129402. After bc03d6c, the VPlanHCFGBuilder doesn't actually build a HCFG any longer. Move what remains directly into VPlanConstruction.cpp.
…#132292) As pointed out by @iamlouk in llvm#129402, the current code doesn't handle latches with different successor orders correctly. Introduce a `NOT`, if needed. Depends on llvm#129402 PR: llvm#132292
…ions. (#132292) As pointed out by @iamlouk in llvm/llvm-project#129402, the current code doesn't handle latches with different successor orders correctly. Introduce a `NOT`, if needed. Depends on llvm/llvm-project#129402 PR: llvm/llvm-project#132292
…#132292) As pointed out by @iamlouk in llvm#129402, the current code doesn't handle latches with different successor orders correctly. Introduce a `NOT`, if needed. Depends on llvm#129402 PR: llvm#132292
Further simplify VPlan CFG builder by moving introduction of inner
regions to a VPlan transform, building on
#128419.
The HCFG builder now only constructs plain CFGs. I will move it to
VPlanConstruction as follow-up.
Depends on #128419, which is currently included in the PR as well.