Skip to content

Commit 4a3ec74

Browse files
committed
[VPlan] Refactor VPlan creation, add transform introducing region (NFC).
Create an empty VPlan first, then let the HCFG builder create a plain CFG for the top-level loop (w/o a top-level region). The top-level region is introduced by a separate VPlan-transform. This is instead of creating the vector loop region before building the VPlan CFG for the input loop. This simplifies the HCFG builder (which should probably be renamed) and moves along the roadmap ('buildLoop') outlined in [1]. As follow-up, I plan to also preserve the exit branches in the initial VPlan out of the CFG builder, including connections to the exit blocks. The conversion from plain CFG with potentially multiple exits to a single entry/exit region will be done as VPlan transform in a follow-up. This is needed to enable VPlan-based predication. Currently early exit support relies on building the block-in masks on the original CFG, because exiting branches and conditions aren't preserved in the VPlan. So in order to switch to VPlan-based predication, we will have to preserve them in the initial plain CFG, so the exit conditions are available explicitly when we convert to single entry/exit regions. Another follow-up is updating the outer loop handling to also introduce VPRegionBlocks for nested loops as transform. Currently the existing logic in the builder will take care of creating VPRegionBlocks for nested loops, but not the top-level loop. [1] https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf
1 parent 900220d commit 4a3ec74

File tree

7 files changed

+133
-141
lines changed

7 files changed

+133
-141
lines changed

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+9-7
Original file line numberDiff line numberDiff line change
@@ -9312,14 +9312,15 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
93129312
return !CM.requiresScalarEpilogue(VF.isVector());
93139313
},
93149314
Range);
9315-
VPlanPtr Plan = VPlan::createInitialVPlan(Legal->getWidestInductionType(),
9316-
PSE, RequiresScalarEpilogueCheck,
9317-
CM.foldTailByMasking(), OrigLoop);
9318-
9315+
auto Plan = std::make_unique<VPlan>(OrigLoop);
93199316
// Build hierarchical CFG.
93209317
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
93219318
HCFGBuilder.buildHierarchicalCFG();
93229319

9320+
VPlanTransforms::introduceTopLevelVectorLoopRegion(
9321+
*Plan, Legal->getWidestInductionType(), PSE, RequiresScalarEpilogueCheck,
9322+
CM.foldTailByMasking(), OrigLoop);
9323+
93239324
// Don't use getDecisionAndClampRange here, because we don't know the UF
93249325
// so this function is better to be conservative, rather than to split
93259326
// it up into different VPlans.
@@ -9615,13 +9616,14 @@ VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
96159616
assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");
96169617

96179618
// Create new empty VPlan
9618-
auto Plan = VPlan::createInitialVPlan(Legal->getWidestInductionType(), PSE,
9619-
true, false, OrigLoop);
9620-
9619+
auto Plan = std::make_unique<VPlan>(OrigLoop);
96219620
// Build hierarchical CFG
96229621
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
96239622
HCFGBuilder.buildHierarchicalCFG();
96249623

9624+
VPlanTransforms::introduceTopLevelVectorLoopRegion(
9625+
*Plan, Legal->getWidestInductionType(), PSE, true, false, OrigLoop);
9626+
96259627
for (ElementCount VF : Range)
96269628
Plan->addVF(VF);
96279629

llvm/lib/Transforms/Vectorize/VPlan.cpp

+7-84
Original file line numberDiff line numberDiff line change
@@ -880,85 +880,6 @@ VPlan::~VPlan() {
880880
delete BackedgeTakenCount;
881881
}
882882

883-
VPlanPtr VPlan::createInitialVPlan(Type *InductionTy,
884-
PredicatedScalarEvolution &PSE,
885-
bool RequiresScalarEpilogueCheck,
886-
bool TailFolded, Loop *TheLoop) {
887-
auto Plan = std::make_unique<VPlan>(TheLoop);
888-
VPBlockBase *ScalarHeader = Plan->getScalarHeader();
889-
890-
// Connect entry only to vector preheader initially. Entry will also be
891-
// connected to the scalar preheader later, during skeleton creation when
892-
// runtime guards are added as needed. Note that when executing the VPlan for
893-
// an epilogue vector loop, the original entry block here will be replaced by
894-
// a new VPIRBasicBlock wrapping the entry to the epilogue vector loop after
895-
// generating code for the main vector loop.
896-
VPBasicBlock *VecPreheader = Plan->createVPBasicBlock("vector.ph");
897-
VPBlockUtils::connectBlocks(Plan->getEntry(), VecPreheader);
898-
899-
// Create SCEV and VPValue for the trip count.
900-
// We use the symbolic max backedge-taken-count, which works also when
901-
// vectorizing loops with uncountable early exits.
902-
const SCEV *BackedgeTakenCountSCEV = PSE.getSymbolicMaxBackedgeTakenCount();
903-
assert(!isa<SCEVCouldNotCompute>(BackedgeTakenCountSCEV) &&
904-
"Invalid loop count");
905-
ScalarEvolution &SE = *PSE.getSE();
906-
const SCEV *TripCount = SE.getTripCountFromExitCount(BackedgeTakenCountSCEV,
907-
InductionTy, TheLoop);
908-
Plan->TripCount =
909-
vputils::getOrCreateVPValueForSCEVExpr(*Plan, TripCount, SE);
910-
911-
// Create VPRegionBlock, with empty header and latch blocks, to be filled
912-
// during processing later.
913-
VPBasicBlock *HeaderVPBB = Plan->createVPBasicBlock("vector.body");
914-
VPBasicBlock *LatchVPBB = Plan->createVPBasicBlock("vector.latch");
915-
VPBlockUtils::insertBlockAfter(LatchVPBB, HeaderVPBB);
916-
auto *TopRegion = Plan->createVPRegionBlock(
917-
HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);
918-
919-
VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
920-
VPBasicBlock *MiddleVPBB = Plan->createVPBasicBlock("middle.block");
921-
VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);
922-
923-
VPBasicBlock *ScalarPH = Plan->createVPBasicBlock("scalar.ph");
924-
VPBlockUtils::connectBlocks(ScalarPH, ScalarHeader);
925-
if (!RequiresScalarEpilogueCheck) {
926-
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
927-
return Plan;
928-
}
929-
930-
// If needed, add a check in the middle block to see if we have completed
931-
// all of the iterations in the first vector loop. Three cases:
932-
// 1) If (N - N%VF) == N, then we *don't* need to run the remainder.
933-
// Thus if tail is to be folded, we know we don't need to run the
934-
// remainder and we can set the condition to true.
935-
// 2) If we require a scalar epilogue, there is no conditional branch as
936-
// we unconditionally branch to the scalar preheader. Do nothing.
937-
// 3) Otherwise, construct a runtime check.
938-
BasicBlock *IRExitBlock = TheLoop->getUniqueLatchExitBlock();
939-
VPIRBasicBlock *VPExitBlock = Plan->getExitBlock(IRExitBlock);
940-
// The connection order corresponds to the operands of the conditional branch.
941-
VPBlockUtils::insertBlockAfter(VPExitBlock, MiddleVPBB);
942-
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
943-
944-
auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
945-
// Here we use the same DebugLoc as the scalar loop latch terminator instead
946-
// of the corresponding compare because they may have ended up with
947-
// different line numbers and we want to avoid awkward line stepping while
948-
// debugging. Eg. if the compare has got a line number inside the loop.
949-
VPBuilder Builder(MiddleVPBB);
950-
VPValue *Cmp =
951-
TailFolded
952-
? Plan->getOrAddLiveIn(ConstantInt::getTrue(
953-
IntegerType::getInt1Ty(TripCount->getType()->getContext())))
954-
: Builder.createICmp(CmpInst::ICMP_EQ, Plan->getTripCount(),
955-
&Plan->getVectorTripCount(),
956-
ScalarLatchTerm->getDebugLoc(), "cmp.n");
957-
Builder.createNaryOp(VPInstruction::BranchOnCond, {Cmp},
958-
ScalarLatchTerm->getDebugLoc());
959-
return Plan;
960-
}
961-
962883
void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
963884
VPTransformState &State) {
964885
Type *TCTy = TripCountV->getType();
@@ -1135,11 +1056,13 @@ void VPlan::printLiveIns(raw_ostream &O) const {
11351056
}
11361057

11371058
O << "\n";
1138-
if (TripCount->isLiveIn())
1139-
O << "Live-in ";
1140-
TripCount->printAsOperand(O, SlotTracker);
1141-
O << " = original trip-count";
1142-
O << "\n";
1059+
if (TripCount) {
1060+
if (TripCount->isLiveIn())
1061+
O << "Live-in ";
1062+
TripCount->printAsOperand(O, SlotTracker);
1063+
O << " = original trip-count";
1064+
O << "\n";
1065+
}
11431066
}
11441067

11451068
LLVM_DUMP_METHOD

llvm/lib/Transforms/Vectorize/VPlan.h

+2-15
Original file line numberDiff line numberDiff line change
@@ -3505,21 +3505,6 @@ class VPlan {
35053505
VPBB->setPlan(this);
35063506
}
35073507

3508-
/// Create initial VPlan, having an "entry" VPBasicBlock (wrapping
3509-
/// original scalar pre-header) which contains SCEV expansions that need
3510-
/// to happen before the CFG is modified (when executing a VPlan for the
3511-
/// epilogue vector loop, the original entry needs to be replaced by a new
3512-
/// one); a VPBasicBlock for the vector pre-header, followed by a region for
3513-
/// the vector loop, followed by the middle VPBasicBlock. If a check is needed
3514-
/// to guard executing the scalar epilogue loop, it will be added to the
3515-
/// middle block, together with VPBasicBlocks for the scalar preheader and
3516-
/// exit blocks. \p InductionTy is the type of the canonical induction and
3517-
/// used for related values, like the trip count expression.
3518-
static VPlanPtr createInitialVPlan(Type *InductionTy,
3519-
PredicatedScalarEvolution &PSE,
3520-
bool RequiresScalarEpilogueCheck,
3521-
bool TailFolded, Loop *TheLoop);
3522-
35233508
/// Prepare the plan for execution, setting up the required live-in values.
35243509
void prepareToExecute(Value *TripCount, Value *VectorTripCount,
35253510
VPTransformState &State);
@@ -3589,6 +3574,8 @@ class VPlan {
35893574
TripCount = NewTripCount;
35903575
}
35913576

3577+
void setTripCount(VPValue *NewTripCount) { TripCount = NewTripCount; }
3578+
35923579
/// The backedge taken count of the original loop.
35933580
VPValue *getOrCreateBackedgeTakenCount() {
35943581
if (!BackedgeTakenCount)

llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp

+20-33
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@ VPBasicBlock *PlainCFGBuilder::getOrCreateVPBB(BasicBlock *BB) {
180180

181181
// Get or create a region for the loop containing BB.
182182
Loop *LoopOfBB = LI->getLoopFor(BB);
183-
if (!LoopOfBB || !doesContainLoop(LoopOfBB, TheLoop))
183+
if (!LoopOfBB || LoopOfBB == TheLoop || !doesContainLoop(LoopOfBB, TheLoop))
184184
return VPBB;
185185

186186
auto *RegionOfVPBB = Loop2Region.lookup(LoopOfBB);
@@ -353,29 +353,6 @@ void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
353353
// Main interface to build the plain CFG.
354354
void PlainCFGBuilder::buildPlainCFG(
355355
DenseMap<VPBlockBase *, BasicBlock *> &VPB2IRBB) {
356-
// 0. Reuse the top-level region, vector-preheader and exit VPBBs from the
357-
// skeleton. These were created directly rather than via getOrCreateVPBB(),
358-
// revisit them now to update BB2VPBB. Note that header/entry and
359-
// latch/exiting VPBB's of top-level region have yet to be created.
360-
VPRegionBlock *TheRegion = Plan.getVectorLoopRegion();
361-
BasicBlock *ThePreheaderBB = TheLoop->getLoopPreheader();
362-
assert((ThePreheaderBB->getTerminator()->getNumSuccessors() == 1) &&
363-
"Unexpected loop preheader");
364-
auto *VectorPreheaderVPBB =
365-
cast<VPBasicBlock>(TheRegion->getSinglePredecessor());
366-
// ThePreheaderBB conceptually corresponds to both Plan.getPreheader() (which
367-
// wraps the original preheader BB) and Plan.getEntry() (which represents the
368-
// new vector preheader); here we're interested in setting BB2VPBB to the
369-
// latter.
370-
BB2VPBB[ThePreheaderBB] = VectorPreheaderVPBB;
371-
Loop2Region[LI->getLoopFor(TheLoop->getHeader())] = TheRegion;
372-
373-
// The existing vector region's entry and exiting VPBBs correspond to the loop
374-
// header and latch.
375-
VPBasicBlock *VectorHeaderVPBB = TheRegion->getEntryBasicBlock();
376-
VPBasicBlock *VectorLatchVPBB = TheRegion->getExitingBasicBlock();
377-
BB2VPBB[TheLoop->getHeader()] = VectorHeaderVPBB;
378-
VectorHeaderVPBB->clearSuccessors();
379356

380357
// 1. Scan the body of the loop in a topological order to visit each basic
381358
// block after having visited its predecessor basic blocks. Create a VPBB for
@@ -386,6 +363,9 @@ void PlainCFGBuilder::buildPlainCFG(
386363

387364
// Loop PH needs to be explicitly visited since it's not taken into account by
388365
// LoopBlocksDFS.
366+
BasicBlock *ThePreheaderBB = TheLoop->getLoopPreheader();
367+
assert((ThePreheaderBB->getTerminator()->getNumSuccessors() == 1) &&
368+
"Unexpected loop preheader");
389369
for (auto &I : *ThePreheaderBB) {
390370
if (I.getType()->isVoidTy())
391371
continue;
@@ -406,18 +386,16 @@ void PlainCFGBuilder::buildPlainCFG(
406386
} else {
407387
// BB is a loop header, set the predecessor for the region, except for the
408388
// top region, whose predecessor was set when creating VPlan's skeleton.
409-
assert(isHeaderVPBB(VPBB) && "isHeaderBB and isHeaderVPBB disagree");
410-
if (TheRegion != Region)
389+
if (LoopForBB != TheLoop)
411390
setRegionPredsFromBB(Region, BB);
412391
}
413392

414393
// Create VPInstructions for BB.
415394
createVPInstructionsForVPBB(VPBB, BB);
416395

417-
if (TheLoop->getLoopLatch() == BB) {
418-
VPBB->setOneSuccessor(VectorLatchVPBB);
419-
VectorLatchVPBB->clearPredecessors();
420-
VectorLatchVPBB->setPredecessors({VPBB});
396+
if (BB == TheLoop->getLoopLatch()) {
397+
VPBasicBlock *HeaderVPBB = getOrCreateVPBB(LoopForBB->getHeader());
398+
VPBlockUtils::connectBlocks(VPBB, HeaderVPBB);
421399
continue;
422400
}
423401

@@ -449,16 +427,22 @@ void PlainCFGBuilder::buildPlainCFG(
449427
VPBasicBlock *Successor0 = getOrCreateVPBB(IRSucc0);
450428
VPBasicBlock *Successor1 = getOrCreateVPBB(IRSucc1);
451429
if (BB == LoopForBB->getLoopLatch()) {
452-
// For a latch we need to set the successor of the region rather than that
453-
// of VPBB and it should be set to the exit, i.e., non-header successor,
430+
// For a latch we need to set the successor of the region rather
431+
// than that
432+
// of VPBB and it should be set to the exit, i.e., non-header
433+
// successor,
454434
// except for the top region, whose successor was set when creating
455435
// VPlan's skeleton.
456-
assert(TheRegion != Region &&
436+
assert(LoopForBB != TheLoop &&
457437
"Latch of the top region should have been handled earlier");
458438
Region->setOneSuccessor(isHeaderVPBB(Successor0) ? Successor1
459439
: Successor0);
460440
Region->setExiting(VPBB);
461441
continue;
442+
443+
VPBasicBlock *HeaderVPBB = getOrCreateVPBB(LoopForBB->getHeader());
444+
VPBlockUtils::connectBlocks(VPBB, HeaderVPBB);
445+
continue;
462446
}
463447

464448
// Don't connect any blocks outside the current loop except the latch for
@@ -482,6 +466,9 @@ void PlainCFGBuilder::buildPlainCFG(
482466
// corresponding VPlan operands.
483467
fixHeaderPhis();
484468

469+
VPBlockUtils::connectBlocks(Plan.getEntry(),
470+
getOrCreateVPBB(TheLoop->getHeader()));
471+
485472
for (const auto &[IRBB, VPB] : BB2VPBB)
486473
VPB2IRBB[VPB] = IRBB;
487474
}

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+76
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,82 @@
3232

3333
using namespace llvm;
3434

35+
void VPlanTransforms::introduceTopLevelVectorLoopRegion(
36+
VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
37+
bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop) {
38+
auto *HeaderVPBB = cast<VPBasicBlock>(Plan.getEntry()->getSingleSuccessor());
39+
VPBlockUtils::disconnectBlocks(Plan.getEntry(), HeaderVPBB);
40+
41+
VPBasicBlock *OriginalLatch =
42+
cast<VPBasicBlock>(HeaderVPBB->getSinglePredecessor());
43+
VPBlockUtils::disconnectBlocks(OriginalLatch, HeaderVPBB);
44+
VPBasicBlock *VecPreheader = Plan.createVPBasicBlock("vector.ph");
45+
VPBlockUtils::connectBlocks(Plan.getEntry(), VecPreheader);
46+
47+
// Create SCEV and VPValue for the trip count.
48+
// We use the symbolic max backedge-taken-count, which works also when
49+
// vectorizing loops with uncountable early exits.
50+
const SCEV *BackedgeTakenCountSCEV = PSE.getSymbolicMaxBackedgeTakenCount();
51+
assert(!isa<SCEVCouldNotCompute>(BackedgeTakenCountSCEV) &&
52+
"Invalid loop count");
53+
ScalarEvolution &SE = *PSE.getSE();
54+
const SCEV *TripCount = SE.getTripCountFromExitCount(BackedgeTakenCountSCEV,
55+
InductionTy, TheLoop);
56+
Plan.setTripCount(
57+
vputils::getOrCreateVPValueForSCEVExpr(Plan, TripCount, SE));
58+
59+
// Create VPRegionBlock, with empty header and latch blocks, to be filled
60+
// during processing later.
61+
VPBasicBlock *LatchVPBB = Plan.createVPBasicBlock("vector.latch");
62+
VPBlockUtils::insertBlockAfter(LatchVPBB, OriginalLatch);
63+
auto *TopRegion = Plan.createVPRegionBlock(
64+
HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);
65+
for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB)) {
66+
VPBB->setParent(TopRegion);
67+
}
68+
69+
VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
70+
VPBasicBlock *MiddleVPBB = Plan.createVPBasicBlock("middle.block");
71+
VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);
72+
73+
VPBasicBlock *ScalarPH = Plan.createVPBasicBlock("scalar.ph");
74+
VPBlockUtils::connectBlocks(ScalarPH, Plan.getScalarHeader());
75+
if (!RequiresScalarEpilogueCheck) {
76+
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
77+
return;
78+
}
79+
80+
// If needed, add a check in the middle block to see if we have completed
81+
// all of the iterations in the first vector loop. Three cases:
82+
// 1) If (N - N%VF) == N, then we *don't* need to run the remainder.
83+
// Thus if tail is to be folded, we know we don't need to run the
84+
// remainder and we can set the condition to true.
85+
// 2) If we require a scalar epilogue, there is no conditional branch as
86+
// we unconditionally branch to the scalar preheader. Do nothing.
87+
// 3) Otherwise, construct a runtime check.
88+
BasicBlock *IRExitBlock = TheLoop->getUniqueLatchExitBlock();
89+
auto *VPExitBlock = Plan.getExitBlock(IRExitBlock);
90+
// The connection order corresponds to the operands of the conditional branch.
91+
VPBlockUtils::insertBlockAfter(VPExitBlock, MiddleVPBB);
92+
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
93+
94+
auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
95+
// Here we use the same DebugLoc as the scalar loop latch terminator instead
96+
// of the corresponding compare because they may have ended up with
97+
// different line numbers and we want to avoid awkward line stepping while
98+
// debugging. Eg. if the compare has got a line number inside the loop.
99+
VPBuilder Builder(MiddleVPBB);
100+
VPValue *Cmp =
101+
TailFolded
102+
? Plan.getOrAddLiveIn(ConstantInt::getTrue(
103+
IntegerType::getInt1Ty(TripCount->getType()->getContext())))
104+
: Builder.createICmp(CmpInst::ICMP_EQ, Plan.getTripCount(),
105+
&Plan.getVectorTripCount(),
106+
ScalarLatchTerm->getDebugLoc(), "cmp.n");
107+
Builder.createNaryOp(VPInstruction::BranchOnCond, {Cmp},
108+
ScalarLatchTerm->getDebugLoc());
109+
}
110+
35111
void VPlanTransforms::VPInstructionsToVPRecipes(
36112
VPlanPtr &Plan,
37113
function_ref<const InductionDescriptor *(PHINode *)>

llvm/lib/Transforms/Vectorize/VPlanTransforms.h

+15
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,21 @@ struct VPlanTransforms {
5252
verifyVPlanIsValid(Plan);
5353
}
5454

55+
/// Introduce the top-level VPRegionBlock for the main loop in \p Plan. Coming
56+
/// in this function, \p Plan's top-level loop is modeled using a plain CFG.
57+
/// This transforms replaces the plain CFG with a VPRegionBlock wrapping the
58+
/// top-level loop and creates a VPValue expressions for the original trip
59+
/// count. It will also introduce a dedicated VPBasicBlock for the vector
60+
/// pre-header as well a VPBasicBlock as exit block of the region
61+
/// (middle.block). If a check is needed to guard executing the scalar
62+
/// epilogue loop, it will be added to the middle block, together with
63+
/// VPBasicBlocks for the scalar preheader and exit blocks. \p InductionTy is
64+
/// the type of the canonical induction and used for related values, like the
65+
/// trip count expression.
66+
static void introduceTopLevelVectorLoopRegion(
67+
VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
68+
bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop);
69+
5570
/// Replaces the VPInstructions in \p Plan with corresponding
5671
/// widen recipes.
5772
static void

0 commit comments

Comments
 (0)