Skip to content

Commit e9d2614

Browse files
committed
[VPlan] Refactor VPlan creation, add transform introducing region (NFC).
Create an empty VPlan first, then let the HCFG builder create a plain CFG for the top-level loop (w/o a top-level region). The top-level region is introduced by a separate VPlan-transform. This is instead of creating the vector loop region before building the VPlan CFG for the input loop. This simplifies the HCFG builder (which should probably be renamed) and moves along the roadmap ('buildLoop') outlined in [1]. As follow-up, I plan to also preserve the exit branches in the initial VPlan out of the CFG builder, including connections to the exit blocks. The conversion from plain CFG with potentially multiple exits to a single entry/exit region will be done as VPlan transform in a follow-up. This is needed to enable VPlan-based predication. Currently early exit support relies on building the block-in masks on the original CFG, because exiting branches and conditions aren't preserved in the VPlan. So in order to switch to VPlan-based predication, we will have to preserve them in the initial plain CFG, so the exit conditions are available explicitly when we convert to single entry/exit regions. Another follow-up is updating the outer loop handling to also introduce VPRegionBlocks for nested loops as transform. Currently the existing logic in the builder will take care of creating VPRegionBlocks for nested loops, but not the top-level loop. [1] https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf
1 parent 72791fe commit e9d2614

File tree

7 files changed

+133
-141
lines changed

7 files changed

+133
-141
lines changed

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+9-7
Original file line numberDiff line numberDiff line change
@@ -9307,14 +9307,15 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
93079307
return !CM.requiresScalarEpilogue(VF.isVector());
93089308
},
93099309
Range);
9310-
VPlanPtr Plan = VPlan::createInitialVPlan(Legal->getWidestInductionType(),
9311-
PSE, RequiresScalarEpilogueCheck,
9312-
CM.foldTailByMasking(), OrigLoop);
9313-
9310+
auto Plan = std::make_unique<VPlan>(OrigLoop);
93149311
// Build hierarchical CFG.
93159312
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
93169313
HCFGBuilder.buildHierarchicalCFG();
93179314

9315+
VPlanTransforms::introduceTopLevelVectorLoopRegion(
9316+
*Plan, Legal->getWidestInductionType(), PSE, RequiresScalarEpilogueCheck,
9317+
CM.foldTailByMasking(), OrigLoop);
9318+
93189319
// Don't use getDecisionAndClampRange here, because we don't know the UF
93199320
// so this function is better to be conservative, rather than to split
93209321
// it up into different VPlans.
@@ -9610,13 +9611,14 @@ VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
96109611
assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");
96119612

96129613
// Create new empty VPlan
9613-
auto Plan = VPlan::createInitialVPlan(Legal->getWidestInductionType(), PSE,
9614-
true, false, OrigLoop);
9615-
9614+
auto Plan = std::make_unique<VPlan>(OrigLoop);
96169615
// Build hierarchical CFG
96179616
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
96189617
HCFGBuilder.buildHierarchicalCFG();
96199618

9619+
VPlanTransforms::introduceTopLevelVectorLoopRegion(
9620+
*Plan, Legal->getWidestInductionType(), PSE, true, false, OrigLoop);
9621+
96209622
for (ElementCount VF : Range)
96219623
Plan->addVF(VF);
96229624

llvm/lib/Transforms/Vectorize/VPlan.cpp

+7-84
Original file line numberDiff line numberDiff line change
@@ -876,85 +876,6 @@ VPlan::~VPlan() {
876876
delete BackedgeTakenCount;
877877
}
878878

879-
VPlanPtr VPlan::createInitialVPlan(Type *InductionTy,
880-
PredicatedScalarEvolution &PSE,
881-
bool RequiresScalarEpilogueCheck,
882-
bool TailFolded, Loop *TheLoop) {
883-
auto Plan = std::make_unique<VPlan>(TheLoop);
884-
VPBlockBase *ScalarHeader = Plan->getScalarHeader();
885-
886-
// Connect entry only to vector preheader initially. Entry will also be
887-
// connected to the scalar preheader later, during skeleton creation when
888-
// runtime guards are added as needed. Note that when executing the VPlan for
889-
// an epilogue vector loop, the original entry block here will be replaced by
890-
// a new VPIRBasicBlock wrapping the entry to the epilogue vector loop after
891-
// generating code for the main vector loop.
892-
VPBasicBlock *VecPreheader = Plan->createVPBasicBlock("vector.ph");
893-
VPBlockUtils::connectBlocks(Plan->getEntry(), VecPreheader);
894-
895-
// Create SCEV and VPValue for the trip count.
896-
// We use the symbolic max backedge-taken-count, which works also when
897-
// vectorizing loops with uncountable early exits.
898-
const SCEV *BackedgeTakenCountSCEV = PSE.getSymbolicMaxBackedgeTakenCount();
899-
assert(!isa<SCEVCouldNotCompute>(BackedgeTakenCountSCEV) &&
900-
"Invalid loop count");
901-
ScalarEvolution &SE = *PSE.getSE();
902-
const SCEV *TripCount = SE.getTripCountFromExitCount(BackedgeTakenCountSCEV,
903-
InductionTy, TheLoop);
904-
Plan->TripCount =
905-
vputils::getOrCreateVPValueForSCEVExpr(*Plan, TripCount, SE);
906-
907-
// Create VPRegionBlock, with empty header and latch blocks, to be filled
908-
// during processing later.
909-
VPBasicBlock *HeaderVPBB = Plan->createVPBasicBlock("vector.body");
910-
VPBasicBlock *LatchVPBB = Plan->createVPBasicBlock("vector.latch");
911-
VPBlockUtils::insertBlockAfter(LatchVPBB, HeaderVPBB);
912-
auto *TopRegion = Plan->createVPRegionBlock(
913-
HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);
914-
915-
VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
916-
VPBasicBlock *MiddleVPBB = Plan->createVPBasicBlock("middle.block");
917-
VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);
918-
919-
VPBasicBlock *ScalarPH = Plan->createVPBasicBlock("scalar.ph");
920-
VPBlockUtils::connectBlocks(ScalarPH, ScalarHeader);
921-
if (!RequiresScalarEpilogueCheck) {
922-
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
923-
return Plan;
924-
}
925-
926-
// If needed, add a check in the middle block to see if we have completed
927-
// all of the iterations in the first vector loop. Three cases:
928-
// 1) If (N - N%VF) == N, then we *don't* need to run the remainder.
929-
// Thus if tail is to be folded, we know we don't need to run the
930-
// remainder and we can set the condition to true.
931-
// 2) If we require a scalar epilogue, there is no conditional branch as
932-
// we unconditionally branch to the scalar preheader. Do nothing.
933-
// 3) Otherwise, construct a runtime check.
934-
BasicBlock *IRExitBlock = TheLoop->getUniqueLatchExitBlock();
935-
auto *VPExitBlock = Plan->createVPIRBasicBlock(IRExitBlock);
936-
// The connection order corresponds to the operands of the conditional branch.
937-
VPBlockUtils::insertBlockAfter(VPExitBlock, MiddleVPBB);
938-
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
939-
940-
auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
941-
// Here we use the same DebugLoc as the scalar loop latch terminator instead
942-
// of the corresponding compare because they may have ended up with
943-
// different line numbers and we want to avoid awkward line stepping while
944-
// debugging. Eg. if the compare has got a line number inside the loop.
945-
VPBuilder Builder(MiddleVPBB);
946-
VPValue *Cmp =
947-
TailFolded
948-
? Plan->getOrAddLiveIn(ConstantInt::getTrue(
949-
IntegerType::getInt1Ty(TripCount->getType()->getContext())))
950-
: Builder.createICmp(CmpInst::ICMP_EQ, Plan->getTripCount(),
951-
&Plan->getVectorTripCount(),
952-
ScalarLatchTerm->getDebugLoc(), "cmp.n");
953-
Builder.createNaryOp(VPInstruction::BranchOnCond, {Cmp},
954-
ScalarLatchTerm->getDebugLoc());
955-
return Plan;
956-
}
957-
958879
void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
959880
VPTransformState &State) {
960881
Type *TCTy = TripCountV->getType();
@@ -1123,11 +1044,13 @@ void VPlan::printLiveIns(raw_ostream &O) const {
11231044
}
11241045

11251046
O << "\n";
1126-
if (TripCount->isLiveIn())
1127-
O << "Live-in ";
1128-
TripCount->printAsOperand(O, SlotTracker);
1129-
O << " = original trip-count";
1130-
O << "\n";
1047+
if (TripCount) {
1048+
if (TripCount->isLiveIn())
1049+
O << "Live-in ";
1050+
TripCount->printAsOperand(O, SlotTracker);
1051+
O << " = original trip-count";
1052+
O << "\n";
1053+
}
11311054
}
11321055

11331056
LLVM_DUMP_METHOD

llvm/lib/Transforms/Vectorize/VPlan.h

+2-15
Original file line numberDiff line numberDiff line change
@@ -3500,21 +3500,6 @@ class VPlan {
35003500
VPBB->setPlan(this);
35013501
}
35023502

3503-
/// Create initial VPlan, having an "entry" VPBasicBlock (wrapping
3504-
/// original scalar pre-header) which contains SCEV expansions that need
3505-
/// to happen before the CFG is modified (when executing a VPlan for the
3506-
/// epilogue vector loop, the original entry needs to be replaced by a new
3507-
/// one); a VPBasicBlock for the vector pre-header, followed by a region for
3508-
/// the vector loop, followed by the middle VPBasicBlock. If a check is needed
3509-
/// to guard executing the scalar epilogue loop, it will be added to the
3510-
/// middle block, together with VPBasicBlocks for the scalar preheader and
3511-
/// exit blocks. \p InductionTy is the type of the canonical induction and
3512-
/// used for related values, like the trip count expression.
3513-
static VPlanPtr createInitialVPlan(Type *InductionTy,
3514-
PredicatedScalarEvolution &PSE,
3515-
bool RequiresScalarEpilogueCheck,
3516-
bool TailFolded, Loop *TheLoop);
3517-
35183503
/// Prepare the plan for execution, setting up the required live-in values.
35193504
void prepareToExecute(Value *TripCount, Value *VectorTripCount,
35203505
VPTransformState &State);
@@ -3582,6 +3567,8 @@ class VPlan {
35823567
TripCount = NewTripCount;
35833568
}
35843569

3570+
void setTripCount(VPValue *NewTripCount) { TripCount = NewTripCount; }
3571+
35853572
/// The backedge taken count of the original loop.
35863573
VPValue *getOrCreateBackedgeTakenCount() {
35873574
if (!BackedgeTakenCount)

llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp

+20-33
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@ VPBasicBlock *PlainCFGBuilder::getOrCreateVPBB(BasicBlock *BB) {
180180

181181
// Get or create a region for the loop containing BB.
182182
Loop *LoopOfBB = LI->getLoopFor(BB);
183-
if (!LoopOfBB || !doesContainLoop(LoopOfBB, TheLoop))
183+
if (!LoopOfBB || LoopOfBB == TheLoop || !doesContainLoop(LoopOfBB, TheLoop))
184184
return VPBB;
185185

186186
auto *RegionOfVPBB = Loop2Region.lookup(LoopOfBB);
@@ -353,29 +353,6 @@ void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
353353
// Main interface to build the plain CFG.
354354
void PlainCFGBuilder::buildPlainCFG(
355355
DenseMap<VPBlockBase *, BasicBlock *> &VPB2IRBB) {
356-
// 0. Reuse the top-level region, vector-preheader and exit VPBBs from the
357-
// skeleton. These were created directly rather than via getOrCreateVPBB(),
358-
// revisit them now to update BB2VPBB. Note that header/entry and
359-
// latch/exiting VPBB's of top-level region have yet to be created.
360-
VPRegionBlock *TheRegion = Plan.getVectorLoopRegion();
361-
BasicBlock *ThePreheaderBB = TheLoop->getLoopPreheader();
362-
assert((ThePreheaderBB->getTerminator()->getNumSuccessors() == 1) &&
363-
"Unexpected loop preheader");
364-
auto *VectorPreheaderVPBB =
365-
cast<VPBasicBlock>(TheRegion->getSinglePredecessor());
366-
// ThePreheaderBB conceptually corresponds to both Plan.getPreheader() (which
367-
// wraps the original preheader BB) and Plan.getEntry() (which represents the
368-
// new vector preheader); here we're interested in setting BB2VPBB to the
369-
// latter.
370-
BB2VPBB[ThePreheaderBB] = VectorPreheaderVPBB;
371-
Loop2Region[LI->getLoopFor(TheLoop->getHeader())] = TheRegion;
372-
373-
// The existing vector region's entry and exiting VPBBs correspond to the loop
374-
// header and latch.
375-
VPBasicBlock *VectorHeaderVPBB = TheRegion->getEntryBasicBlock();
376-
VPBasicBlock *VectorLatchVPBB = TheRegion->getExitingBasicBlock();
377-
BB2VPBB[TheLoop->getHeader()] = VectorHeaderVPBB;
378-
VectorHeaderVPBB->clearSuccessors();
379356

380357
// 1. Scan the body of the loop in a topological order to visit each basic
381358
// block after having visited its predecessor basic blocks. Create a VPBB for
@@ -386,6 +363,9 @@ void PlainCFGBuilder::buildPlainCFG(
386363

387364
// Loop PH needs to be explicitly visited since it's not taken into account by
388365
// LoopBlocksDFS.
366+
BasicBlock *ThePreheaderBB = TheLoop->getLoopPreheader();
367+
assert((ThePreheaderBB->getTerminator()->getNumSuccessors() == 1) &&
368+
"Unexpected loop preheader");
389369
for (auto &I : *ThePreheaderBB) {
390370
if (I.getType()->isVoidTy())
391371
continue;
@@ -406,18 +386,16 @@ void PlainCFGBuilder::buildPlainCFG(
406386
} else {
407387
// BB is a loop header, set the predecessor for the region, except for the
408388
// top region, whose predecessor was set when creating VPlan's skeleton.
409-
assert(isHeaderVPBB(VPBB) && "isHeaderBB and isHeaderVPBB disagree");
410-
if (TheRegion != Region)
389+
if (LoopForBB != TheLoop)
411390
setRegionPredsFromBB(Region, BB);
412391
}
413392

414393
// Create VPInstructions for BB.
415394
createVPInstructionsForVPBB(VPBB, BB);
416395

417-
if (TheLoop->getLoopLatch() == BB) {
418-
VPBB->setOneSuccessor(VectorLatchVPBB);
419-
VectorLatchVPBB->clearPredecessors();
420-
VectorLatchVPBB->setPredecessors({VPBB});
396+
if (BB == TheLoop->getLoopLatch()) {
397+
VPBasicBlock *HeaderVPBB = getOrCreateVPBB(LoopForBB->getHeader());
398+
VPBlockUtils::connectBlocks(VPBB, HeaderVPBB);
421399
continue;
422400
}
423401

@@ -449,16 +427,22 @@ void PlainCFGBuilder::buildPlainCFG(
449427
VPBasicBlock *Successor0 = getOrCreateVPBB(IRSucc0);
450428
VPBasicBlock *Successor1 = getOrCreateVPBB(IRSucc1);
451429
if (BB == LoopForBB->getLoopLatch()) {
452-
// For a latch we need to set the successor of the region rather than that
453-
// of VPBB and it should be set to the exit, i.e., non-header successor,
430+
// For a latch we need to set the successor of the region rather
431+
// than that
432+
// of VPBB and it should be set to the exit, i.e., non-header
433+
// successor,
454434
// except for the top region, whose successor was set when creating
455435
// VPlan's skeleton.
456-
assert(TheRegion != Region &&
436+
assert(LoopForBB != TheLoop &&
457437
"Latch of the top region should have been handled earlier");
458438
Region->setOneSuccessor(isHeaderVPBB(Successor0) ? Successor1
459439
: Successor0);
460440
Region->setExiting(VPBB);
461441
continue;
442+
443+
VPBasicBlock *HeaderVPBB = getOrCreateVPBB(LoopForBB->getHeader());
444+
VPBlockUtils::connectBlocks(VPBB, HeaderVPBB);
445+
continue;
462446
}
463447

464448
// Don't connect any blocks outside the current loop except the latch for
@@ -482,6 +466,9 @@ void PlainCFGBuilder::buildPlainCFG(
482466
// corresponding VPlan operands.
483467
fixHeaderPhis();
484468

469+
VPBlockUtils::connectBlocks(Plan.getEntry(),
470+
getOrCreateVPBB(TheLoop->getHeader()));
471+
485472
for (const auto &[IRBB, VPB] : BB2VPBB)
486473
VPB2IRBB[VPB] = IRBB;
487474
}

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+76
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,82 @@
3232

3333
using namespace llvm;
3434

35+
void VPlanTransforms::introduceTopLevelVectorLoopRegion(
36+
VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
37+
bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop) {
38+
auto *HeaderVPBB = cast<VPBasicBlock>(Plan.getEntry()->getSingleSuccessor());
39+
VPBlockUtils::disconnectBlocks(Plan.getEntry(), HeaderVPBB);
40+
41+
VPBasicBlock *OriginalLatch =
42+
cast<VPBasicBlock>(HeaderVPBB->getSinglePredecessor());
43+
VPBlockUtils::disconnectBlocks(OriginalLatch, HeaderVPBB);
44+
VPBasicBlock *VecPreheader = Plan.createVPBasicBlock("vector.ph");
45+
VPBlockUtils::connectBlocks(Plan.getEntry(), VecPreheader);
46+
47+
// Create SCEV and VPValue for the trip count.
48+
// We use the symbolic max backedge-taken-count, which works also when
49+
// vectorizing loops with uncountable early exits.
50+
const SCEV *BackedgeTakenCountSCEV = PSE.getSymbolicMaxBackedgeTakenCount();
51+
assert(!isa<SCEVCouldNotCompute>(BackedgeTakenCountSCEV) &&
52+
"Invalid loop count");
53+
ScalarEvolution &SE = *PSE.getSE();
54+
const SCEV *TripCount = SE.getTripCountFromExitCount(BackedgeTakenCountSCEV,
55+
InductionTy, TheLoop);
56+
Plan.setTripCount(
57+
vputils::getOrCreateVPValueForSCEVExpr(Plan, TripCount, SE));
58+
59+
// Create VPRegionBlock, with empty header and latch blocks, to be filled
60+
// during processing later.
61+
VPBasicBlock *LatchVPBB = Plan.createVPBasicBlock("vector.latch");
62+
VPBlockUtils::insertBlockAfter(LatchVPBB, OriginalLatch);
63+
auto *TopRegion = Plan.createVPRegionBlock(
64+
HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);
65+
for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB)) {
66+
VPBB->setParent(TopRegion);
67+
}
68+
69+
VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
70+
VPBasicBlock *MiddleVPBB = Plan.createVPBasicBlock("middle.block");
71+
VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);
72+
73+
VPBasicBlock *ScalarPH = Plan.createVPBasicBlock("scalar.ph");
74+
VPBlockUtils::connectBlocks(ScalarPH, Plan.getScalarHeader());
75+
if (!RequiresScalarEpilogueCheck) {
76+
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
77+
return;
78+
}
79+
80+
// If needed, add a check in the middle block to see if we have completed
81+
// all of the iterations in the first vector loop. Three cases:
82+
// 1) If (N - N%VF) == N, then we *don't* need to run the remainder.
83+
// Thus if tail is to be folded, we know we don't need to run the
84+
// remainder and we can set the condition to true.
85+
// 2) If we require a scalar epilogue, there is no conditional branch as
86+
// we unconditionally branch to the scalar preheader. Do nothing.
87+
// 3) Otherwise, construct a runtime check.
88+
BasicBlock *IRExitBlock = TheLoop->getUniqueLatchExitBlock();
89+
auto *VPExitBlock = Plan.createVPIRBasicBlock(IRExitBlock);
90+
// The connection order corresponds to the operands of the conditional branch.
91+
VPBlockUtils::insertBlockAfter(VPExitBlock, MiddleVPBB);
92+
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
93+
94+
auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
95+
// Here we use the same DebugLoc as the scalar loop latch terminator instead
96+
// of the corresponding compare because they may have ended up with
97+
// different line numbers and we want to avoid awkward line stepping while
98+
// debugging. Eg. if the compare has got a line number inside the loop.
99+
VPBuilder Builder(MiddleVPBB);
100+
VPValue *Cmp =
101+
TailFolded
102+
? Plan.getOrAddLiveIn(ConstantInt::getTrue(
103+
IntegerType::getInt1Ty(TripCount->getType()->getContext())))
104+
: Builder.createICmp(CmpInst::ICMP_EQ, Plan.getTripCount(),
105+
&Plan.getVectorTripCount(),
106+
ScalarLatchTerm->getDebugLoc(), "cmp.n");
107+
Builder.createNaryOp(VPInstruction::BranchOnCond, {Cmp},
108+
ScalarLatchTerm->getDebugLoc());
109+
}
110+
35111
void VPlanTransforms::VPInstructionsToVPRecipes(
36112
VPlanPtr &Plan,
37113
function_ref<const InductionDescriptor *(PHINode *)>

llvm/lib/Transforms/Vectorize/VPlanTransforms.h

+15
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,21 @@ struct VPlanTransforms {
5252
verifyVPlanIsValid(Plan);
5353
}
5454

55+
/// Introduce the top-level VPRegionBlock for the main loop in \p Plan. Coming
56+
/// in this function, \p Plan's top-level loop is modeled using a plain CFG.
57+
/// This transforms replaces the plain CFG with a VPRegionBlock wrapping the
58+
/// top-level loop and creates a VPValue expressions for the original trip
59+
/// count. It will also introduce a dedicated VPBasicBlock for the vector
60+
/// pre-header as well a VPBasicBlock as exit block of the region
61+
/// (middle.block). If a check is needed to guard executing the scalar
62+
/// epilogue loop, it will be added to the middle block, together with
63+
/// VPBasicBlocks for the scalar preheader and exit blocks. \p InductionTy is
64+
/// the type of the canonical induction and used for related values, like the
65+
/// trip count expression.
66+
static void introduceTopLevelVectorLoopRegion(
67+
VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
68+
bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop);
69+
5570
/// Replaces the VPInstructions in \p Plan with corresponding
5671
/// widen recipes.
5772
static void

0 commit comments

Comments
 (0)