Skip to content

Commit b6ecd14

Browse files
david-armgithub-actions[bot]
authored andcommitted
Automerge: [LoopVectorize] Enable vectorisation of early exit loops with live-outs (#120567)
This work feeds part of PR llvm/llvm-project#88385, and adds support for vectorising loops with uncountable early exits and outside users of loop-defined variables. When calculating the final value from an uncountable early exit we need to calculate the vector lane that triggered the exit, and hence determine the value at the point we exited. All code for calculating the last value when exiting the loop early now lives in a new vector.early.exit block, which sits between the middle.split block and the original exit block. Doing this required two fixes: 1. The vplan verifier incorrectly assumed that the block containing a definition always dominates the block of the user. That's not true if you can arrive at the use block from multiple incoming blocks. This is possible for early exit loops where both the early exit and the latch jump to the same block. 2. We were adding the new vector.early.exit to the wrong parent loop. It needs to have the same parent as the actual early exit block from the original loop. I've added a new ExtractFirstActive VPInstruction that extracts the first active lane of a vector, i.e. the lane of the vector predicate that triggered the exit. NOTE: The IR generated for dealing with live-outs from early exit loops is unoptimised, as opposed to normal loops. This inevitably leads to poor quality code, but this can be fixed up later.
2 parents 62df834 + 3bc2dad commit b6ecd14

16 files changed

+1003
-146
lines changed

llvm/docs/Vectorizers.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -405,9 +405,11 @@ Early Exit Vectorization
405405
When vectorizing a loop with a single early exit, the loop blocks following the
406406
early exit are predicated and the vector loop will always exit via the latch.
407407
If the early exit has been taken, the vector loop's successor block
408-
(``middle.split`` below) branches to the early exit block. Otherwise
409-
``middle.block`` selects between the exit block from the latch or the scalar
410-
remainder loop.
408+
(``middle.split`` below) branches to the early exit block via an intermediate
409+
block (``vector.early.exit`` below). This intermediate block is responsible for
410+
calculating any exit values of loop-defined variables that are used in the
411+
early exit block. Otherwise, ``middle.block`` selects between the exit block
412+
from the latch or the scalar remainder loop.
411413

412414
.. image:: vplan-early-exit.png
413415

llvm/docs/vplan-early-exit.dot

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -19,23 +19,27 @@ compound=true
1919
"middle.split"
2020
]
2121
N4 -> N5 [ label=""]
22-
N4 -> N6 [ label=""]
22+
N4 -> N7 [ label=""]
2323
N5 [label =
24-
"early.exit"
24+
"vector.early.exit"
2525
]
26+
N5 -> N6 [ label=""]
2627
N6 [label =
27-
"middle.block"
28+
"early.exit"
2829
]
29-
N6 -> N9 [ label=""]
30-
N6 -> N7 [ label=""]
3130
N7 [label =
32-
"scalar.ph"
31+
"middle.block"
3332
]
33+
N7 -> N10 [ label=""]
3434
N7 -> N8 [ label=""]
3535
N8 [label =
36-
"loop.header"
36+
"scalar.ph"
3737
]
38+
N8 -> N9 [ label=""]
3839
N9 [label =
40+
"loop.header"
41+
]
42+
N10 [label =
3943
"latch.exit"
4044
]
4145
}

llvm/docs/vplan-early-exit.png

-54.4 KB
Loading

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9407,14 +9407,8 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
94079407

94089408
if (auto *UncountableExitingBlock =
94099409
Legal->getUncountableEarlyExitingBlock()) {
9410-
if (!VPlanTransforms::handleUncountableEarlyExit(
9411-
*Plan, *PSE.getSE(), OrigLoop, UncountableExitingBlock,
9412-
RecipeBuilder)) {
9413-
reportVectorizationFailure(
9414-
"Some exit values in loop with uncountable exit not supported yet",
9415-
"UncountableEarlyExitLoopsUnsupportedExitValue", ORE, OrigLoop);
9416-
return nullptr;
9417-
}
9410+
VPlanTransforms::handleUncountableEarlyExit(
9411+
*Plan, *PSE.getSE(), OrigLoop, UncountableExitingBlock, RecipeBuilder);
94189412
}
94199413
DenseMap<VPValue *, VPValue *> IVEndValues;
94209414
addScalarResumePhis(RecipeBuilder, *Plan, IVEndValues);

llvm/lib/Transforms/Vectorize/VPlan.cpp

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -501,8 +501,15 @@ void VPBasicBlock::execute(VPTransformState *State) {
501501
UnreachableInst *Terminator = State->Builder.CreateUnreachable();
502502
// Register NewBB in its loop. In innermost loops its the same for all
503503
// BB's.
504-
if (State->CurrentParentLoop)
505-
State->CurrentParentLoop->addBasicBlockToLoop(NewBB, *State->LI);
504+
Loop *ParentLoop = State->CurrentParentLoop;
505+
// If this block has a sole successor that is an exit block then it needs
506+
// adding to the same parent loop as the exit block.
507+
VPBlockBase *SuccVPBB = getSingleSuccessor();
508+
if (SuccVPBB && State->Plan->isExitBlock(SuccVPBB))
509+
ParentLoop = State->LI->getLoopFor(
510+
cast<VPIRBasicBlock>(SuccVPBB)->getIRBasicBlock());
511+
if (ParentLoop)
512+
ParentLoop->addBasicBlockToLoop(NewBB, *State->LI);
506513
State->Builder.SetInsertPoint(Terminator);
507514

508515
State->CFG.PrevBB = NewBB;
@@ -950,6 +957,10 @@ void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
950957
}
951958
}
952959

960+
bool VPlan::isExitBlock(VPBlockBase *VPBB) {
961+
return isa<VPIRBasicBlock>(VPBB) && VPBB->getNumSuccessors() == 0;
962+
}
963+
953964
/// Generate the code inside the preheader and body of the vectorized loop.
954965
/// Assumes a single pre-header basic-block was created for this. Introduce
955966
/// additional basic-blocks as needed, and fill them all.

llvm/lib/Transforms/Vectorize/VPlan.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1223,6 +1223,9 @@ class VPInstruction : public VPRecipeWithIRFlags,
12231223
// Returns a scalar boolean value, which is true if any lane of its (only
12241224
// boolean) vector operand is true.
12251225
AnyOf,
1226+
// Extracts the first active lane of a vector, where the first operand is
1227+
// the predicate, and the second operand is the vector to extract.
1228+
ExtractFirstActive,
12261229
};
12271230

12281231
private:
@@ -3967,6 +3970,9 @@ class VPlan {
39673970
/// of VPBlockShallowTraversalWrapper.
39683971
auto getExitBlocks();
39693972

3973+
/// Returns true if \p VPBB is an exit block.
3974+
bool isExitBlock(VPBlockBase *VPBB);
3975+
39703976
/// The trip count of the original loop.
39713977
VPValue *getTripCount() const {
39723978
assert(TripCount && "trip count needs to be set before accessing it");

llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPInstruction *R) {
7878
case VPInstruction::CanonicalIVIncrementForPart:
7979
case VPInstruction::AnyOf:
8080
return SetResultTyFromOp();
81+
case VPInstruction::ExtractFirstActive:
8182
case VPInstruction::ExtractFromEnd: {
8283
Type *BaseTy = inferScalarType(R->getOperand(0));
8384
if (auto *VecTy = dyn_cast<VectorType>(BaseTy))

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -697,14 +697,21 @@ Value *VPInstruction::generate(VPTransformState &State) {
697697
Value *A = State.get(getOperand(0));
698698
return Builder.CreateOrReduce(A);
699699
}
700-
700+
case VPInstruction::ExtractFirstActive: {
701+
Value *Vec = State.get(getOperand(0));
702+
Value *Mask = State.get(getOperand(1));
703+
Value *Ctz = Builder.CreateCountTrailingZeroElems(
704+
Builder.getInt64Ty(), Mask, true, "first.active.lane");
705+
return Builder.CreateExtractElement(Vec, Ctz, "early.exit.value");
706+
}
701707
default:
702708
llvm_unreachable("Unsupported opcode for instruction");
703709
}
704710
}
705711

706712
bool VPInstruction::isVectorToScalar() const {
707713
return getOpcode() == VPInstruction::ExtractFromEnd ||
714+
getOpcode() == VPInstruction::ExtractFirstActive ||
708715
getOpcode() == VPInstruction::ComputeReductionResult ||
709716
getOpcode() == VPInstruction::AnyOf;
710717
}
@@ -769,6 +776,7 @@ bool VPInstruction::opcodeMayReadOrWriteFromMemory() const {
769776
case VPInstruction::CalculateTripCountMinusVF:
770777
case VPInstruction::CanonicalIVIncrementForPart:
771778
case VPInstruction::ExtractFromEnd:
779+
case VPInstruction::ExtractFirstActive:
772780
case VPInstruction::FirstOrderRecurrenceSplice:
773781
case VPInstruction::LogicalAnd:
774782
case VPInstruction::Not:
@@ -888,6 +896,9 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent,
888896
case VPInstruction::AnyOf:
889897
O << "any-of";
890898
break;
899+
case VPInstruction::ExtractFirstActive:
900+
O << "extract-first-active";
901+
break;
891902
default:
892903
O << Instruction::getOpcodeName(getOpcode());
893904
}

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2064,7 +2064,7 @@ void VPlanTransforms::convertToConcreteRecipes(VPlan &Plan) {
20642064
}
20652065
}
20662066

2067-
bool VPlanTransforms::handleUncountableEarlyExit(
2067+
void VPlanTransforms::handleUncountableEarlyExit(
20682068
VPlan &Plan, ScalarEvolution &SE, Loop *OrigLoop,
20692069
BasicBlock *UncountableExitingBlock, VPRecipeBuilder &RecipeBuilder) {
20702070
VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
@@ -2101,12 +2101,17 @@ bool VPlanTransforms::handleUncountableEarlyExit(
21012101
Builder.createNaryOp(VPInstruction::AnyOf, {EarlyExitTakenCond});
21022102

21032103
VPBasicBlock *NewMiddle = Plan.createVPBasicBlock("middle.split");
2104+
VPBasicBlock *VectorEarlyExitVPBB =
2105+
Plan.createVPBasicBlock("vector.early.exit");
21042106
VPBlockUtils::insertOnEdge(LoopRegion, MiddleVPBB, NewMiddle);
2105-
VPBlockUtils::connectBlocks(NewMiddle, VPEarlyExitBlock);
2107+
VPBlockUtils::connectBlocks(NewMiddle, VectorEarlyExitVPBB);
21062108
NewMiddle->swapSuccessors();
21072109

2110+
VPBlockUtils::connectBlocks(VectorEarlyExitVPBB, VPEarlyExitBlock);
2111+
21082112
// Update the exit phis in the early exit block.
21092113
VPBuilder MiddleBuilder(NewMiddle);
2114+
VPBuilder EarlyExitB(VectorEarlyExitVPBB);
21102115
for (VPRecipeBase &R : *VPEarlyExitBlock) {
21112116
auto *ExitIRI = cast<VPIRInstruction>(&R);
21122117
auto *ExitPhi = dyn_cast<PHINode>(&ExitIRI->getInstruction());
@@ -2115,9 +2120,6 @@ bool VPlanTransforms::handleUncountableEarlyExit(
21152120

21162121
VPValue *IncomingFromEarlyExit = RecipeBuilder.getVPValueOrAddLiveIn(
21172122
ExitPhi->getIncomingValueForBlock(UncountableExitingBlock));
2118-
// The incoming value from the early exit must be a live-in for now.
2119-
if (!IncomingFromEarlyExit->isLiveIn())
2120-
return false;
21212123

21222124
if (OrigLoop->getUniqueExitBlock()) {
21232125
// If there's a unique exit block, VPEarlyExitBlock has 2 predecessors
@@ -2129,6 +2131,10 @@ bool VPlanTransforms::handleUncountableEarlyExit(
21292131
ExitIRI->extractLastLaneOfOperand(MiddleBuilder);
21302132
}
21312133
// Add the incoming value from the early exit.
2134+
if (!IncomingFromEarlyExit->isLiveIn())
2135+
IncomingFromEarlyExit =
2136+
EarlyExitB.createNaryOp(VPInstruction::ExtractFirstActive,
2137+
{IncomingFromEarlyExit, EarlyExitTakenCond});
21322138
ExitIRI->addOperand(IncomingFromEarlyExit);
21332139
}
21342140
MiddleBuilder.createNaryOp(VPInstruction::BranchOnCond, {IsEarlyExitTaken});
@@ -2146,5 +2152,4 @@ bool VPlanTransforms::handleUncountableEarlyExit(
21462152
Instruction::Or, {IsEarlyExitTaken, IsLatchExitTaken});
21472153
Builder.createNaryOp(VPInstruction::BranchOnCond, AnyExitTaken);
21482154
LatchExitingBranch->eraseFromParent();
2149-
return true;
21502155
}

llvm/lib/Transforms/Vectorize/VPlanTransforms.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,7 @@ struct VPlanTransforms {
155155
/// exit conditions
156156
/// * splitting the original middle block to branch to the early exit block
157157
/// if taken.
158-
static bool handleUncountableEarlyExit(VPlan &Plan, ScalarEvolution &SE,
158+
static void handleUncountableEarlyExit(VPlan &Plan, ScalarEvolution &SE,
159159
Loop *OrigLoop,
160160
BasicBlock *UncountableExitingBlock,
161161
VPRecipeBuilder &RecipeBuilder);

llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,9 @@ bool VPlanVerifier::verifyVPBasicBlock(const VPBasicBlock *VPBB) {
209209
auto *UI = cast<VPRecipeBase>(U);
210210
// TODO: check dominance of incoming values for phis properly.
211211
if (!UI ||
212-
isa<VPHeaderPHIRecipe, VPWidenPHIRecipe, VPPredInstPHIRecipe>(UI))
212+
isa<VPHeaderPHIRecipe, VPWidenPHIRecipe, VPPredInstPHIRecipe>(UI) ||
213+
(isa<VPIRInstruction>(UI) &&
214+
isa<PHINode>(cast<VPIRInstruction>(UI)->getInstruction())))
213215
continue;
214216

215217
// If the user is in the same block, check it comes after R in the

0 commit comments

Comments
 (0)