Skip to content

[LoopVectorize] Enable vectorisation of early exit loops with live-outs #120567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 30, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions llvm/docs/Vectorizers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -405,9 +405,11 @@ Early Exit Vectorization
When vectorizing a loop with a single early exit, the loop blocks following the
early exit are predicated and the vector loop will always exit via the latch.
If the early exit has been taken, the vector loop's successor block
(``middle.split`` below) branches to the early exit block. Otherwise
``middle.block`` selects between the exit block from the latch or the scalar
remainder loop.
(``middle.split`` below) branches to the early exit block via an intermediate
block (``vector.early.exit`` below). This intermediate block is responsible for
calculating any exit values of loop-defined variables that are used in the
early exit block. Otherwise, ``middle.block`` selects between the exit block
from the latch or the scalar remainder loop.

.. image:: vplan-early-exit.png

Expand Down
18 changes: 11 additions & 7 deletions llvm/docs/vplan-early-exit.dot
Original file line number Diff line number Diff line change
Expand Up @@ -19,23 +19,27 @@ compound=true
"middle.split"
]
N4 -> N5 [ label=""]
N4 -> N6 [ label=""]
N4 -> N7 [ label=""]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't comment on the generated file, it looks like the size in pixels changed, just checking if that was intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't specifically try to change the size in pixels - I just ran dot -Tpng ../llvm/docs/vplan-early-exit.dot. Is there a recommended command for regenerating this? I couldn't see any documentation about this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that dot -Tpng -Gsize=15,14\! -Gdpi=100 vplan-early-exit.dot -o vplan-ee.png resulted in a larger image, though not the exact size I requested. May require some experimenting to get a good image. You could put the size and dpi directives in the .dot file instead to preserve it the next time someone modifies it.

It seems graphviz isn't terribly cooperative about this, and you need other tools to rescale or pad the edges.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I've regenerated it with that command and hopefully the image looks bigger now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't realize that there were different defaults on different platforms/version. The larger versions looks good, thanks!

N5 [label =
"early.exit"
"vector.early.exit"
]
N5 -> N6 [ label=""]
N6 [label =
"middle.block"
"early.exit"
]
N6 -> N9 [ label=""]
N6 -> N7 [ label=""]
N7 [label =
"scalar.ph"
"middle.block"
]
N7 -> N10 [ label=""]
N7 -> N8 [ label=""]
N8 [label =
"loop.header"
"scalar.ph"
]
N8 -> N9 [ label=""]
N9 [label =
"loop.header"
]
N10 [label =
"latch.exit"
]
}
Binary file modified llvm/docs/vplan-early-exit.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 2 additions & 8 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9396,14 +9396,8 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {

if (auto *UncountableExitingBlock =
Legal->getUncountableEarlyExitingBlock()) {
if (!VPlanTransforms::handleUncountableEarlyExit(
*Plan, *PSE.getSE(), OrigLoop, UncountableExitingBlock,
RecipeBuilder)) {
reportVectorizationFailure(
"Some exit values in loop with uncountable exit not supported yet",
"UncountableEarlyExitLoopsUnsupportedExitValue", ORE, OrigLoop);
return nullptr;
}
VPlanTransforms::handleUncountableEarlyExit(
*Plan, *PSE.getSE(), OrigLoop, UncountableExitingBlock, RecipeBuilder);
}
DenseMap<VPValue *, VPValue *> IVEndValues;
addScalarResumePhis(RecipeBuilder, *Plan, IVEndValues);
Expand Down
15 changes: 13 additions & 2 deletions llvm/lib/Transforms/Vectorize/VPlan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -500,8 +500,15 @@ void VPBasicBlock::execute(VPTransformState *State) {
UnreachableInst *Terminator = State->Builder.CreateUnreachable();
// Register NewBB in its loop. In innermost loops its the same for all
// BB's.
if (State->CurrentParentLoop)
State->CurrentParentLoop->addBasicBlockToLoop(NewBB, *State->LI);
Loop *ParentLoop = State->CurrentParentLoop;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to improve tracking the CurrentParentLoop instead of working around it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible, although you'll still end up with similar looking code in VPlan::execute where we execute the blocks:

  for (VPBlockBase *Block : RPOT)
    Block->execute(State);

It's worth bearing in mind that in VPRegionBlock::execute we also set the ParentLoop:

  if (!isReplicator()) {
    // Create and register the new vector loop.
    Loop *PrevLoop = State->CurrentParentLoop;
    State->CurrentParentLoop = State->LI->AllocateLoop();

and VPRegionBlock is included in that list. So depending upon how the blocks are traversed you may need to cache the previous value each time, i.e. something like:

  for (VPBlockBase *Block : RPOT) {
    Loop *OldParentLoop = State->ParentLoop;
    if (Block->getSingleSuccessor() && isExitBlock(Block->getSingleSuccessor()))
      State->ParentLoop = ... parent for exit ...
    Block->execute(State);
    State->ParentLoop = OldParentLoop;
  }

I'm honestly not sure if that's any better? The only way to do this neatly I think is if you deal with the VP blocks with a single successor exit block in a different loop, i.e.

  for (VPBlockBase *Block : RPOT) {
    if (!Block->getSingleSuccessor() || !isExitBlock(Block->getSingleSuccessor()))
      Block->execute(State);
  }
  for (VPBlockBase *Block : RPOT) {
    if (Block->getSingleSuccessor() && isExitBlock(Block->getSingleSuccessor())) {
      State->ParentLoop = ... exit block parent loop ...
      Block->execute(State);
    }
  }

// If this block has a sole successor that is an exit block then it needs
// adding to the same parent loop as the exit block.
VPBlockBase *SuccVPBB = getSingleSuccessor();
if (SuccVPBB && State->Plan->isExitBlock(SuccVPBB))
ParentLoop = State->LI->getLoopFor(
cast<VPIRBasicBlock>(SuccVPBB)->getIRBasicBlock());
if (ParentLoop)
ParentLoop->addBasicBlockToLoop(NewBB, *State->LI);
State->Builder.SetInsertPoint(Terminator);

State->CFG.PrevBB = NewBB;
Expand Down Expand Up @@ -949,6 +956,10 @@ void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
}
}

bool VPlan::isExitBlock(VPBlockBase *VPBB) {
return isa<VPIRBasicBlock>(VPBB) && VPBB->getNumSuccessors() == 0;
}

/// Generate the code inside the preheader and body of the vectorized loop.
/// Assumes a single pre-header basic-block was created for this. Introduce
/// additional basic-blocks as needed, and fill them all.
Expand Down
6 changes: 6 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -1223,6 +1223,9 @@ class VPInstruction : public VPRecipeWithIRFlags,
// Returns a scalar boolean value, which is true if any lane of its (only
// boolean) vector operand is true.
AnyOf,
// Extracts the first active lane of a vector, where the first operand is
// the predicate, and the second operand is the vector to extract.
ExtractFirstActive,
};

private:
Expand Down Expand Up @@ -3967,6 +3970,9 @@ class VPlan {
/// of VPBlockShallowTraversalWrapper.
auto getExitBlocks();

/// Returns true if \p VPBB is an exit block.
bool isExitBlock(VPBlockBase *VPBB);

/// The trip count of the original loop.
VPValue *getTripCount() const {
assert(TripCount && "trip count needs to be set before accessing it");
Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPInstruction *R) {
case VPInstruction::CanonicalIVIncrementForPart:
case VPInstruction::AnyOf:
return SetResultTyFromOp();
case VPInstruction::ExtractFirstActive:
case VPInstruction::ExtractFromEnd: {
Type *BaseTy = inferScalarType(R->getOperand(0));
if (auto *VecTy = dyn_cast<VectorType>(BaseTy))
Expand Down
13 changes: 12 additions & 1 deletion llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -697,14 +697,21 @@ Value *VPInstruction::generate(VPTransformState &State) {
Value *A = State.get(getOperand(0));
return Builder.CreateOrReduce(A);
}

case VPInstruction::ExtractFirstActive: {
Value *Vec = State.get(getOperand(0));
Value *Mask = State.get(getOperand(1));
Value *Ctz = Builder.CreateCountTrailingZeroElems(
Builder.getInt64Ty(), Mask, true, "first.active.lane");
return Builder.CreateExtractElement(Vec, Ctz, "early.exit.value");
}
default:
llvm_unreachable("Unsupported opcode for instruction");
}
}

bool VPInstruction::isVectorToScalar() const {
return getOpcode() == VPInstruction::ExtractFromEnd ||
getOpcode() == VPInstruction::ExtractFirstActive ||
getOpcode() == VPInstruction::ComputeReductionResult ||
getOpcode() == VPInstruction::AnyOf;
}
Expand Down Expand Up @@ -769,6 +776,7 @@ bool VPInstruction::opcodeMayReadOrWriteFromMemory() const {
case VPInstruction::CalculateTripCountMinusVF:
case VPInstruction::CanonicalIVIncrementForPart:
case VPInstruction::ExtractFromEnd:
case VPInstruction::ExtractFirstActive:
case VPInstruction::FirstOrderRecurrenceSplice:
case VPInstruction::LogicalAnd:
case VPInstruction::Not:
Expand Down Expand Up @@ -888,6 +896,9 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent,
case VPInstruction::AnyOf:
O << "any-of";
break;
case VPInstruction::ExtractFirstActive:
O << "extract-first-active";
break;
default:
O << Instruction::getOpcodeName(getOpcode());
}
Expand Down
17 changes: 11 additions & 6 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2062,7 +2062,7 @@ void VPlanTransforms::convertToConcreteRecipes(VPlan &Plan) {
}
}

bool VPlanTransforms::handleUncountableEarlyExit(
void VPlanTransforms::handleUncountableEarlyExit(
VPlan &Plan, ScalarEvolution &SE, Loop *OrigLoop,
BasicBlock *UncountableExitingBlock, VPRecipeBuilder &RecipeBuilder) {
VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
Expand Down Expand Up @@ -2099,12 +2099,17 @@ bool VPlanTransforms::handleUncountableEarlyExit(
Builder.createNaryOp(VPInstruction::AnyOf, {EarlyExitTakenCond});

VPBasicBlock *NewMiddle = Plan.createVPBasicBlock("middle.split");
VPBasicBlock *VectorEarlyExitVPBB =
Plan.createVPBasicBlock("vector.early.exit");
VPBlockUtils::insertOnEdge(LoopRegion, MiddleVPBB, NewMiddle);
VPBlockUtils::connectBlocks(NewMiddle, VPEarlyExitBlock);
VPBlockUtils::connectBlocks(NewMiddle, VectorEarlyExitVPBB);
NewMiddle->swapSuccessors();

VPBlockUtils::connectBlocks(VectorEarlyExitVPBB, VPEarlyExitBlock);

// Update the exit phis in the early exit block.
VPBuilder MiddleBuilder(NewMiddle);
VPBuilder EarlyExitB(VectorEarlyExitVPBB);
for (VPRecipeBase &R : *VPEarlyExitBlock) {
auto *ExitIRI = cast<VPIRInstruction>(&R);
auto *ExitPhi = dyn_cast<PHINode>(&ExitIRI->getInstruction());
Expand All @@ -2113,9 +2118,6 @@ bool VPlanTransforms::handleUncountableEarlyExit(

VPValue *IncomingFromEarlyExit = RecipeBuilder.getVPValueOrAddLiveIn(
ExitPhi->getIncomingValueForBlock(UncountableExitingBlock));
// The incoming value from the early exit must be a live-in for now.
if (!IncomingFromEarlyExit->isLiveIn())
return false;

if (OrigLoop->getUniqueExitBlock()) {
// If there's a unique exit block, VPEarlyExitBlock has 2 predecessors
Expand All @@ -2127,6 +2129,10 @@ bool VPlanTransforms::handleUncountableEarlyExit(
ExitIRI->extractLastLaneOfOperand(MiddleBuilder);
}
// Add the incoming value from the early exit.
if (!IncomingFromEarlyExit->isLiveIn())
IncomingFromEarlyExit =
EarlyExitB.createNaryOp(VPInstruction::ExtractFirstActive,
{IncomingFromEarlyExit, EarlyExitTakenCond});
ExitIRI->addOperand(IncomingFromEarlyExit);
}
MiddleBuilder.createNaryOp(VPInstruction::BranchOnCond, {IsEarlyExitTaken});
Expand All @@ -2144,5 +2150,4 @@ bool VPlanTransforms::handleUncountableEarlyExit(
Instruction::Or, {IsEarlyExitTaken, IsLatchExitTaken});
Builder.createNaryOp(VPInstruction::BranchOnCond, AnyExitTaken);
LatchExitingBranch->eraseFromParent();
return true;
}
2 changes: 1 addition & 1 deletion llvm/lib/Transforms/Vectorize/VPlanTransforms.h
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ struct VPlanTransforms {
/// exit conditions
/// * splitting the original middle block to branch to the early exit block
/// if taken.
static bool handleUncountableEarlyExit(VPlan &Plan, ScalarEvolution &SE,
static void handleUncountableEarlyExit(VPlan &Plan, ScalarEvolution &SE,
Loop *OrigLoop,
BasicBlock *UncountableExitingBlock,
VPRecipeBuilder &RecipeBuilder);
Expand Down
4 changes: 3 additions & 1 deletion llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,9 @@ bool VPlanVerifier::verifyVPBasicBlock(const VPBasicBlock *VPBB) {
auto *UI = cast<VPRecipeBase>(U);
// TODO: check dominance of incoming values for phis properly.
if (!UI ||
isa<VPHeaderPHIRecipe, VPWidenPHIRecipe, VPPredInstPHIRecipe>(UI))
isa<VPHeaderPHIRecipe, VPWidenPHIRecipe, VPPredInstPHIRecipe>(UI) ||
(isa<VPIRInstruction>(UI) &&
isa<PHINode>(cast<VPIRInstruction>(UI)->getInstruction())))
continue;

// If the user is in the same block, check it comes after R in the
Expand Down
Loading