Skip to content

[VPlan] Generalize collectUsersInExitBlocks for multiple exit bbs. #115066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 21, 2024
78 changes: 40 additions & 38 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8864,56 +8864,56 @@ static void addScalarResumePhis(VPRecipeBuilder &Builder, VPlan &Plan) {
}
}

// Collect VPIRInstructions for phis in the original exit block that are modeled
// Collect VPIRInstructions for phis in the exit blocks that are modeled
// in VPlan and add the exiting VPValue as operand. Some exiting values are not
// modeled explicitly yet and won't be included. Those are un-truncated
// VPWidenIntOrFpInductionRecipe, VPWidenPointerInductionRecipe and induction
// increments.
static SetVector<VPIRInstruction *> collectUsersInExitBlock(
static SetVector<VPIRInstruction *> collectUsersInExitBlocks(
Loop *OrigLoop, VPRecipeBuilder &Builder, VPlan &Plan,
const MapVector<PHINode *, InductionDescriptor> &Inductions) {
auto *MiddleVPBB = Plan.getMiddleBlock();
// No edge from the middle block to the unique exit block has been inserted
// and there is nothing to fix from vector loop; phis should have incoming
// from scalar loop only.
if (MiddleVPBB->getNumSuccessors() != 2)
return {};
SetVector<VPIRInstruction *> ExitUsersToFix;
VPBasicBlock *ExitVPBB = cast<VPIRBasicBlock>(MiddleVPBB->getSuccessors()[0]);
BasicBlock *ExitingBB = OrigLoop->getExitingBlock();
for (VPRecipeBase &R : *ExitVPBB) {
auto *ExitIRI = dyn_cast<VPIRInstruction>(&R);
if (!ExitIRI)
continue;
auto *ExitPhi = dyn_cast<PHINode>(&ExitIRI->getInstruction());
if (!ExitPhi)
break;
Value *IncomingValue = ExitPhi->getIncomingValueForBlock(ExitingBB);
VPValue *V = Builder.getVPValueOrAddLiveIn(IncomingValue);
// Exit values for inductions are computed and updated outside of VPlan and
// independent of induction recipes.
// TODO: Compute induction exit values in VPlan.
if ((isa<VPWidenIntOrFpInductionRecipe>(V) &&
!cast<VPWidenIntOrFpInductionRecipe>(V)->getTruncInst()) ||
isa<VPWidenPointerInductionRecipe>(V) ||
(isa<Instruction>(IncomingValue) &&
OrigLoop->contains(cast<Instruction>(IncomingValue)) &&
any_of(IncomingValue->users(), [&Inductions](User *U) {
auto *P = dyn_cast<PHINode>(U);
return P && Inductions.contains(P);
})))
continue;
ExitUsersToFix.insert(ExitIRI);
ExitIRI->addOperand(V);
for (VPIRBasicBlock *ExitVPBB : Plan.getExitBlocks()) {
BasicBlock *ExitBB = ExitVPBB->getIRBasicBlock();
BasicBlock *ExitingBB = find_singleton<BasicBlock>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this works when all exiting blocks jump to the same exit block? LoopVectorizationLegality has a list of exiting and exit blocks that could perhaps be useful here? Otherwise you could compile a list of all exiting blocks for that exit block and walk over them.

This algorithm is definitely more powerful than what I originally planned in #88385, which isn't a bad thing. However, I guess it is more difficult to implement for the general case. If it's easier you could just limit this to the single early exit case that LoopVectorizationLegality current supports? That class already has the single early exiting block and early exit block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To handle multiple exiting blocks, this can be replaced by iterating over all exiting blocks when processing each exit phi I think, as in https://github.com/llvm/llvm-project/pull/112138/files#diff-da321d454a7246f8ae276bf1db2782bf26b5210b8133cb59e4d7fd45d0905decR8857

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see so this patch is only generalising for multiple exit blocks on the assumption that there is only a single predecessor per exit block. It looks like in PR #112138 you've generalised this further by removing that restriction. I guess this is ok and reduces the diff for PR #112138. In that case is it worth adding a temporary assert until PR #112138 lands?

  assert(ExitingBB && "Expected exit block to have a single predecessor");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version of the patch has an assert in addUsersInExitBlocks to make sure we only updated the exit phis, if the incoming value is from the middle block. Do you think that's sufficient or should I add another one here as suggested?

I'll update #112138 once this lands, and the assert in addUsersInExitBlocks would be turned into a bail-out as it means we encountered a exit value we cannot handle yet, and only will be able to with #88385,?

Copy link
Contributor

@david-arm david-arm Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I see, that makes sense. In that case you're right what you have now is fine and happy for you to merge this patch!

to_vector(predecessors(ExitBB)),
[OrigLoop](BasicBlock *Pred, bool AllowRepeats) {
return OrigLoop->contains(Pred) ? Pred : nullptr;
});
for (VPRecipeBase &R : *ExitVPBB) {
auto *ExitIRI = dyn_cast<VPIRInstruction>(&R);
if (!ExitIRI)
continue;
auto *ExitPhi = dyn_cast<PHINode>(&ExitIRI->getInstruction());
if (!ExitPhi)
break;
Value *IncomingValue = ExitPhi->getIncomingValueForBlock(ExitingBB);
VPValue *V = Builder.getVPValueOrAddLiveIn(IncomingValue);
// Exit values for inductions are computed and updated outside of VPlan
// and independent of induction recipes.
// TODO: Compute induction exit values in VPlan.
if ((isa<VPWidenIntOrFpInductionRecipe>(V) &&
!cast<VPWidenIntOrFpInductionRecipe>(V)->getTruncInst()) ||
isa<VPWidenPointerInductionRecipe>(V) ||
(isa<Instruction>(IncomingValue) &&
OrigLoop->contains(cast<Instruction>(IncomingValue)) &&
any_of(IncomingValue->users(), [&Inductions](User *U) {
auto *P = dyn_cast<PHINode>(U);
return P && Inductions.contains(P);
})))
continue;
ExitUsersToFix.insert(ExitIRI);
ExitIRI->addOperand(V);
}
}
return ExitUsersToFix;
}

// Add exit values to \p Plan. Extracts are added for each entry in \p
// ExitUsersToFix if needed and their operands are updated.
static void
addUsersInExitBlock(VPlan &Plan,
const SetVector<VPIRInstruction *> &ExitUsersToFix) {
addUsersInExitBlocks(VPlan &Plan,
const SetVector<VPIRInstruction *> &ExitUsersToFix) {
if (ExitUsersToFix.empty())
return;

Expand All @@ -8929,6 +8929,8 @@ addUsersInExitBlock(VPlan &Plan,
if (V->isLiveIn())
continue;

assert(ExitIRI->getParent()->getSinglePredecessor() == MiddleVPBB &&
"Exit value not handled yet for this edge.");
LLVMContext &Ctx = ExitIRI->getInstruction().getContext();
VPValue *Ext = B.createNaryOp(VPInstruction::ExtractFromEnd,
{V, Plan.getOrAddLiveIn(ConstantInt::get(
Expand Down Expand Up @@ -9206,10 +9208,10 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
RecipeBuilder.fixHeaderPhis();

addScalarResumePhis(RecipeBuilder, *Plan);
SetVector<VPIRInstruction *> ExitUsersToFix = collectUsersInExitBlock(
SetVector<VPIRInstruction *> ExitUsersToFix = collectUsersInExitBlocks(
OrigLoop, RecipeBuilder, *Plan, Legal->getInductionVars());
addExitUsersForFirstOrderRecurrences(*Plan, ExitUsersToFix);
addUsersInExitBlock(*Plan, ExitUsersToFix);
addUsersInExitBlocks(*Plan, ExitUsersToFix);
// ---------------------------------------------------------------------------
// Transform initial VPlan: Apply previously taken decisions, in order, to
// bring the VPlan to its final state.
Expand Down
6 changes: 6 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -3839,6 +3839,12 @@ class VPlan {
return cast<VPBasicBlock>(getVectorLoopRegion()->getSingleSuccessor());
}

/// Return an iterator range over the VPIRBasicBlock wrapping the exit blocks
/// of the VPlan, that is leaf nodes except the scalar header. Defined in
/// VPlanHCFG, as the definition of the type needs access to the definitions
/// of VPBlockShallowTraversalWrapper.
auto getExitBlocks();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, it would be nice if this returned a concrete type or had a comment describing it. It's just a bit awkward for the reader having to look in a different header file to see what it returns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit unfortunate but I couldn't find a way to do that, as it needs accesses to the definition of VPBlockShallowTraversalWrapper to spell out the full type (other than moving everything in VPlanCFG back to this file).

There may be another way to spell out the type I am not aware of (it is a few nests of various iterator range adopters). For now I extended the comment to provide more information


/// The trip count of the original loop.
VPValue *getTripCount() const {
assert(TripCount && "trip count needs to be set before accessing it");
Expand Down
9 changes: 9 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanCFG.h
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,15 @@ template <> struct GraphTraits<VPlan *> {
}
};

inline auto VPlan::getExitBlocks() {
VPBlockBase *ScalarHeader = getScalarHeader();
return make_filter_range(
VPBlockUtils::blocksOnly<VPIRBasicBlock>(
vp_depth_first_shallow(getVectorLoopRegion()->getSingleSuccessor())),
[ScalarHeader](VPIRBasicBlock *VPIRBB) {
return VPIRBB != ScalarHeader && VPIRBB->getNumSuccessors() == 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in the other PR - I think this can return different blocks depending upon whether you call this before or during VPlan::execute. Is this something we should worry about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should always return the exit blocks currently in the VPlan, so if some transformation replaces/removes some of the exit blocks, the result would be different. But at the moment, the exit blocks should be fixed, we know them at time of construction and can wrap them in VPIRBasicBlocks and those shouldn't change (and if they change the transform would be responsible to make sure the exit phis are updated accordingly).

Is there anything particular in VPlan::execute you are thinking of? (We replace VPBasicBlocks with VPIRBasicBlocks in some cases if they are only created during skeleton creation, but that shouldn't be the case for the exit blocks)

Copy link
Contributor

@david-arm david-arm Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently isn't the middle block the single successor of the loop region? Perhaps I've misunderstood what VPBlockUtils::blocksOnly<VPIRBasicBlock> does but it looks like it may or may not include the middle block depending upon the timing. In VPlan::execute we replace the VPBasicBlock for the middle block with a VPIRBasicBlock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think the reason why this works is because of the extra VPIRBB->getNumSuccessors() == 0 condition, since the middle block will always have at least one successor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The middle-block may or may not be included by VPBlockUtils::blocksOnly<VPIRBasicBlock>, but the middle block will never be considered as an exit block, as it has successors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thanks for explaining! I realised that around the same time that you replied. It just took me a while to figure it out. :)

});
}
} // namespace llvm

#endif // LLVM_TRANSFORMS_VECTORIZE_VPLANCFG_H
Loading