Skip to content

[VPlan] Update scalar induction resume values in VPlan. #110577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Dec 6, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
33b3aac
[VPlan] Update induction resume values in VPlan.
fhahn Sep 30, 2024
8040cdb
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Nov 9, 2024
c98b6d3
!fixup simplify code, update remaining tests.
fhahn Nov 9, 2024
a14749c
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Nov 9, 2024
d3728f4
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Nov 10, 2024
ad1f578
!fixup only update InductionBypassValues if there's a bypass.
fhahn Nov 10, 2024
a11bca4
!fixup update comment.
fhahn Nov 10, 2024
56e82ef
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Nov 16, 2024
e8d78a9
!fixup address latest comments, thanks!
fhahn Nov 16, 2024
c7a5b03
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Nov 20, 2024
5d2eb8b
!fixup address latest comments, thanks!
fhahn Nov 20, 2024
9393eda
!fixup fix formatting
fhahn Nov 20, 2024
c54e8f2
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Nov 24, 2024
d8717f9
!fixup fix build failure
fhahn Nov 24, 2024
3e5dbff
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Nov 28, 2024
a02b278
!fixup update remaining tests
fhahn Nov 28, 2024
674d15b
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Dec 1, 2024
6998270
!fixup address latest comments, thanks
fhahn Dec 1, 2024
55cd843
!fixup remove redundant phi
fhahn Dec 1, 2024
e292a3d
!fixup formatting, undo unnecessary test changes
fhahn Dec 1, 2024
93f3304
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Dec 4, 2024
ce214f5
!fixup address latest comments, thanks!
fhahn Dec 4, 2024
b37e297
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Dec 5, 2024
a1d2a13
!fixup address comments and reduce some test diffs
fhahn Dec 5, 2024
61e6d95
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Dec 6, 2024
e96323f
!fixup adjust comments, pass IV phi filter.
fhahn Dec 6, 2024
c964dad
Merge remote-tracking branch 'origin/main' into vplan-induction-resum…
fhahn Dec 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 69 additions & 44 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -512,16 +512,17 @@ class InnerLoopVectorizer {
/// Fix the non-induction PHIs in \p Plan.
void fixNonInductionPHIs(VPTransformState &State);

/// Create a new phi node for the induction variable \p OrigPhi to resume
/// iteration count in the scalar epilogue, from where the vectorized loop
/// left off. \p Step is the SCEV-expanded induction step to use. In cases
/// where the loop skeleton is more complicated (i.e., epilogue vectorization)
/// and the resume values can come from an additional bypass block, the \p
/// AdditionalBypass pair provides information about the bypass block and the
/// end value on the edge from bypass to this loop.
PHINode *createInductionResumeValue(
/// Create a ResumePHI VPInstruction for the induction variable \p OrigPhi to
/// resume iteration count in the scalar epilogue, from where the vectorized
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// resume iteration count in the scalar epilogue, from where the vectorized
/// resume iteration count in the scalar epilogue from where the vectorized

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks!

/// loop left off and add it the scalar preheader of the VPlan. \p Step is the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// loop left off and add it the scalar preheader of the VPlan. \p Step is the
/// loop left off, and add it to the scalar preheader of VPlan. \p Step is the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

/// SCEV-expanded induction step to use. In cases where the loop skeleton is
/// more complicated (i.e., epilogue vectorization) and the resume values can
/// come from an additional bypass block, the \p AdditionalBypass pair
/// provides information about the bypass block and the end value on the edge
/// from bypass to this loop.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// provides information about the bypass block and the end value on the edge
/// from bypass to this loop.
/// provides this additional bypass block along with the resume value coming
/// from it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

void createInductionResumeValue(
PHINode *OrigPhi, const InductionDescriptor &ID, Value *Step,
ArrayRef<BasicBlock *> BypassBlocks,
ArrayRef<BasicBlock *> BypassBlocks, VPBuilder &ScalarPHBuilder,
std::pair<BasicBlock *, Value *> AdditionalBypass = {nullptr, nullptr});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Independent): an optional pair with default None may look better than a pair with default nulls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack


/// Returns the original loop trip count.
Expand All @@ -532,6 +533,11 @@ class InnerLoopVectorizer {
/// count of the original loop for both main loop and epilogue vectorization.
void setTripCount(Value *TC) { TripCount = TC; }

std::pair<BasicBlock *, Value *>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::pair<BasicBlock *, Value *>
/// Retrieve the bypass value and predecessor associated with an original induction header phi.
std::pair<BasicBlock *, Value *>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!

getInductionBypassValue(PHINode *OrigPhi) const {
return InductionBypassValues.find(OrigPhi)->second;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return InductionBypassValues.find(OrigPhi)->second;
return InductionBypassValues.at(OrigPhi);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

}

protected:
friend class LoopVectorizationPlanner;

Expand Down Expand Up @@ -667,6 +673,9 @@ class InnerLoopVectorizer {
/// for cleaning the checks, if vectorization turns out unprofitable.
GeneratedRTChecks &RTChecks;

/// Mapping of induction phis to their bypass values and bypass blocks.
DenseMap<PHINode *, std::pair<BasicBlock *, Value *>> InductionBypassValues;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DenseMap<PHINode *, std::pair<BasicBlock *, Value *>> InductionBypassValues;
DenseMap<PHINode *, std::pair<BasicBlock *, Value *>> InductionAdditionalBypasses;

These are only the additional bypasses, and include their predecessors, i.e., not only values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed only for "additional" bypasses. Is there a way to avoid storing this mapping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment I think this is the only place it is stored, so unfortunately there's no other way to retrieve it unless storing it here (or somewhere else in ILV)

VPlan &Plan;
};

Expand Down Expand Up @@ -2591,9 +2600,9 @@ void InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
nullptr, Twine(Prefix) + "scalar.ph");
}

PHINode *InnerLoopVectorizer::createInductionResumeValue(
void InnerLoopVectorizer::createInductionResumeValue(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth renaming createInductionResumeVPValue()?
This used to create only IR Values, in particular a PHINode in scalar preheader. Now it creates a ResumePhi recipe in scalar preheader instead, while still generating IR in vector preheader/bypass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

PHINode *OrigPhi, const InductionDescriptor &II, Value *Step,
ArrayRef<BasicBlock *> BypassBlocks,
ArrayRef<BasicBlock *> BypassBlocks, VPBuilder &ScalarPHBuilder,
std::pair<BasicBlock *, Value *> AdditionalBypass) {
Value *VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
assert(VectorTripCount && "Expected valid arguments");
Expand Down Expand Up @@ -2626,27 +2635,22 @@ PHINode *InnerLoopVectorizer::createInductionResumeValue(
}
}

// Create phi nodes to merge from the backedge-taken check block.
PHINode *BCResumeVal =
PHINode::Create(OrigPhi->getType(), 3, "bc.resume.val",
LoopScalarPreHeader->getFirstNonPHIIt());
// Copy original phi DL over to the new one.
BCResumeVal->setDebugLoc(OrigPhi->getDebugLoc());

// The new PHI merges the original incoming value, in case of a bypass,
// or the value at the end of the vectorized loop.
BCResumeVal->addIncoming(EndValue, LoopMiddleBlock);

// Fix the scalar body counter (PHI node).
// The old induction's phi node in the scalar body needs the truncated
// value.
for (BasicBlock *BB : BypassBlocks)
BCResumeVal->addIncoming(II.getStartValue(), BB);
auto *ResumePhiRecipe = ScalarPHBuilder.createNaryOp(
VPInstruction::ResumePhi,
{Plan.getOrAddLiveIn(EndValue), Plan.getOrAddLiveIn(II.getStartValue())},
OrigPhi->getDebugLoc(), "bc.resume.val");
auto *ScalarLoopHeader = Plan.getScalarHeader();
for (VPRecipeBase &R : *ScalarLoopHeader) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of searching linearly for each induction phi in scalar loop header, have createInductionResumeValues() scan the IRI VPIRInstructions of scalar loop header which wrap inductions, passing IRI to createInductionResumeValue(), from which OrigPhi can be retrieved? More below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

auto *IRI = cast<VPIRInstruction>(&R);
if (&IRI->getInstruction() == OrigPhi) {
IRI->addOperand(ResumePhiRecipe);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth asserting IRI has no operands before adding one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

break;
}
}

if (AdditionalBypass.first)
BCResumeVal->setIncomingValueForBlock(AdditionalBypass.first,
EndValueFromAdditionalBypass);
return BCResumeVal;
InductionBypassValues[OrigPhi] = {AdditionalBypass.first,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth asserting OrigPhi is not already there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

EndValueFromAdditionalBypass};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comment why the information is stored in InductionBypassValues to be handled later, rather than added as an operand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added, thanks

}

/// Return the expanded step for \p ID using \p ExpandedSCEVs to look up SCEV
Expand Down Expand Up @@ -2676,13 +2680,14 @@ void InnerLoopVectorizer::createInductionResumeValues(
// iteration in the vectorized loop.
// If we come from a bypass edge then we need to start from the original
// start value.
VPBasicBlock *ScalarPHVPBB = Plan.getScalarPreheader();
VPBuilder ScalarPHBuilder(ScalarPHVPBB, ScalarPHVPBB->begin());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to iterate over the IRI's of scalar loop header, looking for those wrapping inductions? Something like:

  for (VPRecipeBase &R : *Plan.getScalarHeader()) {
    auto *IRI = cast<VPIRInstruction>(&R);
    auto *Inst = dyn_cast<PHINode>(IRI->getInstruction());
    if (!Inst)
      break;
    if (!Legal->getInductionVars().contains(Inst))
      continue;
    const InductionDescriptor &II = Legal->getInductionVars()[Inst];
    createInductionResumeValue(IRI, II, getExpandedStep(II, ExpandedSCEVs),
                               LoopBypassBlocks, ScalarPHBuilder,
                               AdditionalBypass);
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the long-term roadmap, it may be better to update eveny ResumePhi of the scalar preheader when each bypass block is introduced, to keep this introduction atomic, keeping control-flow predecessors up-to-date with data-flow phi's.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

for (const auto &InductionEntry : Legal->getInductionVars()) {
PHINode *OrigPhi = InductionEntry.first;
const InductionDescriptor &II = InductionEntry.second;
PHINode *BCResumeVal = createInductionResumeValue(
OrigPhi, II, getExpandedStep(II, ExpandedSCEVs), LoopBypassBlocks,
AdditionalBypass);
OrigPhi->setIncomingValueForBlock(LoopScalarPreHeader, BCResumeVal);
createInductionResumeValue(OrigPhi, II, getExpandedStep(II, ExpandedSCEVs),
LoopBypassBlocks, ScalarPHBuilder,
AdditionalBypass);
}
}

Expand Down Expand Up @@ -7808,6 +7813,27 @@ EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton(
// the second pass for the scalar loop. The induction resume values for the
// inductions in the epilogue loop are created before executing the plan for
// the epilogue loop.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above comment should be updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

VPBasicBlock *ScalarPHVPBB = Plan.getScalarPreheader();
VPBuilder ScalarPHBuilder(ScalarPHVPBB, ScalarPHVPBB->begin());
for (VPRecipeBase &R :
Plan.getVectorLoopRegion()->getEntryBasicBlock()->phis()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we iterate over the IRI's of the scalar loop header instead, as suggested above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't think we can here, as we don't have a mapping from IR values to VPValues (other than live-outs here)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating over the IRI's of scalar header instead of the integer/fp induction header phi recipes of vector header, relieves the need for such a mapping - to find PhiR which wraps IndPhi (by searching thru the recipes of scalar header for each IndPhi), as in plural createInductionResumeValues()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here originally tried to just create resume value for wide phis, and the scalar VPIRInstructions don't have a link to the induction phi recipes. But I replaced this now as per the suggestion above.

// Create induction resume values for both widened pointer and
// integer/fp inductions and update the start value of the induction
// recipes to use the resume value.
PHINode *IndPhi = nullptr;
const InductionDescriptor *ID;
if (auto *Ind = dyn_cast<VPWidenPointerInductionRecipe>(&R)) {
IndPhi = cast<PHINode>(Ind->getUnderlyingValue());
ID = &Ind->getInductionDescriptor();
} else if (auto *WidenInd = dyn_cast<VPWidenIntOrFpInductionRecipe>(&R)) {
IndPhi = WidenInd->getPHINode();
ID = &WidenInd->getInductionDescriptor();
} else
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else
continue;
} else {
continue;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!


createInductionResumeValue(IndPhi, *ID, getExpandedStep(*ID, ExpandedSCEVs),
LoopBypassBlocks, ScalarPHBuilder);
}

return {LoopVectorPreHeader, nullptr};
}
Expand Down Expand Up @@ -10296,23 +10322,16 @@ bool LoopVectorizePass::processLoop(Loop *L) {
RdxDesc.getRecurrenceStartValue());
}
} else {
// Create induction resume values for both widened pointer and
// integer/fp inductions and update the start value of the induction
// recipes to use the resume value.
// Retrive the induction resume values for wide inductions from
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Retrive the induction resume values for wide inductions from
// Retrieve the induction resume values for wide inductions from

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

// their original phi nodes in the scalar loop
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// their original phi nodes in the scalar loop
// their original phi nodes in the scalar loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed thanks!

PHINode *IndPhi = nullptr;
const InductionDescriptor *ID;
if (auto *Ind = dyn_cast<VPWidenPointerInductionRecipe>(&R)) {
IndPhi = cast<PHINode>(Ind->getUnderlyingValue());
ID = &Ind->getInductionDescriptor();
} else {
auto *WidenInd = cast<VPWidenIntOrFpInductionRecipe>(&R);
IndPhi = WidenInd->getPHINode();
ID = &WidenInd->getInductionDescriptor();
}

ResumeV = MainILV.createInductionResumeValue(
IndPhi, *ID, getExpandedStep(*ID, ExpandedSCEVs),
{EPI.MainLoopIterationCountCheck});
ResumeV = IndPhi->getIncomingValueForBlock(L->getLoopPreheader());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an existing VPValue to resume from, is it not already used as the start value of header phi induction recipes of epilog loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but the code here sets the start value for the header recipes in the epilogue. We could possibly get it from the ResumePhis in the main vector loop VPlan, but the VPValues would be defined in a different plan?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Perhaps worth a note explaining the (missing) connection to ResumePhi recipes in main loop VPlan, which were executed and hooked up to the Phi nodes of the scalar loop.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like // Hook up to the PHINode generated by a ResumePhi recipe of main loop VPlan, which feeds the scalar loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

}
assert(ResumeV && "Must have a resume value");
VPValue *StartVal = BestEpiPlan.getOrAddLiveIn(ResumeV);
Expand All @@ -10324,7 +10343,13 @@ bool LoopVectorizePass::processLoop(Loop *L) {
LVP.executePlan(EPI.EpilogueVF, EPI.EpilogueUF, BestEpiPlan, EpilogILV,
DT, true, &ExpandedSCEVs);
++LoopsEpilogueVectorized;
BasicBlock *PH = L->getLoopPreheader();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BasicBlock *PH = L->getLoopPreheader();
// SOME COMMENT
BasicBlock *PH = L->getLoopPreheader();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment at new location, thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(independent)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks

for (const auto &[IVPhi, _] : LVL.getInductionVars()) {
auto *Inc = cast<PHINode>(IVPhi->getIncomingValueForBlock(PH));
const auto &[BB, V] = EpilogILV.getInductionBypassValue(IVPhi);
Inc->setIncomingValueForBlock(BB, V);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be taken care of by LVP.executePlan() above rather than here in LoopVectorizePass::processLoop()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, moved, thanks

if (!MainILV.areSafetyChecksAdded())
DisableRuntimeUnroll = true;
} else {
Expand Down
3 changes: 2 additions & 1 deletion llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -629,7 +629,8 @@ Value *VPInstruction::generate(VPTransformState &State) {
State.CFG
.VPBB2IRBB[cast<VPBasicBlock>(getParent()->getSinglePredecessor())];
NewPhi->addIncoming(IncomingFromVPlanPred, VPlanPred);
for (auto *OtherPred : predecessors(Builder.GetInsertBlock())) {
for (auto *OtherPred :
reverse(to_vector(predecessors(Builder.GetInsertBlock())))) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better reverse the predecessors when they are set, rather than here during VPlan::execute()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to keep the number of test changes lower by trying to better match the order in the phis and should be dropped after landing this change as follow up. Added a TODO

assert(OtherPred != VPlanPred &&
"VPlan predecessors should not be connected yet");
NewPhi->addIncoming(IncomingFromOtherPreds, OtherPred);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -205,10 +205,10 @@ define void @test_widen_induction_variable_start(ptr %A, i64 %N, i64 %start) {
; CHECK: vector.ph:
; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP0]], 4
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP0]], [[N_MOD_VF]]
; CHECK-NEXT: [[IND_END:%.*]] = add i64 [[START]], [[N_VEC]]
; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[START]], i64 0
; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <2 x i64> [[DOTSPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
; CHECK-NEXT: [[INDUCTION:%.*]] = add <2 x i64> [[DOTSPLAT]], <i64 0, i64 1>
; CHECK-NEXT: [[IND_END:%.*]] = add i64 [[START]], [[N_VEC]]
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
Expand Down Expand Up @@ -433,7 +433,7 @@ define void @test_widen_extended_induction(ptr %dst) {
; CHECK: vec.epilog.middle.block:
; CHECK-NEXT: br i1 true, label [[EXIT]], label [[VEC_EPILOG_SCALAR_PH]]
; CHECK: vec.epilog.scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i8 [ 16, [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ 16, [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_SCEVCHECK]] ], [ 0, [[ITER_CHECK:%.*]] ]
; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i8 [ 16, [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ 16, [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[ITER_CHECK:%.*]] ], [ 0, [[VECTOR_SCEVCHECK]] ]
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
; CHECK-NEXT: [[IV:%.*]] = phi i8 [ [[BC_RESUME_VAL1]], [[VEC_EPILOG_SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
Expand Down
30 changes: 15 additions & 15 deletions llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ define void @iv_casts(ptr %dst, ptr %src, i32 %x, i64 %N) #0 {
; DEFAULT-NEXT: [[CMP_N7:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC6]]
; DEFAULT-NEXT: br i1 [[CMP_N7]], label [[EXIT]], label [[VEC_EPILOG_SCALAR_PH]]
; DEFAULT: vec.epilog.scalar.ph:
; DEFAULT-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC6]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MEMCHECK]] ], [ 0, [[ITER_CHECK:%.*]] ]
; DEFAULT-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC6]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[ITER_CHECK:%.*]] ], [ 0, [[VECTOR_MEMCHECK]] ]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting: a change due to reordering of predecessors.

; DEFAULT-NEXT: br label [[LOOP:%.*]]
; DEFAULT: loop:
; DEFAULT-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[VEC_EPILOG_SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
Expand Down Expand Up @@ -522,31 +522,31 @@ define void @trunc_ivs_and_store(i32 %x, ptr %dst, i64 %N) #0 {
; PRED: pred.store.continue:
; PRED-NEXT: [[TMP23:%.*]] = extractelement <4 x i1> [[ACTIVE_LANE_MASK]], i32 1
; PRED-NEXT: br i1 [[TMP23]], label [[PRED_STORE_IF3:%.*]], label [[PRED_STORE_CONTINUE4:%.*]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
; PRED-NEXT: br i1 [[TMP23]], label [[PRED_STORE_IF3:%.*]], label [[PRED_STORE_CONTINUE4:%.*]]
; PRED-NEXT: br i1 [[TMP23]], label [[PRED_STORE_IF2:%.*]], label [[PRED_STORE_CONTINUE4:%.*]]

for the sake of consistency, here and elsewhere, otherwise the result appears confusing.

Better also change the labels to use the defined [[PRED_STORE_IF2]] rather than the explicit pred.store.if2:, systematically (independently), thereby checking that the destinations of branches reach the corresponding labels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a consequence of using the auto-generated scripts, they for some reason never use patterns to match block names :(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, but this patch updates a lot of tests and obfuscates many - causing their destinations to no longer match their labels (albeit not checking their connectivity in any case). How/can this confusing state be fixed, even as a separate follow-up or preparation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by re-generating the checks for the impacted functions from scratch, block pattern names should now match.

; PRED: pred.store.if3:
; PRED: pred.store.if2:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to remove or reduce the amount of such irrelevant naming changes?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to avoid additional redundant changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't think so, as the numbering depends on the exact order in which names are created that need de-duplication (there's a single counter for all names)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Just curious - are there now fewer(?) de-duplications taking place, or only a reordering of de-duplications?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to confirm this patch is effectively NFCI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to strop as many unrelated changes from the diffs, but also added some new ones in the ones with predicated blocks to update the block variables.

; PRED-NEXT: [[TMP24:%.*]] = extractelement <4 x i64> [[TMP18]], i32 1
; PRED-NEXT: [[TMP25:%.*]] = getelementptr i32, ptr [[DST]], i64 [[TMP24]]
; PRED-NEXT: [[TMP26:%.*]] = add i32 [[OFFSET_IDX]], 1
; PRED-NEXT: store i32 [[TMP26]], ptr [[TMP25]], align 4
; PRED-NEXT: br label [[PRED_STORE_CONTINUE4]]
; PRED: pred.store.continue4:
; PRED: pred.store.continue3:
; PRED-NEXT: [[TMP27:%.*]] = extractelement <4 x i1> [[ACTIVE_LANE_MASK]], i32 2
; PRED-NEXT: br i1 [[TMP27]], label [[PRED_STORE_IF5:%.*]], label [[PRED_STORE_CONTINUE6:%.*]]
; PRED: pred.store.if5:
; PRED: pred.store.if4:
; PRED-NEXT: [[TMP28:%.*]] = extractelement <4 x i64> [[TMP18]], i32 2
; PRED-NEXT: [[TMP29:%.*]] = getelementptr i32, ptr [[DST]], i64 [[TMP28]]
; PRED-NEXT: [[TMP30:%.*]] = add i32 [[OFFSET_IDX]], 2
; PRED-NEXT: store i32 [[TMP30]], ptr [[TMP29]], align 4
; PRED-NEXT: br label [[PRED_STORE_CONTINUE6]]
; PRED: pred.store.continue6:
; PRED: pred.store.continue5:
; PRED-NEXT: [[TMP31:%.*]] = extractelement <4 x i1> [[ACTIVE_LANE_MASK]], i32 3
; PRED-NEXT: br i1 [[TMP31]], label [[PRED_STORE_IF7:%.*]], label [[PRED_STORE_CONTINUE8]]
; PRED: pred.store.if7:
; PRED: pred.store.if6:
; PRED-NEXT: [[TMP32:%.*]] = extractelement <4 x i64> [[TMP18]], i32 3
; PRED-NEXT: [[TMP33:%.*]] = getelementptr i32, ptr [[DST]], i64 [[TMP32]]
; PRED-NEXT: [[TMP34:%.*]] = add i32 [[OFFSET_IDX]], 3
; PRED-NEXT: store i32 [[TMP34]], ptr [[TMP33]], align 4
; PRED-NEXT: br label [[PRED_STORE_CONTINUE8]]
; PRED: pred.store.continue8:
; PRED: pred.store.continue7:
; PRED-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
; PRED-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[INDEX]], i64 [[TMP16]])
; PRED-NEXT: [[TMP35:%.*]] = xor <4 x i1> [[ACTIVE_LANE_MASK_NEXT]], splat (i1 true)
Expand Down Expand Up @@ -719,31 +719,31 @@ define void @ivs_trunc_and_ext(i32 %x, ptr %dst, i64 %N) #0 {
; PRED: pred.store.continue:
; PRED-NEXT: [[TMP22:%.*]] = extractelement <4 x i1> [[ACTIVE_LANE_MASK]], i32 1
; PRED-NEXT: br i1 [[TMP22]], label [[PRED_STORE_IF2:%.*]], label [[PRED_STORE_CONTINUE3:%.*]]
; PRED: pred.store.if2:
; PRED: pred.store.if1:
; PRED-NEXT: [[TMP23:%.*]] = extractelement <4 x i64> [[TMP17]], i32 1
; PRED-NEXT: [[TMP24:%.*]] = getelementptr i32, ptr [[DST]], i64 [[TMP23]]
; PRED-NEXT: [[TMP25:%.*]] = add i32 [[OFFSET_IDX]], 1
; PRED-NEXT: store i32 [[TMP25]], ptr [[TMP24]], align 4
; PRED-NEXT: br label [[PRED_STORE_CONTINUE3]]
; PRED: pred.store.continue3:
; PRED: pred.store.continue2:
; PRED-NEXT: [[TMP26:%.*]] = extractelement <4 x i1> [[ACTIVE_LANE_MASK]], i32 2
; PRED-NEXT: br i1 [[TMP26]], label [[PRED_STORE_IF4:%.*]], label [[PRED_STORE_CONTINUE5:%.*]]
; PRED: pred.store.if4:
; PRED: pred.store.if3:
; PRED-NEXT: [[TMP27:%.*]] = extractelement <4 x i64> [[TMP17]], i32 2
; PRED-NEXT: [[TMP28:%.*]] = getelementptr i32, ptr [[DST]], i64 [[TMP27]]
; PRED-NEXT: [[TMP29:%.*]] = add i32 [[OFFSET_IDX]], 2
; PRED-NEXT: store i32 [[TMP29]], ptr [[TMP28]], align 4
; PRED-NEXT: br label [[PRED_STORE_CONTINUE5]]
; PRED: pred.store.continue5:
; PRED: pred.store.continue4:
; PRED-NEXT: [[TMP30:%.*]] = extractelement <4 x i1> [[ACTIVE_LANE_MASK]], i32 3
; PRED-NEXT: br i1 [[TMP30]], label [[PRED_STORE_IF6:%.*]], label [[PRED_STORE_CONTINUE7]]
; PRED: pred.store.if6:
; PRED: pred.store.if5:
; PRED-NEXT: [[TMP31:%.*]] = extractelement <4 x i64> [[TMP17]], i32 3
; PRED-NEXT: [[TMP32:%.*]] = getelementptr i32, ptr [[DST]], i64 [[TMP31]]
; PRED-NEXT: [[TMP33:%.*]] = add i32 [[OFFSET_IDX]], 3
; PRED-NEXT: store i32 [[TMP33]], ptr [[TMP32]], align 4
; PRED-NEXT: br label [[PRED_STORE_CONTINUE7]]
; PRED: pred.store.continue7:
; PRED: pred.store.continue6:
; PRED-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
; PRED-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[INDEX]], i64 [[TMP15]])
; PRED-NEXT: [[TMP34:%.*]] = xor <4 x i1> [[ACTIVE_LANE_MASK_NEXT]], splat (i1 true)
Expand Down Expand Up @@ -884,12 +884,12 @@ define void @exit_cond_zext_iv(ptr %dst, i64 %N) {
; PRED: pred.store.continue:
; PRED-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP7]], i32 1
; PRED-NEXT: br i1 [[TMP11]], label [[PRED_STORE_IF5:%.*]], label [[PRED_STORE_CONTINUE6]]
; PRED: pred.store.if5:
; PRED: pred.store.if4:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(introduced mismatch)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also be fixed now

; PRED-NEXT: [[TMP12:%.*]] = add i64 [[INDEX]], 1
; PRED-NEXT: [[TMP13:%.*]] = getelementptr { [100 x i32], i32, i32 }, ptr [[DST]], i64 [[TMP12]], i32 2
; PRED-NEXT: store i32 0, ptr [[TMP13]], align 8
; PRED-NEXT: br label [[PRED_STORE_CONTINUE6]]
; PRED: pred.store.continue6:
; PRED: pred.store.continue5:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(introduced mismatch)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also be fixed now

; PRED-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
; PRED-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; PRED-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[LOOP]], !llvm.loop [[LOOP10:![0-9]+]]
Expand Down
Loading
Loading