-
Notifications
You must be signed in to change notification settings - Fork 14.6k
[VPlan] Update scalar induction resume values in VPlan. #110577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 7 commits
33b3aac
8040cdb
c98b6d3
a14749c
d3728f4
ad1f578
a11bca4
56e82ef
e8d78a9
c7a5b03
5d2eb8b
9393eda
c54e8f2
d8717f9
3e5dbff
a02b278
674d15b
6998270
55cd843
e292a3d
93f3304
ce214f5
b37e297
a1d2a13
61e6d95
e96323f
c964dad
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -512,16 +512,17 @@ class InnerLoopVectorizer { | |||||||||||
/// Fix the non-induction PHIs in \p Plan. | ||||||||||||
void fixNonInductionPHIs(VPTransformState &State); | ||||||||||||
|
||||||||||||
/// Create a new phi node for the induction variable \p OrigPhi to resume | ||||||||||||
/// iteration count in the scalar epilogue, from where the vectorized loop | ||||||||||||
/// left off. \p Step is the SCEV-expanded induction step to use. In cases | ||||||||||||
/// where the loop skeleton is more complicated (i.e., epilogue vectorization) | ||||||||||||
/// and the resume values can come from an additional bypass block, the \p | ||||||||||||
/// AdditionalBypass pair provides information about the bypass block and the | ||||||||||||
/// end value on the edge from bypass to this loop. | ||||||||||||
PHINode *createInductionResumeValue( | ||||||||||||
/// Create a ResumePHI VPInstruction for the induction variable \p OrigPhi to | ||||||||||||
/// resume iteration count in the scalar epilogue, from where the vectorized | ||||||||||||
/// loop left off and add it the scalar preheader of the VPlan. \p Step is the | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, thanks! |
||||||||||||
/// SCEV-expanded induction step to use. In cases where the loop skeleton is | ||||||||||||
/// more complicated (i.e., epilogue vectorization) and the resume values can | ||||||||||||
/// come from an additional bypass block, the \p AdditionalBypass pair | ||||||||||||
/// provides information about the bypass block and the end value on the edge | ||||||||||||
/// from bypass to this loop. | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated, thanks! |
||||||||||||
void createInductionResumeValue( | ||||||||||||
PHINode *OrigPhi, const InductionDescriptor &ID, Value *Step, | ||||||||||||
ArrayRef<BasicBlock *> BypassBlocks, | ||||||||||||
ArrayRef<BasicBlock *> BypassBlocks, VPBuilder &ScalarPHBuilder, | ||||||||||||
std::pair<BasicBlock *, Value *> AdditionalBypass = {nullptr, nullptr}); | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (Independent): an optional pair with default None may look better than a pair with default nulls. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ack |
||||||||||||
|
||||||||||||
/// Returns the original loop trip count. | ||||||||||||
|
@@ -532,6 +533,11 @@ class InnerLoopVectorizer { | |||||||||||
/// count of the original loop for both main loop and epilogue vectorization. | ||||||||||||
void setTripCount(Value *TC) { TripCount = TC; } | ||||||||||||
|
||||||||||||
std::pair<BasicBlock *, Value *> | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added, thanks! |
||||||||||||
getInductionBypassValue(PHINode *OrigPhi) const { | ||||||||||||
return InductionBypassValues.find(OrigPhi)->second; | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated, thanks! |
||||||||||||
} | ||||||||||||
|
||||||||||||
protected: | ||||||||||||
friend class LoopVectorizationPlanner; | ||||||||||||
|
||||||||||||
|
@@ -667,6 +673,9 @@ class InnerLoopVectorizer { | |||||||||||
/// for cleaning the checks, if vectorization turns out unprofitable. | ||||||||||||
GeneratedRTChecks &RTChecks; | ||||||||||||
|
||||||||||||
/// Mapping of induction phis to their bypass values and bypass blocks. | ||||||||||||
DenseMap<PHINode *, std::pair<BasicBlock *, Value *>> InductionBypassValues; | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
These are only the additional bypasses, and include their predecessors, i.e., not only values. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated, thanks |
||||||||||||
|
||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is needed only for "additional" bypasses. Is there a way to avoid storing this mapping? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At the moment I think this is the only place it is stored, so unfortunately there's no other way to retrieve it unless storing it here (or somewhere else in ILV) |
||||||||||||
VPlan &Plan; | ||||||||||||
}; | ||||||||||||
|
||||||||||||
|
@@ -2591,9 +2600,9 @@ void InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) { | |||||||||||
nullptr, Twine(Prefix) + "scalar.ph"); | ||||||||||||
} | ||||||||||||
|
||||||||||||
PHINode *InnerLoopVectorizer::createInductionResumeValue( | ||||||||||||
void InnerLoopVectorizer::createInductionResumeValue( | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth renaming There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, thanks! |
||||||||||||
PHINode *OrigPhi, const InductionDescriptor &II, Value *Step, | ||||||||||||
ArrayRef<BasicBlock *> BypassBlocks, | ||||||||||||
ArrayRef<BasicBlock *> BypassBlocks, VPBuilder &ScalarPHBuilder, | ||||||||||||
std::pair<BasicBlock *, Value *> AdditionalBypass) { | ||||||||||||
Value *VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader); | ||||||||||||
assert(VectorTripCount && "Expected valid arguments"); | ||||||||||||
|
@@ -2626,27 +2635,22 @@ PHINode *InnerLoopVectorizer::createInductionResumeValue( | |||||||||||
} | ||||||||||||
} | ||||||||||||
|
||||||||||||
// Create phi nodes to merge from the backedge-taken check block. | ||||||||||||
PHINode *BCResumeVal = | ||||||||||||
PHINode::Create(OrigPhi->getType(), 3, "bc.resume.val", | ||||||||||||
LoopScalarPreHeader->getFirstNonPHIIt()); | ||||||||||||
// Copy original phi DL over to the new one. | ||||||||||||
BCResumeVal->setDebugLoc(OrigPhi->getDebugLoc()); | ||||||||||||
|
||||||||||||
// The new PHI merges the original incoming value, in case of a bypass, | ||||||||||||
// or the value at the end of the vectorized loop. | ||||||||||||
BCResumeVal->addIncoming(EndValue, LoopMiddleBlock); | ||||||||||||
|
||||||||||||
// Fix the scalar body counter (PHI node). | ||||||||||||
// The old induction's phi node in the scalar body needs the truncated | ||||||||||||
// value. | ||||||||||||
for (BasicBlock *BB : BypassBlocks) | ||||||||||||
BCResumeVal->addIncoming(II.getStartValue(), BB); | ||||||||||||
auto *ResumePhiRecipe = ScalarPHBuilder.createNaryOp( | ||||||||||||
VPInstruction::ResumePhi, | ||||||||||||
{Plan.getOrAddLiveIn(EndValue), Plan.getOrAddLiveIn(II.getStartValue())}, | ||||||||||||
OrigPhi->getDebugLoc(), "bc.resume.val"); | ||||||||||||
auto *ScalarLoopHeader = Plan.getScalarHeader(); | ||||||||||||
for (VPRecipeBase &R : *ScalarLoopHeader) { | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of searching linearly for each induction phi in scalar loop header, have createInductionResumeValues() scan the IRI VPIRInstructions of scalar loop header which wrap inductions, passing IRI to createInductionResumeValue(), from which OrigPhi can be retrieved? More below. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, thanks! |
||||||||||||
auto *IRI = cast<VPIRInstruction>(&R); | ||||||||||||
if (&IRI->getInstruction() == OrigPhi) { | ||||||||||||
IRI->addOperand(ResumePhiRecipe); | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth asserting IRI has no operands before adding one? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, thanks! |
||||||||||||
break; | ||||||||||||
} | ||||||||||||
} | ||||||||||||
|
||||||||||||
if (AdditionalBypass.first) | ||||||||||||
BCResumeVal->setIncomingValueForBlock(AdditionalBypass.first, | ||||||||||||
EndValueFromAdditionalBypass); | ||||||||||||
return BCResumeVal; | ||||||||||||
InductionBypassValues[OrigPhi] = {AdditionalBypass.first, | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth asserting OrigPhi is not already there? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, thanks! |
||||||||||||
EndValueFromAdditionalBypass}; | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some comment why the information is stored in InductionBypassValues to be handled later, rather than added as an operand? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added, thanks |
||||||||||||
} | ||||||||||||
|
||||||||||||
/// Return the expanded step for \p ID using \p ExpandedSCEVs to look up SCEV | ||||||||||||
|
@@ -2676,13 +2680,14 @@ void InnerLoopVectorizer::createInductionResumeValues( | |||||||||||
// iteration in the vectorized loop. | ||||||||||||
// If we come from a bypass edge then we need to start from the original | ||||||||||||
// start value. | ||||||||||||
VPBasicBlock *ScalarPHVPBB = Plan.getScalarPreheader(); | ||||||||||||
VPBuilder ScalarPHBuilder(ScalarPHVPBB, ScalarPHVPBB->begin()); | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better to iterate over the IRI's of scalar loop header, looking for those wrapping inductions? Something like:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated, thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the long-term roadmap, it may be better to update eveny ResumePhi of the scalar preheader when each bypass block is introduced, to keep this introduction atomic, keeping control-flow predecessors up-to-date with data-flow phi's. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good! |
||||||||||||
for (const auto &InductionEntry : Legal->getInductionVars()) { | ||||||||||||
PHINode *OrigPhi = InductionEntry.first; | ||||||||||||
const InductionDescriptor &II = InductionEntry.second; | ||||||||||||
PHINode *BCResumeVal = createInductionResumeValue( | ||||||||||||
OrigPhi, II, getExpandedStep(II, ExpandedSCEVs), LoopBypassBlocks, | ||||||||||||
AdditionalBypass); | ||||||||||||
OrigPhi->setIncomingValueForBlock(LoopScalarPreHeader, BCResumeVal); | ||||||||||||
createInductionResumeValue(OrigPhi, II, getExpandedStep(II, ExpandedSCEVs), | ||||||||||||
LoopBypassBlocks, ScalarPHBuilder, | ||||||||||||
AdditionalBypass); | ||||||||||||
} | ||||||||||||
} | ||||||||||||
|
||||||||||||
|
@@ -7808,6 +7813,27 @@ EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton( | |||||||||||
// the second pass for the scalar loop. The induction resume values for the | ||||||||||||
// inductions in the epilogue loop are created before executing the plan for | ||||||||||||
// the epilogue loop. | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Above comment should be updated? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated, thanks! |
||||||||||||
VPBasicBlock *ScalarPHVPBB = Plan.getScalarPreheader(); | ||||||||||||
VPBuilder ScalarPHBuilder(ScalarPHVPBB, ScalarPHVPBB->begin()); | ||||||||||||
for (VPRecipeBase &R : | ||||||||||||
Plan.getVectorLoopRegion()->getEntryBasicBlock()->phis()) { | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we iterate over the IRI's of the scalar loop header instead, as suggested above? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately I don't think we can here, as we don't have a mapping from IR values to VPValues (other than live-outs here) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Iterating over the IRI's of scalar header instead of the integer/fp induction header phi recipes of vector header, relieves the need for such a mapping - to find PhiR which wraps IndPhi (by searching thru the recipes of scalar header for each IndPhi), as in plural createInductionResumeValues()? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code here originally tried to just create resume value for wide phis, and the scalar VPIRInstructions don't have a link to the induction phi recipes. But I replaced this now as per the suggestion above. |
||||||||||||
// Create induction resume values for both widened pointer and | ||||||||||||
// integer/fp inductions and update the start value of the induction | ||||||||||||
// recipes to use the resume value. | ||||||||||||
PHINode *IndPhi = nullptr; | ||||||||||||
const InductionDescriptor *ID; | ||||||||||||
if (auto *Ind = dyn_cast<VPWidenPointerInductionRecipe>(&R)) { | ||||||||||||
IndPhi = cast<PHINode>(Ind->getUnderlyingValue()); | ||||||||||||
ID = &Ind->getInductionDescriptor(); | ||||||||||||
} else if (auto *WidenInd = dyn_cast<VPWidenIntOrFpInductionRecipe>(&R)) { | ||||||||||||
IndPhi = WidenInd->getPHINode(); | ||||||||||||
ID = &WidenInd->getInductionDescriptor(); | ||||||||||||
} else | ||||||||||||
continue; | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed, thanks! |
||||||||||||
|
||||||||||||
createInductionResumeValue(IndPhi, *ID, getExpandedStep(*ID, ExpandedSCEVs), | ||||||||||||
LoopBypassBlocks, ScalarPHBuilder); | ||||||||||||
} | ||||||||||||
|
||||||||||||
return {LoopVectorPreHeader, nullptr}; | ||||||||||||
} | ||||||||||||
|
@@ -10296,23 +10322,16 @@ bool LoopVectorizePass::processLoop(Loop *L) { | |||||||||||
RdxDesc.getRecurrenceStartValue()); | ||||||||||||
} | ||||||||||||
} else { | ||||||||||||
// Create induction resume values for both widened pointer and | ||||||||||||
// integer/fp inductions and update the start value of the induction | ||||||||||||
// recipes to use the resume value. | ||||||||||||
// Retrive the induction resume values for wide inductions from | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed, thanks! |
||||||||||||
// their original phi nodes in the scalar loop | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed thanks! |
||||||||||||
PHINode *IndPhi = nullptr; | ||||||||||||
const InductionDescriptor *ID; | ||||||||||||
if (auto *Ind = dyn_cast<VPWidenPointerInductionRecipe>(&R)) { | ||||||||||||
IndPhi = cast<PHINode>(Ind->getUnderlyingValue()); | ||||||||||||
ID = &Ind->getInductionDescriptor(); | ||||||||||||
} else { | ||||||||||||
auto *WidenInd = cast<VPWidenIntOrFpInductionRecipe>(&R); | ||||||||||||
IndPhi = WidenInd->getPHINode(); | ||||||||||||
ID = &WidenInd->getInductionDescriptor(); | ||||||||||||
} | ||||||||||||
|
||||||||||||
ResumeV = MainILV.createInductionResumeValue( | ||||||||||||
IndPhi, *ID, getExpandedStep(*ID, ExpandedSCEVs), | ||||||||||||
{EPI.MainLoopIterationCountCheck}); | ||||||||||||
ResumeV = IndPhi->getIncomingValueForBlock(L->getLoopPreheader()); | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there an existing VPValue to resume from, is it not already used as the start value of header phi induction recipes of epilog loop? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I might be missing something, but the code here sets the start value for the header recipes in the epilogue. We could possibly get it from the ResumePhis in the main vector loop VPlan, but the VPValues would be defined in a different plan? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, right. Perhaps worth a note explaining the (missing) connection to ResumePhi recipes in main loop VPlan, which were executed and hooked up to the Phi nodes of the scalar loop. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated, thanks |
||||||||||||
} | ||||||||||||
assert(ResumeV && "Must have a resume value"); | ||||||||||||
VPValue *StartVal = BestEpiPlan.getOrAddLiveIn(ResumeV); | ||||||||||||
|
@@ -10324,7 +10343,13 @@ bool LoopVectorizePass::processLoop(Loop *L) { | |||||||||||
LVP.executePlan(EPI.EpilogueVF, EPI.EpilogueUF, BestEpiPlan, EpilogILV, | ||||||||||||
DT, true, &ExpandedSCEVs); | ||||||||||||
++LoopsEpilogueVectorized; | ||||||||||||
BasicBlock *PH = L->getLoopPreheader(); | ||||||||||||
|
||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added comment at new location, thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (independent) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed, thanks |
||||||||||||
for (const auto &[IVPhi, _] : LVL.getInductionVars()) { | ||||||||||||
auto *Inc = cast<PHINode>(IVPhi->getIncomingValueForBlock(PH)); | ||||||||||||
const auto &[BB, V] = EpilogILV.getInductionBypassValue(IVPhi); | ||||||||||||
Inc->setIncomingValueForBlock(BB, V); | ||||||||||||
} | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could this be taken care of by LVP.executePlan() above rather than here in LoopVectorizePass::processLoop()? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, moved, thanks |
||||||||||||
if (!MainILV.areSafetyChecksAdded()) | ||||||||||||
DisableRuntimeUnroll = true; | ||||||||||||
} else { | ||||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -629,7 +629,8 @@ Value *VPInstruction::generate(VPTransformState &State) { | |
State.CFG | ||
.VPBB2IRBB[cast<VPBasicBlock>(getParent()->getSinglePredecessor())]; | ||
NewPhi->addIncoming(IncomingFromVPlanPred, VPlanPred); | ||
for (auto *OtherPred : predecessors(Builder.GetInsertBlock())) { | ||
for (auto *OtherPred : | ||
reverse(to_vector(predecessors(Builder.GetInsertBlock())))) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better reverse the predecessors when they are set, rather than here during VPlan::execute()? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is just to keep the number of test changes lower by trying to better match the order in the phis and should be dropped after landing this change as follow up. Added a TODO |
||
assert(OtherPred != VPlanPred && | ||
"VPlan predecessors should not be connected yet"); | ||
NewPhi->addIncoming(IncomingFromOtherPreds, OtherPred); | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -114,7 +114,7 @@ define void @iv_casts(ptr %dst, ptr %src, i32 %x, i64 %N) #0 { | |||||
; DEFAULT-NEXT: [[CMP_N7:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC6]] | ||||||
; DEFAULT-NEXT: br i1 [[CMP_N7]], label [[EXIT]], label [[VEC_EPILOG_SCALAR_PH]] | ||||||
; DEFAULT: vec.epilog.scalar.ph: | ||||||
; DEFAULT-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC6]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MEMCHECK]] ], [ 0, [[ITER_CHECK:%.*]] ] | ||||||
; DEFAULT-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC6]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[ITER_CHECK:%.*]] ], [ 0, [[VECTOR_MEMCHECK]] ] | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Noting: a change due to reordering of predecessors. |
||||||
; DEFAULT-NEXT: br label [[LOOP:%.*]] | ||||||
; DEFAULT: loop: | ||||||
; DEFAULT-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[VEC_EPILOG_SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ] | ||||||
|
@@ -522,31 +522,31 @@ define void @trunc_ivs_and_store(i32 %x, ptr %dst, i64 %N) #0 { | |||||
; PRED: pred.store.continue: | ||||||
; PRED-NEXT: [[TMP23:%.*]] = extractelement <4 x i1> [[ACTIVE_LANE_MASK]], i32 1 | ||||||
; PRED-NEXT: br i1 [[TMP23]], label [[PRED_STORE_IF3:%.*]], label [[PRED_STORE_CONTINUE4:%.*]] | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
for the sake of consistency, here and elsewhere, otherwise the result appears confusing. Better also change the labels to use the defined There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a consequence of using the auto-generated scripts, they for some reason never use patterns to match block names :( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Understood, but this patch updates a lot of tests and obfuscates many - causing their destinations to no longer match their labels (albeit not checking their connectivity in any case). How/can this confusing state be fixed, even as a separate follow-up or preparation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed by re-generating the checks for the impacted functions from scratch, block pattern names should now match. |
||||||
; PRED: pred.store.if3: | ||||||
; PRED: pred.store.if2: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a way to remove or reduce the amount of such irrelevant naming changes? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Trying to avoid additional redundant changes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately I don't think so, as the numbering depends on the exact order in which names are created that need de-duplication (there's a single counter for all names) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. Just curious - are there now fewer(?) de-duplications taking place, or only a reordering of de-duplications? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Trying to confirm this patch is effectively NFCI. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tried to strop as many unrelated changes from the diffs, but also added some new ones in the ones with predicated blocks to update the block variables. |
||||||
; PRED-NEXT: [[TMP24:%.*]] = extractelement <4 x i64> [[TMP18]], i32 1 | ||||||
; PRED-NEXT: [[TMP25:%.*]] = getelementptr i32, ptr [[DST]], i64 [[TMP24]] | ||||||
; PRED-NEXT: [[TMP26:%.*]] = add i32 [[OFFSET_IDX]], 1 | ||||||
; PRED-NEXT: store i32 [[TMP26]], ptr [[TMP25]], align 4 | ||||||
; PRED-NEXT: br label [[PRED_STORE_CONTINUE4]] | ||||||
; PRED: pred.store.continue4: | ||||||
; PRED: pred.store.continue3: | ||||||
; PRED-NEXT: [[TMP27:%.*]] = extractelement <4 x i1> [[ACTIVE_LANE_MASK]], i32 2 | ||||||
; PRED-NEXT: br i1 [[TMP27]], label [[PRED_STORE_IF5:%.*]], label [[PRED_STORE_CONTINUE6:%.*]] | ||||||
; PRED: pred.store.if5: | ||||||
; PRED: pred.store.if4: | ||||||
; PRED-NEXT: [[TMP28:%.*]] = extractelement <4 x i64> [[TMP18]], i32 2 | ||||||
; PRED-NEXT: [[TMP29:%.*]] = getelementptr i32, ptr [[DST]], i64 [[TMP28]] | ||||||
; PRED-NEXT: [[TMP30:%.*]] = add i32 [[OFFSET_IDX]], 2 | ||||||
; PRED-NEXT: store i32 [[TMP30]], ptr [[TMP29]], align 4 | ||||||
; PRED-NEXT: br label [[PRED_STORE_CONTINUE6]] | ||||||
; PRED: pred.store.continue6: | ||||||
; PRED: pred.store.continue5: | ||||||
; PRED-NEXT: [[TMP31:%.*]] = extractelement <4 x i1> [[ACTIVE_LANE_MASK]], i32 3 | ||||||
; PRED-NEXT: br i1 [[TMP31]], label [[PRED_STORE_IF7:%.*]], label [[PRED_STORE_CONTINUE8]] | ||||||
; PRED: pred.store.if7: | ||||||
; PRED: pred.store.if6: | ||||||
; PRED-NEXT: [[TMP32:%.*]] = extractelement <4 x i64> [[TMP18]], i32 3 | ||||||
; PRED-NEXT: [[TMP33:%.*]] = getelementptr i32, ptr [[DST]], i64 [[TMP32]] | ||||||
; PRED-NEXT: [[TMP34:%.*]] = add i32 [[OFFSET_IDX]], 3 | ||||||
; PRED-NEXT: store i32 [[TMP34]], ptr [[TMP33]], align 4 | ||||||
; PRED-NEXT: br label [[PRED_STORE_CONTINUE8]] | ||||||
; PRED: pred.store.continue8: | ||||||
; PRED: pred.store.continue7: | ||||||
; PRED-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4 | ||||||
; PRED-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[INDEX]], i64 [[TMP16]]) | ||||||
; PRED-NEXT: [[TMP35:%.*]] = xor <4 x i1> [[ACTIVE_LANE_MASK_NEXT]], splat (i1 true) | ||||||
|
@@ -719,31 +719,31 @@ define void @ivs_trunc_and_ext(i32 %x, ptr %dst, i64 %N) #0 { | |||||
; PRED: pred.store.continue: | ||||||
; PRED-NEXT: [[TMP22:%.*]] = extractelement <4 x i1> [[ACTIVE_LANE_MASK]], i32 1 | ||||||
; PRED-NEXT: br i1 [[TMP22]], label [[PRED_STORE_IF2:%.*]], label [[PRED_STORE_CONTINUE3:%.*]] | ||||||
; PRED: pred.store.if2: | ||||||
; PRED: pred.store.if1: | ||||||
; PRED-NEXT: [[TMP23:%.*]] = extractelement <4 x i64> [[TMP17]], i32 1 | ||||||
; PRED-NEXT: [[TMP24:%.*]] = getelementptr i32, ptr [[DST]], i64 [[TMP23]] | ||||||
; PRED-NEXT: [[TMP25:%.*]] = add i32 [[OFFSET_IDX]], 1 | ||||||
; PRED-NEXT: store i32 [[TMP25]], ptr [[TMP24]], align 4 | ||||||
; PRED-NEXT: br label [[PRED_STORE_CONTINUE3]] | ||||||
; PRED: pred.store.continue3: | ||||||
; PRED: pred.store.continue2: | ||||||
; PRED-NEXT: [[TMP26:%.*]] = extractelement <4 x i1> [[ACTIVE_LANE_MASK]], i32 2 | ||||||
; PRED-NEXT: br i1 [[TMP26]], label [[PRED_STORE_IF4:%.*]], label [[PRED_STORE_CONTINUE5:%.*]] | ||||||
; PRED: pred.store.if4: | ||||||
; PRED: pred.store.if3: | ||||||
; PRED-NEXT: [[TMP27:%.*]] = extractelement <4 x i64> [[TMP17]], i32 2 | ||||||
; PRED-NEXT: [[TMP28:%.*]] = getelementptr i32, ptr [[DST]], i64 [[TMP27]] | ||||||
; PRED-NEXT: [[TMP29:%.*]] = add i32 [[OFFSET_IDX]], 2 | ||||||
; PRED-NEXT: store i32 [[TMP29]], ptr [[TMP28]], align 4 | ||||||
; PRED-NEXT: br label [[PRED_STORE_CONTINUE5]] | ||||||
; PRED: pred.store.continue5: | ||||||
; PRED: pred.store.continue4: | ||||||
; PRED-NEXT: [[TMP30:%.*]] = extractelement <4 x i1> [[ACTIVE_LANE_MASK]], i32 3 | ||||||
; PRED-NEXT: br i1 [[TMP30]], label [[PRED_STORE_IF6:%.*]], label [[PRED_STORE_CONTINUE7]] | ||||||
; PRED: pred.store.if6: | ||||||
; PRED: pred.store.if5: | ||||||
; PRED-NEXT: [[TMP31:%.*]] = extractelement <4 x i64> [[TMP17]], i32 3 | ||||||
; PRED-NEXT: [[TMP32:%.*]] = getelementptr i32, ptr [[DST]], i64 [[TMP31]] | ||||||
; PRED-NEXT: [[TMP33:%.*]] = add i32 [[OFFSET_IDX]], 3 | ||||||
; PRED-NEXT: store i32 [[TMP33]], ptr [[TMP32]], align 4 | ||||||
; PRED-NEXT: br label [[PRED_STORE_CONTINUE7]] | ||||||
; PRED: pred.store.continue7: | ||||||
; PRED: pred.store.continue6: | ||||||
; PRED-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4 | ||||||
; PRED-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[INDEX]], i64 [[TMP15]]) | ||||||
; PRED-NEXT: [[TMP34:%.*]] = xor <4 x i1> [[ACTIVE_LANE_MASK_NEXT]], splat (i1 true) | ||||||
|
@@ -884,12 +884,12 @@ define void @exit_cond_zext_iv(ptr %dst, i64 %N) { | |||||
; PRED: pred.store.continue: | ||||||
; PRED-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP7]], i32 1 | ||||||
; PRED-NEXT: br i1 [[TMP11]], label [[PRED_STORE_IF5:%.*]], label [[PRED_STORE_CONTINUE6]] | ||||||
; PRED: pred.store.if5: | ||||||
; PRED: pred.store.if4: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (introduced mismatch) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should also be fixed now |
||||||
; PRED-NEXT: [[TMP12:%.*]] = add i64 [[INDEX]], 1 | ||||||
; PRED-NEXT: [[TMP13:%.*]] = getelementptr { [100 x i32], i32, i32 }, ptr [[DST]], i64 [[TMP12]], i32 2 | ||||||
; PRED-NEXT: store i32 0, ptr [[TMP13]], align 8 | ||||||
; PRED-NEXT: br label [[PRED_STORE_CONTINUE6]] | ||||||
; PRED: pred.store.continue6: | ||||||
; PRED: pred.store.continue5: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (introduced mismatch) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should also be fixed now |
||||||
; PRED-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2 | ||||||
; PRED-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]] | ||||||
; PRED-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[LOOP]], !llvm.loop [[LOOP10:![0-9]+]] | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, thanks!