-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[VPlan] Add VPIRInstruction, use for exit block live-outs. #100735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Introduce explicit ExtractFromEnd recipes to extract the final values for live-outs instead of implicitly extracting in VPLiveOut::fixPhi. This is a follow-up to the recent changes of modeling extracts for recurrences and consolidates live-out extract creation for fixed-order recurrences at a single place: addLiveOutsForFirstOrderRecurrences. It is also in preparation of replacing VPLiveOut with VPIRInstructions wrapping the original scalar phis.
Add a new VPIRInstruction recipe to wrap existing IR instructions not to be modified during execution, execept for PHIs. For PHIs, a single VPValue operand is allowed, and it is used to add a new incoming value for the single predecessor VPBB. Expect PHIs, VPIRInstructions cannot have any operands. Depends on llvm#100658.
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesAdd a new VPIRInstruction recipe to wrap existing IR instructions not to Depends on #100658. Patch is 41.20 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/100735.diff 16 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index ea19af10a1474..fabddbbce139a 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8434,25 +8434,133 @@ static void addUsersInExitBlock(
if (!ExitBB || !ExitBB->getSinglePredecessor() || !ExitingBB)
return;
- // Introduce VPUsers modeling the exit values.
- for (PHINode &ExitPhi : ExitBB->phis()) {
- Value *IncomingValue =
- ExitPhi.getIncomingValueForBlock(ExitingBB);
- VPValue *V = Builder.getVPValueOrAddLiveIn(IncomingValue, Plan);
- // Exit values for inductions are computed and updated outside of VPlan and
- // independent of induction recipes.
- // TODO: Compute induction exit values in VPlan, use VPLiveOuts to update
- // live-outs.
- if ((isa<VPWidenIntOrFpInductionRecipe>(V) &&
- !cast<VPWidenIntOrFpInductionRecipe>(V)->getTruncInst()) ||
- isa<VPWidenPointerInductionRecipe>(V) ||
- (isa<Instruction>(IncomingValue) &&
- any_of(IncomingValue->users(), [&Inductions](User *U) {
- auto *P = dyn_cast<PHINode>(U);
- return P && Inductions.contains(P);
- })))
+ auto *MiddleBlock = Plan.getVectorLoopRegion()->getSingleSuccessor();
+ for (VPIRBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPIRBasicBlock>(
+ vp_depth_first_shallow(MiddleBlock))) {
+ if (VPBB->getIRBasicBlock() != ExitBB)
continue;
- Plan.addLiveOut(&ExitPhi, V);
+
+ for (auto &R : *VPBB) {
+ auto *IR = dyn_cast<VPIRInstruction>(&R);
+ if (!IR)
+ continue;
+ auto *ExitPhi = dyn_cast<PHINode>(&IR->getInstruction());
+ if (!ExitPhi)
+ break;
+ Value *IncomingValue = ExitPhi->getIncomingValueForBlock(ExitingBB);
+ VPValue *V = Builder.getVPValueOrAddLiveIn(IncomingValue, Plan);
+ // Exit values for inductions are computed and updated outside of VPlan
+ // and independent of induction recipes.
+ // TODO: Compute induction exit values in VPlan, use VPLiveOuts to update
+ // live-outs.
+ if ((isa<VPWidenIntOrFpInductionRecipe>(V) &&
+ !cast<VPWidenIntOrFpInductionRecipe>(V)->getTruncInst()) ||
+ isa<VPWidenPointerInductionRecipe>(V) ||
+ (isa<Instruction>(IncomingValue) &&
+ any_of(IncomingValue->users(), [&Inductions](User *U) {
+ auto *P = dyn_cast<PHINode>(U);
+ return P && Inductions.contains(P);
+ })))
+ continue;
+
+ auto MiddleVPBB =
+ cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSingleSuccessor());
+ VPBuilder B(MiddleVPBB);
+ if (auto *Terminator = MiddleVPBB->getTerminator()) {
+ auto *Condition = dyn_cast<VPInstruction>(Terminator->getOperand(0));
+ assert((!Condition || Condition->getParent() == MiddleVPBB) &&
+ "Condition expected in MiddleVPBB");
+ B.setInsertPoint(Condition ? Condition : Terminator);
+ }
+
+ VPValue *Ext;
+ if (auto *FOR = dyn_cast_or_null<VPFirstOrderRecurrencePHIRecipe>(
+ V->getDefiningRecipe())) {
+ // This is the second phase of vectorizing first-order recurrences. An
+ // overview of the transformation is described below. Suppose we have
+ // the following loop with some use after the loop of the last a[i-1],
+ //
+ // for (int i = 0; i < n; ++i) {
+ // t = a[i - 1];
+ // b[i] = a[i] - t;
+ // }
+ // use t;
+ //
+ // There is a first-order recurrence on "a". For this loop, the
+ // shorthand scalar IR looks like:
+ //
+ // scalar.ph:
+ // s_init = a[-1]
+ // br scalar.body
+ //
+ // scalar.body:
+ // i = phi [0, scalar.ph], [i+1, scalar.body]
+ // s1 = phi [s_init, scalar.ph], [s2, scalar.body]
+ // s2 = a[i]
+ // b[i] = s2 - s1
+ // br cond, scalar.body, exit.block
+ //
+ // exit.block:
+ // use = lcssa.phi [s1, scalar.body]
+ //
+ // In this example, s1 is a recurrence because it's value depends on the
+ // previous iteration. In the first phase of vectorization, we created a
+ // vector phi v1 for s1. We now complete the vectorization and produce
+ // the shorthand vector IR shown below (for VF = 4, UF = 1).
+ //
+ // vector.ph:
+ // v_init = vector(..., ..., ..., a[-1])
+ // br vector.body
+ //
+ // vector.body
+ // i = phi [0, vector.ph], [i+4, vector.body]
+ // v1 = phi [v_init, vector.ph], [v2, vector.body]
+ // v2 = a[i, i+1, i+2, i+3];
+ // v3 = vector(v1(3), v2(0, 1, 2))
+ // b[i, i+1, i+2, i+3] = v2 - v3
+ // br cond, vector.body, middle.block
+ //
+ // middle.block:
+ // s_penultimate = v2(2) = v3(3)
+ // s_resume = v2(3)
+ // br cond, scalar.ph, exit.block
+ //
+ // scalar.ph:
+ // s_init' = phi [s_resume, middle.block], [s_init, otherwise]
+ // br scalar.body
+ //
+ // scalar.body:
+ // i = phi [0, scalar.ph], [i+1, scalar.body]
+ // s1 = phi [s_init', scalar.ph], [s2, scalar.body]
+ // s2 = a[i]
+ // b[i] = s2 - s1
+ // br cond, scalar.body, exit.block
+ //
+ // exit.block:
+ // lo = lcssa.phi [s1, scalar.body], [s.penultimate, middle.block]
+ //
+ // After execution completes the vector loop, we extract the next value
+ // of the recurrence (x) to use as the initial value in the scalar loop.
+ // This is modeled by ExtractFromEnd.
+ //
+ // Extract the penultimate value of the recurrence and update VPLiveOut
+ // users of the recurrence splice. Note that the extract of the final
+ // value used to resume in the scalar loop is created earlier during
+ // VPlan construction.
+ Ext = B.createNaryOp(
+ VPInstruction::ExtractFromEnd,
+ {FOR->getBackedgeValue(),
+ Plan.getOrAddLiveIn(ConstantInt::get(
+ IntegerType::get(ExitBB->getContext(), 32), 2))},
+ {}, "vector.recur.extract.for.phi");
+ } else {
+ Ext = B.createNaryOp(
+ VPInstruction::ExtractFromEnd,
+ {V, Plan.getOrAddLiveIn(ConstantInt::get(
+ IntegerType::get(ExitBB->getContext(), 32), 1))});
+ }
+ IR->addOperand(Ext);
+ }
}
}
@@ -8660,14 +8768,6 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
// After here, VPBB should not be used.
VPBB = nullptr;
- if (CM.requiresScalarEpilogue(Range)) {
- // No edge from the middle block to the unique exit block has been inserted
- // and there is nothing to fix from vector loop; phis should have incoming
- // from scalar loop only.
- } else
- addUsersInExitBlock(OrigLoop, RecipeBuilder, *Plan,
- Legal->getInductionVars());
-
assert(isa<VPRegionBlock>(Plan->getVectorLoopRegion()) &&
!Plan->getVectorLoopRegion()->getEntryBasicBlock()->empty() &&
"entry block must be set to a VPRegionBlock having a non-empty entry "
@@ -8676,6 +8776,14 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
addLiveOutsForFirstOrderRecurrences(*Plan);
+ if (CM.requiresScalarEpilogue(Range)) {
+ // No edge from the middle block to the unique exit block has been inserted
+ // and there is nothing to fix from vector loop; phis should have incoming
+ // from scalar loop only.
+ } else
+ addUsersInExitBlock(OrigLoop, RecipeBuilder, *Plan,
+ Legal->getInductionVars());
+
// ---------------------------------------------------------------------------
// Transform initial VPlan: Apply previously taken decisions, in order, to
// bring the VPlan to its final state.
@@ -8884,10 +8992,12 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
for (unsigned I = 0; I != Worklist.size(); ++I) {
VPSingleDefRecipe *Cur = Worklist[I];
for (VPUser *U : Cur->users()) {
- auto *UserRecipe = dyn_cast<VPSingleDefRecipe>(U);
- if (!UserRecipe) {
- assert(isa<VPLiveOut>(U) &&
- "U must either be a VPSingleDef or VPLiveOut");
+ auto *UserRecipe = cast<VPSingleDefRecipe>(U);
+ if (!UserRecipe->getParent()->getParent()) {
+ assert(cast<VPInstruction>(U) &&
+ cast<VPInstruction>(U)->getOpcode() ==
+ VPInstruction::ExtractFromEnd &&
+ "U must be an ExtractFromEnd VPInstruction");
continue;
}
Worklist.insert(UserRecipe);
@@ -9105,8 +9215,10 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
VPInstruction::ComputeReductionResult, {PhiR, NewExitingVPV}, ExitDL);
FinalReductionResult->insertBefore(*MiddleVPBB, IP);
OrigExitingVPV->replaceUsesWithIf(
- FinalReductionResult,
- [](VPUser &User, unsigned) { return isa<VPLiveOut>(&User); });
+ FinalReductionResult, [](VPUser &User, unsigned) {
+ auto *R = dyn_cast<VPInstruction>(&User);
+ return R && R->getOpcode() == VPInstruction::ExtractFromEnd;
+ });
}
VPlanTransforms::clearReductionWrapFlags(*Plan);
@@ -10055,7 +10167,9 @@ bool LoopVectorizePass::processLoop(Loop *L) {
// directly in VPlan.
EpilogILV.setTripCount(MainILV.getTripCount());
for (auto &R : make_early_inc_range(*BestEpiPlan.getPreheader())) {
- auto *ExpandR = cast<VPExpandSCEVRecipe>(&R);
+ auto *ExpandR = dyn_cast<VPExpandSCEVRecipe>(&R);
+ if (!ExpandR)
+ continue;
auto *ExpandedVal = BestEpiPlan.getOrAddLiveIn(
ExpandedSCEVs.find(ExpandR->getSCEV())->second);
ExpandR->replaceAllUsesWith(ExpandedVal);
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 58de6256900f0..b16527386521b 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -855,10 +855,18 @@ VPlan::~VPlan() {
delete BackedgeTakenCount;
}
+static VPIRBasicBlock *createVPIRBasicBlockFor(BasicBlock *BB) {
+ auto *VPIRBB = new VPIRBasicBlock(BB);
+ for (Instruction &I :
+ make_range(BB->begin(), BB->getTerminator()->getIterator()))
+ VPIRBB->appendRecipe(new VPIRInstruction(I));
+ return VPIRBB;
+}
+
VPlanPtr VPlan::createInitialVPlan(const SCEV *TripCount, ScalarEvolution &SE,
bool RequiresScalarEpilogueCheck,
bool TailFolded, Loop *TheLoop) {
- VPIRBasicBlock *Entry = new VPIRBasicBlock(TheLoop->getLoopPreheader());
+ VPIRBasicBlock *Entry = createVPIRBasicBlockFor(TheLoop->getLoopPreheader());
VPBasicBlock *VecPreheader = new VPBasicBlock("vector.ph");
auto Plan = std::make_unique<VPlan>(Entry, VecPreheader);
Plan->TripCount =
@@ -890,7 +898,7 @@ VPlanPtr VPlan::createInitialVPlan(const SCEV *TripCount, ScalarEvolution &SE,
// we unconditionally branch to the scalar preheader. Do nothing.
// 3) Otherwise, construct a runtime check.
BasicBlock *IRExitBlock = TheLoop->getUniqueExitBlock();
- auto *VPExitBlock = new VPIRBasicBlock(IRExitBlock);
+ auto *VPExitBlock = createVPIRBasicBlockFor(IRExitBlock);
// The connection order corresponds to the operands of the conditional branch.
VPBlockUtils::insertBlockAfter(VPExitBlock, MiddleVPBB);
VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
@@ -957,7 +965,7 @@ void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
/// predecessor, which is rewired to the new VPIRBasicBlock. All successors of
/// VPBB, if any, are rewired to the new VPIRBasicBlock.
static void replaceVPBBWithIRVPBB(VPBasicBlock *VPBB, BasicBlock *IRBB) {
- VPIRBasicBlock *IRMiddleVPBB = new VPIRBasicBlock(IRBB);
+ VPIRBasicBlock *IRMiddleVPBB = createVPIRBasicBlockFor(IRBB);
for (auto &R : make_early_inc_range(*VPBB))
R.moveBefore(*IRMiddleVPBB, IRMiddleVPBB->end());
VPBlockBase *PredVPBB = VPBB->getSinglePredecessor();
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index c9da5e5d38a6b..2100734eead1b 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -932,6 +932,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
case VPRecipeBase::VPScalarCastSC:
return true;
case VPRecipeBase::VPInterleaveSC:
+ case VPRecipeBase::VPIRInstructionSC:
case VPRecipeBase::VPBranchOnMaskSC:
case VPRecipeBase::VPWidenLoadEVLSC:
case VPRecipeBase::VPWidenLoadSC:
@@ -1399,6 +1400,45 @@ class VPInstruction : public VPRecipeWithIRFlags {
bool isSingleScalar() const;
};
+/// A recipe to wrap on original IR instruction not to be modified during
+/// execution, execept for PHIs. For PHIs, a single VPValue operand is allowed,
+/// and it is used to add a new incoming value for the single predecessor VPBB.
+/// Expect PHIs, VPIRInstructions cannot have any operands.
+class VPIRInstruction : public VPRecipeBase {
+ Instruction &I;
+
+public:
+ VPIRInstruction(Instruction &I)
+ : VPRecipeBase(VPDef::VPIRInstructionSC, ArrayRef<VPValue *>()), I(I) {}
+
+ ~VPIRInstruction() override = default;
+
+ VP_CLASSOF_IMPL(VPDef::VPIRInstructionSC)
+
+ VPIRInstruction *clone() override {
+ auto *R = new VPIRInstruction(I);
+ for (auto *Op : operands())
+ R->addOperand(Op);
+ return R;
+ }
+
+ void execute(VPTransformState &State) override;
+
+ Instruction &getInstruction() { return I; }
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+ /// Print the recipe.
+ void print(raw_ostream &O, const Twine &Indent,
+ VPSlotTracker &SlotTracker) const override;
+#endif
+
+ bool usesScalars(const VPValue *Op) const override {
+ assert(is_contained(operands(), Op) &&
+ "Op must be an operand of the recipe");
+ return true;
+ }
+};
+
/// VPWidenRecipe is a recipe for producing a widened instruction using the
/// opcode and operands of the recipe. This recipe covers most of the
/// traditional vectorization cases where each recipe transforms into a
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 2d6d67a55c17d..780eeff5a2b7b 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -194,9 +194,6 @@ bool VPRecipeBase::mayHaveSideEffects() const {
void VPLiveOut::fixPhi(VPlan &Plan, VPTransformState &State) {
VPValue *ExitValue = getOperand(0);
- auto Lane = vputils::isUniformAfterVectorization(ExitValue)
- ? VPLane::getFirstLane()
- : VPLane::getLastLaneForVF(State.VF);
VPBasicBlock *MiddleVPBB =
cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSingleSuccessor());
VPRecipeBase *ExitingRecipe = ExitValue->getDefiningRecipe();
@@ -207,10 +204,7 @@ void VPLiveOut::fixPhi(VPlan &Plan, VPTransformState &State) {
? MiddleVPBB
: ExitingVPBB;
BasicBlock *PredBB = State.CFG.VPBB2IRBB[PredVPBB];
- // Set insertion point in PredBB in case an extract needs to be generated.
- // TODO: Model extracts explicitly.
- State.Builder.SetInsertPoint(PredBB, PredBB->getFirstNonPHIIt());
- Value *V = State.get(ExitValue, VPIteration(State.UF - 1, Lane));
+ Value *V = State.get(ExitValue, VPIteration(0, 0));
if (Phi->getBasicBlockIndex(PredBB) != -1)
Phi->setIncomingValueForBlock(PredBB, V);
else
@@ -862,6 +856,35 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent,
}
#endif
+void VPIRInstruction::execute(VPTransformState &State) {
+ assert(isa<VPIRBasicBlock>(getParent()) &&
+ "VPIRInstructions can only be placed in VPIRBasicBlocks");
+
+ if (getNumOperands() == 1 && isa<PHINode>(&I)) {
+ VPValue *ExitValue = getOperand(0);
+ auto Lane = vputils::isUniformAfterVectorization(ExitValue)
+ ? VPLane::getFirstLane()
+ : VPLane::getLastLaneForVF(State.VF);
+ auto *PredVPBB = cast<VPBasicBlock>(getParent()->getSinglePredecessor());
+ BasicBlock *PredBB = State.CFG.VPBB2IRBB[PredVPBB];
+ // Set insertion point in PredBB in case an extract needs to be generated.
+ // TODO: Model extracts explicitly.
+ State.Builder.SetInsertPoint(PredBB, PredBB->getFirstNonPHIIt());
+ Value *V = State.get(ExitValue, VPIteration(State.UF - 1, Lane));
+ auto *Phi = cast<PHINode>(&I);
+ Phi->addIncoming(V, PredBB);
+ }
+
+ State.Builder.SetInsertPoint(I.getParent(), std::next(I.getIterator()));
+}
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPIRInstruction::print(raw_ostream &O, const Twine &Indent,
+ VPSlotTracker &SlotTracker) const {
+ O << Indent << "IR " << I;
+}
+#endif
+
void VPWidenCallRecipe::execute(VPTransformState &State) {
assert(State.VF.isVector() && "not widening");
Function *CalledScalarFn = getCalledScalarFunction();
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 045f6c356669f..a2496f067024c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -826,20 +826,6 @@ bool VPlanTransforms::adjustFixedOrderRecurrences(VPlan &Plan,
if (auto *FOR = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&R))
RecurrencePhis.push_back(FOR);
- VPBasicBlock *MiddleVPBB =
- cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSingleSuccessor());
- VPBuilder MiddleBuilder;
- // Set insert point so new recipes are inserted before terminator and
- // condition, if there is either the former or both.
- if (auto *Term =
- dyn_cast_or_null<VPInstruction>(MiddleVPBB->getTerminator())) {
- if (auto *Cmp = dyn_cast<VPInstruction>(Term->getOperand(0)))
- MiddleBuilder.setInsertPoint(Cmp);
- else
- MiddleBuilder.setInsertPoint(Term);
- } else
- MiddleBuilder.setInsertPoint(MiddleVPBB);
-
for (VPFirstOrderRecurrencePHIRecipe *FOR : RecurrencePhis) {
SmallPtrSet<VPFirstOrderRecurrencePHIRecipe *, 4> SeenPhis;
VPRecipeBase *Previous = FOR->getBackedgeValue()->getDefiningRecipe();
@@ -872,86 +858,6 @@ bool VPlanTransforms::adjustFixedOrderRecurrences(VPlan &Plan,
// Set the first operand of RecurSplice to FOR again, after replacing
// all users.
RecurSplice->setOperand(0, FOR);
-
- // This is the second phase of vectorizing first-order recurrences. An
- // overview of the transformation is described below. Suppose we have the
- // following loop with some use after the loop of the last a[i-1],
- //
- // for (int i = 0; i < n; ++i) {
- // t = a[i - 1];
- // b[i] = a[i] - t;
- // }
- // use t;
- //
- // There is a first-order recurrence on "a". For this loop, the shorthand
- // scalar IR looks like:
- //
- // scalar.ph:
- // s_init = a[-1]
- // br scalar.body
- //
- // scalar.body:
- // i = phi [0, scalar.ph], [i+1, scalar.body]
- // s1 = phi [s_init, scalar.ph], [s2, scalar.body]
- // s2 = a[i]
- // b[i] = s2 - s1
- // br cond, scalar.body, exit.block
- //
- // exit.block:
- // use = lcssa.phi [s1, scalar.body]
- //
- // In this example, s1 is a recurrence because it's value depends on the
- // previous iteration. In the first phase of vectorization, we created a
- // vector phi v1 for s1. We now complete the vectorization and produce the
- // shorthand vector IR shown below (for VF = 4, UF = 1).
- //
- // vector.ph:
- // v_init = vector(..., ..., ..., a[-1])
- // br vector.body
- //
- // vector.body
- // i = phi [0, vector.ph], [i+4, vector.body]
- // v1 = phi [v_init, vector.ph], [v2, vector.body]
- // v2 = a[i, i+1, i+2, i+3];
- // v3 = vector(v1(3), v2(0, 1, 2))
- // b[i, i+1, i+2, i+3] = v2 - v3
- ...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good step forward! Adding various comments.
; CHECK-NEXT: No successors | ||
; CHECK-EMPTY: | ||
; CHECK-NEXT: scalar.ph | ||
; CHECK-NEXT: EMIT vp<[[RESUME_P:%.*]]> = resume-phi vp<[[RESUME_1]]>, ir<22> | ||
; CHECK-NEXT: No successors | ||
; CHECK-EMPTY: | ||
; CHECK-NEXT: Live-out i16 %for.1.lcssa = vp<[[FOR_RESULT]]> | ||
; CHECK-NEXT: Live-out i16 %for.1 = vp<[[RESUME_P]]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once live outs introduced in scalar header by addLiveOutsForFirstOrderRecurrences() are also converted to VPIRInstructions, VPLiveOut can be retired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can wait to find out ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is the plan as follow-up! All uses of VPLiveOut will be replaced by VPIRInstruction.
@@ -572,6 +572,8 @@ define void @print_expand_scev(i64 %y, ptr %ptr) { | |||
; CHECK-NEXT: vp<[[TC:%.+]]> = original trip-count | |||
; CHECK-EMPTY: | |||
; CHECK-NEXT: ir-bb<entry>: | |||
; CHECK-NEXT: IR %div = udiv i64 %y, 492802768830814060 | |||
; CHECK-NEXT: IR %inc = add i64 %div, 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suffice to wrap only the last IR Instruction of a VPIRBasicBlock if all other (expand SCEV) recipes are to appear afterwards/last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When creating the VPIRBasicBlock it would be difficult to check beforehand if that would be the case I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could alternatively eliminate redundant pairs of adjacent non-phi VPIRI's in a later VPlan2VPlan pass.
; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[TMP1]], [[TMP2]] | ||
; CHECK-NEXT: [[T1_0_LCSSA2:%.*]] = ptrtoint ptr [[T1_0_LCSSA]] to i64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ptrtoint seems to now be replicated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think due to insert points now being set slightly differently, preventing SCEV expander to re-use it in this particular case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything worth being (more) careful about when setting insertion points, to facilitate such foldings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure as this should be cleaned up be later folding passes.
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] | ||
; CHECK-NEXT: [[VEC_IND:%.*]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ] | ||
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 | ||
; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The induction now gets vectorized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this relates to the dependent PR, should be gone now from the PR.
@@ -339,6 +339,7 @@ class VPDef { | |||
VPBranchOnMaskSC, | |||
VPDerivedIVSC, | |||
VPExpandSCEVSC, | |||
VPIRInstructionSC, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recipes that represent phi's need to appear below, for isPhi() to work. Should VPIRInstruction be split into VPIRPhi
and VPIRNonPhi
? This specialization may also simplify them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below is just for header phis, so I think it wouldn't apply of VPIRPhi at least for now. I've left it for now as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also for PredInstPhi, a non-loop-header if-then hammock phi of a replicate region. It should be for all phi's planned to be generated - to help keep them together at the start of their basic block.
Related thought - perhaps also worth checking, when fusing together two VPBB's, that the second is free of phi recipes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, we have a header phi ids are grouped together separately.
@@ -862,6 +856,35 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent, | |||
} | |||
#endif | |||
|
|||
void VPIRInstruction::execute(VPTransformState &State) { | |||
assert(isa<VPIRBasicBlock>(getParent()) && | |||
"VPIRInstructions can only be placed in VPIRBasicBlocks"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Independent: assert also applies to SCEV expand recipe? Should be checked by validation instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add to verifier seprarately, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved assert to VPlanVerifier. Will add for SCEVExpandRecipe separately.
/// execution, execept for PHIs. For PHIs, a single VPValue operand is allowed, | ||
/// and it is used to add a new incoming value for the single predecessor VPBB. | ||
/// Expect PHIs, VPIRInstructions cannot have any operands. | ||
class VPIRInstruction : public VPRecipeBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be a VPSingleDefRecipe, as IR instructions define at most one value, but this currently serves only as a placeholder to advance the insert position, and as a live-out use/operand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's at the moment intentionally not a VPSingleDefRecipe, as VPIRIntructions at the moment don't need their def-use chains modeled in VPlan. Making them VPSingleDefRecipe without also modeling VPIRInstruction's operands as VPValues would probably be q bit surprising as the def-use chains aren't connected.
Might be good as follow-up, however we would still need to decide how to deal with exit phis that have their original operand coming from the scalar loop, for which is not yet modeled in VPlan.
static VPIRBasicBlock *createVPIRBasicBlockFor(BasicBlock *BB) { | ||
auto *VPIRBB = new VPIRBasicBlock(BB); | ||
for (Instruction &I : | ||
make_range(BB->begin(), BB->getTerminator()->getIterator())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a simple : *BB
do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentional to skip the terminators and it allows to insert new recipes at the end of the VPIRBB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ok, adding some further thoughts:
Appending VPBB should be fine when iterating over BB?
Unconditional branches are redundant in VPlan, but there are recipes for binary branches. Should terminators be wrapped as well, possibly excluding unconditional branches? Need to make sure appended recipes are inserted before them.
For now, suffice to wrap (lcssa) phi's and last non-terminal instruction, but easier to fully populate with all instructions excluding terminal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, suffice to wrap (lcssa) phi's and last non-terminal instruction, but easier to fully populate with all instructions excluding terminal?
Yes, only partially populating would probably be a bit more logic to skip undesired parts. Not adding the branches also have the benefit that we don't remove the original terminator VPIRI from the middle block when adding the VPlan-modeled branch recipes.
Phi->addIncoming(V, PredBB); | ||
} | ||
|
||
State.Builder.SetInsertPoint(I.getParent(), std::next(I.getIterator())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this simplify VPIRBasicBlock::execute(), where this setInsertPoint() replaces the one there.
Independent: can VPBasicBlock::execute() be simplified, given VPIRBasicBlocks: "A. the first VPBB reuses the loop pre-header BB" - should this reuse be modelled by fusing into a VPIRBasicBlock? Further cases should be simplified once replicate regions are unfolded as a VPlan2VPlan transformation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this simplify VPIRBasicBlock::execute(), where this setInsertPoint() replaces the one there.
Unfortunately not yet; We still need to set the default insert point for empty VPIRBasicBlocks and also some IR instructions may be created in VPIRBasicBlock (the pre-preheader) only during skeleton creation, which will be missing from the VPIRBB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated and address comments now that #100658 has landed
static VPIRBasicBlock *createVPIRBasicBlockFor(BasicBlock *BB) { | ||
auto *VPIRBB = new VPIRBasicBlock(BB); | ||
for (Instruction &I : | ||
make_range(BB->begin(), BB->getTerminator()->getIterator())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentional to skip the terminators and it allows to insert new recipes at the end of the VPIRBB.
/// execution, execept for PHIs. For PHIs, a single VPValue operand is allowed, | ||
/// and it is used to add a new incoming value for the single predecessor VPBB. | ||
/// Expect PHIs, VPIRInstructions cannot have any operands. | ||
class VPIRInstruction : public VPRecipeBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's at the moment intentionally not a VPSingleDefRecipe, as VPIRIntructions at the moment don't need their def-use chains modeled in VPlan. Making them VPSingleDefRecipe without also modeling VPIRInstruction's operands as VPValues would probably be q bit surprising as the def-use chains aren't connected.
Might be good as follow-up, however we would still need to decide how to deal with exit phis that have their original operand coming from the scalar loop, for which is not yet modeled in VPlan.
@@ -862,6 +856,35 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent, | |||
} | |||
#endif | |||
|
|||
void VPIRInstruction::execute(VPTransformState &State) { | |||
assert(isa<VPIRBasicBlock>(getParent()) && | |||
"VPIRInstructions can only be placed in VPIRBasicBlocks"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add to verifier seprarately, thanks!
Phi->addIncoming(V, PredBB); | ||
} | ||
|
||
State.Builder.SetInsertPoint(I.getParent(), std::next(I.getIterator())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this simplify VPIRBasicBlock::execute(), where this setInsertPoint() replaces the one there.
Unfortunately not yet; We still need to set the default insert point for empty VPIRBasicBlocks and also some IR instructions may be created in VPIRBasicBlock (the pre-preheader) only during skeleton creation, which will be missing from the VPIRBB.
@@ -339,6 +339,7 @@ class VPDef { | |||
VPBranchOnMaskSC, | |||
VPDerivedIVSC, | |||
VPExpandSCEVSC, | |||
VPIRInstructionSC, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below is just for header phis, so I think it wouldn't apply of VPIRPhi at least for now. I've left it for now as is.
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ] | ||
; CHECK-NEXT: [[VEC_IND:%.*]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ] | ||
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 | ||
; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this relates to the dependent PR, should be gone now from the PR.
@@ -572,6 +572,8 @@ define void @print_expand_scev(i64 %y, ptr %ptr) { | |||
; CHECK-NEXT: vp<[[TC:%.+]]> = original trip-count | |||
; CHECK-EMPTY: | |||
; CHECK-NEXT: ir-bb<entry>: | |||
; CHECK-NEXT: IR %div = udiv i64 %y, 492802768830814060 | |||
; CHECK-NEXT: IR %inc = add i64 %div, 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When creating the VPIRBasicBlock it would be difficult to check beforehand if that would be the case I think
; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[TMP1]], [[TMP2]] | ||
; CHECK-NEXT: [[T1_0_LCSSA2:%.*]] = ptrtoint ptr [[T1_0_LCSSA]] to i64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think due to insert points now being set slightly differently, preventing SCEV expander to re-use it in this particular case
ping :) |
ping :) |
@@ -8637,7 +8637,7 @@ static void addCanonicalIVRecipes(VPlan &Plan, Type *IdxTy, bool HasNUW, | |||
// are modeled in VPlan. Some exiting values are not modeled explicitly yet and | |||
// won't be included. Those are un-truncated VPWidenIntOrFpInductionRecipe, | |||
// VPWidenPointerInductionRecipe and induction increments. | |||
static MapVector<PHINode *, VPValue *> collectUsersInExitBlock( | |||
static MapVector<VPIRInstruction *, VPValue *> collectUsersInExitBlock( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth updating comment above that talks about Exit/phi's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can assign a name for this type of map, to clarify and simplify passing it as a parameter below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted to use SetVector<VPIRInstruction*>
, adding the exiting values as operand early, which should simplify the passed arguments enough hopefully
Value *IncomingValue = | ||
ExitPhi.getIncomingValueForBlock(ExitingBB); | ||
for (VPRecipeBase &R : *ExitVPBB) { | ||
auto *IR = dyn_cast<VPIRInstruction>(&R); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto *IR = dyn_cast<VPIRInstruction>(&R); | |
auto *ExitIRI = dyn_cast<VPIRInstruction>(&R); |
or something else more accurate than IR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
@@ -8695,17 +8698,18 @@ addUsersInExitBlock(VPlan &Plan, | |||
} | |||
|
|||
// Introduce VPUsers modeling the exit values. | |||
for (const auto &[ExitPhi, V] : ExitingValuesToFix) { | |||
for (const auto &[IR, V] : ExitingValuesToFix) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (const auto &[IR, V] : ExitingValuesToFix) { | |
for (const auto &[ExitIRI, V] : ExitingValuesToFix) { |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
continue; | ||
auto *ExitPhi = dyn_cast<PHINode>(&IR->getInstruction()); | ||
if (!ExitPhi) | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaking because all remaining recipes are expected to be non phi's? Should the lcssa VPIRI's be considered phi recipes (as do loop header phi recipes) to help ensure they appear before non-phi recipes (VPIRI's) in the VP(IR)BB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaking because all remaining recipes are expected to be non phi's
Yes.
Should the lcssa VPIRI's be considered phi recipes (as do loop header phi recipes) to help ensure they appear before non-phi recipes (VPIRI's) in the VP(IR)BB?
We could but this would pull in more complexity in this patch without much benefits initially. Perhaps better done once a more concrete need arises/as follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate VPIRPhi subclass of VIPIRInstruction dedicated to phi's would arguably model the distinction between the two more accurately, and help simplify their documentation, operand behavior, execute(), keeping phi's together, iterating over them, etc., but this can be considered as a follow-up.
@@ -8695,17 +8698,18 @@ addUsersInExitBlock(VPlan &Plan, | |||
} | |||
|
|||
// Introduce VPUsers modeling the exit values. | |||
for (const auto &[ExitPhi, V] : ExitingValuesToFix) { | |||
for (const auto &[IR, V] : ExitingValuesToFix) { | |||
// Pass live-in values used by exit phis directly through to the live-out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Unrelated) Do such values that bypass the vector loop exist, can VPlan ignore them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to add incoming values for new edges added, I think, which at the moment is modeled naturally in VPlan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phi's in exit block that are fed by live-in values which bypass the original loop, seem redundant - should they be considered lcssa phi's? Can be checked later, independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are redundant, but I think there were some cases where it made it through the pipeline (there's a number of cases where various unsimplified constructs make it to LV), which caused adding https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/LoopVectorize/extract-from-end-vector-constant.ll#L35.
We could alternatively remove them after vectorization, but this cannot be done in VPlan, as the users of the exit phi are not modeled. Doing a separate cleanup outside of VPlan would also be an option, but less desirable than the current solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, optimizing away redundant phi's in exit block that are fed by live-in values which bypass the original loop, and which do not qualify as lcssa phi's, best be done before LV, which relies on (proper) lcssa form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, but we still need to handle incoming IR that may not be simplified. Or would you prefer to run some pre-cleanup passes before vectorizing? At the moment, I think there is no easy way to do this cleanup before creating VPlans and not modifying the original IR, if we are not vectorizing.
@@ -60,12 +60,11 @@ define i32 @reduction(ptr %a, i64 %n, i32 %start) { | |||
; IF-EVL-INLOOP-NEXT: Successor(s): ir-bb<for.end>, scalar.ph | |||
; IF-EVL-INLOOP-EMPTY: | |||
; IF-EVL-INLOOP-NEXT: ir-bb<for.end>: | |||
; IF-EVL-INLOOP-NEXT: IR %add.lcssa = phi i32 [ %add, %for.body ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this capture the RDX_EX operand as the live out did below?
Good to see the live-out being placed in the context of exit bb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
; CHECK: %_tmp133.lcssa1 = phi i16 [ %_tmp133, %bb2 ], [ %vector.recur.extract.for.phi1, %middle.block ] | ||
; CHECK: %_tmp133.lcssa = phi i16 [ %_tmp133, %bb2 ], [ %vector.recur.extract.for.phi, %middle.block ] | ||
; CHECK: %_tmp133.lcssa1 = phi i16 [ %_tmp133, %bb2 ], [ %vector.recur.extract.for.phi, %middle.block ] | ||
; CHECK: %_tmp133.lcssa = phi i16 [ %_tmp133, %bb2 ], [ %vector.recur.extract.for.phi1, %middle.block ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correcting wrong code (or vice versa)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just adjusts the order, both LCSSA phis in the source have the same incoming value
; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[TMP1]], [[TMP2]] | ||
; CHECK-NEXT: [[T1_0_LCSSA2:%.*]] = ptrtoint ptr [[T1_0_LCSSA]] to i64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything worth being (more) careful about when setting insertion points, to facilitate such foldings?
@@ -572,6 +572,8 @@ define void @print_expand_scev(i64 %y, ptr %ptr) { | |||
; CHECK-NEXT: vp<[[TC:%.+]]> = original trip-count | |||
; CHECK-EMPTY: | |||
; CHECK-NEXT: ir-bb<entry>: | |||
; CHECK-NEXT: IR %div = udiv i64 %y, 492802768830814060 | |||
; CHECK-NEXT: IR %inc = add i64 %div, 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could alternatively eliminate redundant pairs of adjacent non-phi VPIRI's in a later VPlan2VPlan pass.
; CHECK-NEXT: No successors | ||
; CHECK-EMPTY: | ||
; CHECK-NEXT: scalar.ph | ||
; CHECK-NEXT: EMIT vp<[[RESUME_P:%.*]]> = resume-phi vp<[[RESUME_1]]>, ir<22> | ||
; CHECK-NEXT: No successors | ||
; CHECK-EMPTY: | ||
; CHECK-NEXT: Live-out i16 %for.1.lcssa = vp<[[FOR_RESULT]]> | ||
; CHECK-NEXT: Live-out i16 %for.1 = vp<[[RESUME_P]]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can wait to find out ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks, adding some final comments.
continue; | ||
auto *ExitPhi = dyn_cast<PHINode>(&IR->getInstruction()); | ||
if (!ExitPhi) | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate VPIRPhi subclass of VIPIRInstruction dedicated to phi's would arguably model the distinction between the two more accurately, and help simplify their documentation, operand behavior, execute(), keeping phi's together, iterating over them, etc., but this can be considered as a follow-up.
@@ -8637,7 +8637,7 @@ static void addCanonicalIVRecipes(VPlan &Plan, Type *IdxTy, bool HasNUW, | |||
// are modeled in VPlan. Some exiting values are not modeled explicitly yet and | |||
// won't be included. Those are un-truncated VPWidenIntOrFpInductionRecipe, | |||
// VPWidenPointerInductionRecipe and induction increments. | |||
static MapVector<PHINode *, VPValue *> collectUsersInExitBlock( | |||
static MapVector<VPIRInstruction *, VPValue *> collectUsersInExitBlock( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can assign a name for this type of map, to clarify and simplify passing it as a parameter below.
case VPRecipeBase::VPInterleaveSC: | ||
case VPRecipeBase::VPIRInstructionSC: | ||
case VPRecipeBase::VPBranchOnMaskSC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case VPRecipeBase::VPInterleaveSC: | |
case VPRecipeBase::VPIRInstructionSC: | |
case VPRecipeBase::VPBranchOnMaskSC: | |
case VPRecipeBase::VPBranchOnMaskSC: | |
case VPRecipeBase::VPInterleaveSC: | |
case VPRecipeBase::VPIRInstructionSC: |
trying to keep these in order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
/// A recipe to wrap on original IR instruction not to be modified during | ||
/// execution, execept for PHIs. For PHIs, a single VPValue operand is allowed, | ||
/// and it is used to add a new incoming value for the single predecessor VPBB. | ||
/// Expect PHIs, VPIRInstructions cannot have any operands. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be, OTOH may simplify rather than complicate, worth evaluating, possibly as a follow-up.
VPSlotTracker &SlotTracker) const { | ||
O << Indent << "IR " << I; | ||
|
||
if (getNumOperands() != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, e.g., must be true for VPIRPhi's and false for other VPIRI's.
if (auto *IRI = dyn_cast<VPIRInstruction>(&R)) { | ||
if (!isa<VPIRBasicBlock>(IRI->getParent())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (auto *IRI = dyn_cast<VPIRInstruction>(&R)) { | |
if (!isa<VPIRBasicBlock>(IRI->getParent())) { | |
if (isa<VPIRInstruction>(&R) ^ isa<VPIRBasicBlock>(VPBB)) { |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
@@ -972,7 +980,7 @@ void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV, | |||
/// predecessor, which is rewired to the new VPIRBasicBlock. All successors of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth clarifying and asserting (independently).
@@ -8695,17 +8698,18 @@ addUsersInExitBlock(VPlan &Plan, | |||
} | |||
|
|||
// Introduce VPUsers modeling the exit values. | |||
for (const auto &[ExitPhi, V] : ExitingValuesToFix) { | |||
for (const auto &[IR, V] : ExitingValuesToFix) { | |||
// Pass live-in values used by exit phis directly through to the live-out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, optimizing away redundant phi's in exit block that are fed by live-in values which bypass the original loop, and which do not qualify as lcssa phi's, best be done before LV, which relies on (proper) lcssa form.
✅ With the latest revision this PR passed the C/C++ code formatter. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/145/builds/1845 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/186/builds/2329 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/72/builds/3307 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/66/builds/3916 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/51/builds/3814 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/168/builds/3322 Here is the relevant piece of the build log for the reference
|
Follow-up to suggestion during #100735. More specifically https://github.com/llvm/llvm-project/pull/100735/files/9a40ed0919bad7fb79123317267d3eb36ff2c582#diff-6d0b73adfa9f8465923d2225ab6674ddcdeab71666f7a73dfaec7fa1246b3a1f
Add a new VPIRInstruction recipe to wrap existing IR instructions not to
be modified during execution, execept for PHIs. For PHIs, a single VPValue
operand is allowed, and it is used to add a new incoming value for the
single predecessor VPBB. Expect PHIs, VPIRInstructions cannot have any
operands.
Depends on #100658.