Skip to content

Commit 0aad007

Browse files
committed
[lsr][term-fold] Restrict expansion budget, particularly for profiled loops
This is a follow up to an item I noted in my submission comment for e947f95. I don't have a real world example where this is triggering unprofitably, but avoiding the transform when we estimate the loop to be short running from profiling seems quite reasonable. A couple of observations: * Because we already checked for affine addrecs, the worst expansion we're going to see is the unknown trip count * unknown step case (both are loop invariant). I picked a fallback cost (2 * SCEV's default expansion budget of 4) which was high enough to allow this. These cases will end up being somewhat rare anyways as finding one where we can prove legality takes some work. * Short running loops with constant trip counts produce either constant expressions (if the step is constant) or multiplications by small constant (if the step is unknown). Both of these are cheap by any reasonable budget and thus not wroth checking for explicitly. * Estimated trip counts come from profile data, loops without profile data will not have one. Overall, this doesn't end up making much of a difference in practical behavior of the transform. Probably still worth landing, but I expect this to be pretty low impact. Possible future work (which I have no plans of doing at this time): * Ppick the lowest cost alternate IV if we have multiple ones under the budget. * Use SCEV to prove a constant upper bound on the trip count, and restrict the transform for unknown-but-small trip counts. * I think we can now remove the affine restriction if we wanted. Would need to think carefully about legality on this one. I don't have examples where any of these would be profitable in practice, but we'll see what comes up in the future.
1 parent 04cbfcc commit 0aad007

File tree

2 files changed

+25
-12
lines changed

2 files changed

+25
-12
lines changed

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp

+20-2
Original file line numberDiff line numberDiff line change
@@ -6730,7 +6730,7 @@ static llvm::PHINode *GetInductionVariable(const Loop &L, ScalarEvolution &SE,
67306730

67316731
static std::optional<std::tuple<PHINode *, PHINode *, const SCEV *, bool>>
67326732
canFoldTermCondOfLoop(Loop *L, ScalarEvolution &SE, DominatorTree &DT,
6733-
const LoopInfo &LI) {
6733+
const LoopInfo &LI, const TargetTransformInfo &TTI) {
67346734
if (!L->isInnermost()) {
67356735
LLVM_DEBUG(dbgs() << "Cannot fold on non-innermost loop\n");
67366736
return std::nullopt;
@@ -6781,13 +6781,23 @@ canFoldTermCondOfLoop(Loop *L, ScalarEvolution &SE, DominatorTree &DT,
67816781
if (!isAlmostDeadIV(ToFold, LoopLatch, TermCond))
67826782
return std::nullopt;
67836783

6784+
// Inserting instructions in the preheader has a runtime cost, scale
6785+
// the allowed cost with the loops trip count as best we can.
6786+
const unsigned ExpansionBudget = [&]() {
6787+
if (std::optional<unsigned> SmallTC = getLoopEstimatedTripCount(L))
6788+
return std::min(2*SCEVCheapExpansionBudget, *SmallTC);
6789+
// Unknown trip count, assume long running by default.
6790+
return 2*SCEVCheapExpansionBudget;
6791+
}();
6792+
67846793
const SCEV *BECount = SE.getBackedgeTakenCount(L);
67856794
const DataLayout &DL = L->getHeader()->getModule()->getDataLayout();
67866795
SCEVExpander Expander(SE, DL, "lsr_fold_term_cond");
67876796

67886797
PHINode *ToHelpFold = nullptr;
67896798
const SCEV *TermValueS = nullptr;
67906799
bool MustDropPoison = false;
6800+
auto InsertPt = L->getLoopPreheader()->getTerminator();
67916801
for (PHINode &PN : L->getHeader()->phis()) {
67926802
if (ToFold == &PN)
67936803
continue;
@@ -6828,6 +6838,14 @@ canFoldTermCondOfLoop(Loop *L, ScalarEvolution &SE, DominatorTree &DT,
68286838
continue;
68296839
}
68306840

6841+
if (Expander.isHighCostExpansion(TermValueSLocal, L, ExpansionBudget,
6842+
&TTI, InsertPt)) {
6843+
LLVM_DEBUG(
6844+
dbgs() << "Is too expensive to expand terminating value for phi node"
6845+
<< PN << "\n");
6846+
continue;
6847+
}
6848+
68316849
// The candidate IV may have been otherwise dead and poison from the
68326850
// very first iteration. If we can't disprove that, we can't use the IV.
68336851
if (!mustExecuteUBIfPoisonOnPathTo(&PN, LoopLatch->getTerminator(), &DT)) {
@@ -6953,7 +6971,7 @@ static bool ReduceLoopStrength(Loop *L, IVUsers &IU, ScalarEvolution &SE,
69536971
}();
69546972

69556973
if (EnableFormTerm) {
6956-
if (auto Opt = canFoldTermCondOfLoop(L, SE, DT, LI)) {
6974+
if (auto Opt = canFoldTermCondOfLoop(L, SE, DT, LI, TTI)) {
69576975
auto [ToFold, ToHelpFold, TermValueS, MustDrop] = *Opt;
69586976

69596977
Changed = true;

llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll

+5-10
Original file line numberDiff line numberDiff line change
@@ -478,20 +478,15 @@ define void @expensive_expand_short_tc(ptr %a, i32 %offset, i32 %n) {
478478
; CHECK-LABEL: @expensive_expand_short_tc(
479479
; CHECK-NEXT: entry:
480480
; CHECK-NEXT: [[UGLYGEP:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 84
481-
; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[N:%.*]], -1
482-
; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64
483-
; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
484-
; CHECK-NEXT: [[TMP3:%.*]] = sext i32 [[OFFSET:%.*]] to i64
485-
; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP2]], [[TMP3]]
486-
; CHECK-NEXT: [[TMP5:%.*]] = add nsw i64 [[TMP4]], 84
487-
; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP5]]
488481
; CHECK-NEXT: br label [[FOR_BODY:%.*]]
489482
; CHECK: for.body:
490483
; CHECK-NEXT: [[LSR_IV1:%.*]] = phi ptr [ [[UGLYGEP2:%.*]], [[FOR_BODY]] ], [ [[UGLYGEP]], [[ENTRY:%.*]] ]
484+
; CHECK-NEXT: [[LSR_IV:%.*]] = phi i32 [ [[LSR_IV_NEXT:%.*]], [[FOR_BODY]] ], [ 0, [[ENTRY]] ]
491485
; CHECK-NEXT: store i32 1, ptr [[LSR_IV1]], align 4
492-
; CHECK-NEXT: [[UGLYGEP2]] = getelementptr i8, ptr [[LSR_IV1]], i32 [[OFFSET]]
493-
; CHECK-NEXT: [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND:%.*]] = icmp eq ptr [[UGLYGEP2]], [[SCEVGEP]]
494-
; CHECK-NEXT: br i1 [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND]], label [[FOR_END:%.*]], label [[FOR_BODY]], !prof [[PROF0:![0-9]+]]
486+
; CHECK-NEXT: [[LSR_IV_NEXT]] = add nsw i32 [[LSR_IV]], 1
487+
; CHECK-NEXT: [[UGLYGEP2]] = getelementptr i8, ptr [[LSR_IV1]], i32 [[OFFSET:%.*]]
488+
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i32 [[LSR_IV_NEXT]], [[N:%.*]]
489+
; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]], !prof [[PROF0:![0-9]+]]
495490
; CHECK: for.end:
496491
; CHECK-NEXT: ret void
497492
;

0 commit comments

Comments
 (0)