[lsr][term-fold] Restrict transform to low cost expansions #74747

preames · 2023-12-07T18:55:10Z

This is a follow up to an item I noted in my submission comment for e947f95. I don't have a real world example where this is triggering unprofitably, but avoiding the transform when we estimate the loop to be short running from profiling seems quite reasonable. It's also now come up as a possibility in a regression twice in two days, so I'd like to get this in to close out the possibility if nothing else.

The original review dropped the threshold for short trip count loops. I will return to that in a separate review if this lands.

llvmbot · 2023-12-07T18:55:41Z

@llvm/pr-subscribers-llvm-transforms

Author: Philip Reames (preames)

Changes

… loops

This is a follow up to an item I noted in my submission comment for e947f95. I don't have a real world example where this is triggering unprofitably, but avoiding the transform when we estimate the loop to be short running from profiling seems quite reasonable.

A couple of observations:

Because we already checked for affine addrecs, the worst expansion we're going to see is the unknown trip count * unknown step case (both are loop invariant). I picked a fallback cost (2 * SCEV's default expansion budget of 4) which was high enough to allow this. These cases will end up being somewhat rare anyways as finding one where we can prove legality takes some work.
Short running loops with constant trip counts produce either constant expressions (if the step is constant) or multiplications by small constant (if the step is unknown). Both of these are cheap by any reasonable budget and thus not wroth checking for explicitly.
Estimated trip counts come from profile data, loops without profile data will not have one.

Overall, this doesn't end up making much of a difference in practical behavior of the transform. Probably still worth landing, but I expect this to be pretty low impact.

Possible future work (which I have no plans of doing at this time):

Ppick the lowest cost alternate IV if we have multiple ones under the budget.
Use SCEV to prove a constant upper bound on the trip count, and restrict the transform for unknown-but-small trip counts.
I think we can now remove the affine restriction if we wanted. Would need to think carefully about legality on this one.

I don't have examples where any of these would be profitable in practice, but we'll see what comes up in the future.

Full diff: https://github.com/llvm/llvm-project/pull/74747.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp (+20-2)
(modified) llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll (+5-10)

diff --git a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
index 39607464dd009..2bcf6a2d659c0 100644
--- a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
@@ -6730,7 +6730,7 @@ static llvm::PHINode *GetInductionVariable(const Loop &L, ScalarEvolution &SE,
 
 static std::optional<std::tuple<PHINode *, PHINode *, const SCEV *, bool>>
 canFoldTermCondOfLoop(Loop *L, ScalarEvolution &SE, DominatorTree &DT,
-                      const LoopInfo &LI) {
+                      const LoopInfo &LI, const TargetTransformInfo &TTI) {
   if (!L->isInnermost()) {
     LLVM_DEBUG(dbgs() << "Cannot fold on non-innermost loop\n");
     return std::nullopt;
@@ -6781,6 +6781,15 @@ canFoldTermCondOfLoop(Loop *L, ScalarEvolution &SE, DominatorTree &DT,
   if (!isAlmostDeadIV(ToFold, LoopLatch, TermCond))
     return std::nullopt;
 
+  // Inserting instructions in the preheader has a runtime cost, scale
+  // the allowed cost with the loops trip count as best we can.
+  const unsigned ExpansionBudget = [&]() {
+    if (std::optional<unsigned> SmallTC = getLoopEstimatedTripCount(L))
+      return std::min(2*SCEVCheapExpansionBudget, *SmallTC);
+    // Unknown trip count, assume long running by default.
+    return 2*SCEVCheapExpansionBudget;
+  }();
+
   const SCEV *BECount = SE.getBackedgeTakenCount(L);
   const DataLayout &DL = L->getHeader()->getModule()->getDataLayout();
   SCEVExpander Expander(SE, DL, "lsr_fold_term_cond");
@@ -6788,6 +6797,7 @@ canFoldTermCondOfLoop(Loop *L, ScalarEvolution &SE, DominatorTree &DT,
   PHINode *ToHelpFold = nullptr;
   const SCEV *TermValueS = nullptr;
   bool MustDropPoison = false;
+  auto InsertPt = L->getLoopPreheader()->getTerminator();
   for (PHINode &PN : L->getHeader()->phis()) {
     if (ToFold == &PN)
       continue;
@@ -6828,6 +6838,14 @@ canFoldTermCondOfLoop(Loop *L, ScalarEvolution &SE, DominatorTree &DT,
       continue;
     }
 
+    if (Expander.isHighCostExpansion(TermValueSLocal, L, ExpansionBudget,
+                                     &TTI, InsertPt)) {
+      LLVM_DEBUG(
+          dbgs() << "Is too expensive to expand terminating value for phi node"
+                 << PN << "\n");
+      continue;
+    }
+
     // The candidate IV may have been otherwise dead and poison from the
     // very first iteration.  If we can't disprove that, we can't use the IV.
     if (!mustExecuteUBIfPoisonOnPathTo(&PN, LoopLatch->getTerminator(), &DT)) {
@@ -6953,7 +6971,7 @@ static bool ReduceLoopStrength(Loop *L, IVUsers &IU, ScalarEvolution &SE,
   }();
 
   if (EnableFormTerm) {
-    if (auto Opt = canFoldTermCondOfLoop(L, SE, DT, LI)) {
+    if (auto Opt = canFoldTermCondOfLoop(L, SE, DT, LI, TTI)) {
       auto [ToFold, ToHelpFold, TermValueS, MustDrop] = *Opt;
 
       Changed = true;
diff --git a/llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll b/llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll
index 0203abe69ac29..5086f533c38d8 100644
--- a/llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll
+++ b/llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll
@@ -478,20 +478,15 @@ define void @expensive_expand_short_tc(ptr %a, i32 %offset, i32 %n) {
 ; CHECK-LABEL: @expensive_expand_short_tc(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[UGLYGEP:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 84
-; CHECK-NEXT:    [[TMP0:%.*]] = add i32 [[N:%.*]], -1
-; CHECK-NEXT:    [[TMP1:%.*]] = zext i32 [[TMP0]] to i64
-; CHECK-NEXT:    [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
-; CHECK-NEXT:    [[TMP3:%.*]] = sext i32 [[OFFSET:%.*]] to i64
-; CHECK-NEXT:    [[TMP4:%.*]] = mul i64 [[TMP2]], [[TMP3]]
-; CHECK-NEXT:    [[TMP5:%.*]] = add nsw i64 [[TMP4]], 84
-; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP5]]
 ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
 ; CHECK:       for.body:
 ; CHECK-NEXT:    [[LSR_IV1:%.*]] = phi ptr [ [[UGLYGEP2:%.*]], [[FOR_BODY]] ], [ [[UGLYGEP]], [[ENTRY:%.*]] ]
+; CHECK-NEXT:    [[LSR_IV:%.*]] = phi i32 [ [[LSR_IV_NEXT:%.*]], [[FOR_BODY]] ], [ 0, [[ENTRY]] ]
 ; CHECK-NEXT:    store i32 1, ptr [[LSR_IV1]], align 4
-; CHECK-NEXT:    [[UGLYGEP2]] = getelementptr i8, ptr [[LSR_IV1]], i32 [[OFFSET]]
-; CHECK-NEXT:    [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND:%.*]] = icmp eq ptr [[UGLYGEP2]], [[SCEVGEP]]
-; CHECK-NEXT:    br i1 [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND]], label [[FOR_END:%.*]], label [[FOR_BODY]], !prof [[PROF0:![0-9]+]]
+; CHECK-NEXT:    [[LSR_IV_NEXT]] = add nsw i32 [[LSR_IV]], 1
+; CHECK-NEXT:    [[UGLYGEP2]] = getelementptr i8, ptr [[LSR_IV1]], i32 [[OFFSET:%.*]]
+; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i32 [[LSR_IV_NEXT]], [[N:%.*]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]], !prof [[PROF0:![0-9]+]]
 ; CHECK:       for.end:
 ; CHECK-NEXT:    ret void
 ;

github-actions · 2023-12-07T18:57:31Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff 5282202db04f8fb57b581ce8871851becc0eb737 f63c971315dccc08771f134a1cc17f2e72425875 -- llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp

View the diff from clang-format here.

diff --git a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
index 831911ab21..e0509befaf 100644
--- a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
@@ -6862,9 +6862,8 @@ canFoldTermCondOfLoop(Loop *L, ScalarEvolution &SE, DominatorTree &DT,
       continue;
     }
 
-    if (Expander.isHighCostExpansion(TermValueSLocal, L,
-                                     2*SCEVCheapExpansionBudget,
-                                     &TTI, InsertPt)) {
+    if (Expander.isHighCostExpansion(
+            TermValueSLocal, L, 2 * SCEVCheapExpansionBudget, &TTI, InsertPt)) {
       LLVM_DEBUG(
           dbgs() << "Is too expensive to expand terminating value for phi node"
                  << PN << "\n");

nikic · 2023-12-08T10:33:10Z

Just to double check, do you see regressions if you just use the standard isHighCostExpansion() check? And that's why you are trying to use this more conservative profile count based heuristic instead?

preames · 2023-12-08T15:41:11Z

Just to double check, do you see regressions if you just use the standard isHighCostExpansion() check? And that's why you are trying to use this more conservative profile count based heuristic instead?

Yes, if I use the standard SCEV threshold. I can of course use a larger single threshold if you'd rather, but then this patch is largely pointless as we can't generate anything worse than about 8 instructions from the non-constant * non-constant case no matter what we do.

nikic · 2023-12-14T15:22:23Z

Just to double check, do you see regressions if you just use the standard isHighCostExpansion() check? And that's why you are trying to use this more conservative profile count based heuristic instead?

Yes, if I use the standard SCEV threshold. I can of course use a larger single threshold if you'd rather, but then this patch is largely pointless as we can't generate anything worse than about 8 instructions from the non-constant * non-constant case no matter what we do.

I don't think this is true. BECounts can get quite complex. To give a very simple variation on one of your tests:

define void @expensive_expand_unknown_tc(ptr %a, i32 %offset, i32 %n, i32 %step) mustprogress {
; CHECK-LABEL: @expensive_expand_unknown_tc(
; CHECK-NEXT:  entry:
; CHECK-NEXT:    [[UGLYGEP:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 84
; CHECK-NEXT:    [[SMAX:%.*]] = call i32 @llvm.smax.i32(i32 [[STEP:%.*]], i32 [[N:%.*]])
; CHECK-NEXT:    [[TMP0:%.*]] = sub i32 [[SMAX]], [[STEP]]
; CHECK-NEXT:    [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP0]], i32 1)
; CHECK-NEXT:    [[TMP1:%.*]] = sub i32 [[TMP0]], [[UMIN]]
; CHECK-NEXT:    [[UMAX:%.*]] = call i32 @llvm.umax.i32(i32 [[STEP]], i32 1)
; CHECK-NEXT:    [[TMP2:%.*]] = udiv i32 [[TMP1]], [[UMAX]]
; CHECK-NEXT:    [[TMP3:%.*]] = add i32 [[UMIN]], [[TMP2]]
; CHECK-NEXT:    [[TMP4:%.*]] = zext i32 [[TMP3]] to i64
; CHECK-NEXT:    [[TMP5:%.*]] = add nuw nsw i64 [[TMP4]], 1
; CHECK-NEXT:    [[TMP6:%.*]] = sext i32 [[OFFSET:%.*]] to i64
; CHECK-NEXT:    [[TMP7:%.*]] = mul i64 [[TMP5]], [[TMP6]]
; CHECK-NEXT:    [[TMP8:%.*]] = add nsw i64 [[TMP7]], 84
; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP8]]
; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
; CHECK:       for.body:
; CHECK-NEXT:    [[LSR_IV1:%.*]] = phi ptr [ [[UGLYGEP2:%.*]], [[FOR_BODY]] ], [ [[UGLYGEP]], [[ENTRY:%.*]] ]
; CHECK-NEXT:    store i32 1, ptr [[LSR_IV1]], align 4
; CHECK-NEXT:    [[UGLYGEP2]] = getelementptr i8, ptr [[LSR_IV1]], i32 [[OFFSET]]
; CHECK-NEXT:    [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND:%.*]] = icmp eq ptr [[UGLYGEP2]], [[SCEVGEP]]
; CHECK-NEXT:    br i1 [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
; CHECK:       for.end:
; CHECK-NEXT:    ret void
;
entry:
  %uglygep = getelementptr i8, ptr %a, i64 84
  br label %for.body

for.body:                                         ; preds = %for.body, %entry
  %lsr.iv1 = phi ptr [ %uglygep2, %for.body ], [ %uglygep, %entry ]
  %lsr.iv = phi i32 [ %lsr.iv.next, %for.body ], [ 0, %entry ]
  store i32 1, ptr %lsr.iv1, align 4
  %lsr.iv.next = add nsw i32 %lsr.iv, %step
  %uglygep2 = getelementptr i8, ptr %lsr.iv1, i32 %offset
  %exitcond.not = icmp sge i32 %lsr.iv.next, %n
  br i1 %exitcond.not, label %for.end, label %for.body

for.end:                                          ; preds = %for.body
  ret void
}

This adds 14 instructions including 3 min/max operations which likely have higher cost.

As mentioned in #74747, this case is triggering a particularly high cost trip count expansion.

… loops This is a follow up to an item I noted in my submission comment for e947f95. I don't have a real world example where this is triggering unprofitably, but avoiding the transform when we estimate the loop to be short running from profiling seems quite reasonable. A couple of observations: * Because we already checked for affine addrecs, the worst expansion we're going to see is the unknown trip count * unknown step case (both are loop invariant). I picked a fallback cost (2 * SCEV's default expansion budget of 4) which was high enough to allow this. These cases will end up being somewhat rare anyways as finding one where we can prove legality takes some work. * Short running loops with constant trip counts produce either constant expressions (if the step is constant) or multiplications by small constant (if the step is unknown). Both of these are cheap by any reasonable budget and thus not wroth checking for explicitly. * Estimated trip counts come from profile data, loops without profile data will not have one. Overall, this doesn't end up making much of a difference in practical behavior of the transform. Probably still worth landing, but I expect this to be pretty low impact. Possible future work (which I have no plans of doing at this time): * Ppick the lowest cost alternate IV if we have multiple ones under the budget. * Use SCEV to prove a constant upper bound on the trip count, and restrict the transform for unknown-but-small trip counts. * I think we can now remove the affine restriction if we wanted. Would need to think carefully about legality on this one. I don't have examples where any of these would be profitable in practice, but we'll see what comes up in the future.

preames · 2024-01-31T21:02:04Z

@nikic I've rebased, simplified the patch, and reworded the commit message based on your observation and test case. Sorry for the inactivity here, it feel off my mental todo list.

nikic

LGTM

As mentioned in llvm#74747, this case is triggering a particularly high cost trip count expansion.

This is a follow up to an item I noted in my submission comment for e947f95. I don't have a real world example where this is triggering unprofitably, but avoiding the transform when we estimate the loop to be short running from profiling seems quite reasonable. It's also now come up as a possibility in a regression twice in two days, so I'd like to get this in to close out the possibility if nothing else. The original review dropped the threshold for short trip count loops. I will return to that in a separate review if this lands.

Follow up to #74747 This change extends the previously added fixed expansion threshold by scaling down the cost allowed for an expansion for a loop with either a small known trip count or a profile which indicates the trip count is likely small. The goal here is to improve code generation for a loop nest where the outer loop has a high trip count, and the inner loop runs only a handful of iterations. --------- Co-authored-by: Nikita Popov <[email protected]>

) Follow up to llvm#74747 This change extends the previously added fixed expansion threshold by scaling down the cost allowed for an expansion for a loop with either a small known trip count or a profile which indicates the trip count is likely small. The goal here is to improve code generation for a loop nest where the outer loop has a high trip count, and the inner loop runs only a handful of iterations. --------- Co-authored-by: Nikita Popov <[email protected]>

preames requested review from eopXD and nikic December 7, 2023 18:55

llvmbot added the llvm:transforms label Dec 7, 2023

preames changed the title ~~[lsr][term-fold] Restrict expansion budget, particularly for profiled…~~ [lsr][term-fold] Restrict expansion budget, particularly for profiled loops Dec 7, 2023

preames changed the title ~~[lsr][term-fold] Restrict expansion budget, particularly for profiled loops~~ [lsr][term-fold] Restrict expansion budget for profiled loops Dec 7, 2023

preames added a commit that referenced this pull request Jan 31, 2024

[LSR] Add a test case mentioned in review

5282202

As mentioned in #74747, this case is triggering a particularly high cost trip count expansion.

preames added 2 commits January 31, 2024 12:46

[LSR] Use simplest constant threshold for the moment

f63c971

preames force-pushed the pr-lsr-term-fold-budget-restruction branch from 0aad007 to f63c971 Compare January 31, 2024 20:58

preames changed the title ~~[lsr][term-fold] Restrict expansion budget for profiled loops~~ [lsr][term-fold] Restrict transform to low cost expansions Jan 31, 2024

nikic approved these changes Jan 31, 2024

View reviewed changes

preames merged commit f264da4 into llvm:main Jan 31, 2024

preames deleted the pr-lsr-term-fold-budget-restruction branch January 31, 2024 22:48

preames mentioned this pull request Feb 1, 2024

[LSR][term-fold] Adjust expansion budget based on trip count #80304

Merged

carlosgalvezp pushed a commit to carlosgalvezp/llvm-project that referenced this pull request Feb 1, 2024

[LSR] Add a test case mentioned in review

a2639e3

As mentioned in llvm#74747, this case is triggering a particularly high cost trip count expansion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lsr][term-fold] Restrict transform to low cost expansions #74747

[lsr][term-fold] Restrict transform to low cost expansions #74747

Uh oh!

preames commented Dec 7, 2023 •

edited

Loading

Uh oh!

llvmbot commented Dec 7, 2023

Uh oh!

github-actions bot commented Dec 7, 2023 •

edited

Loading

Uh oh!

nikic commented Dec 8, 2023

Uh oh!

preames commented Dec 8, 2023

Uh oh!

nikic commented Dec 14, 2023

Uh oh!

preames commented Jan 31, 2024

Uh oh!

nikic left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[lsr][term-fold] Restrict transform to low cost expansions #74747

[lsr][term-fold] Restrict transform to low cost expansions #74747

Uh oh!

Conversation

preames commented Dec 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 7, 2023

Uh oh!

github-actions bot commented Dec 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic commented Dec 8, 2023

Uh oh!

preames commented Dec 8, 2023

Uh oh!

nikic commented Dec 14, 2023

Uh oh!

preames commented Jan 31, 2024

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

preames commented Dec 7, 2023 •

edited

Loading

github-actions bot commented Dec 7, 2023 •

edited

Loading