[SCEV] Memoize collected loop guards. NFCI #116947

lukel97 · 2024-11-20T10:00:55Z

#116187 aims to apply loop guards in more places when calculating the maximum backedge count, but in doing so introduces a compile-time regression.

This tries to compensate for it by memoizing the loop guards so that they're shared across all loop exits in computeBackedgeTakenCount, which some reductions in compile-time for stage2-O3: http://llvm-compile-time-tracker.com/compare.php?from=aff98e4be05a1060e489ce62a88ee0ff365e571a&to=a10aad8c6acfcb879f56ab9267f61d0b2c899939&stat=instructions%3Au

This tries to compensate for it by caching the collected loop guards, which gives a -0.07% geomean reduction for stage2-O3: https://llvm-compile-time-tracker.com/compare.php?from=aff98e4be05a1060e489ce62a88ee0ff365e571a&to=198a76db2c0b8fbda5374ffd195731a9d47469e3&stat=instructions:u LoopAccessAnalysis already had a LoopGuards cache for the innermost loop, so this hoists it up into ScalarEvolution.

llvmbot · 2024-11-20T10:01:34Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Luke Lau (lukel97)

Changes

This tries to compensate for it by caching the collected loop guards, which gives a -0.07% geomean reduction for stage2-O3: https://llvm-compile-time-tracker.com/compare.php?from=aff98e4be05a1060e489ce62a88ee0ff365e571a&to=198a76db2c0b8fbda5374ffd195731a9d47469e3&stat=instructions:u

LoopAccessAnalysis already had a LoopGuards cache for the innermost loop, so this hoists it up into ScalarEvolution.

Full diff: https://github.com/llvm/llvm-project/pull/116947.diff

4 Files Affected:

(modified) llvm/include/llvm/Analysis/LoopAccessAnalysis.h (-3)
(modified) llvm/include/llvm/Analysis/ScalarEvolution.h (+4-1)
(modified) llvm/lib/Analysis/LoopAccessAnalysis.cpp (+5-11)
(modified) llvm/lib/Analysis/ScalarEvolution.cpp (+13-11)

diff --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
index a35bc7402d1a89..872b68f924e654 100644
--- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
+++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
@@ -334,9 +334,6 @@ class MemoryDepChecker {
            std::pair<const SCEV *, const SCEV *>>
       PointerBounds;
 
-  /// Cache for the loop guards of InnermostLoop.
-  std::optional<ScalarEvolution::LoopGuards> LoopGuards;
-
   /// Check whether there is a plausible dependence between the two
   /// accesses.
   ///
diff --git a/llvm/include/llvm/Analysis/ScalarEvolution.h b/llvm/include/llvm/Analysis/ScalarEvolution.h
index 885c5985f9d23a..b7b9384c6e642c 100644
--- a/llvm/include/llvm/Analysis/ScalarEvolution.h
+++ b/llvm/include/llvm/Analysis/ScalarEvolution.h
@@ -1346,7 +1346,6 @@ class ScalarEvolution {
 
   /// Try to apply information from loop guards for \p L to \p Expr.
   const SCEV *applyLoopGuards(const SCEV *Expr, const Loop *L);
-  const SCEV *applyLoopGuards(const SCEV *Expr, const LoopGuards &Guards);
 
   /// Return true if the loop has no abnormal exits. That is, if the loop
   /// is not infinite, it must exit through an explicit edge in the CFG.
@@ -1651,6 +1650,10 @@ class ScalarEvolution {
   /// function as they are computed.
   DenseMap<const Loop *, BackedgeTakenInfo> PredicatedBackedgeTakenCounts;
 
+  /// Cache the collected loop guards of the loops of this function as they are
+  /// computed.
+  DenseMap<const Loop *, LoopGuards> LoopGuardsCache;
+
   /// Loops whose backedge taken counts directly use this non-constant SCEV.
   DenseMap<const SCEV *, SmallPtrSet<PointerIntPair<const Loop *, 1, bool>, 4>>
       BECountUsers;
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index 907bb7875dc807..9dfc3def3140af 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -1945,16 +1945,13 @@ MemoryDepChecker::getDependenceDistanceStrideAndSize(
         !isa<SCEVCouldNotCompute>(SrcEnd_) &&
         !isa<SCEVCouldNotCompute>(SinkStart_) &&
         !isa<SCEVCouldNotCompute>(SinkEnd_)) {
-      if (!LoopGuards)
-        LoopGuards.emplace(
-            ScalarEvolution::LoopGuards::collect(InnermostLoop, SE));
-      auto SrcEnd = SE.applyLoopGuards(SrcEnd_, *LoopGuards);
-      auto SinkStart = SE.applyLoopGuards(SinkStart_, *LoopGuards);
+      auto SrcEnd = SE.applyLoopGuards(SrcEnd_, InnermostLoop);
+      auto SinkStart = SE.applyLoopGuards(SinkStart_, InnermostLoop);
       if (SE.isKnownPredicate(CmpInst::ICMP_ULE, SrcEnd, SinkStart))
         return MemoryDepChecker::Dependence::NoDep;
 
-      auto SinkEnd = SE.applyLoopGuards(SinkEnd_, *LoopGuards);
-      auto SrcStart = SE.applyLoopGuards(SrcStart_, *LoopGuards);
+      auto SinkEnd = SE.applyLoopGuards(SinkEnd_, InnermostLoop);
+      auto SrcStart = SE.applyLoopGuards(SrcStart_, InnermostLoop);
       if (SE.isKnownPredicate(CmpInst::ICMP_ULE, SinkEnd, SrcStart))
         return MemoryDepChecker::Dependence::NoDep;
     }
@@ -2057,10 +2054,7 @@ MemoryDepChecker::isDependent(const MemAccessInfo &A, unsigned AIdx,
       return Dependence::NoDep;
     }
   } else {
-    if (!LoopGuards)
-      LoopGuards.emplace(
-          ScalarEvolution::LoopGuards::collect(InnermostLoop, SE));
-    Dist = SE.applyLoopGuards(Dist, *LoopGuards);
+    Dist = SE.applyLoopGuards(Dist, InnermostLoop);
   }
 
   // Negative distances are not plausible dependencies.
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 46b108606f6a62..70a45ef507e279 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -8417,6 +8417,7 @@ void ScalarEvolution::forgetAllLoops() {
   // result.
   BackedgeTakenCounts.clear();
   PredicatedBackedgeTakenCounts.clear();
+  LoopGuardsCache.clear();
   BECountUsers.clear();
   LoopPropertiesCache.clear();
   ConstantEvolutionLoopExitValue.clear();
@@ -10551,9 +10552,8 @@ ScalarEvolution::ExitLimit ScalarEvolution::howFarToZero(const SCEV *V,
   if (!isLoopInvariant(Step, L))
     return getCouldNotCompute();
 
-  LoopGuards Guards = LoopGuards::collect(L, *this);
   // Specialize step for this loop so we get context sensitive facts below.
-  const SCEV *StepWLG = applyLoopGuards(Step, Guards);
+  const SCEV *StepWLG = applyLoopGuards(Step, L);
 
   // For positive steps (counting up until unsigned overflow):
   //   N = -Start/Step (as unsigned)
@@ -10570,7 +10570,7 @@ ScalarEvolution::ExitLimit ScalarEvolution::howFarToZero(const SCEV *V,
   //   N = Distance (as unsigned)
   if (StepC &&
       (StepC->getValue()->isOne() || StepC->getValue()->isMinusOne())) {
-    APInt MaxBECount = getUnsignedRangeMax(applyLoopGuards(Distance, Guards));
+    APInt MaxBECount = getUnsignedRangeMax(applyLoopGuards(Distance, L));
     MaxBECount = APIntOps::umin(MaxBECount, getUnsignedRangeMax(Distance));
 
     // When a loop like "for (int i = 0; i != n; ++i) { /* body */ }" is rotated,
@@ -10611,7 +10611,7 @@ ScalarEvolution::ExitLimit ScalarEvolution::howFarToZero(const SCEV *V,
         getUDivExpr(Distance, CountDown ? getNegativeSCEV(Step) : Step);
     const SCEV *ConstantMax = getCouldNotCompute();
     if (Exact != getCouldNotCompute()) {
-      APInt MaxInt = getUnsignedRangeMax(applyLoopGuards(Exact, Guards));
+      APInt MaxInt = getUnsignedRangeMax(applyLoopGuards(Exact, L));
       ConstantMax =
           getConstant(APIntOps::umin(MaxInt, getUnsignedRangeMax(Exact)));
     }
@@ -10629,7 +10629,7 @@ ScalarEvolution::ExitLimit ScalarEvolution::howFarToZero(const SCEV *V,
 
   const SCEV *M = E;
   if (E != getCouldNotCompute()) {
-    APInt MaxWithGuards = getUnsignedRangeMax(applyLoopGuards(E, Guards));
+    APInt MaxWithGuards = getUnsignedRangeMax(applyLoopGuards(E, L));
     M = getConstant(APIntOps::umin(MaxWithGuards, getUnsignedRangeMax(E)));
   }
   auto *S = isa<SCEVCouldNotCompute>(E) ? M : E;
@@ -13674,6 +13674,7 @@ ScalarEvolution::~ScalarEvolution() {
   HasRecMap.clear();
   BackedgeTakenCounts.clear();
   PredicatedBackedgeTakenCounts.clear();
+  LoopGuardsCache.clear();
 
   assert(PendingLoopPredicates.empty() && "isImpliedCond garbage");
   assert(PendingPhiRanges.empty() && "getRangeRef garbage");
@@ -15889,10 +15890,11 @@ const SCEV *ScalarEvolution::LoopGuards::rewrite(const SCEV *Expr) const {
 }
 
 const SCEV *ScalarEvolution::applyLoopGuards(const SCEV *Expr, const Loop *L) {
-  return applyLoopGuards(Expr, LoopGuards::collect(L, *this));
-}
-
-const SCEV *ScalarEvolution::applyLoopGuards(const SCEV *Expr,
-                                             const LoopGuards &Guards) {
-  return Guards.rewrite(Expr);
+  auto Itr = LoopGuardsCache.find(L);
+  if (Itr == LoopGuardsCache.end()) {
+    LoopGuards Guard = LoopGuards::collect(L, *this);
+    LoopGuardsCache.insert({L, Guard});
+    return Guard.rewrite(Expr);
+  }
+  return Itr->second.rewrite(Expr);
 }

nikic

This is going to run up against our best friend: SCEV invalidation. I think invalidation for loop guards would be pretty tricky. It's easy to handle for loop invalidation, but we also need to invalidate if any of the SCEVs in the rewrite map are invalidated, and possibly also for certain control flow changes.

lukel97 · 2024-11-20T10:18:33Z

Ah right, I had a feeling this was a bit naïve. Would there be any issue if we instead tried to collect the loop guards once in computeBackedgeTakenCount?

nikic · 2024-11-20T10:20:19Z

Ah right, I had a feeling this was a bit naïve. Would there be any issue if we instead tried to collect the loop guards once in computeBackedgeTakenCount?

That should be fine.

nikic

How does compile-time for the new variant look like?

llvm/include/llvm/Analysis/ScalarEvolution.h

lukel97 · 2024-11-21T09:53:47Z

How does compile-time for the new variant look like?

Somehow even worse than the baseline: https://llvm-compile-time-tracker.com/compare.php?from=aff98e4be05a1060e489ce62a88ee0ff365e571a&to=7a0bf19b515a555f0b80c55e8d700dfcc8e88ff4&stat=instructions%3Au

I haven't really looked into why yet. Maybe there's an overhead with passing about std::function?

lukel97 · 2024-11-21T10:30:47Z

Now that the copies are gone the memoized variant seems to be slightly faster than the baseline for stage2-O3, -0.04%: http://llvm-compile-time-tracker.com/compare.php?from=aff98e4be05a1060e489ce62a88ee0ff365e571a&to=9c8dac0196c3199842606a208e19235bc6505bc2&stat=instructions%3Au

It's a bit of a shame we have to plumb the lambda all the way through here, but I guess we're already plumbing through ControlsOnlyExit and AllowPredicates. Open to any alternative suggestions though.

nikic · 2024-11-21T14:24:30Z

Now that the copies are gone the memoized variant seems to be slightly faster than the baseline for stage2-O3, -0.04%: http://llvm-compile-time-tracker.com/compare.php?from=aff98e4be05a1060e489ce62a88ee0ff365e571a&to=9c8dac0196c3199842606a208e19235bc6505bc2&stat=instructions%3Au

This is most likely just noise :(

It's a bit of a shame we have to plumb the lambda all the way through here, but I guess we're already plumbing through ControlsOnlyExit and AllowPredicates. Open to any alternative suggestions though.

We could consolidate all of those into a context struct.

lukel97 · 2024-11-21T16:42:36Z

This is most likely just noise :(

Indeed, I'm not sure if there's many gains to be had from just multi-exit loops. I guess #116187 is just ultimately going to add more compile time, but I don't have a particularly strong motivation for it yet. This was mostly just a follow up that fell out from #115705, so I'm happy to close these PRs for now and we can revisit them later if needed.

lukel97 requested a review from nikic as a code owner November 20, 2024 10:00

lukel97 requested review from david-arm and fhahn November 20, 2024 10:00

llvmbot added the llvm:analysis label Nov 20, 2024

nikic reviewed Nov 20, 2024

View reviewed changes

lukel97 marked this pull request as draft November 20, 2024 10:25

Rework to memoize loop guards across multiple exits

7a0bf19

nikic reviewed Nov 21, 2024

View reviewed changes

llvm/include/llvm/Analysis/ScalarEvolution.h Outdated Show resolved Hide resolved

Use function_ref, return const reference

9c8dac0

lukel97 changed the title ~~[SCEV] Cache collected loop guards. NFCI~~ [SCEV] Memoize collected loop guards. NFCI Nov 21, 2024

Use reference again and use in howManyLessThans

a10aad8

lukel97 marked this pull request as ready for review November 21, 2024 13:06

llvmbot added the llvm:transforms label Nov 21, 2024

lukel97 closed this Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SCEV] Memoize collected loop guards. NFCI #116947

[SCEV] Memoize collected loop guards. NFCI #116947

Uh oh!

lukel97 commented Nov 20, 2024 •

edited

Loading

Uh oh!

llvmbot commented Nov 20, 2024 •

edited

Loading

Uh oh!

nikic left a comment

Uh oh!

lukel97 commented Nov 20, 2024

Uh oh!

nikic commented Nov 20, 2024

Uh oh!

nikic left a comment

Uh oh!

Uh oh!

lukel97 commented Nov 21, 2024 •

edited

Loading

Uh oh!

lukel97 commented Nov 21, 2024

Uh oh!

nikic commented Nov 21, 2024

Uh oh!

lukel97 commented Nov 21, 2024

Uh oh!

Uh oh!

[SCEV] Memoize collected loop guards. NFCI #116947

[SCEV] Memoize collected loop guards. NFCI #116947

Uh oh!

Conversation

lukel97 commented Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Nov 20, 2024

Uh oh!

nikic commented Nov 20, 2024

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lukel97 commented Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 commented Nov 21, 2024

Uh oh!

nikic commented Nov 21, 2024

Uh oh!

lukel97 commented Nov 21, 2024

Uh oh!

Uh oh!

lukel97 commented Nov 20, 2024 •

edited

Loading

llvmbot commented Nov 20, 2024 •

edited

Loading

lukel97 commented Nov 21, 2024 •

edited

Loading