[LoopInterchange] Constrain LI within supported loop nest depth #118656

madhur13490 · 2024-12-04T15:33:53Z

This patch is an extension to #115128.

After profiling LLVM test-suite, I see a lot of loop nest of depth more than MaxLoopNestDepth which is 10. Early exit for them would save compile-time as it would avoid computing DependenceInfo and CacheCost.

llvmbot · 2024-12-04T15:34:33Z

@llvm/pr-subscribers-llvm-transforms

Author: Madhur Amilkanthwar (madhur13490)

Changes

This patch is an extension to #115128.

After profiling LLVM test-suite, I see a lot of loop nest of depth more than MaxLoopNestDepth which is 10. Early exit for them would save compile-time as it would avoid computing DependenceInfo and CacheCost.

Full diff: https://github.com/llvm/llvm-project/pull/118656.diff

3 Files Affected:

(modified) llvm/lib/Transforms/Scalar/LoopInterchange.cpp (+12-12)
(modified) llvm/test/Transforms/LoopInterchange/bail-out-one-loop.ll (+1-1)
(added) llvm/test/Transforms/LoopInterchange/deep-loop-nest.ll (+95)

diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index a0c0080c0bda1c..b5ebde300ebe6f 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -69,6 +69,9 @@ using CharMatrix = std::vector<std::vector<char>>;
 // Maximum number of dependencies that can be handled in the dependency matrix.
 static const unsigned MaxMemInstrCount = 100;
 
+// Minimum loop depth supported.
+static const unsigned MinLoopNestDepth = 2;
+
 // Maximum loop depth supported.
 static const unsigned MaxLoopNestDepth = 10;
 
@@ -239,10 +242,12 @@ static void populateWorklist(Loop &L, LoopVector &LoopList) {
   LoopList.push_back(CurrentLoop);
 }
 
-static bool hasMinimumLoopDepth(SmallVectorImpl<Loop *> &LoopList) {
+static bool hasSupportedLoopDepth(SmallVectorImpl<Loop *> &LoopList) {
   unsigned LoopNestDepth = LoopList.size();
-  if (LoopNestDepth < 2) {
-    LLVM_DEBUG(dbgs() << "Loop doesn't contain minimum nesting level.\n");
+  if (LoopNestDepth < MinLoopNestDepth || LoopNestDepth > MaxLoopNestDepth) {
+    LLVM_DEBUG(dbgs() << "Unsupported depth of loop nest " << LoopNestDepth
+                      << " should be [" << MinLoopNestDepth << ", "
+                      << MaxLoopNestDepth << "]\n");
     return false;
   }
   return true;
@@ -430,15 +435,10 @@ struct LoopInterchange {
   bool processLoopList(SmallVectorImpl<Loop *> &LoopList) {
     bool Changed = false;
 
-    // Ensure minimum loop nest depth.
-    assert(hasMinimumLoopDepth(LoopList) && "Loop nest does not meet minimum depth.");
+    // Ensure proper loop nest depth.
+    assert(hasSupportedLoopDepth(LoopList) && "Unsupported depth of loop nest.");
 
     unsigned LoopNestDepth = LoopList.size();
-    if (LoopNestDepth > MaxLoopNestDepth) {
-      LLVM_DEBUG(dbgs() << "Cannot handle loops of depth greater than "
-                        << MaxLoopNestDepth << "\n");
-      return false;
-    }
     if (!isComputableLoopNest(LoopList)) {
       LLVM_DEBUG(dbgs() << "Not valid loop candidate for interchange\n");
       return false;
@@ -1725,8 +1725,8 @@ PreservedAnalyses LoopInterchangePass::run(LoopNest &LN,
                                            LPMUpdater &U) {
   Function &F = *LN.getParent();
   SmallVector<Loop *, 8> LoopList(LN.getLoops());
-  // Ensure minimum depth of the loop nest to do the interchange.
-  if (!hasMinimumLoopDepth(LoopList))
+  // Ensure proper depth of the loop nest to do the interchange.
+  if (!hasSupportedLoopDepth(LoopList))
     return PreservedAnalyses::all();
 
   DependenceInfo DI(&F, &AR.AA, &AR.SE, &AR.LI);
diff --git a/llvm/test/Transforms/LoopInterchange/bail-out-one-loop.ll b/llvm/test/Transforms/LoopInterchange/bail-out-one-loop.ll
index 788e1b0157d80f..477b37937747fc 100644
--- a/llvm/test/Transforms/LoopInterchange/bail-out-one-loop.ll
+++ b/llvm/test/Transforms/LoopInterchange/bail-out-one-loop.ll
@@ -15,7 +15,7 @@ target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i6
 ; CHECK-NOT: Delinearizing
 ; CHECK-NOT: Strides:
 ; CHECK-NOT: Terms:
-; CHECK: Loop doesn't contain minimum nesting level.
+; CHECK: Unsupported depth of loop nest 1 should be [2, 10]
 
 define void @foo() {
 entry:
diff --git a/llvm/test/Transforms/LoopInterchange/deep-loop-nest.ll b/llvm/test/Transforms/LoopInterchange/deep-loop-nest.ll
new file mode 100644
index 00000000000000..348c1ab74b7588
--- /dev/null
+++ b/llvm/test/Transforms/LoopInterchange/deep-loop-nest.ll
@@ -0,0 +1,95 @@
+; REQUIRES: asserts
+; RUN: opt < %s -passes=loop-interchange -debug -disable-output 2>&1| FileCheck %s
+
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+
+; For deep loop nest, delinearization should not be run.
+
+; CHECK-NOT: Delinearizing
+; CHECK-NOT: Strides:
+; CHECK-NOT: Terms:
+; CHECK: Unsupported depth of loop nest 11 should be [2, 10]
+define void @big_loop_nest() {
+entry:
+  br label %for1.header
+
+for1.header:
+  %j = phi i64 [ 0, %entry ], [ %j.next, %for1.inc ]
+  br label %for2.header
+for2.header:
+  %k = phi i64 [ 0, %for1.header ], [ %k.next, %for2.inc ]
+  br label %for3.header
+for3.header:
+  %l = phi i64 [ 0, %for2.header ], [ %l.next, %for3.inc ]
+  br label %for4.header
+for4.header:
+  %m = phi i64 [ 0, %for3.header ], [ %m.next, %for4.inc ]
+  br label %for5.header
+for5.header:
+  %n = phi i64 [ 0, %for4.header ], [ %n.next, %for5.inc ]
+  br label %for6.header
+for6.header:
+  %o = phi i64 [ 0, %for5.header ], [ %o.next, %for6.inc ]
+  br label %for7.header
+for7.header:
+  %p = phi i64 [ 0, %for6.header ], [ %p.next, %for7.inc ]
+  br label %for8.header
+for8.header:
+  %q = phi i64 [ 0, %for7.header ], [ %q.next, %for8.inc ]
+  br label %for9.header
+for9.header:
+  %r = phi i64 [ 0, %for8.header ], [ %r.next, %for9.inc ]
+  br label %for10.header
+for10.header:
+  %s = phi i64 [ 0, %for9.header ], [ %s.next, %for10.inc ]
+  br label %for11
+for11:
+  %t = phi i64 [ %t.next, %for11 ], [ 0, %for10.header ]
+  %t.next = add nuw nsw i64 %t, 1
+  %exitcond = icmp eq i64 %t.next, 99
+  br i1 %exitcond, label %for1.inc, label %for11
+
+for1.inc:
+  %j.next = add nuw nsw i64 %j, 1
+  %exitcond26 = icmp eq i64 %j.next, 99
+  br i1 %exitcond26, label %for2.inc, label %for1.header
+for2.inc:
+  %k.next = add nuw nsw i64 %k, 1
+  %exitcond27 = icmp eq i64 %j.next, 99
+  br i1 %exitcond27, label %for3.inc, label %for2.header
+for3.inc:
+  %l.next = add nuw nsw i64 %l, 1
+  %exitcond28 = icmp eq i64 %l.next, 99
+  br i1 %exitcond28, label %for4.inc, label %for3.header
+for4.inc:
+  %m.next = add nuw nsw i64 %m, 1
+  %exitcond29 = icmp eq i64 %m.next, 99
+  br i1 %exitcond29, label %for5.inc, label %for4.header
+for5.inc:
+  %n.next = add nuw nsw i64 %n, 1
+  %exitcond30 = icmp eq i64 %n.next, 99
+  br i1 %exitcond30, label %for6.inc, label %for5.header
+for6.inc:
+  %o.next = add nuw nsw i64 %o, 1
+  %exitcond31 = icmp eq i64 %o.next, 99
+  br i1 %exitcond31, label %for7.inc, label %for6.header
+for7.inc:
+  %p.next = add nuw nsw i64 %p, 1
+  %exitcond32 = icmp eq i64 %p.next, 99
+  br i1 %exitcond32, label %for8.inc, label %for7.header
+for8.inc:
+  %q.next = add nuw nsw i64 %q, 1
+  %exitcond33 = icmp eq i64 %q.next, 99
+  br i1 %exitcond33, label %for9.inc, label %for8.header
+for9.inc:
+  %r.next = add nuw nsw i64 %r, 1
+  %exitcond34 = icmp eq i64 %q.next, 99
+  br i1 %exitcond34, label %for10.inc, label %for9.header
+for10.inc:
+  %s.next = add nuw nsw i64 %s, 1
+  %exitcond35 = icmp eq i64 %s.next, 99
+  br i1 %exitcond35, label %for.end, label %for10.header
+
+for.end:
+  ret void
+}

github-actions · 2024-12-04T15:38:55Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff e1c1e74a6fd71dd889155100d4c0f5e3284f7a22 b6d834518823fdfc6cf0bee464db63c09cc91568 --extensions cpp -- llvm/lib/Transforms/Scalar/LoopInterchange.cpp

View the diff from clang-format here.

diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index d366e749c7..0f21fd2c57 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -1758,7 +1758,7 @@ PreservedAnalyses LoopInterchangePass::run(LoopNest &LN,
   DependenceInfo DI(&F, &AR.AA, &AR.SE, &AR.LI);
   std::unique_ptr<CacheCost> CC =
       CacheCost::getCacheCost(LN.getOutermostLoop(), AR, DI);
-  
+
   if (!LoopInterchange(&AR.SE, &AR.LI, &DI, &AR.DT, CC, &ORE).run(LN))
     return PreservedAnalyses::all();
   U.markLoopNestChanged(true);

sjoerdmeijer · 2024-12-06T14:56:30Z

llvm/test/Transforms/LoopInterchange/deep-loop-nest.ll

@@ -0,0 +1,95 @@
+; REQUIRES: asserts


I would prefer not to rely on asserts builds, that would increase our test coverage.

Can you instead rely on e.g. the optimisation remarks for which we don't need asserts?

Sure but in this case, I will have to duplicate the file for without asserts builds. I don't know if we can conditionally execute RUN line based on build type.

I don't see why you need to duplicate the file? I think you only need to check optimization remarks instead of debug output as what you are doing now, e.g., llvm/test/Transforms/LoopInterchange/loop-interchange-optimization-remarks.ll.

Well, I want to ensure DependencyAnalysis is not run and I can do that only when -debug is provided. Specifically, the below lines are of my interest:

; CHECK-NOT: Delinearizing ; CHECK-NOT: Strides: ; CHECK-NOT: Terms: ; CHECK: Unsupported depth of loop nest 11, the supported range is [2, 10].

sjoerdmeijer · 2024-12-06T15:43:40Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

-  if (LoopNestDepth < 2) {
-    LLVM_DEBUG(dbgs() << "Loop doesn't contain minimum nesting level.\n");
+  if (LoopNestDepth < MinLoopNestDepth || LoopNestDepth > MaxLoopNestDepth) {
+    LLVM_DEBUG(dbgs() << "Unsupported depth of loop nest " << LoopNestDepth


Nit, maybe small clarification of the message:

"Unsupported loop nest depth of " << LoopNestDepth << ", the supported range is ..."

CongzheUalberta · 2024-12-12T17:01:51Z

This patch is an extension to #115128.

After profiling LLVM test-suite, I see a lot of loop nest of depth more than MaxLoopNestDepth which is 10. Early exit for them would save compile-time as it would avoid computing DependenceInfo and CacheCost.

Would be good to know how much compile time reduction for LLVM test-suite you could achieve with this patch?

CongzheUalberta · 2024-12-12T17:04:30Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

  unsigned LoopNestDepth = LoopList.size();
-  if (LoopNestDepth < 2) {
-    LLVM_DEBUG(dbgs() << "Loop doesn't contain minimum nesting level.\n");
+  if (LoopNestDepth < MinLoopNestDepth || LoopNestDepth > MaxLoopNestDepth) {


I would prefer to make MaxLoopNestDepth and MinLoopNestDepth to be tunable parameters rather than hardcoded numbers, so we could control with better flexibility.

CongzheUalberta · 2024-12-12T17:11:42Z

llvm/test/Transforms/LoopInterchange/deep-loop-nest.ll

@@ -0,0 +1,95 @@
+; REQUIRES: asserts


I don't see why you need to duplicate the file? I think you only need to check optimization remarks instead of debug output as what you are doing now, e.g., llvm/test/Transforms/LoopInterchange/loop-interchange-optimization-remarks.ll.

madhur13490 · 2025-01-21T08:12:58Z

llvm/lib/Passes/PassBuilderPipelines.cpp

@@ -201,7 +201,7 @@ static cl::opt<bool> RunNewGVN("enable-newgvn", cl::init(false), cl::Hidden,
                               cl::desc("Run the NewGVN pass"));

 static cl::opt<bool> EnableLoopInterchange(
-    "enable-loopinterchange", cl::init(false), cl::Hidden,
+    "enable-loopinterchange", cl::init(true), cl::Hidden,


This is for experimentation purposes only.

madhur13490 · 2025-01-21T08:15:25Z

This patch is an extension to #115128.
After profiling LLVM test-suite, I see a lot of loop nest of depth more than MaxLoopNestDepth which is 10. Early exit for them would save compile-time as it would avoid computing DependenceInfo and CacheCost.

Would be good to know how much compile time reduction for LLVM test-suite you could achieve with this patch?

Sure, let me rerun again with compile-time-tracker.

Meinersbur

Sure, let me rerun again with compile-time-tracker.

If you want to check the influence of the max-depth bound with enabled loop-interchange, you would need it based on the commit that has if enabled.

I don't think any of the sources from test-suite used by by the compile-time-tracker has loops that deep. I don't think it matters anyway, being based on bubble-sort already is a convinving argument that an upper bound it needed.

Not relying on -debug would be nice, but still common practice that I don't want to suddenly enforce.

LGTM. Be sure to remove cl::init(true) before pushing¹.

Could have used a different branch for experimentation, bound-max-depth-2 is identical, enable-li-2 is based on a different commit and doesn't show up as direct comparison in llvm-compile-tracker. ↩

Meinersbur · 2025-01-21T12:36:30Z

llvm/test/Transforms/LoopInterchange/deep-loop-nest.ll

+; CHECK-NOT: Delinearizing
+; CHECK-NOT: Strides:
+; CHECK-NOT: Terms:


-NOT checks are fragile. For instance, it will not match any of those after the "Unsupported depth of ..." line. If there is not such line, then this test will already fail because of that.

Consider adding additional dbg() << ... output somewhere with the reason that it has been skipped.

sjoerdmeijer · 2025-01-21T13:50:14Z

Looks good to me too, but I would really prefer not to introduce tests that rely on Debug when not needed.

The way I look at: with deep-loop-nest.ll we test the loop depth. With the default value of 10, it shouldn't trigger, but we may want another RUN line with -loop-interchange-max-loop-nest-depth=11 where it does trigger, so we have a negative and positive test, and we can fully rely on optimisation remarks for that, no need for Debug.

Testing when DependencyAnalysis is or isn't running, is slightly orthogonal? Can this be tested separated with a different test that then depends on Debug?

But anyway, this is my preference, am also not blocking this.

This patch is an extension to llvm#115128. After profiling LLVM test-suite, I see a lot of loop nest of depth more than `MaxLoopNestDepth` which is 10. Early exit for them would save compile-time as it would avoid computing DependenceInfo and CacheCost.

madhur13490 · 2025-01-21T17:14:27Z

Thanks @Meinersbur and @sjoerdmeijer for the comments.
I have removed the dependence analysis check and thus -debug requirement.

As @Meinersbur suggested, I reran on compile-time-tracker and PFA screenshot - the benefits are clearly visible for the branch perf/max-depth-exp. The first commit on this branch enables the loop interchange and the second commit applied this patch.

I don't think any of the sources from test-suite used by by the compile-time-tracker has loops that deep. I don't think it matters anyway, based on bubble-sort already is a convinving argument that an upper bound it needed.

I don't know if compile-time-tracker covers everything that is present in LLVM Test-suite but when I ran locally here is a quick stat.
I dumped depth of loop nest (using LoopList.size()) for each file in the LLVM test suite at the beginning of run(). When the transformation is enabled, run() fucntion is invoked 139323 times on the entire LVM test suite and there are 238 loops whose depth is more than 10. 238/139323 seems much less, but there are indeed loops with huge depth in the test suite. Max Loop nest depth is 122!

Thanks for the approval. I will land the patch soon giving some more time in case anyone else wants to comment.

Meinersbur · 2025-01-21T18:37:34Z

I don't know if compile-time-tracker covers everything that is present in LLVM Test-suite

It does not, it only runs kimwitu++, sqlite3, consumer-typeset, Bullet, tramp3d-v4, mafft, ClamAV, lencod, SPASS, 7zip, geomean. You can see the detailed results here: http://llvm-compile-time-tracker.com/compare.php?from=a591801b74bc59e815f9d8c35af8c8ee2582dce3&to=e96b7568c7f5bb3b5a1af0d5c13c56dee745874b&stat=instructions%3Au&details=on

It seems indeed there is a measurable difference for some translation units.

…#118656) This patch is an extension to llvm#115128. After profiling LLVM test-suite, I see a lot of loop nest of depth more than `MaxLoopNestDepth` which is 10. Early exit for them would save compile-time as it would avoid computing DependenceInfo and CacheCost. Please see 'bound-max-depth' branch on compile-time-tracker.

madhur13490 requested review from sebpop, CongzheUalberta and sjoerdmeijer December 4, 2024 15:33

llvmbot added the llvm:transforms label Dec 4, 2024

madhur13490 requested a review from Meinersbur December 4, 2024 16:53

sjoerdmeijer reviewed Dec 6, 2024

View reviewed changes

CongzheUalberta reviewed Dec 12, 2024

View reviewed changes

madhur13490 force-pushed the perf/bound-max-depth branch 2 times, most recently from bff108c to 30807f0 Compare January 21, 2025 08:11

madhur13490 commented Jan 21, 2025

View reviewed changes

Meinersbur approved these changes Jan 21, 2025

View reviewed changes

madhur13490 added 2 commits January 21, 2025 08:54

Address comments

b6d8345

madhur13490 force-pushed the perf/bound-max-depth branch from 30807f0 to b6d8345 Compare January 21, 2025 17:05

madhur13490 merged commit d15f3e8 into llvm:main Jan 23, 2025
7 of 8 checks passed

This was referenced Jan 28, 2025

[LoopInterchange] Hoist isComputableLoopNest() in the control flow #124247

Merged

[LoopInterchange] Enable it by default #124911

Open

[LoopInterchange] Constrain LI within supported loop nest depth #118656

[LoopInterchange] Constrain LI within supported loop nest depth #118656

Uh oh!

Conversation

madhur13490 commented Dec 4, 2024

Uh oh!

llvmbot commented Dec 4, 2024

Uh oh!

github-actions bot commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CongzheUalberta commented Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

madhur13490 commented Jan 21, 2025

Uh oh!

Meinersbur left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Footnotes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sjoerdmeijer commented Jan 21, 2025

Uh oh!

madhur13490 commented Jan 21, 2025

Uh oh!

Meinersbur commented Jan 21, 2025

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 4, 2024 •

edited

Loading

CongzheUalberta commented Dec 12, 2024 •

edited

Loading

Meinersbur left a comment •

edited

Loading