-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[LoopInterchange] Constrain LI within supported loop nest depth #118656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Madhur Amilkanthwar (madhur13490) ChangesThis patch is an extension to #115128. After profiling LLVM test-suite, I see a lot of loop nest of depth more than Full diff: https://github.com/llvm/llvm-project/pull/118656.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index a0c0080c0bda1c..b5ebde300ebe6f 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -69,6 +69,9 @@ using CharMatrix = std::vector<std::vector<char>>;
// Maximum number of dependencies that can be handled in the dependency matrix.
static const unsigned MaxMemInstrCount = 100;
+// Minimum loop depth supported.
+static const unsigned MinLoopNestDepth = 2;
+
// Maximum loop depth supported.
static const unsigned MaxLoopNestDepth = 10;
@@ -239,10 +242,12 @@ static void populateWorklist(Loop &L, LoopVector &LoopList) {
LoopList.push_back(CurrentLoop);
}
-static bool hasMinimumLoopDepth(SmallVectorImpl<Loop *> &LoopList) {
+static bool hasSupportedLoopDepth(SmallVectorImpl<Loop *> &LoopList) {
unsigned LoopNestDepth = LoopList.size();
- if (LoopNestDepth < 2) {
- LLVM_DEBUG(dbgs() << "Loop doesn't contain minimum nesting level.\n");
+ if (LoopNestDepth < MinLoopNestDepth || LoopNestDepth > MaxLoopNestDepth) {
+ LLVM_DEBUG(dbgs() << "Unsupported depth of loop nest " << LoopNestDepth
+ << " should be [" << MinLoopNestDepth << ", "
+ << MaxLoopNestDepth << "]\n");
return false;
}
return true;
@@ -430,15 +435,10 @@ struct LoopInterchange {
bool processLoopList(SmallVectorImpl<Loop *> &LoopList) {
bool Changed = false;
- // Ensure minimum loop nest depth.
- assert(hasMinimumLoopDepth(LoopList) && "Loop nest does not meet minimum depth.");
+ // Ensure proper loop nest depth.
+ assert(hasSupportedLoopDepth(LoopList) && "Unsupported depth of loop nest.");
unsigned LoopNestDepth = LoopList.size();
- if (LoopNestDepth > MaxLoopNestDepth) {
- LLVM_DEBUG(dbgs() << "Cannot handle loops of depth greater than "
- << MaxLoopNestDepth << "\n");
- return false;
- }
if (!isComputableLoopNest(LoopList)) {
LLVM_DEBUG(dbgs() << "Not valid loop candidate for interchange\n");
return false;
@@ -1725,8 +1725,8 @@ PreservedAnalyses LoopInterchangePass::run(LoopNest &LN,
LPMUpdater &U) {
Function &F = *LN.getParent();
SmallVector<Loop *, 8> LoopList(LN.getLoops());
- // Ensure minimum depth of the loop nest to do the interchange.
- if (!hasMinimumLoopDepth(LoopList))
+ // Ensure proper depth of the loop nest to do the interchange.
+ if (!hasSupportedLoopDepth(LoopList))
return PreservedAnalyses::all();
DependenceInfo DI(&F, &AR.AA, &AR.SE, &AR.LI);
diff --git a/llvm/test/Transforms/LoopInterchange/bail-out-one-loop.ll b/llvm/test/Transforms/LoopInterchange/bail-out-one-loop.ll
index 788e1b0157d80f..477b37937747fc 100644
--- a/llvm/test/Transforms/LoopInterchange/bail-out-one-loop.ll
+++ b/llvm/test/Transforms/LoopInterchange/bail-out-one-loop.ll
@@ -15,7 +15,7 @@ target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i6
; CHECK-NOT: Delinearizing
; CHECK-NOT: Strides:
; CHECK-NOT: Terms:
-; CHECK: Loop doesn't contain minimum nesting level.
+; CHECK: Unsupported depth of loop nest 1 should be [2, 10]
define void @foo() {
entry:
diff --git a/llvm/test/Transforms/LoopInterchange/deep-loop-nest.ll b/llvm/test/Transforms/LoopInterchange/deep-loop-nest.ll
new file mode 100644
index 00000000000000..348c1ab74b7588
--- /dev/null
+++ b/llvm/test/Transforms/LoopInterchange/deep-loop-nest.ll
@@ -0,0 +1,95 @@
+; REQUIRES: asserts
+; RUN: opt < %s -passes=loop-interchange -debug -disable-output 2>&1| FileCheck %s
+
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+
+; For deep loop nest, delinearization should not be run.
+
+; CHECK-NOT: Delinearizing
+; CHECK-NOT: Strides:
+; CHECK-NOT: Terms:
+; CHECK: Unsupported depth of loop nest 11 should be [2, 10]
+define void @big_loop_nest() {
+entry:
+ br label %for1.header
+
+for1.header:
+ %j = phi i64 [ 0, %entry ], [ %j.next, %for1.inc ]
+ br label %for2.header
+for2.header:
+ %k = phi i64 [ 0, %for1.header ], [ %k.next, %for2.inc ]
+ br label %for3.header
+for3.header:
+ %l = phi i64 [ 0, %for2.header ], [ %l.next, %for3.inc ]
+ br label %for4.header
+for4.header:
+ %m = phi i64 [ 0, %for3.header ], [ %m.next, %for4.inc ]
+ br label %for5.header
+for5.header:
+ %n = phi i64 [ 0, %for4.header ], [ %n.next, %for5.inc ]
+ br label %for6.header
+for6.header:
+ %o = phi i64 [ 0, %for5.header ], [ %o.next, %for6.inc ]
+ br label %for7.header
+for7.header:
+ %p = phi i64 [ 0, %for6.header ], [ %p.next, %for7.inc ]
+ br label %for8.header
+for8.header:
+ %q = phi i64 [ 0, %for7.header ], [ %q.next, %for8.inc ]
+ br label %for9.header
+for9.header:
+ %r = phi i64 [ 0, %for8.header ], [ %r.next, %for9.inc ]
+ br label %for10.header
+for10.header:
+ %s = phi i64 [ 0, %for9.header ], [ %s.next, %for10.inc ]
+ br label %for11
+for11:
+ %t = phi i64 [ %t.next, %for11 ], [ 0, %for10.header ]
+ %t.next = add nuw nsw i64 %t, 1
+ %exitcond = icmp eq i64 %t.next, 99
+ br i1 %exitcond, label %for1.inc, label %for11
+
+for1.inc:
+ %j.next = add nuw nsw i64 %j, 1
+ %exitcond26 = icmp eq i64 %j.next, 99
+ br i1 %exitcond26, label %for2.inc, label %for1.header
+for2.inc:
+ %k.next = add nuw nsw i64 %k, 1
+ %exitcond27 = icmp eq i64 %j.next, 99
+ br i1 %exitcond27, label %for3.inc, label %for2.header
+for3.inc:
+ %l.next = add nuw nsw i64 %l, 1
+ %exitcond28 = icmp eq i64 %l.next, 99
+ br i1 %exitcond28, label %for4.inc, label %for3.header
+for4.inc:
+ %m.next = add nuw nsw i64 %m, 1
+ %exitcond29 = icmp eq i64 %m.next, 99
+ br i1 %exitcond29, label %for5.inc, label %for4.header
+for5.inc:
+ %n.next = add nuw nsw i64 %n, 1
+ %exitcond30 = icmp eq i64 %n.next, 99
+ br i1 %exitcond30, label %for6.inc, label %for5.header
+for6.inc:
+ %o.next = add nuw nsw i64 %o, 1
+ %exitcond31 = icmp eq i64 %o.next, 99
+ br i1 %exitcond31, label %for7.inc, label %for6.header
+for7.inc:
+ %p.next = add nuw nsw i64 %p, 1
+ %exitcond32 = icmp eq i64 %p.next, 99
+ br i1 %exitcond32, label %for8.inc, label %for7.header
+for8.inc:
+ %q.next = add nuw nsw i64 %q, 1
+ %exitcond33 = icmp eq i64 %q.next, 99
+ br i1 %exitcond33, label %for9.inc, label %for8.header
+for9.inc:
+ %r.next = add nuw nsw i64 %r, 1
+ %exitcond34 = icmp eq i64 %q.next, 99
+ br i1 %exitcond34, label %for10.inc, label %for9.header
+for10.inc:
+ %s.next = add nuw nsw i64 %s, 1
+ %exitcond35 = icmp eq i64 %s.next, 99
+ br i1 %exitcond35, label %for.end, label %for10.header
+
+for.end:
+ ret void
+}
|
You can test this locally with the following command:git-clang-format --diff e1c1e74a6fd71dd889155100d4c0f5e3284f7a22 b6d834518823fdfc6cf0bee464db63c09cc91568 --extensions cpp -- llvm/lib/Transforms/Scalar/LoopInterchange.cpp View the diff from clang-format here.diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index d366e749c7..0f21fd2c57 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -1758,7 +1758,7 @@ PreservedAnalyses LoopInterchangePass::run(LoopNest &LN,
DependenceInfo DI(&F, &AR.AA, &AR.SE, &AR.LI);
std::unique_ptr<CacheCost> CC =
CacheCost::getCacheCost(LN.getOutermostLoop(), AR, DI);
-
+
if (!LoopInterchange(&AR.SE, &AR.LI, &DI, &AR.DT, CC, &ORE).run(LN))
return PreservedAnalyses::all();
U.markLoopNestChanged(true);
|
@@ -0,0 +1,95 @@ | |||
; REQUIRES: asserts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not to rely on asserts builds, that would increase our test coverage.
Can you instead rely on e.g. the optimisation remarks for which we don't need asserts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure but in this case, I will have to duplicate the file for without asserts builds. I don't know if we can conditionally execute RUN line based on build type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why you need to duplicate the file? I think you only need to check optimization remarks instead of debug output as what you are doing now, e.g., llvm/test/Transforms/LoopInterchange/loop-interchange-optimization-remarks.ll
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I want to ensure DependencyAnalysis is not run and I can do that only when -debug is provided. Specifically, the below lines are of my interest:
; CHECK-NOT: Delinearizing
; CHECK-NOT: Strides:
; CHECK-NOT: Terms:
; CHECK: Unsupported depth of loop nest 11, the supported range is [2, 10].
if (LoopNestDepth < 2) { | ||
LLVM_DEBUG(dbgs() << "Loop doesn't contain minimum nesting level.\n"); | ||
if (LoopNestDepth < MinLoopNestDepth || LoopNestDepth > MaxLoopNestDepth) { | ||
LLVM_DEBUG(dbgs() << "Unsupported depth of loop nest " << LoopNestDepth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, maybe small clarification of the message:
"Unsupported loop nest depth of " << LoopNestDepth << ", the supported range is ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Would be good to know how much compile time reduction for LLVM test-suite you could achieve with this patch? |
unsigned LoopNestDepth = LoopList.size(); | ||
if (LoopNestDepth < 2) { | ||
LLVM_DEBUG(dbgs() << "Loop doesn't contain minimum nesting level.\n"); | ||
if (LoopNestDepth < MinLoopNestDepth || LoopNestDepth > MaxLoopNestDepth) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to make MaxLoopNestDepth
and MinLoopNestDepth
to be tunable parameters rather than hardcoded numbers, so we could control with better flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -0,0 +1,95 @@ | |||
; REQUIRES: asserts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why you need to duplicate the file? I think you only need to check optimization remarks instead of debug output as what you are doing now, e.g., llvm/test/Transforms/LoopInterchange/loop-interchange-optimization-remarks.ll
.
bff108c
to
30807f0
Compare
@@ -201,7 +201,7 @@ static cl::opt<bool> RunNewGVN("enable-newgvn", cl::init(false), cl::Hidden, | |||
cl::desc("Run the NewGVN pass")); | |||
|
|||
static cl::opt<bool> EnableLoopInterchange( | |||
"enable-loopinterchange", cl::init(false), cl::Hidden, | |||
"enable-loopinterchange", cl::init(true), cl::Hidden, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for experimentation purposes only.
Sure, let me rerun again with compile-time-tracker. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me rerun again with compile-time-tracker.
If you want to check the influence of the max-depth bound with enabled loop-interchange, you would need it based on the commit that has if enabled.
I don't think any of the sources from test-suite used by by the compile-time-tracker has loops that deep. I don't think it matters anyway, being based on bubble-sort already is a convinving argument that an upper bound it needed.
Not relying on -debug
would be nice, but still common practice that I don't want to suddenly enforce.
LGTM. Be sure to remove cl::init(true)
before pushing1.
Footnotes
-
Could have used a different branch for experimentation,
bound-max-depth-2
is identical,enable-li-2
is based on a different commit and doesn't show up as direct comparison in llvm-compile-tracker. ↩
; CHECK-NOT: Delinearizing | ||
; CHECK-NOT: Strides: | ||
; CHECK-NOT: Terms: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-NOT
checks are fragile. For instance, it will not match any of those after the "Unsupported depth of ..." line. If there is not such line, then this test will already fail because of that.
Consider adding additional dbg() << ...
output somewhere with the reason that it has been skipped.
Looks good to me too, but I would really prefer not to introduce tests that rely on Debug when not needed. The way I look at: with Testing when DependencyAnalysis is or isn't running, is slightly orthogonal? Can this be tested separated with a different test that then depends on Debug? But anyway, this is my preference, am also not blocking this. |
This patch is an extension to llvm#115128. After profiling LLVM test-suite, I see a lot of loop nest of depth more than `MaxLoopNestDepth` which is 10. Early exit for them would save compile-time as it would avoid computing DependenceInfo and CacheCost.
30807f0
to
b6d8345
Compare
Thanks @Meinersbur and @sjoerdmeijer for the comments. As @Meinersbur suggested, I reran on compile-time-tracker and PFA screenshot - the benefits are clearly visible for the branch ![]()
I don't know if compile-time-tracker covers everything that is present in LLVM Test-suite but when I ran locally here is a quick stat. Thanks for the approval. I will land the patch soon giving some more time in case anyone else wants to comment. |
It does not, it only runs kimwitu++, sqlite3, consumer-typeset, Bullet, tramp3d-v4, mafft, ClamAV, lencod, SPASS, 7zip, geomean. You can see the detailed results here: http://llvm-compile-time-tracker.com/compare.php?from=a591801b74bc59e815f9d8c35af8c8ee2582dce3&to=e96b7568c7f5bb3b5a1af0d5c13c56dee745874b&stat=instructions%3Au&details=on It seems indeed there is a measurable difference for some translation units. |
This patch is an extension to #115128.
After profiling LLVM test-suite, I see a lot of loop nest of depth more than
MaxLoopNestDepth
which is 10. Early exit for them would save compile-time as it would avoid computing DependenceInfo and CacheCost.