Skip to content

[SimpleLoopUnswitch] Fix exponential unswitch #66883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 21, 2023

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Sep 20, 2023

When unswitching via invariant condition injection, we currently mark the condition in the old loop, so that it does not get unswitched again. However, if there are multiple branches for which conditions can be injected, then we can do that for both the old and new loops. This means that the number of unswitches increases exponentially.

Change the handling to be more similar to partial unswitching, where we instead mark the whole loop, rather than a single condition. This means that we will only generate a linear number of loops.

TBH I think even that is still highly undesirable, and we should probably be unswitching all candidates at the same time, so that we end up with only two loops. But at least this mitigates the worst case.

The test case is a reduced variant that generates 1700 lines of IR without this patch and 290 with it.

Fixes #66868.

@llvmbot
Copy link
Member

llvmbot commented Sep 20, 2023

@llvm/pr-subscribers-llvm-transforms

Changes

When unswitching via invariant condition injection, we currently mark the condition in the old loop, so that it does not get unswitched again. However, if there are multiple branches for which conditions can be injected, then we can do that for both the old and new loops. This means that the number of unswitches increases exponentially.

Change the handling to be more similar to partial unswitching, where we instead mark the whole loop, rather than a single condition. This means that we will only generate a linear number of loops.

TBH I think even that is still highly undesirable, and we should probably be unswitching all candidates at the same time, so that we end up with only two loops. But at least this mitigates the worst case.

The test case is a reduced variant that generates 1700 lines of IR without this patch and 290 with it.

Fixes #66868.


Patch is 35.43 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/66883.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp (+32-25)
  • (added) llvm/test/Transforms/SimpleLoopUnswitch/inject-invariant-conditions-exponential.ll (+260)
  • (modified) llvm/test/Transforms/SimpleLoopUnswitch/inject-invariant-conditions.ll (+35-20)
diff --git a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
index cad026ffbbd6b74..8ac974bdfecf84a 100644
--- a/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
+++ b/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
@@ -2127,9 +2127,10 @@ static void unswitchNontrivialInvariants(
     Loop &L, Instruction &TI, ArrayRef<Value *> Invariants,
     IVConditionInfo &PartialIVInfo, DominatorTree &DT, LoopInfo &LI,
     AssumptionCache &AC,
-    function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB,
+    function_ref<void(bool, bool, bool, ArrayRef<Loop *>)> UnswitchCB,
     ScalarEvolution *SE, MemorySSAUpdater *MSSAU,
-    function_ref<void(Loop &, StringRef)> DestroyLoopCB, bool InsertFreeze) {
+    function_ref<void(Loop &, StringRef)> DestroyLoopCB, bool InsertFreeze,
+    bool InjectedCondition) {
   auto *ParentBB = TI.getParent();
   BranchInst *BI = dyn_cast<BranchInst>(&TI);
   SwitchInst *SI = BI ? nullptr : cast<SwitchInst>(&TI);
@@ -2582,7 +2583,7 @@ static void unswitchNontrivialInvariants(
   for (Loop *UpdatedL : llvm::concat<Loop *>(NonChildClonedLoops, HoistedLoops))
     if (UpdatedL->getParentLoop() == ParentL)
       SibLoops.push_back(UpdatedL);
-  UnswitchCB(IsStillLoop, PartiallyInvariant, SibLoops);
+  UnswitchCB(IsStillLoop, PartiallyInvariant, InjectedCondition, SibLoops);
 
   if (MSSAU && VerifyMemorySSA)
     MSSAU->getMemorySSA()->verifyMemorySSA();
@@ -2980,13 +2981,6 @@ static bool shouldTryInjectInvariantCondition(
 /// the metadata.
 bool shouldTryInjectBasingOnMetadata(const BranchInst *BI,
                                      const BasicBlock *TakenSucc) {
-  // Skip branches that have already been unswithed this way. After successful
-  // unswitching of injected condition, we will still have a copy of this loop
-  // which looks exactly the same as original one. To prevent the 2nd attempt
-  // of unswitching it in the same pass, mark this branch as "nothing to do
-  // here".
-  if (BI->hasMetadata("llvm.invariant.condition.injection.disabled"))
-    return false;
   SmallVector<uint32_t> Weights;
   if (!extractBranchWeights(*BI, Weights))
     return false;
@@ -3069,13 +3063,9 @@ injectPendingInvariantConditions(NonTrivialUnswitchCandidate Candidate, Loop &L,
       Builder.CreateCondBr(InjectedCond, InLoopSucc, CheckBlock);
 
   Builder.SetInsertPoint(CheckBlock);
-  auto *NewTerm = Builder.CreateCondBr(TI->getCondition(), TI->getSuccessor(0),
-                                       TI->getSuccessor(1));
-
+  Builder.CreateCondBr(TI->getCondition(), TI->getSuccessor(0),
+                       TI->getSuccessor(1));
   TI->eraseFromParent();
-  // Prevent infinite unswitching.
-  NewTerm->setMetadata("llvm.invariant.condition.injection.disabled",
-                       MDNode::get(BB->getContext(), {}));
 
   // Fixup phis.
   for (auto &I : *InLoopSucc) {
@@ -3443,7 +3433,7 @@ static bool shouldInsertFreeze(Loop &L, Instruction &TI, DominatorTree &DT,
 static bool unswitchBestCondition(
     Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC,
     AAResults &AA, TargetTransformInfo &TTI,
-    function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB,
+    function_ref<void(bool, bool, bool, ArrayRef<Loop *>)> UnswitchCB,
     ScalarEvolution *SE, MemorySSAUpdater *MSSAU,
     function_ref<void(Loop &, StringRef)> DestroyLoopCB) {
   // Collect all invariant conditions within this loop (as opposed to an inner
@@ -3453,9 +3443,10 @@ static bool unswitchBestCondition(
   Instruction *PartialIVCondBranch = nullptr;
   collectUnswitchCandidates(UnswitchCandidates, PartialIVInfo,
                             PartialIVCondBranch, L, LI, AA, MSSAU);
-  collectUnswitchCandidatesWithInjections(UnswitchCandidates, PartialIVInfo,
-                                          PartialIVCondBranch, L, DT, LI, AA,
-                                          MSSAU);
+  if (!findOptionMDForLoop(&L, "llvm.loop.unswitch.injection.disable"))
+    collectUnswitchCandidatesWithInjections(UnswitchCandidates, PartialIVInfo,
+                                            PartialIVCondBranch, L, DT, LI, AA,
+                                            MSSAU);
   // If we didn't find any candidates, we're done.
   if (UnswitchCandidates.empty())
     return false;
@@ -3476,8 +3467,11 @@ static bool unswitchBestCondition(
     return false;
   }
 
-  if (Best.hasPendingInjection())
+  bool InjectedCondition = false;
+  if (Best.hasPendingInjection()) {
     Best = injectPendingInvariantConditions(Best, L, DT, LI, AC, MSSAU);
+    InjectedCondition = true;
+  }
   assert(!Best.hasPendingInjection() &&
          "All injections should have been done by now!");
 
@@ -3505,7 +3499,7 @@ static bool unswitchBestCondition(
                     << ") terminator: " << *Best.TI << "\n");
   unswitchNontrivialInvariants(L, *Best.TI, Best.Invariants, PartialIVInfo, DT,
                                LI, AC, UnswitchCB, SE, MSSAU, DestroyLoopCB,
-                               InsertFreeze);
+                               InsertFreeze, InjectedCondition);
   return true;
 }
 
@@ -3534,7 +3528,7 @@ static bool
 unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC,
              AAResults &AA, TargetTransformInfo &TTI, bool Trivial,
              bool NonTrivial,
-             function_ref<void(bool, bool, ArrayRef<Loop *>)> UnswitchCB,
+             function_ref<void(bool, bool, bool, ArrayRef<Loop *>)> UnswitchCB,
              ScalarEvolution *SE, MemorySSAUpdater *MSSAU,
              ProfileSummaryInfo *PSI, BlockFrequencyInfo *BFI,
              function_ref<void(Loop &, StringRef)> DestroyLoopCB) {
@@ -3549,7 +3543,8 @@ unswitchLoop(Loop &L, DominatorTree &DT, LoopInfo &LI, AssumptionCache &AC,
   if (Trivial && unswitchAllTrivialConditions(L, DT, LI, SE, MSSAU)) {
     // If we unswitched successfully we will want to clean up the loop before
     // processing it further so just mark it as unswitched and return.
-    UnswitchCB(/*CurrentLoopValid*/ true, false, {});
+    UnswitchCB(/*CurrentLoopValid*/ true, /*PartiallyInvariant*/ false,
+               /*InjectedCondition*/ false, {});
     return true;
   }
 
@@ -3645,6 +3640,7 @@ PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM,
 
   auto UnswitchCB = [&L, &U, &LoopName](bool CurrentLoopValid,
                                         bool PartiallyInvariant,
+                                        bool InjectedCondition,
                                         ArrayRef<Loop *> NewLoops) {
     // If we did a non-trivial unswitch, we have added new (cloned) loops.
     if (!NewLoops.empty())
@@ -3664,6 +3660,16 @@ PreservedAnalyses SimpleLoopUnswitchPass::run(Loop &L, LoopAnalysisManager &AM,
             Context, L.getLoopID(), {"llvm.loop.unswitch.partial"},
             {DisableUnswitchMD});
         L.setLoopID(NewLoopID);
+      } else if (InjectedCondition) {
+        // Do the same for injection of invariant conditions.
+        auto &Context = L.getHeader()->getContext();
+        MDNode *DisableUnswitchMD = MDNode::get(
+            Context,
+            MDString::get(Context, "llvm.loop.unswitch.injection.disable"));
+        MDNode *NewLoopID = makePostTransformationMetadata(
+            Context, L.getLoopID(), {"llvm.loop.unswitch.injection"},
+            {DisableUnswitchMD});
+        L.setLoopID(NewLoopID);
       } else
         U.revisitCurrentLoop();
     } else
@@ -3756,6 +3762,7 @@ bool SimpleLoopUnswitchLegacyPass::runOnLoop(Loop *L, LPPassManager &LPM) {
   auto *SE = SEWP ? &SEWP->getSE() : nullptr;
 
   auto UnswitchCB = [&L, &LPM](bool CurrentLoopValid, bool PartiallyInvariant,
+                               bool InjectedCondition,
                                ArrayRef<Loop *> NewLoops) {
     // If we did a non-trivial unswitch, we have added new (cloned) loops.
     for (auto *NewL : NewLoops)
@@ -3768,7 +3775,7 @@ bool SimpleLoopUnswitchLegacyPass::runOnLoop(Loop *L, LPPassManager &LPM) {
       // If the current loop has been unswitched using a partially invariant
       // condition, we should not re-add the current loop to avoid unswitching
       // on the same condition again.
-      if (!PartiallyInvariant)
+      if (!PartiallyInvariant && !InjectedCondition)
         LPM.addLoop(*L);
     } else
       LPM.markLoopAsDeleted(*L);
diff --git a/llvm/test/Transforms/SimpleLoopUnswitch/inject-invariant-conditions-exponential.ll b/llvm/test/Transforms/SimpleLoopUnswitch/inject-invariant-conditions-exponential.ll
new file mode 100644
index 000000000000000..bd88b4030d675e6
--- /dev/null
+++ b/llvm/test/Transforms/SimpleLoopUnswitch/inject-invariant-conditions-exponential.ll
@@ -0,0 +1,260 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
+; RUN: opt -S -passes='simple-loop-unswitch<nontrivial>' < %s | FileCheck %s
+
+; Make sure invariant condition injection does not result in exponential
+; size increase.
+
+; FIXME: It probably shouldn't result in linear size increase either.
+
+define void @ham(i64 %arg) {
+; CHECK-LABEL: define void @ham(
+; CHECK-SAME: i64 [[ARG:%.*]]) {
+; CHECK-NEXT:  bb:
+; CHECK-NEXT:    [[INJECTED_COND:%.*]] = icmp ule i64 [[ARG]], [[ARG]]
+; CHECK-NEXT:    [[INJECTED_COND_FR:%.*]] = freeze i1 [[INJECTED_COND]]
+; CHECK-NEXT:    br i1 [[INJECTED_COND_FR]], label [[BB_SPLIT_US:%.*]], label [[BB_SPLIT:%.*]]
+; CHECK:       bb.split.us:
+; CHECK-NEXT:    [[INJECTED_COND1:%.*]] = icmp ule i64 [[ARG]], [[ARG]]
+; CHECK-NEXT:    [[INJECTED_COND1_FR:%.*]] = freeze i1 [[INJECTED_COND1]]
+; CHECK-NEXT:    br i1 [[INJECTED_COND1_FR]], label [[BB_SPLIT_US_SPLIT_US:%.*]], label [[BB_SPLIT_US_SPLIT:%.*]]
+; CHECK:       bb.split.us.split.us:
+; CHECK-NEXT:    [[INJECTED_COND2:%.*]] = icmp ule i64 [[ARG]], [[ARG]]
+; CHECK-NEXT:    [[INJECTED_COND2_FR:%.*]] = freeze i1 [[INJECTED_COND2]]
+; CHECK-NEXT:    br i1 [[INJECTED_COND2_FR]], label [[BB_SPLIT_US_SPLIT_US_SPLIT_US:%.*]], label [[BB_SPLIT_US_SPLIT_US_SPLIT:%.*]]
+; CHECK:       bb.split.us.split.us.split.us:
+; CHECK-NEXT:    [[INJECTED_COND3:%.*]] = icmp ule i64 [[ARG]], [[ARG]]
+; CHECK-NEXT:    [[INJECTED_COND3_FR:%.*]] = freeze i1 [[INJECTED_COND3]]
+; CHECK-NEXT:    br i1 [[INJECTED_COND3_FR]], label [[BB_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT_US:%.*]], label [[BB_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT:%.*]]
+; CHECK:       bb.split.us.split.us.split.us.split.us:
+; CHECK-NEXT:    [[INJECTED_COND4:%.*]] = icmp ule i64 [[ARG]], [[ARG]]
+; CHECK-NEXT:    [[INJECTED_COND4_FR:%.*]] = freeze i1 [[INJECTED_COND4]]
+; CHECK-NEXT:    br i1 [[INJECTED_COND4_FR]], label [[BB_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT_US:%.*]], label [[BB_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT:%.*]]
+; CHECK:       bb.split.us.split.us.split.us.split.us.split.us:
+; CHECK-NEXT:    br label [[BB1_US_US_US_US_US:%.*]]
+; CHECK:       bb1.us.us.us.us.us:
+; CHECK-NEXT:    [[PHI_US_US_US_US_US:%.*]] = phi i64 [ 0, [[BB_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT_US]] ], [ [[ADD_US_US_US_US_US:%.*]], [[BB20_US_US_US_US_US:%.*]] ]
+; CHECK-NEXT:    [[ADD_US_US_US_US_US]] = add nuw i64 [[PHI_US_US_US_US_US]], 1
+; CHECK-NEXT:    [[ICMP_US_US_US_US_US:%.*]] = icmp ult i64 [[PHI_US_US_US_US_US]], [[ARG]]
+; CHECK-NEXT:    br i1 [[ICMP_US_US_US_US_US]], label [[BB2_US_US_US_US_US:%.*]], label [[BB21_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT_US:%.*]], !prof [[PROF0:![0-9]+]]
+; CHECK:       bb2.us.us.us.us.us:
+; CHECK-NEXT:    [[ICMP3_US_US_US_US_US:%.*]] = icmp ult i64 [[PHI_US_US_US_US_US]], [[ARG]]
+; CHECK-NEXT:    br label [[BB4_US_US_US_US_US:%.*]]
+; CHECK:       bb4.us.us.us.us.us:
+; CHECK-NEXT:    [[ICMP5_US_US_US_US_US:%.*]] = icmp ult i64 [[PHI_US_US_US_US_US]], [[ARG]]
+; CHECK-NEXT:    br label [[BB6_US_US_US_US_US:%.*]]
+; CHECK:       bb6.us.us.us.us.us:
+; CHECK-NEXT:    [[ICMP7_US_US_US_US_US:%.*]] = icmp ult i64 [[PHI_US_US_US_US_US]], [[ARG]]
+; CHECK-NEXT:    br label [[BB8_US_US_US_US_US:%.*]]
+; CHECK:       bb8.us.us.us.us.us:
+; CHECK-NEXT:    [[ICMP9_US_US_US_US_US:%.*]] = icmp ult i64 [[PHI_US_US_US_US_US]], [[ARG]]
+; CHECK-NEXT:    br label [[BB10_US_US_US_US_US:%.*]]
+; CHECK:       bb10.us.us.us.us.us:
+; CHECK-NEXT:    [[ICMP11_US_US_US_US_US:%.*]] = icmp ult i64 [[PHI_US_US_US_US_US]], [[ARG]]
+; CHECK-NEXT:    br label [[BB20_US_US_US_US_US]]
+; CHECK:       bb20.us.us.us.us.us:
+; CHECK-NEXT:    br label [[BB1_US_US_US_US_US]]
+; CHECK:       bb21.split.us.split.us.split.us.split.us.split.us:
+; CHECK-NEXT:    br label [[BB21_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT_US:%.*]]
+; CHECK:       bb.split.us.split.us.split.us.split.us.split:
+; CHECK-NEXT:    br label [[BB1_US_US_US_US:%.*]]
+; CHECK:       bb1.us.us.us.us:
+; CHECK-NEXT:    [[PHI_US_US_US_US:%.*]] = phi i64 [ 0, [[BB_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT]] ], [ [[ADD_US_US_US_US:%.*]], [[BB20_US_US_US_US:%.*]] ]
+; CHECK-NEXT:    [[ADD_US_US_US_US]] = add nuw i64 [[PHI_US_US_US_US]], 1
+; CHECK-NEXT:    [[ICMP_US_US_US_US:%.*]] = icmp ult i64 [[PHI_US_US_US_US]], [[ARG]]
+; CHECK-NEXT:    br i1 [[ICMP_US_US_US_US]], label [[BB2_US_US_US_US:%.*]], label [[BB21_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT:%.*]], !prof [[PROF0]]
+; CHECK:       bb2.us.us.us.us:
+; CHECK-NEXT:    [[ICMP3_US_US_US_US:%.*]] = icmp ult i64 [[PHI_US_US_US_US]], [[ARG]]
+; CHECK-NEXT:    br label [[BB4_US_US_US_US:%.*]]
+; CHECK:       bb4.us.us.us.us:
+; CHECK-NEXT:    [[ICMP5_US_US_US_US:%.*]] = icmp ult i64 [[PHI_US_US_US_US]], [[ARG]]
+; CHECK-NEXT:    br label [[BB6_US_US_US_US:%.*]]
+; CHECK:       bb6.us.us.us.us:
+; CHECK-NEXT:    [[ICMP7_US_US_US_US:%.*]] = icmp ult i64 [[PHI_US_US_US_US]], [[ARG]]
+; CHECK-NEXT:    br label [[BB8_US_US_US_US:%.*]]
+; CHECK:       bb8.us.us.us.us:
+; CHECK-NEXT:    [[ICMP9_US_US_US_US:%.*]] = icmp ult i64 [[PHI_US_US_US_US]], [[ARG]]
+; CHECK-NEXT:    br label [[BB10_US_US_US_US:%.*]]
+; CHECK:       bb10.us.us.us.us:
+; CHECK-NEXT:    [[ICMP11_US_US_US_US:%.*]] = icmp ult i64 [[PHI_US_US_US_US]], [[ARG]]
+; CHECK-NEXT:    br label [[BB10_US_US_US_US_CHECK:%.*]]
+; CHECK:       bb10.us.us.us.us.check:
+; CHECK-NEXT:    br i1 [[ICMP11_US_US_US_US]], label [[BB20_US_US_US_US]], label [[BB21_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT]]
+; CHECK:       bb20.us.us.us.us:
+; CHECK-NEXT:    br label [[BB1_US_US_US_US]], !llvm.loop [[LOOP1:![0-9]+]]
+; CHECK:       bb21.split.us.split.us.split.us.split.us.split:
+; CHECK-NEXT:    br label [[BB21_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT_US]]
+; CHECK:       bb21.split.us.split.us.split.us.split.us:
+; CHECK-NEXT:    br label [[BB21_SPLIT_US_SPLIT_US_SPLIT_US:%.*]]
+; CHECK:       bb.split.us.split.us.split.us.split:
+; CHECK-NEXT:    br label [[BB1_US_US_US:%.*]]
+; CHECK:       bb1.us.us.us:
+; CHECK-NEXT:    [[PHI_US_US_US:%.*]] = phi i64 [ 0, [[BB_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT]] ], [ [[ADD_US_US_US:%.*]], [[BB20_US_US_US:%.*]] ]
+; CHECK-NEXT:    [[ADD_US_US_US]] = add nuw i64 [[PHI_US_US_US]], 1
+; CHECK-NEXT:    [[ICMP_US_US_US:%.*]] = icmp ult i64 [[PHI_US_US_US]], [[ARG]]
+; CHECK-NEXT:    br i1 [[ICMP_US_US_US]], label [[BB2_US_US_US:%.*]], label [[BB21_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT:%.*]], !prof [[PROF0]]
+; CHECK:       bb2.us.us.us:
+; CHECK-NEXT:    [[ICMP3_US_US_US:%.*]] = icmp ult i64 [[PHI_US_US_US]], [[ARG]]
+; CHECK-NEXT:    br label [[BB4_US_US_US:%.*]]
+; CHECK:       bb4.us.us.us:
+; CHECK-NEXT:    [[ICMP5_US_US_US:%.*]] = icmp ult i64 [[PHI_US_US_US]], [[ARG]]
+; CHECK-NEXT:    br label [[BB6_US_US_US:%.*]]
+; CHECK:       bb6.us.us.us:
+; CHECK-NEXT:    [[ICMP7_US_US_US:%.*]] = icmp ult i64 [[PHI_US_US_US]], [[ARG]]
+; CHECK-NEXT:    br label [[BB8_US_US_US:%.*]]
+; CHECK:       bb8.us.us.us:
+; CHECK-NEXT:    [[ICMP9_US_US_US:%.*]] = icmp ult i64 [[PHI_US_US_US]], [[ARG]]
+; CHECK-NEXT:    br label [[BB8_US_US_US_CHECK:%.*]]
+; CHECK:       bb8.us.us.us.check:
+; CHECK-NEXT:    br i1 [[ICMP9_US_US_US]], label [[BB10_US_US_US:%.*]], label [[BB21_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT]]
+; CHECK:       bb10.us.us.us:
+; CHECK-NEXT:    [[ICMP11_US_US_US:%.*]] = icmp ult i64 [[PHI_US_US_US]], [[ARG]]
+; CHECK-NEXT:    br i1 [[ICMP11_US_US_US]], label [[BB20_US_US_US]], label [[BB21_SPLIT_US_SPLIT_US_SPLIT_US_SPLIT]], !prof [[PROF0]]
+; CHECK:       bb20.us.us.us:
+; CHECK-NEXT:    br label [[BB1_US_US_US]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK:       bb21.split.us.split.us.split.us.split:
+; CHECK-NEXT:    br label [[BB21_SPLIT_US_SPLIT_US_SPLIT_US]]
+; CHECK:       bb21.split.us.split.us.split.us:
+; CHECK-NEXT:    br label [[BB21_SPLIT_US_SPLIT_US:%.*]]
+; CHECK:       bb.split.us.split.us.split:
+; CHECK-NEXT:    br label [[BB1_US_US:%.*]]
+; CHECK:       bb1.us.us:
+; CHECK-NEXT:    [[PHI_US_US:%.*]] = phi i64 [ 0, [[BB_SPLIT_US_SPLIT_US_SPLIT]] ], [ [[ADD_US_US:%.*]], [[BB20_US_US:%.*]] ]
+; CHECK-NEXT:    [[ADD_US_US]] = add nuw i64 [[PHI_US_US]], 1
+; CHECK-NEXT:    [[ICMP_US_US:%.*]] = icmp ult i64 [[PHI_US_US]], [[ARG]]
+; CHECK-NEXT:    br i1 [[ICMP_US_US]], label [[BB2_US_US:%.*]], label [[BB21_SPLIT_US_SPLIT_US_SPLIT:%.*]], !prof [[PROF0]]
+; CHECK:       bb2.us.us:
+; CHECK-NEXT:    [[ICMP3_US_US:%.*]] = icmp ult i64 [[PHI_US_US]], [[ARG]]
+; CHECK-NEXT:    br label [[BB4_US_US:%.*]]
+; CHECK:       bb4.us.us:
+; CHECK-NEXT:    [[ICMP5_US_US:%.*]] = icmp ult i64 [[PHI_US_US]], [[ARG]]
+; CHECK-NEXT:    br label [[BB6_US_US:%.*]]
+; CHECK:       bb6.us.us:
+; CHECK-NEXT:    [[ICMP7_US_US:%.*]] = icmp ult i64 [[PHI_US_US]], [[ARG]]
+; CHECK-NEXT:    br label [[BB6_US_US_CHECK:%.*]]
+; CHECK:       bb6.us.us.check:
+; CHECK-NEXT:    br i1 [[ICMP7_US_US]], label [[BB8_US_US:%.*]], label [[BB21_SPLIT_US_SPLIT_US_SPLIT]]
+; CHECK:       bb8.us.us:
+; CHECK-NEXT:    [[ICMP9_US_US:%.*]] = icmp ult i64 [[PHI_US_US]], [[ARG]]
+; CHECK-NEXT:    br i1 [[ICMP9_US_US]], label [[BB10_US_US:%.*]], label [[BB21_SPLIT_US_SPLIT_US_SPLIT]], !prof [[PROF0]]
+; CHECK:       bb10.us.us:
+; CHECK-NEXT:    [[ICMP11_US_US:%.*]] = icmp ult i64 [[PHI_US_US]], [[ARG]]
+; CHECK-NEXT:    br i1 [[ICMP11_US_US]], label [[BB20_US_US]], label [[BB21_SPLIT_US_SPLIT_US_SPLIT]], !prof [[PROF0]]
+; CHECK:       bb20.us.us:
+; CHECK-NEXT:    br label [[BB1_US_US]], !llvm.loop [[LOOP4:![0-9]+]]
+; CHECK:       bb21.split.us.split.us.split:
+; CHECK-NEXT:    br label [[BB21_SPLIT_US_SPLIT_US]]
+; CHECK:       bb21.split.us.split.us:
+; CHECK-NEXT:    br label [[BB21_SPLIT_US:%.*]]
+; CHECK:       bb.split.us.split:
+; CHECK-NEXT:    br label [[BB1_US:%.*]]
+; CHECK:       bb1.us:
+; CHECK-NEXT:    [[PHI_US:%.*]] = phi i64 [ 0, [[BB_SPLIT_US_SPLIT]] ], [ [[ADD_US:%.*]], [[BB20_US:%.*]] ]
+; CHECK-NEXT:    [[ADD_US]] = add nuw i64 [[PHI_US]], 1
+; CHECK-NEXT:    [[ICMP_US:%.*]] = icmp ult i64 [[PHI_US]], [[ARG]]
+; CHECK-NEXT:    br i1 [[ICMP_US]], label [[BB2_US:%.*]], label [[BB21_SPLIT_US_SPLIT:%.*]], !prof [[PROF0]]
+; CHECK:       bb2.us:
+; CHECK-NEXT:    [[ICMP3_US:%.*]] = icmp ult i64 [[PHI_US]], [[ARG]]
+; CHECK-NEXT:    br label [[BB4_US:%.*]]
+; CHECK:       bb4.us:
+; CHECK-NEXT:    [[ICMP5_US:%.*]] = icmp ult i64 [[PHI_US]], [[ARG]]
+; CHECK-NEXT:    br label [[BB4_US_CHECK:%.*]]
+; CHECK:       bb4.us.check:
+; CHECK-NEXT:    br i1 [[ICMP5_US]], label [[BB6_US:%.*]], label [[BB22_SPLIT_US:%.*]]
+; CHECK:       bb6.us:
+; CHECK-NEXT:    [[ICMP7_US:%.*]] = icmp ult i64 [[PHI_US]], [[ARG]]
+; CHECK-NEXT:    br i1 [[ICMP7_US]], label [[BB8_US:%.*]], label [[BB21_SPLIT_US_SPLIT]], !prof [[PROF0]]
+; CHECK:       bb8.us:
+; CHECK-NEXT:    [[ICMP9_US:%.*]] = icmp ult i64 [[PHI_US]], [[ARG]]
+; CHECK-NEXT:    br i1 [[ICMP9_US]], label [[BB10_US:%.*]], label [[BB21_SPLIT_US_SPLIT]], !prof [[PROF0]]
+; CHECK:       bb10.us:
+; CHECK-NEXT:    [[ICMP11_US:%.*]] = icmp ult i64 [[PHI_US]], [[ARG]]
+; CHECK-NEXT:    br i1 [[ICMP11_US]], label [[BB20_US]], label [[BB21_SPLIT_US_SPLIT]], !prof [[PROF0]]
+; CHECK:       bb20.us:
+; CHECK-NEXT:    br label [[BB1_US]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK:       bb21.split.us.split:
+; CHECK-NEXT:    br label [[BB21_SPLIT_US]]
+; CHECK:       bb21.split.us:
+; CHECK-NEXT:    br label [[BB21:%.*]]
+; CHECK:       bb22.split.us:
+; CHECK-NEXT:    br label [[BB22:%.*]]
+; CHECK:       bb.split:
+; CHECK-NEXT:    br label [[BB1:%.*]]
+; CHECK:       bb1:
+; CHECK-NEXT:    [[PHI:%.*]] = phi i64 [ 0, [[BB_SPLIT]] ], [ [[ADD:%.*]], [[BB20:%.*]] ]
+; CHECK-NEXT:    [[ADD]] = add nuw i64 [[PHI]], 1
+; CHECK-NEXT:    [[ICMP:%.*]] = icmp u...
[truncated]

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@@ -3768,7 +3775,7 @@ bool SimpleLoopUnswitchLegacyPass::runOnLoop(Loop *L, LPPassManager &LPM) {
// If the current loop has been unswitched using a partially invariant
// condition, we should not re-add the current loop to avoid unswitching
// on the same condition again.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment above needs updating

When unswitching via invariant condition injection, we currently
mark the condition in the old loop, so that it does not get
unswitched again. However, if there are multiple branches for
which conditions can be injected, then we can do that for both
the old and new loop. This means that the number of unswitches
increases exponentially.

Change the handling to be more similar to partial unswitching,
where we instead mark the whole loop, rather than a single
condition. This means that we will only generate a linear number
of loops.

TBH I think even that is still highly undesirable, and we should
probably be unswitching all candidates at the same time, so that
we end up with only two loops. But at least this mitigates the
worst case.

The test case is a reduced variant that generates 1700 lines of IR
without this patch and 290 with it.

Fixes llvm#66868.
@nikic nikic force-pushed the simple-loop-unswitch-fix branch from 866577f to 8362cae Compare September 21, 2023 07:47
@nikic nikic merged commit 8362cae into llvm:main Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Infinite/exponential loop unswitch due to invariant condition injection
3 participants