Skip to content

[DFAJumpThreading] Rewrite the way paths are enumerated #96127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 10, 2024

Conversation

UsmanNadeem
Copy link
Contributor

I tried to add a limit to number of blocks visited in the paths() function but even with a very high limit the transformation coverage was being reduced.

After looking at the code it seemed that the function was trying to create paths of the form SwitchBB...DeterminatorBB...SwitchPredecessor. This is inefficient because a lot of nodes in those paths (nodes before DeterminatorBB) would be irrelevant to the optimization. We only care about paths of the form DeterminatorBB_Pred DeterminatorBB...SwitchBB. This weeds out a lot of visited nodes.

In this patch I have added a hard limit to the number of nodes visited and changed the algorithm for path calculation. Primarily I am traversing the use-def chain for the PHI nodes that define the state. If we have a hole in the use-def chain (no immediate predecessors) then I call the paths() function.

I also had to the change the select instruction unfolding code to insert redundant one input PHIs to allow the use of the use-def chain in calculating the paths.

The test suite coverage with this patch (including a limit on nodes visited) is as follows:

Geomean diff:
  dfa-jump-threading.NumTransforms: +13.4%
  dfa-jump-threading.NumCloned: +34.1%
  dfa-jump-threading.NumPaths: -80.7%

Compile time effect vs baseline (pass enabled by default) is mostly positive: https://llvm-compile-time-tracker.com/compare.php?from=ad8705fda25f64dcfeb6264ac4d6bac36bee91ab&to=5a3af6ce7e852f0736f706b4a8663efad5bce6ea&stat=instructions:u

Change-Id: I0fba9e0f8aa079706f633089a8ccd4ecf57547ed

I tried to add a limit to number of blocks visited in the paths()
function but even with a very high limit the transformation coverage was
being reduced.

After looking at the code it seemed that the function was trying to
create paths of the form `SwitchBB...DeterminatorBB...SwitchPredecessor`.
This is inefficient because a lot of nodes in those paths (nodes before
DeterminatorBB) would be irrelevant to the optimization. We only care
about paths of the form `DeterminatorBB_Pred DeterminatorBB...SwitchBB`.
This weeds out a lot of visited nodes.

In this patch I have added a hard limit to the number of nodes visited
and changed the algorithm for path calculation. Primarily I am
traversing the use-def chain for the PHI nodes that define the state. If
we have a hole in the use-def chain (no immediate predecessors) then I
call the paths() function.

I also had to the change the select instruction unfolding code to insert
redundant one input PHIs to allow the use of the use-def chain in
calculating the paths.

The test suite coverage with this patch (including a limit on nodes
visited) is as follows:

    Geomean diff:
      dfa-jump-threading.NumTransforms: +13.4%
      dfa-jump-threading.NumCloned: +34.1%
      dfa-jump-threading.NumPaths: -80.7%

Compile time effect vs baseline (pass enabled by default) is mostly
positive: https://llvm-compile-time-tracker.com/compare.php?from=ad8705fda25f64dcfeb6264ac4d6bac36bee91ab&to=5a3af6ce7e852f0736f706b4a8663efad5bce6ea&stat=instructions:u

Change-Id: I0fba9e0f8aa079706f633089a8ccd4ecf57547ed
@llvmbot
Copy link
Member

llvmbot commented Jun 20, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Usman Nadeem (UsmanNadeem)

Changes

I tried to add a limit to number of blocks visited in the paths() function but even with a very high limit the transformation coverage was being reduced.

After looking at the code it seemed that the function was trying to create paths of the form SwitchBB...DeterminatorBB...SwitchPredecessor. This is inefficient because a lot of nodes in those paths (nodes before DeterminatorBB) would be irrelevant to the optimization. We only care about paths of the form DeterminatorBB_Pred DeterminatorBB...SwitchBB. This weeds out a lot of visited nodes.

In this patch I have added a hard limit to the number of nodes visited and changed the algorithm for path calculation. Primarily I am traversing the use-def chain for the PHI nodes that define the state. If we have a hole in the use-def chain (no immediate predecessors) then I call the paths() function.

I also had to the change the select instruction unfolding code to insert redundant one input PHIs to allow the use of the use-def chain in calculating the paths.

The test suite coverage with this patch (including a limit on nodes visited) is as follows:

Geomean diff:
  dfa-jump-threading.NumTransforms: +13.4%
  dfa-jump-threading.NumCloned: +34.1%
  dfa-jump-threading.NumPaths: -80.7%

Compile time effect vs baseline (pass enabled by default) is mostly positive: https://llvm-compile-time-tracker.com/compare.php?from=ad8705fda25f64dcfeb6264ac4d6bac36bee91ab&to=5a3af6ce7e852f0736f706b4a8663efad5bce6ea&stat=instructions:u

Change-Id: I0fba9e0f8aa079706f633089a8ccd4ecf57547ed


Patch is 63.80 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/96127.diff

5 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp (+255-207)
  • (modified) llvm/test/Transforms/DFAJumpThreading/dfa-jump-threading-analysis.ll (+18-26)
  • (modified) llvm/test/Transforms/DFAJumpThreading/dfa-jump-threading-transform.ll (+46-38)
  • (modified) llvm/test/Transforms/DFAJumpThreading/dfa-unfold-select.ll (+133-71)
  • (modified) llvm/test/Transforms/DFAJumpThreading/max-path-length.ll (+13-58)
diff --git a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
index 4371b821eae63..42900642edb2c 100644
--- a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
+++ b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
@@ -106,6 +106,12 @@ static cl::opt<unsigned> MaxPathLength(
     cl::desc("Max number of blocks searched to find a threading path"),
     cl::Hidden, cl::init(20));
 
+static cl::opt<unsigned> MaxNumVisitiedPaths(
+    "dfa-max-num-visited-paths",
+    cl::desc(
+        "Max number of blocks visited while enumerating paths around a switch"),
+    cl::Hidden, cl::init(2000));
+
 static cl::opt<unsigned>
     MaxNumPaths("dfa-max-num-paths",
                 cl::desc("Max number of paths enumerated around a switch"),
@@ -177,24 +183,6 @@ class DFAJumpThreading {
 
 namespace {
 
-/// Create a new basic block and sink \p SIToSink into it.
-void createBasicBlockAndSinkSelectInst(
-    DomTreeUpdater *DTU, SelectInst *SI, PHINode *SIUse, SelectInst *SIToSink,
-    BasicBlock *EndBlock, StringRef NewBBName, BasicBlock **NewBlock,
-    BranchInst **NewBranch, std::vector<SelectInstToUnfold> *NewSIsToUnfold,
-    std::vector<BasicBlock *> *NewBBs) {
-  assert(SIToSink->hasOneUse());
-  assert(NewBlock);
-  assert(NewBranch);
-  *NewBlock = BasicBlock::Create(SI->getContext(), NewBBName,
-                                 EndBlock->getParent(), EndBlock);
-  NewBBs->push_back(*NewBlock);
-  *NewBranch = BranchInst::Create(EndBlock, *NewBlock);
-  SIToSink->moveBefore(*NewBranch);
-  NewSIsToUnfold->push_back(SelectInstToUnfold(SIToSink, SIUse));
-  DTU->applyUpdates({{DominatorTree::Insert, *NewBlock, EndBlock}});
-}
-
 /// Unfold the select instruction held in \p SIToUnfold by replacing it with
 /// control flow.
 ///
@@ -212,89 +200,42 @@ void unfold(DomTreeUpdater *DTU, LoopInfo *LI, SelectInstToUnfold SIToUnfold,
   BranchInst *StartBlockTerm =
       dyn_cast<BranchInst>(StartBlock->getTerminator());
 
-  assert(StartBlockTerm && StartBlockTerm->isUnconditional());
+  assert(StartBlockTerm);
   assert(SI->hasOneUse());
 
-  // These are the new basic blocks for the conditional branch.
-  // At least one will become an actual new basic block.
-  BasicBlock *TrueBlock = nullptr;
-  BasicBlock *FalseBlock = nullptr;
-  BranchInst *TrueBranch = nullptr;
-  BranchInst *FalseBranch = nullptr;
-
-  // Sink select instructions to be able to unfold them later.
-  if (SelectInst *SIOp = dyn_cast<SelectInst>(SI->getTrueValue())) {
-    createBasicBlockAndSinkSelectInst(DTU, SI, SIUse, SIOp, EndBlock,
-                                      "si.unfold.true", &TrueBlock, &TrueBranch,
-                                      NewSIsToUnfold, NewBBs);
-  }
-  if (SelectInst *SIOp = dyn_cast<SelectInst>(SI->getFalseValue())) {
-    createBasicBlockAndSinkSelectInst(DTU, SI, SIUse, SIOp, EndBlock,
-                                      "si.unfold.false", &FalseBlock,
-                                      &FalseBranch, NewSIsToUnfold, NewBBs);
-  }
-
-  // If there was nothing to sink, then arbitrarily choose the 'false' side
-  // for a new input value to the PHI.
-  if (!TrueBlock && !FalseBlock) {
-    FalseBlock = BasicBlock::Create(SI->getContext(), "si.unfold.false",
-                                    EndBlock->getParent(), EndBlock);
-    NewBBs->push_back(FalseBlock);
-    BranchInst::Create(EndBlock, FalseBlock);
-    DTU->applyUpdates({{DominatorTree::Insert, FalseBlock, EndBlock}});
-  }
-
-  // Insert the real conditional branch based on the original condition.
-  // If we did not create a new block for one of the 'true' or 'false' paths
-  // of the condition, it means that side of the branch goes to the end block
-  // directly and the path originates from the start block from the point of
-  // view of the new PHI.
-  BasicBlock *TT = EndBlock;
-  BasicBlock *FT = EndBlock;
-  if (TrueBlock && FalseBlock) {
-    // A diamond.
-    TT = TrueBlock;
-    FT = FalseBlock;
-
-    // Update the phi node of SI.
-    SIUse->addIncoming(SI->getTrueValue(), TrueBlock);
-    SIUse->addIncoming(SI->getFalseValue(), FalseBlock);
-
-    // Update any other PHI nodes in EndBlock.
-    for (PHINode &Phi : EndBlock->phis()) {
-      if (&Phi != SIUse) {
-        Value *OrigValue = Phi.getIncomingValueForBlock(StartBlock);
-        Phi.addIncoming(OrigValue, TrueBlock);
-        Phi.addIncoming(OrigValue, FalseBlock);
-      }
-
-      // Remove incoming place of original StartBlock, which comes in a indirect
-      // way (through TrueBlock and FalseBlock) now.
-      Phi.removeIncomingValue(StartBlock, /* DeletePHIIfEmpty = */ false);
-    }
-  } else {
-    BasicBlock *NewBlock = nullptr;
+  if (StartBlockTerm->isUnconditional()) {
+    // Arbitrarily choose the 'false' side for a new input value to the PHI.
+    BasicBlock *NewBlock = BasicBlock::Create(
+        SI->getContext(), Twine(SI->getName(), ".si.unfold.false"),
+        EndBlock->getParent(), EndBlock);
+    NewBBs->push_back(NewBlock);
+    BranchInst::Create(EndBlock, NewBlock);
+    DTU->applyUpdates({{DominatorTree::Insert, NewBlock, EndBlock}});
+
+    // StartBlock
+    //   |  \
+    //   |  NewBlock
+    //   |  /
+    // EndBlock
     Value *SIOp1 = SI->getTrueValue();
     Value *SIOp2 = SI->getFalseValue();
 
-    // A triangle pointing right.
-    if (!TrueBlock) {
-      NewBlock = FalseBlock;
-      FT = FalseBlock;
-    }
-    // A triangle pointing left.
-    else {
-      NewBlock = TrueBlock;
-      TT = TrueBlock;
-      std::swap(SIOp1, SIOp2);
-    }
+    PHINode *NewPhi = PHINode::Create(SIUse->getType(), 1,
+                                      Twine(SIOp2->getName(), ".si.unfold.phi"),
+                                      NewBlock->getFirstInsertionPt());
+    NewPhi->addIncoming(SIOp2, StartBlock);
+
+    if (auto *OpSi = dyn_cast<SelectInst>(SIOp1))
+      NewSIsToUnfold->push_back(SelectInstToUnfold(OpSi, SIUse));
+    if (auto *OpSi = dyn_cast<SelectInst>(SIOp2))
+      NewSIsToUnfold->push_back(SelectInstToUnfold(OpSi, NewPhi));
 
     // Update the phi node of SI.
     for (unsigned Idx = 0; Idx < SIUse->getNumIncomingValues(); ++Idx) {
       if (SIUse->getIncomingBlock(Idx) == StartBlock)
         SIUse->setIncomingValue(Idx, SIOp1);
     }
-    SIUse->addIncoming(SIOp2, NewBlock);
+    SIUse->addIncoming(NewPhi, NewBlock);
 
     // Update any other PHI nodes in EndBlock.
     for (auto II = EndBlock->begin(); PHINode *Phi = dyn_cast<PHINode>(II);
@@ -302,11 +243,87 @@ void unfold(DomTreeUpdater *DTU, LoopInfo *LI, SelectInstToUnfold SIToUnfold,
       if (Phi != SIUse)
         Phi->addIncoming(Phi->getIncomingValueForBlock(StartBlock), NewBlock);
     }
+
+    StartBlockTerm->eraseFromParent();
+
+    // Insert the real conditional branch based on the original condition.
+    BranchInst::Create(EndBlock, NewBlock, SI->getCondition(), StartBlock);
+    DTU->applyUpdates({{DominatorTree::Insert, StartBlock, EndBlock},
+                       {DominatorTree::Insert, StartBlock, NewBlock}});
+  } else {
+    BasicBlock *NewBlockT = BasicBlock::Create(
+        SI->getContext(), Twine(SI->getName(), ".si.unfold.true"),
+        EndBlock->getParent(), EndBlock);
+    BasicBlock *NewBlockF = BasicBlock::Create(
+        SI->getContext(), Twine(SI->getName(), ".si.unfold.false"),
+        EndBlock->getParent(), EndBlock);
+
+    NewBBs->push_back(NewBlockT);
+    NewBBs->push_back(NewBlockF);
+
+    // Def only has one use in EndBlock.
+    // Before transformation:
+    // StartBlock(Def)
+    //   |      \
+    // EndBlock  OtherBlock
+    //  (Use)
+    //
+    // After transformation:
+    // StartBlock(Def)
+    //   |      \
+    //   |       OtherBlock
+    // NewBlockT
+    //   |     \
+    //   |   NewBlockF
+    //   |      /
+    //   |     /
+    // EndBlock
+    //  (Use)
+    BranchInst::Create(EndBlock, NewBlockF);
+    // Insert the real conditional branch based on the original condition.
+    BranchInst::Create(EndBlock, NewBlockF, SI->getCondition(), NewBlockT);
+    DTU->applyUpdates({{DominatorTree::Insert, NewBlockT, NewBlockF},
+                       {DominatorTree::Insert, NewBlockT, EndBlock},
+                       {DominatorTree::Insert, NewBlockF, EndBlock}});
+
+    Value *TrueVal = SI->getTrueValue();
+    Value *FalseVal = SI->getFalseValue();
+
+    PHINode *NewPhiT = PHINode::Create(
+        SIUse->getType(), 1, Twine(TrueVal->getName(), ".si.unfold.phi"),
+        NewBlockT->getFirstInsertionPt());
+    PHINode *NewPhiF = PHINode::Create(
+        SIUse->getType(), 1, Twine(FalseVal->getName(), ".si.unfold.phi"),
+        NewBlockF->getFirstInsertionPt());
+    NewPhiT->addIncoming(TrueVal, StartBlock);
+    NewPhiF->addIncoming(FalseVal, NewBlockT);
+
+    if (auto *TrueSI = dyn_cast<SelectInst>(TrueVal))
+      NewSIsToUnfold->push_back(SelectInstToUnfold(TrueSI, NewPhiT));
+    if (auto *FalseSi = dyn_cast<SelectInst>(FalseVal))
+      NewSIsToUnfold->push_back(SelectInstToUnfold(FalseSi, NewPhiF));
+
+    SIUse->addIncoming(NewPhiT, NewBlockT);
+    SIUse->addIncoming(NewPhiF, NewBlockF);
+    SIUse->removeIncomingValue(StartBlock);
+
+    // Update any other PHI nodes in EndBlock.
+    for (auto II = EndBlock->begin(); PHINode *Phi = dyn_cast<PHINode>(II);
+         ++II) {
+      if (Phi != SIUse) {
+        Phi->addIncoming(Phi->getIncomingValueForBlock(StartBlock), NewBlockT);
+        Phi->addIncoming(Phi->getIncomingValueForBlock(StartBlock), NewBlockF);
+        Phi->removeIncomingValue(StartBlock);
+      }
+    }
+
+    // Update the appropriate successor of the start block to point to the new
+    // unfolded block.
+    unsigned SuccNum = StartBlockTerm->getSuccessor(1) == EndBlock ? 1 : 0;
+    StartBlockTerm->setSuccessor(SuccNum, NewBlockT);
+    DTU->applyUpdates({{DominatorTree::Delete, StartBlock, EndBlock},
+                       {DominatorTree::Insert, StartBlock, NewBlockT}});
   }
-  StartBlockTerm->eraseFromParent();
-  BranchInst::Create(TT, FT, SI->getCondition(), StartBlock);
-  DTU->applyUpdates({{DominatorTree::Insert, StartBlock, TT},
-                     {DominatorTree::Insert, StartBlock, FT}});
 
   // Preserve loop info
   if (Loop *L = LI->getLoopFor(SI->getParent())) {
@@ -372,6 +389,11 @@ struct ThreadingPath {
   /// Path is a list of basic blocks.
   const PathType &getPath() const { return Path; }
   void setPath(const PathType &NewPath) { Path = NewPath; }
+  void push_back(BasicBlock *BB) { Path.push_back(BB); }
+  void push_front(BasicBlock *BB) { Path.push_front(BB); }
+  void appendExcludingFirst(const PathType &OtherPath) {
+    Path.insert(Path.end(), OtherPath.begin() + 1, OtherPath.end());
+  }
 
   void print(raw_ostream &OS) const {
     OS << Path << " [ " << ExitVal << ", " << DBB->getName() << " ]";
@@ -530,9 +552,9 @@ struct MainSwitch {
 
 struct AllSwitchPaths {
   AllSwitchPaths(const MainSwitch *MSwitch, OptimizationRemarkEmitter *ORE,
-                 LoopInfo *LI)
+                 LoopInfo *LI, Loop *L)
       : Switch(MSwitch->getInstr()), SwitchBlock(Switch->getParent()), ORE(ORE),
-        LI(LI) {}
+        LI(LI), SwitchOuterLoop(L) {}
 
   std::vector<ThreadingPath> &getThreadingPaths() { return TPaths; }
   unsigned getNumThreadingPaths() { return TPaths.size(); }
@@ -540,10 +562,7 @@ struct AllSwitchPaths {
   BasicBlock *getSwitchBlock() { return SwitchBlock; }
 
   void run() {
-    VisitedBlocks Visited;
-    PathsType LoopPaths = paths(SwitchBlock, Visited, /* PathDepth = */ 1);
-    StateDefMap StateDef = getStateDefMap(LoopPaths);
-
+    StateDefMap StateDef = getStateDefMap();
     if (StateDef.empty()) {
       ORE->emit([&]() {
         return OptimizationRemarkMissed(DEBUG_TYPE, "SwitchNotPredictable",
@@ -553,42 +572,118 @@ struct AllSwitchPaths {
       return;
     }
 
-    for (const PathType &Path : LoopPaths) {
-      ThreadingPath TPath;
-
-      const BasicBlock *PrevBB = Path.back();
-      for (const BasicBlock *BB : Path) {
-        if (StateDef.contains(BB)) {
-          const PHINode *Phi = dyn_cast<PHINode>(StateDef[BB]);
-          assert(Phi && "Expected a state-defining instr to be a phi node.");
-
-          const Value *V = Phi->getIncomingValueForBlock(PrevBB);
-          if (const ConstantInt *C = dyn_cast<const ConstantInt>(V)) {
-            TPath.setExitValue(C);
-            TPath.setDeterminator(BB);
-            TPath.setPath(Path);
-          }
-        }
+    auto *SwitchPhi = cast<PHINode>(Switch->getOperand(0));
+    auto *SwitchPhiDefBB = SwitchPhi->getParent();
+    VisitedBlocks VB;
+    // Get paths from the determinator BBs to SwitchPhiDefBB
+    std::vector<ThreadingPath> PathsToPhiDef =
+        getPathsFromStateDefMap(StateDef, SwitchPhi, VB);
+    if (SwitchPhiDefBB == SwitchBlock) {
+      TPaths = std::move(PathsToPhiDef);
+      return;
+    }
 
-        // Switch block is the determinator, this is the final exit value.
-        if (TPath.isExitValueSet() && BB == Path.front())
-          break;
+    // Find and append paths from SwitchPhiDefBB to SwitchBlock.
+    PathsType PathsToSwitchBB =
+        paths(SwitchPhiDefBB, SwitchBlock, VB, /* PathDepth = */ 1);
+    if (PathsToSwitchBB.empty())
+      return;
 
-        PrevBB = BB;
+    std::vector<ThreadingPath> TempList;
+    for (const ThreadingPath &Path : PathsToPhiDef) {
+      for (const PathType &PathToSw : PathsToSwitchBB) {
+        ThreadingPath PathCopy(Path);
+        PathCopy.appendExcludingFirst(PathToSw);
+        TempList.push_back(PathCopy);
       }
-
-      if (TPath.isExitValueSet() && isSupported(TPath))
-        TPaths.push_back(TPath);
     }
+    TPaths = std::move(TempList);
   }
 
 private:
   // Value: an instruction that defines a switch state;
   // Key: the parent basic block of that instruction.
   typedef DenseMap<const BasicBlock *, const PHINode *> StateDefMap;
+  std::vector<ThreadingPath> getPathsFromStateDefMap(StateDefMap &StateDef,
+                                                     PHINode *Phi,
+                                                     VisitedBlocks &VB) {
+    std::vector<ThreadingPath> Res;
+    auto *PhiBB = Phi->getParent();
+    VB.insert(PhiBB);
+
+    VisitedBlocks UniqueBlocks;
+    for (auto *IncomingBB : Phi->blocks()) {
+      if (!UniqueBlocks.insert(IncomingBB).second)
+        continue;
+      if (!SwitchOuterLoop->contains(IncomingBB))
+        continue;
+
+      Value *IncomingValue = Phi->getIncomingValueForBlock(IncomingBB);
+      // We found the determinator. The is the start of our path.
+      if (auto *C = dyn_cast<ConstantInt>(IncomingValue)) {
+        // SwitchBlock is the determinator, unsupported unless its also the def.
+        if (PhiBB == SwitchBlock &&
+            SwitchBlock != cast<PHINode>(Switch->getOperand(0))->getParent())
+          continue;
+        ThreadingPath NewPath;
+        NewPath.setDeterminator(PhiBB);
+        NewPath.setExitValue(C);
+        // Don't add SwitchBlock at the start, this is handled later.
+        if (IncomingBB != SwitchBlock)
+          NewPath.push_back(IncomingBB);
+        NewPath.push_back(PhiBB);
+        Res.push_back(NewPath);
+        continue;
+      }
+      // Don't get into a cycle.
+      if (VB.contains(IncomingBB) || IncomingBB == SwitchBlock)
+        continue;
+      // Recurse up the PHI chain.
+      auto *IncomingPhi = dyn_cast<PHINode>(IncomingValue);
+      if (!IncomingPhi)
+        continue;
+      auto *IncomingPhiDefBB = IncomingPhi->getParent();
+      if (!StateDef.contains(IncomingPhiDefBB))
+        continue;
+
+      // Direct prececessor, just add to the path.
+      if (IncomingPhiDefBB == IncomingBB) {
+        std::vector<ThreadingPath> PredPaths =
+            getPathsFromStateDefMap(StateDef, IncomingPhi, VB);
+        for (ThreadingPath &Path : PredPaths) {
+          Path.push_back(PhiBB);
+          Res.push_back(std::move(Path));
+        }
+        continue;
+      }
+      // Not a direct prececessor, find intermediate paths to append to the
+      // existing path.
+      if (VB.contains(IncomingPhiDefBB))
+        continue;
 
-  PathsType paths(BasicBlock *BB, VisitedBlocks &Visited,
-                  unsigned PathDepth) const {
+      PathsType IntermediatePaths;
+      IntermediatePaths =
+          paths(IncomingPhiDefBB, IncomingBB, VB, /* PathDepth = */ 1);
+      if (IntermediatePaths.empty())
+        continue;
+
+      std::vector<ThreadingPath> PredPaths =
+          getPathsFromStateDefMap(StateDef, IncomingPhi, VB);
+      for (const ThreadingPath &Path : PredPaths) {
+        for (const PathType &IPath : IntermediatePaths) {
+          ThreadingPath NewPath(Path);
+          NewPath.appendExcludingFirst(IPath);
+          NewPath.push_back(PhiBB);
+          Res.push_back(NewPath);
+        }
+      }
+    }
+    VB.erase(PhiBB);
+    return Res;
+  }
+
+  PathsType paths(BasicBlock *BB, BasicBlock *ToBB, VisitedBlocks &Visited,
+                  unsigned PathDepth) {
     PathsType Res;
 
     // Stop exploring paths after visiting MaxPathLength blocks
@@ -603,11 +698,12 @@ struct AllSwitchPaths {
     }
 
     Visited.insert(BB);
+    if (++NumVisited > MaxNumVisitiedPaths)
+      return Res;
 
     // Stop if we have reached the BB out of loop, since its successors have no
     // impact on the DFA.
-    // TODO: Do we need to stop exploring if BB is the outer loop of the switch?
-    if (!LI->getLoopFor(BB))
+    if (!SwitchOuterLoop->contains(BB))
       return Res;
 
     // Some blocks have multiple edges to the same successor, and this set
@@ -617,9 +713,12 @@ struct AllSwitchPaths {
       if (!Successors.insert(Succ).second)
         continue;
 
-      // Found a cycle through the SwitchBlock
-      if (Succ == SwitchBlock) {
-        Res.push_back({BB});
+      // Found a cycle through the final block.
+      if (Succ == ToBB) {
+        PathType NewPath;
+        NewPath.push_back(BB);
+        NewPath.push_back(ToBB);
+        Res.push_back(NewPath);
         continue;
       }
 
@@ -627,11 +726,19 @@ struct AllSwitchPaths {
       if (Visited.contains(Succ))
         continue;
 
-      PathsType SuccPaths = paths(Succ, Visited, PathDepth + 1);
-      for (const PathType &Path : SuccPaths) {
-        PathType NewPath(Path);
-        NewPath.push_front(BB);
-        Res.push_back(NewPath);
+      auto *CurrLoop = LI->getLoopFor(BB);
+      // Unlikely to be beneficial.
+      if (Succ == CurrLoop->getHeader())
+        continue;
+      // Skip for now, revisit this condition later to see the impact on
+      // coverage and compile time.
+      if (LI->getLoopFor(Succ) != CurrLoop)
+        continue;
+
+      PathsType SuccPaths = paths(Succ, ToBB, Visited, PathDepth + 1);
+      for (PathType &Path : SuccPaths) {
+        Path.push_front(BB);
+        Res.push_back(Path);
         if (Res.size() >= MaxNumPaths) {
           return Res;
         }
@@ -648,18 +755,9 @@ struct AllSwitchPaths {
   ///
   /// Return an empty map if unpredictable values encountered inside the basic
   /// blocks of \p LoopPaths.
-  StateDefMap getStateDefMap(const PathsType &LoopPaths) const {
+  StateDefMap getStateDefMap() const {
     StateDefMap Res;
-
-    // Basic blocks belonging to any of the loops around the switch statement.
-    SmallPtrSet<BasicBlock *, 16> LoopBBs;
-    for (const PathType &Path : LoopPaths) {
-      for (BasicBlock *BB : Path)
-        LoopBBs.insert(BB);
-    }
-
     Value *FirstDef = Switch->getOperand(0);
-
     assert(isa<PHINode>(FirstDef) && "The first definition must be a phi.");
 
     SmallVector<PHINode *, 8> Stack;
@@ -674,7 +772,7 @@ struct AllSwitchPaths {
 
       for (BasicBlock *IncomingBB : CurPhi->blocks()) {
         Value *Incoming = CurPhi->getIncomingValueForBlock(IncomingBB);
-        bool IsOutsideLoops = LoopBBs.count(IncomingBB) == 0;
+        bool IsOutsideLoops = !SwitchOuterLoop->contains(IncomingBB);
         if (Incoming == FirstDef || isa<ConstantInt>(Incoming) ||
             SeenValues.contains(Incoming) || IsOutsideLoops) {
           continue;
@@ -691,67 +789,13 @@ struct AllSwitchPaths {
     return Res;
   }
 
-  /// The determinator BB should precede the switch-defining BB.
-  ///
-  /// Otherwise, it is possible that the state defined in the determinator block
-  /// defines the state for the next iteration of the loop, rather than for the
-  /// current one.
-  ///
-  /// Currently supported paths:
-...
[truncated]

@dtcxzyw dtcxzyw requested a review from nikic June 20, 2024 03:39
@XChy
Copy link
Member

XChy commented Jun 29, 2024

I'm busy these days. May look into the patch next week. By the way, there is compile-time regression in ClamAV.

@UsmanNadeem
Copy link
Contributor Author

I'm busy these days. May look into the patch next week. By the way, there is compile-time regression in ClamAV.

I looked at the O3 config, ClamAV has 12.2% more blocks that are cloned with this patch. So that might be the reason.

@UsmanNadeem
Copy link
Contributor Author

Ping!

Copy link
Member

@XChy XChy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late reply. Look basically good to me.

Comment on lines 649 to 660
// Direct prececessor, just add to the path.
if (IncomingPhiDefBB == IncomingBB) {
std::vector<ThreadingPath> PredPaths =
getPathsFromStateDefMap(StateDef, IncomingPhi, VB);
for (ThreadingPath &Path : PredPaths) {
Path.push_back(PhiBB);
Res.push_back(std::move(Path));
}
continue;
}
// Not a direct prececessor, find intermediate paths to append to the
// existing path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we handle incomingPhiDefBB separately for direct predecessor and non-direct predecessor? Is there any performance impact when handling it generally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. Doing it separately will save some time, not sure how much.

Change-Id: I1205e80ff94ec3da400536ddbf3bedf270e7a6d0
@UsmanNadeem UsmanNadeem requested a review from XChy July 29, 2024 16:33
@UsmanNadeem
Copy link
Contributor Author

Ping!

Copy link
Member

@XChy XChy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@UsmanNadeem UsmanNadeem merged commit b167ada into llvm:main Aug 10, 2024
7 checks passed
@mikaelholmen
Copy link
Collaborator

Hi @UsmanNadeem

The following starts crashing with this patch:

opt "-passes=dfa-jump-threading" bbi-98332.ll -o /dev/null

It hits an assert like

opt: ../include/llvm/IR/Instructions.h:2679: llvm::Value *llvm::PHINode::getIncomingValueForBlock(const llvm::BasicBlock *) const: Assertion `Idx >= 0 && "Invalid basic block argument!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: build-all/bin/opt -passes=dfa-jump-threading bbi-98332.ll -o /dev/null
1.	Running pass "function(dfa-jump-threading)" on module "bbi-98332.ll"
2.	Running pass "dfa-jump-threading" on function "func_17"
 #0 0x00005608ae38fc27 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build-all/bin/opt+0x4142c27)
 #1 0x00005608ae38d70e llvm::sys::RunSignalHandlers() (build-all/bin/opt+0x414070e)
 #2 0x00005608ae39062f SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f377811ccf0 __restore_rt (/lib64/libpthread.so.0+0x12cf0)
 #4 0x00007f3775cd5acf raise (/lib64/libc.so.6+0x4eacf)
 #5 0x00007f3775ca8ea5 abort (/lib64/libc.so.6+0x21ea5)
 #6 0x00007f3775ca8d79 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d79)
 #7 0x00007f3775cce426 (/lib64/libc.so.6+0x47426)
 #8 0x00005608afa7b065 (anonymous namespace)::DFAJumpThreading::run(llvm::Function&) DFAJumpThreading.cpp:0:0
 #9 0x00005608afa73f8c llvm::DFAJumpThreadingPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build-all/bin/opt+0x5826f8c)
#10 0x00005608af70d9ed llvm::detail::PassModel<llvm::Function, llvm::DFAJumpThreadingPass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilderPipelines.cpp:0:0
#11 0x00005608ae598cfa llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build-all/bin/opt+0x434bcfa)
#12 0x00005608af70e41d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilderPipelines.cpp:0:0
#13 0x00005608ae59d891 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x4350891)
#14 0x00005608af70793d llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) PassBuilderPipelines.cpp:0:0
#15 0x00005608ae59798a llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x434a98a)
#16 0x00005608af6b118b llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) (build-all/bin/opt+0x546418b)
#17 0x00005608ae357c3d optMain (build-all/bin/opt+0x410ac3d)
#18 0x00007f3775cc1d85 __libc_start_main (/lib64/libc.so.6+0x3ad85)
#19 0x00005608ae3517ae _start (build-all/bin/opt+0x41047ae)
Abort (core dumped)

Maybe it's the dead basic block if.then449 that trips things over?

bbi-98332.ll.gz

@UsmanNadeem
Copy link
Contributor Author

Hi @UsmanNadeem

The following starts crashing with this patch:

opt "-passes=dfa-jump-threading" bbi-98332.ll -o /dev/null

It hits an assert like

opt: ../include/llvm/IR/Instructions.h:2679: llvm::Value *llvm::PHINode::getIncomingValueForBlock(const llvm::BasicBlock *) const: Assertion `Idx >= 0 && "Invalid basic block argument!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: build-all/bin/opt -passes=dfa-jump-threading bbi-98332.ll -o /dev/null
1.	Running pass "function(dfa-jump-threading)" on module "bbi-98332.ll"
2.	Running pass "dfa-jump-threading" on function "func_17"
 #0 0x00005608ae38fc27 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build-all/bin/opt+0x4142c27)
 #1 0x00005608ae38d70e llvm::sys::RunSignalHandlers() (build-all/bin/opt+0x414070e)
 #2 0x00005608ae39062f SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f377811ccf0 __restore_rt (/lib64/libpthread.so.0+0x12cf0)
 #4 0x00007f3775cd5acf raise (/lib64/libc.so.6+0x4eacf)
 #5 0x00007f3775ca8ea5 abort (/lib64/libc.so.6+0x21ea5)
 #6 0x00007f3775ca8d79 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d79)
 #7 0x00007f3775cce426 (/lib64/libc.so.6+0x47426)
 #8 0x00005608afa7b065 (anonymous namespace)::DFAJumpThreading::run(llvm::Function&) DFAJumpThreading.cpp:0:0
 #9 0x00005608afa73f8c llvm::DFAJumpThreadingPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build-all/bin/opt+0x5826f8c)
#10 0x00005608af70d9ed llvm::detail::PassModel<llvm::Function, llvm::DFAJumpThreadingPass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilderPipelines.cpp:0:0
#11 0x00005608ae598cfa llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build-all/bin/opt+0x434bcfa)
#12 0x00005608af70e41d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilderPipelines.cpp:0:0
#13 0x00005608ae59d891 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x4350891)
#14 0x00005608af70793d llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) PassBuilderPipelines.cpp:0:0
#15 0x00005608ae59798a llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x434a98a)
#16 0x00005608af6b118b llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) (build-all/bin/opt+0x546418b)
#17 0x00005608ae357c3d optMain (build-all/bin/opt+0x410ac3d)
#18 0x00007f3775cc1d85 __libc_start_main (/lib64/libc.so.6+0x3ad85)
#19 0x00005608ae3517ae _start (build-all/bin/opt+0x41047ae)
Abort (core dumped)

Maybe it's the dead basic block if.then449 that trips things over?

bbi-98332.ll.gz

Thanks for the reproducer, I'll take a look!

@mikaelholmen
Copy link
Collaborator

I've seen another crash with this patch: opt -passes=dfa-jump-threading bbi-98705.ll -o /dev/null
Results in:

opt: ../lib/Transforms/Scalar/DFAJumpThreading.cpp:332: void (anonymous namespace)::unfold(llvm::DomTreeUpdater *, llvm::LoopInfo *, (anonymous namespace)::SelectInstToUnfold, std::vector<SelectInstToUnfold> *, std::vector<BasicBlock *> *): Assertion `SI->use_empty() && "Select must be dead now"' failed.

@UsmanNadeem : how is debugging of these crashes (also mentioned in #106083) going?

bbi-98705.ll.gz

@mshockwave
Copy link
Member

Just want to leave a note that this patch caused performance regression (>2% increase on cycle counts and ~0.23% increase on dynamic instruction counts) in 400.perlbench for RISC-V. More specifically, after this patch DFAJumpThreading stops optimizing some cases in the hot function (i.e. S_regmatch).

UsmanNadeem added a commit to UsmanNadeem/llvm-project that referenced this pull request Dec 12, 2024
…values

After llvm#96127 landed, mshockwave reported that the pass was no longer
threading SPEC2006/perlbench.

After 96127 we started bailing out in `getStateDefMap` and rejecting the
transformation because one of the unpredictable values was coming from
inside the loop. There was no fundamental change in that function except
that we started calling `Loop->contains(IncomingBB)` instead of
`LoopBBs.count(IncomingBB)`. After some analysis I came to the
conclusion that even before 96127 we would reject the transformation if
we provided large enough limits on the path traversal (large enough so
that LoopBBs contained blocks corresponding to that unpredictable value).

In this patch I changed `getStateDefMap` to not terminate early on
finding an unpredictable value, this is because `getPathsFromStateDefMap`,
later, actually has checks to ensure that the final list of paths only
have predictable values. As a result we can now partially thread functions
like `negative6` in the tests that have some predictable paths.

This patch does not really have any compile-time impact on the test suite
without `-dfa-early-exit-heuristic=false` (early exit is enabled by
default).

Change-Id: Ie1633b370ed4a0eda8dea52650b40f6f66ef49a3
@apazos
Copy link
Contributor

apazos commented Dec 12, 2024

Thanks for reporting the regression @mshockwave, let us know if the new patch from Usman resolves the issue.

UsmanNadeem added a commit that referenced this pull request Dec 25, 2024
…values (#119774)

After #96127 landed, mshockwave reported that the pass was no longer
threading SPEC2006/perlbench.

After 96127 we started bailing out in `getStateDefMap` and rejecting the
transformation because one of the unpredictable values was coming from
inside the loop. There was no fundamental change in that function except
that we started calling `Loop->contains(IncomingBB)` instead of
`LoopBBs.count(IncomingBB)`. After some analysis I came to the
conclusion that even before 96127 we would reject the transformation if
we provided large enough limits on the path traversal (large enough so
that LoopBBs contained blocks corresponding to that unpredictable
value).

In this patch I changed `getStateDefMap` to not terminate early on
finding an unpredictable value, this is because
`getPathsFromStateDefMap`, later, actually has checks to ensure that the
final list of paths only have predictable values. As a result we can now
partially thread functions like `negative6` in the tests that have some
predictable paths.

This patch does not really have any compile-time impact on the test
suite without `-dfa-early-exit-heuristic=false` (early exit is enabled
by default).

Change-Id: Ie1633b370ed4a0eda8dea52650b40f6f66ef49a3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants