[DFAJumpThreading] Rewrite the way paths are enumerated #96127

UsmanNadeem · 2024-06-20T00:52:05Z

I tried to add a limit to number of blocks visited in the paths() function but even with a very high limit the transformation coverage was being reduced.

After looking at the code it seemed that the function was trying to create paths of the form SwitchBB...DeterminatorBB...SwitchPredecessor. This is inefficient because a lot of nodes in those paths (nodes before DeterminatorBB) would be irrelevant to the optimization. We only care about paths of the form DeterminatorBB_Pred DeterminatorBB...SwitchBB. This weeds out a lot of visited nodes.

In this patch I have added a hard limit to the number of nodes visited and changed the algorithm for path calculation. Primarily I am traversing the use-def chain for the PHI nodes that define the state. If we have a hole in the use-def chain (no immediate predecessors) then I call the paths() function.

I also had to the change the select instruction unfolding code to insert redundant one input PHIs to allow the use of the use-def chain in calculating the paths.

The test suite coverage with this patch (including a limit on nodes visited) is as follows:

Geomean diff:
  dfa-jump-threading.NumTransforms: +13.4%
  dfa-jump-threading.NumCloned: +34.1%
  dfa-jump-threading.NumPaths: -80.7%

Compile time effect vs baseline (pass enabled by default) is mostly positive: https://llvm-compile-time-tracker.com/compare.php?from=ad8705fda25f64dcfeb6264ac4d6bac36bee91ab&to=5a3af6ce7e852f0736f706b4a8663efad5bce6ea&stat=instructions:u

Change-Id: I0fba9e0f8aa079706f633089a8ccd4ecf57547ed

I tried to add a limit to number of blocks visited in the paths() function but even with a very high limit the transformation coverage was being reduced. After looking at the code it seemed that the function was trying to create paths of the form `SwitchBB...DeterminatorBB...SwitchPredecessor`. This is inefficient because a lot of nodes in those paths (nodes before DeterminatorBB) would be irrelevant to the optimization. We only care about paths of the form `DeterminatorBB_Pred DeterminatorBB...SwitchBB`. This weeds out a lot of visited nodes. In this patch I have added a hard limit to the number of nodes visited and changed the algorithm for path calculation. Primarily I am traversing the use-def chain for the PHI nodes that define the state. If we have a hole in the use-def chain (no immediate predecessors) then I call the paths() function. I also had to the change the select instruction unfolding code to insert redundant one input PHIs to allow the use of the use-def chain in calculating the paths. The test suite coverage with this patch (including a limit on nodes visited) is as follows: Geomean diff: dfa-jump-threading.NumTransforms: +13.4% dfa-jump-threading.NumCloned: +34.1% dfa-jump-threading.NumPaths: -80.7% Compile time effect vs baseline (pass enabled by default) is mostly positive: https://llvm-compile-time-tracker.com/compare.php?from=ad8705fda25f64dcfeb6264ac4d6bac36bee91ab&to=5a3af6ce7e852f0736f706b4a8663efad5bce6ea&stat=instructions:u Change-Id: I0fba9e0f8aa079706f633089a8ccd4ecf57547ed

llvmbot · 2024-06-20T00:52:35Z

@llvm/pr-subscribers-llvm-transforms

Author: Usman Nadeem (UsmanNadeem)

Changes

I tried to add a limit to number of blocks visited in the paths() function but even with a very high limit the transformation coverage was being reduced.

After looking at the code it seemed that the function was trying to create paths of the form SwitchBB...DeterminatorBB...SwitchPredecessor. This is inefficient because a lot of nodes in those paths (nodes before DeterminatorBB) would be irrelevant to the optimization. We only care about paths of the form DeterminatorBB_Pred DeterminatorBB...SwitchBB. This weeds out a lot of visited nodes.

In this patch I have added a hard limit to the number of nodes visited and changed the algorithm for path calculation. Primarily I am traversing the use-def chain for the PHI nodes that define the state. If we have a hole in the use-def chain (no immediate predecessors) then I call the paths() function.

I also had to the change the select instruction unfolding code to insert redundant one input PHIs to allow the use of the use-def chain in calculating the paths.

The test suite coverage with this patch (including a limit on nodes visited) is as follows:

Geomean diff:
  dfa-jump-threading.NumTransforms: +13.4%
  dfa-jump-threading.NumCloned: +34.1%
  dfa-jump-threading.NumPaths: -80.7%

Compile time effect vs baseline (pass enabled by default) is mostly positive: https://llvm-compile-time-tracker.com/compare.php?from=ad8705fda25f64dcfeb6264ac4d6bac36bee91ab&to=5a3af6ce7e852f0736f706b4a8663efad5bce6ea&stat=instructions:u

Change-Id: I0fba9e0f8aa079706f633089a8ccd4ecf57547ed

Patch is 63.80 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/96127.diff

5 Files Affected:

(modified) llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp (+255-207)
(modified) llvm/test/Transforms/DFAJumpThreading/dfa-jump-threading-analysis.ll (+18-26)
(modified) llvm/test/Transforms/DFAJumpThreading/dfa-jump-threading-transform.ll (+46-38)
(modified) llvm/test/Transforms/DFAJumpThreading/dfa-unfold-select.ll (+133-71)
(modified) llvm/test/Transforms/DFAJumpThreading/max-path-length.ll (+13-58)

diff --git a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
index 4371b821eae63..42900642edb2c 100644
--- a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
+++ b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
@@ -106,6 +106,12 @@ static cl::opt<unsigned> MaxPathLength(
     cl::desc("Max number of blocks searched to find a threading path"),
     cl::Hidden, cl::init(20));
 
+static cl::opt<unsigned> MaxNumVisitiedPaths(
+    "dfa-max-num-visited-paths",
+    cl::desc(
+        "Max number of blocks visited while enumerating paths around a switch"),
+    cl::Hidden, cl::init(2000));
+
 static cl::opt<unsigned>
     MaxNumPaths("dfa-max-num-paths",
                 cl::desc("Max number of paths enumerated around a switch"),
@@ -177,24 +183,6 @@ class DFAJumpThreading {
 
 namespace {
 
-/// Create a new basic block and sink \p SIToSink into it.
-void createBasicBlockAndSinkSelectInst(
-    DomTreeUpdater *DTU, SelectInst *SI, PHINode *SIUse, SelectInst *SIToSink,
-    BasicBlock *EndBlock, StringRef NewBBName, BasicBlock **NewBlock,
-    BranchInst **NewBranch, std::vector<SelectInstToUnfold> *NewSIsToUnfold,
-    std::vector<BasicBlock *> *NewBBs) {
-  assert(SIToSink->hasOneUse());
-  assert(NewBlock);
-  assert(NewBranch);
-  *NewBlock = BasicBlock::Create(SI->getContext(), NewBBName,
-                                 EndBlock->getParent(), EndBlock);
-  NewBBs->push_back(*NewBlock);
-  *NewBranch = BranchInst::Create(EndBlock, *NewBlock);
-  SIToSink->moveBefore(*NewBranch);
-  NewSIsToUnfold->push_back(SelectInstToUnfold(SIToSink, SIUse));
-  DTU->applyUpdates({{DominatorTree::Insert, *NewBlock, EndBlock}});
-}
-
 /// Unfold the select instruction held in \p SIToUnfold by replacing it with
 /// control flow.
 ///
@@ -212,89 +200,42 @@ void unfold(DomTreeUpdater *DTU, LoopInfo *LI, SelectInstToUnfold SIToUnfold,
   BranchInst *StartBlockTerm =
       dyn_cast<BranchInst>(StartBlock->getTerminator());
 
-  assert(StartBlockTerm && StartBlockTerm->isUnconditional());
+  assert(StartBlockTerm);
   assert(SI->hasOneUse());
 
-  // These are the new basic blocks for the conditional branch.
-  // At least one will become an actual new basic block.
-  BasicBlock *TrueBlock = nullptr;
-  BasicBlock *FalseBlock = nullptr;
-  BranchInst *TrueBranch = nullptr;
-  BranchInst *FalseBranch = nullptr;
-
-  // Sink select instructions to be able to unfold them later.
-  if (SelectInst *SIOp = dyn_cast<SelectInst>(SI->getTrueValue())) {
-    createBasicBlockAndSinkSelectInst(DTU, SI, SIUse, SIOp, EndBlock,
-                                      "si.unfold.true", &TrueBlock, &TrueBranch,
-                                      NewSIsToUnfold, NewBBs);
-  }
-  if (SelectInst *SIOp = dyn_cast<SelectInst>(SI->getFalseValue())) {
-    createBasicBlockAndSinkSelectInst(DTU, SI, SIUse, SIOp, EndBlock,
-                                      "si.unfold.false", &FalseBlock,
-                                      &FalseBranch, NewSIsToUnfold, NewBBs);
-  }
-
-  // If there was nothing to sink, then arbitrarily choose the 'false' side
-  // for a new input value to the PHI.
-  if (!TrueBlock && !FalseBlock) {
-    FalseBlock = BasicBlock::Create(SI->getContext(), "si.unfold.false",
-                                    EndBlock->getParent(), EndBlock);
-    NewBBs->push_back(FalseBlock);
-    BranchInst::Create(EndBlock, FalseBlock);
-    DTU->applyUpdates({{DominatorTree::Insert, FalseBlock, EndBlock}});
-  }
-
-  // Insert the real conditional branch based on the original condition.
-  // If we did not create a new block for one of the 'true' or 'false' paths
-  // of the condition, it means that side of the branch goes to the end block
-  // directly and the path originates from the start block from the point of
-  // view of the new PHI.
-  BasicBlock *TT = EndBlock;
-  BasicBlock *FT = EndBlock;
-  if (TrueBlock && FalseBlock) {
-    // A diamond.
-    TT = TrueBlock;
-    FT = FalseBlock;
-
-    // Update the phi node of SI.
-    SIUse->addIncoming(SI->getTrueValue(), TrueBlock);
-    SIUse->addIncoming(SI->getFalseValue(), FalseBlock);
-
-    // Update any other PHI nodes in EndBlock.
-    for (PHINode &Phi : EndBlock->phis()) {
-      if (&Phi != SIUse) {
-        Value *OrigValue = Phi.getIncomingValueForBlock(StartBlock);
-        Phi.addIncoming(OrigValue, TrueBlock);
-        Phi.addIncoming(OrigValue, FalseBlock);
-      }
-
-      // Remove incoming place of original StartBlock, which comes in a indirect
-      // way (through TrueBlock and FalseBlock) now.
-      Phi.removeIncomingValue(StartBlock, /* DeletePHIIfEmpty = */ false);
-    }
-  } else {
-    BasicBlock *NewBlock = nullptr;
+  if (StartBlockTerm->isUnconditional()) {
+    // Arbitrarily choose the 'false' side for a new input value to the PHI.
+    BasicBlock *NewBlock = BasicBlock::Create(
+        SI->getContext(), Twine(SI->getName(), ".si.unfold.false"),
+        EndBlock->getParent(), EndBlock);
+    NewBBs->push_back(NewBlock);
+    BranchInst::Create(EndBlock, NewBlock);
+    DTU->applyUpdates({{DominatorTree::Insert, NewBlock, EndBlock}});
+
+    // StartBlock
+    //   |  \
+    //   |  NewBlock
+    //   |  /
+    // EndBlock
     Value *SIOp1 = SI->getTrueValue();
     Value *SIOp2 = SI->getFalseValue();
 
-    // A triangle pointing right.
-    if (!TrueBlock) {
-      NewBlock = FalseBlock;
-      FT = FalseBlock;
-    }
-    // A triangle pointing left.
-    else {
-      NewBlock = TrueBlock;
-      TT = TrueBlock;
-      std::swap(SIOp1, SIOp2);
-    }
+    PHINode *NewPhi = PHINode::Create(SIUse->getType(), 1,
+                                      Twine(SIOp2->getName(), ".si.unfold.phi"),
+                                      NewBlock->getFirstInsertionPt());
+    NewPhi->addIncoming(SIOp2, StartBlock);
+
+    if (auto *OpSi = dyn_cast<SelectInst>(SIOp1))
+      NewSIsToUnfold->push_back(SelectInstToUnfold(OpSi, SIUse));
+    if (auto *OpSi = dyn_cast<SelectInst>(SIOp2))
+      NewSIsToUnfold->push_back(SelectInstToUnfold(OpSi, NewPhi));
 
     // Update the phi node of SI.
     for (unsigned Idx = 0; Idx < SIUse->getNumIncomingValues(); ++Idx) {
       if (SIUse->getIncomingBlock(Idx) == StartBlock)
         SIUse->setIncomingValue(Idx, SIOp1);
     }
-    SIUse->addIncoming(SIOp2, NewBlock);
+    SIUse->addIncoming(NewPhi, NewBlock);
 
     // Update any other PHI nodes in EndBlock.
     for (auto II = EndBlock->begin(); PHINode *Phi = dyn_cast<PHINode>(II);
@@ -302,11 +243,87 @@ void unfold(DomTreeUpdater *DTU, LoopInfo *LI, SelectInstToUnfold SIToUnfold,
       if (Phi != SIUse)
         Phi->addIncoming(Phi->getIncomingValueForBlock(StartBlock), NewBlock);
     }
+
+    StartBlockTerm->eraseFromParent();
+
+    // Insert the real conditional branch based on the original condition.
+    BranchInst::Create(EndBlock, NewBlock, SI->getCondition(), StartBlock);
+    DTU->applyUpdates({{DominatorTree::Insert, StartBlock, EndBlock},
+                       {DominatorTree::Insert, StartBlock, NewBlock}});
+  } else {
+    BasicBlock *NewBlockT = BasicBlock::Create(
+        SI->getContext(), Twine(SI->getName(), ".si.unfold.true"),
+        EndBlock->getParent(), EndBlock);
+    BasicBlock *NewBlockF = BasicBlock::Create(
+        SI->getContext(), Twine(SI->getName(), ".si.unfold.false"),
+        EndBlock->getParent(), EndBlock);
+
+    NewBBs->push_back(NewBlockT);
+    NewBBs->push_back(NewBlockF);
+
+    // Def only has one use in EndBlock.
+    // Before transformation:
+    // StartBlock(Def)
+    //   |      \
+    // EndBlock  OtherBlock
+    //  (Use)
+    //
+    // After transformation:
+    // StartBlock(Def)
+    //   |      \
+    //   |       OtherBlock
+    // NewBlockT
+    //   |     \
+    //   |   NewBlockF
+    //   |      /
+    //   |     /
+    // EndBlock
+    //  (Use)
+    BranchInst::Create(EndBlock, NewBlockF);
+    // Insert the real conditional branch based on the original condition.
+    BranchInst::Create(EndBlock, NewBlockF, SI->getCondition(), NewBlockT);
+    DTU->applyUpdates({{DominatorTree::Insert, NewBlockT, NewBlockF},
+                       {DominatorTree::Insert, NewBlockT, EndBlock},
+                       {DominatorTree::Insert, NewBlockF, EndBlock}});
+
+    Value *TrueVal = SI->getTrueValue();
+    Value *FalseVal = SI->getFalseValue();
+
+    PHINode *NewPhiT = PHINode::Create(
+        SIUse->getType(), 1, Twine(TrueVal->getName(), ".si.unfold.phi"),
+        NewBlockT->getFirstInsertionPt());
+    PHINode *NewPhiF = PHINode::Create(
+        SIUse->getType(), 1, Twine(FalseVal->getName(), ".si.unfold.phi"),
+        NewBlockF->getFirstInsertionPt());
+    NewPhiT->addIncoming(TrueVal, StartBlock);
+    NewPhiF->addIncoming(FalseVal, NewBlockT);
+
+    if (auto *TrueSI = dyn_cast<SelectInst>(TrueVal))
+      NewSIsToUnfold->push_back(SelectInstToUnfold(TrueSI, NewPhiT));
+    if (auto *FalseSi = dyn_cast<SelectInst>(FalseVal))
+      NewSIsToUnfold->push_back(SelectInstToUnfold(FalseSi, NewPhiF));
+
+    SIUse->addIncoming(NewPhiT, NewBlockT);
+    SIUse->addIncoming(NewPhiF, NewBlockF);
+    SIUse->removeIncomingValue(StartBlock);
+
+    // Update any other PHI nodes in EndBlock.
+    for (auto II = EndBlock->begin(); PHINode *Phi = dyn_cast<PHINode>(II);
+         ++II) {
+      if (Phi != SIUse) {
+        Phi->addIncoming(Phi->getIncomingValueForBlock(StartBlock), NewBlockT);
+        Phi->addIncoming(Phi->getIncomingValueForBlock(StartBlock), NewBlockF);
+        Phi->removeIncomingValue(StartBlock);
+      }
+    }
+
+    // Update the appropriate successor of the start block to point to the new
+    // unfolded block.
+    unsigned SuccNum = StartBlockTerm->getSuccessor(1) == EndBlock ? 1 : 0;
+    StartBlockTerm->setSuccessor(SuccNum, NewBlockT);
+    DTU->applyUpdates({{DominatorTree::Delete, StartBlock, EndBlock},
+                       {DominatorTree::Insert, StartBlock, NewBlockT}});
   }
-  StartBlockTerm->eraseFromParent();
-  BranchInst::Create(TT, FT, SI->getCondition(), StartBlock);
-  DTU->applyUpdates({{DominatorTree::Insert, StartBlock, TT},
-                     {DominatorTree::Insert, StartBlock, FT}});
 
   // Preserve loop info
   if (Loop *L = LI->getLoopFor(SI->getParent())) {
@@ -372,6 +389,11 @@ struct ThreadingPath {
   /// Path is a list of basic blocks.
   const PathType &getPath() const { return Path; }
   void setPath(const PathType &NewPath) { Path = NewPath; }
+  void push_back(BasicBlock *BB) { Path.push_back(BB); }
+  void push_front(BasicBlock *BB) { Path.push_front(BB); }
+  void appendExcludingFirst(const PathType &OtherPath) {
+    Path.insert(Path.end(), OtherPath.begin() + 1, OtherPath.end());
+  }
 
   void print(raw_ostream &OS) const {
     OS << Path << " [ " << ExitVal << ", " << DBB->getName() << " ]";
@@ -530,9 +552,9 @@ struct MainSwitch {
 
 struct AllSwitchPaths {
   AllSwitchPaths(const MainSwitch *MSwitch, OptimizationRemarkEmitter *ORE,
-                 LoopInfo *LI)
+                 LoopInfo *LI, Loop *L)
       : Switch(MSwitch->getInstr()), SwitchBlock(Switch->getParent()), ORE(ORE),
-        LI(LI) {}
+        LI(LI), SwitchOuterLoop(L) {}
 
   std::vector<ThreadingPath> &getThreadingPaths() { return TPaths; }
   unsigned getNumThreadingPaths() { return TPaths.size(); }
@@ -540,10 +562,7 @@ struct AllSwitchPaths {
   BasicBlock *getSwitchBlock() { return SwitchBlock; }
 
   void run() {
-    VisitedBlocks Visited;
-    PathsType LoopPaths = paths(SwitchBlock, Visited, /* PathDepth = */ 1);
-    StateDefMap StateDef = getStateDefMap(LoopPaths);
-
+    StateDefMap StateDef = getStateDefMap();
     if (StateDef.empty()) {
       ORE->emit([&]() {
         return OptimizationRemarkMissed(DEBUG_TYPE, "SwitchNotPredictable",
@@ -553,42 +572,118 @@ struct AllSwitchPaths {
       return;
     }
 
-    for (const PathType &Path : LoopPaths) {
-      ThreadingPath TPath;
-
-      const BasicBlock *PrevBB = Path.back();
-      for (const BasicBlock *BB : Path) {
-        if (StateDef.contains(BB)) {
-          const PHINode *Phi = dyn_cast<PHINode>(StateDef[BB]);
-          assert(Phi && "Expected a state-defining instr to be a phi node.");
-
-          const Value *V = Phi->getIncomingValueForBlock(PrevBB);
-          if (const ConstantInt *C = dyn_cast<const ConstantInt>(V)) {
-            TPath.setExitValue(C);
-            TPath.setDeterminator(BB);
-            TPath.setPath(Path);
-          }
-        }
+    auto *SwitchPhi = cast<PHINode>(Switch->getOperand(0));
+    auto *SwitchPhiDefBB = SwitchPhi->getParent();
+    VisitedBlocks VB;
+    // Get paths from the determinator BBs to SwitchPhiDefBB
+    std::vector<ThreadingPath> PathsToPhiDef =
+        getPathsFromStateDefMap(StateDef, SwitchPhi, VB);
+    if (SwitchPhiDefBB == SwitchBlock) {
+      TPaths = std::move(PathsToPhiDef);
+      return;
+    }
 
-        // Switch block is the determinator, this is the final exit value.
-        if (TPath.isExitValueSet() && BB == Path.front())
-          break;
+    // Find and append paths from SwitchPhiDefBB to SwitchBlock.
+    PathsType PathsToSwitchBB =
+        paths(SwitchPhiDefBB, SwitchBlock, VB, /* PathDepth = */ 1);
+    if (PathsToSwitchBB.empty())
+      return;
 
-        PrevBB = BB;
+    std::vector<ThreadingPath> TempList;
+    for (const ThreadingPath &Path : PathsToPhiDef) {
+      for (const PathType &PathToSw : PathsToSwitchBB) {
+        ThreadingPath PathCopy(Path);
+        PathCopy.appendExcludingFirst(PathToSw);
+        TempList.push_back(PathCopy);
       }
-
-      if (TPath.isExitValueSet() && isSupported(TPath))
-        TPaths.push_back(TPath);
     }
+    TPaths = std::move(TempList);
   }
 
 private:
   // Value: an instruction that defines a switch state;
   // Key: the parent basic block of that instruction.
   typedef DenseMap<const BasicBlock *, const PHINode *> StateDefMap;
+  std::vector<ThreadingPath> getPathsFromStateDefMap(StateDefMap &StateDef,
+                                                     PHINode *Phi,
+                                                     VisitedBlocks &VB) {
+    std::vector<ThreadingPath> Res;
+    auto *PhiBB = Phi->getParent();
+    VB.insert(PhiBB);
+
+    VisitedBlocks UniqueBlocks;
+    for (auto *IncomingBB : Phi->blocks()) {
+      if (!UniqueBlocks.insert(IncomingBB).second)
+        continue;
+      if (!SwitchOuterLoop->contains(IncomingBB))
+        continue;
+
+      Value *IncomingValue = Phi->getIncomingValueForBlock(IncomingBB);
+      // We found the determinator. The is the start of our path.
+      if (auto *C = dyn_cast<ConstantInt>(IncomingValue)) {
+        // SwitchBlock is the determinator, unsupported unless its also the def.
+        if (PhiBB == SwitchBlock &&
+            SwitchBlock != cast<PHINode>(Switch->getOperand(0))->getParent())
+          continue;
+        ThreadingPath NewPath;
+        NewPath.setDeterminator(PhiBB);
+        NewPath.setExitValue(C);
+        // Don't add SwitchBlock at the start, this is handled later.
+        if (IncomingBB != SwitchBlock)
+          NewPath.push_back(IncomingBB);
+        NewPath.push_back(PhiBB);
+        Res.push_back(NewPath);
+        continue;
+      }
+      // Don't get into a cycle.
+      if (VB.contains(IncomingBB) || IncomingBB == SwitchBlock)
+        continue;
+      // Recurse up the PHI chain.
+      auto *IncomingPhi = dyn_cast<PHINode>(IncomingValue);
+      if (!IncomingPhi)
+        continue;
+      auto *IncomingPhiDefBB = IncomingPhi->getParent();
+      if (!StateDef.contains(IncomingPhiDefBB))
+        continue;
+
+      // Direct prececessor, just add to the path.
+      if (IncomingPhiDefBB == IncomingBB) {
+        std::vector<ThreadingPath> PredPaths =
+            getPathsFromStateDefMap(StateDef, IncomingPhi, VB);
+        for (ThreadingPath &Path : PredPaths) {
+          Path.push_back(PhiBB);
+          Res.push_back(std::move(Path));
+        }
+        continue;
+      }
+      // Not a direct prececessor, find intermediate paths to append to the
+      // existing path.
+      if (VB.contains(IncomingPhiDefBB))
+        continue;
 
-  PathsType paths(BasicBlock *BB, VisitedBlocks &Visited,
-                  unsigned PathDepth) const {
+      PathsType IntermediatePaths;
+      IntermediatePaths =
+          paths(IncomingPhiDefBB, IncomingBB, VB, /* PathDepth = */ 1);
+      if (IntermediatePaths.empty())
+        continue;
+
+      std::vector<ThreadingPath> PredPaths =
+          getPathsFromStateDefMap(StateDef, IncomingPhi, VB);
+      for (const ThreadingPath &Path : PredPaths) {
+        for (const PathType &IPath : IntermediatePaths) {
+          ThreadingPath NewPath(Path);
+          NewPath.appendExcludingFirst(IPath);
+          NewPath.push_back(PhiBB);
+          Res.push_back(NewPath);
+        }
+      }
+    }
+    VB.erase(PhiBB);
+    return Res;
+  }
+
+  PathsType paths(BasicBlock *BB, BasicBlock *ToBB, VisitedBlocks &Visited,
+                  unsigned PathDepth) {
     PathsType Res;
 
     // Stop exploring paths after visiting MaxPathLength blocks
@@ -603,11 +698,12 @@ struct AllSwitchPaths {
     }
 
     Visited.insert(BB);
+    if (++NumVisited > MaxNumVisitiedPaths)
+      return Res;
 
     // Stop if we have reached the BB out of loop, since its successors have no
     // impact on the DFA.
-    // TODO: Do we need to stop exploring if BB is the outer loop of the switch?
-    if (!LI->getLoopFor(BB))
+    if (!SwitchOuterLoop->contains(BB))
       return Res;
 
     // Some blocks have multiple edges to the same successor, and this set
@@ -617,9 +713,12 @@ struct AllSwitchPaths {
       if (!Successors.insert(Succ).second)
         continue;
 
-      // Found a cycle through the SwitchBlock
-      if (Succ == SwitchBlock) {
-        Res.push_back({BB});
+      // Found a cycle through the final block.
+      if (Succ == ToBB) {
+        PathType NewPath;
+        NewPath.push_back(BB);
+        NewPath.push_back(ToBB);
+        Res.push_back(NewPath);
         continue;
       }
 
@@ -627,11 +726,19 @@ struct AllSwitchPaths {
       if (Visited.contains(Succ))
         continue;
 
-      PathsType SuccPaths = paths(Succ, Visited, PathDepth + 1);
-      for (const PathType &Path : SuccPaths) {
-        PathType NewPath(Path);
-        NewPath.push_front(BB);
-        Res.push_back(NewPath);
+      auto *CurrLoop = LI->getLoopFor(BB);
+      // Unlikely to be beneficial.
+      if (Succ == CurrLoop->getHeader())
+        continue;
+      // Skip for now, revisit this condition later to see the impact on
+      // coverage and compile time.
+      if (LI->getLoopFor(Succ) != CurrLoop)
+        continue;
+
+      PathsType SuccPaths = paths(Succ, ToBB, Visited, PathDepth + 1);
+      for (PathType &Path : SuccPaths) {
+        Path.push_front(BB);
+        Res.push_back(Path);
         if (Res.size() >= MaxNumPaths) {
           return Res;
         }
@@ -648,18 +755,9 @@ struct AllSwitchPaths {
   ///
   /// Return an empty map if unpredictable values encountered inside the basic
   /// blocks of \p LoopPaths.
-  StateDefMap getStateDefMap(const PathsType &LoopPaths) const {
+  StateDefMap getStateDefMap() const {
     StateDefMap Res;
-
-    // Basic blocks belonging to any of the loops around the switch statement.
-    SmallPtrSet<BasicBlock *, 16> LoopBBs;
-    for (const PathType &Path : LoopPaths) {
-      for (BasicBlock *BB : Path)
-        LoopBBs.insert(BB);
-    }
-
     Value *FirstDef = Switch->getOperand(0);
-
     assert(isa<PHINode>(FirstDef) && "The first definition must be a phi.");
 
     SmallVector<PHINode *, 8> Stack;
@@ -674,7 +772,7 @@ struct AllSwitchPaths {
 
       for (BasicBlock *IncomingBB : CurPhi->blocks()) {
         Value *Incoming = CurPhi->getIncomingValueForBlock(IncomingBB);
-        bool IsOutsideLoops = LoopBBs.count(IncomingBB) == 0;
+        bool IsOutsideLoops = !SwitchOuterLoop->contains(IncomingBB);
         if (Incoming == FirstDef || isa<ConstantInt>(Incoming) ||
             SeenValues.contains(Incoming) || IsOutsideLoops) {
           continue;
@@ -691,67 +789,13 @@ struct AllSwitchPaths {
     return Res;
   }
 
-  /// The determinator BB should precede the switch-defining BB.
-  ///
-  /// Otherwise, it is possible that the state defined in the determinator block
-  /// defines the state for the next iteration of the loop, rather than for the
-  /// current one.
-  ///
-  /// Currently supported paths:
-...
[truncated]

XChy · 2024-06-29T09:19:57Z

I'm busy these days. May look into the patch next week. By the way, there is compile-time regression in ClamAV.

UsmanNadeem · 2024-07-01T16:33:42Z

I'm busy these days. May look into the patch next week. By the way, there is compile-time regression in ClamAV.

I looked at the O3 config, ClamAV has 12.2% more blocks that are cloned with this patch. So that might be the reason.

UsmanNadeem · 2024-07-17T07:01:14Z

Ping!

XChy

Sorry for late reply. Look basically good to me.

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp

XChy · 2024-07-19T17:16:02Z

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp

+      // Direct prececessor, just add to the path.
+      if (IncomingPhiDefBB == IncomingBB) {
+        std::vector<ThreadingPath> PredPaths =
+            getPathsFromStateDefMap(StateDef, IncomingPhi, VB);
+        for (ThreadingPath &Path : PredPaths) {
+          Path.push_back(PhiBB);
+          Res.push_back(std::move(Path));
+        }
+        continue;
+      }
+      // Not a direct prececessor, find intermediate paths to append to the
+      // existing path.


Why do we handle incomingPhiDefBB separately for direct predecessor and non-direct predecessor? Is there any performance impact when handling it generally?

I think so. Doing it separately will save some time, not sure how much.

llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp

Change-Id: I1205e80ff94ec3da400536ddbf3bedf270e7a6d0

UsmanNadeem · 2024-08-02T14:15:37Z

Ping!

XChy

LGTM.

mikaelholmen · 2024-08-22T06:11:44Z

Hi @UsmanNadeem

The following starts crashing with this patch:

opt "-passes=dfa-jump-threading" bbi-98332.ll -o /dev/null

It hits an assert like

opt: ../include/llvm/IR/Instructions.h:2679: llvm::Value *llvm::PHINode::getIncomingValueForBlock(const llvm::BasicBlock *) const: Assertion `Idx >= 0 && "Invalid basic block argument!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: build-all/bin/opt -passes=dfa-jump-threading bbi-98332.ll -o /dev/null
1.	Running pass "function(dfa-jump-threading)" on module "bbi-98332.ll"
2.	Running pass "dfa-jump-threading" on function "func_17"
 #0 0x00005608ae38fc27 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build-all/bin/opt+0x4142c27)
 #1 0x00005608ae38d70e llvm::sys::RunSignalHandlers() (build-all/bin/opt+0x414070e)
 #2 0x00005608ae39062f SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f377811ccf0 __restore_rt (/lib64/libpthread.so.0+0x12cf0)
 #4 0x00007f3775cd5acf raise (/lib64/libc.so.6+0x4eacf)
 #5 0x00007f3775ca8ea5 abort (/lib64/libc.so.6+0x21ea5)
 #6 0x00007f3775ca8d79 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d79)
 #7 0x00007f3775cce426 (/lib64/libc.so.6+0x47426)
 #8 0x00005608afa7b065 (anonymous namespace)::DFAJumpThreading::run(llvm::Function&) DFAJumpThreading.cpp:0:0
 #9 0x00005608afa73f8c llvm::DFAJumpThreadingPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build-all/bin/opt+0x5826f8c)
#10 0x00005608af70d9ed llvm::detail::PassModel<llvm::Function, llvm::DFAJumpThreadingPass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilderPipelines.cpp:0:0
#11 0x00005608ae598cfa llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build-all/bin/opt+0x434bcfa)
#12 0x00005608af70e41d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilderPipelines.cpp:0:0
#13 0x00005608ae59d891 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x4350891)
#14 0x00005608af70793d llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) PassBuilderPipelines.cpp:0:0
#15 0x00005608ae59798a llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x434a98a)
#16 0x00005608af6b118b llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) (build-all/bin/opt+0x546418b)
#17 0x00005608ae357c3d optMain (build-all/bin/opt+0x410ac3d)
#18 0x00007f3775cc1d85 __libc_start_main (/lib64/libc.so.6+0x3ad85)
#19 0x00005608ae3517ae _start (build-all/bin/opt+0x41047ae)
Abort (core dumped)

Maybe it's the dead basic block if.then449 that trips things over?

bbi-98332.ll.gz

UsmanNadeem · 2024-08-22T20:07:39Z

Hi @UsmanNadeem

The following starts crashing with this patch:

opt "-passes=dfa-jump-threading" bbi-98332.ll -o /dev/null

It hits an assert like

opt: ../include/llvm/IR/Instructions.h:2679: llvm::Value *llvm::PHINode::getIncomingValueForBlock(const llvm::BasicBlock *) const: Assertion `Idx >= 0 && "Invalid basic block argument!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: build-all/bin/opt -passes=dfa-jump-threading bbi-98332.ll -o /dev/null
1.	Running pass "function(dfa-jump-threading)" on module "bbi-98332.ll"
2.	Running pass "dfa-jump-threading" on function "func_17"
 #0 0x00005608ae38fc27 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build-all/bin/opt+0x4142c27)
 #1 0x00005608ae38d70e llvm::sys::RunSignalHandlers() (build-all/bin/opt+0x414070e)
 #2 0x00005608ae39062f SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f377811ccf0 __restore_rt (/lib64/libpthread.so.0+0x12cf0)
 #4 0x00007f3775cd5acf raise (/lib64/libc.so.6+0x4eacf)
 #5 0x00007f3775ca8ea5 abort (/lib64/libc.so.6+0x21ea5)
 #6 0x00007f3775ca8d79 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d79)
 #7 0x00007f3775cce426 (/lib64/libc.so.6+0x47426)
 #8 0x00005608afa7b065 (anonymous namespace)::DFAJumpThreading::run(llvm::Function&) DFAJumpThreading.cpp:0:0
 #9 0x00005608afa73f8c llvm::DFAJumpThreadingPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build-all/bin/opt+0x5826f8c)
#10 0x00005608af70d9ed llvm::detail::PassModel<llvm::Function, llvm::DFAJumpThreadingPass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilderPipelines.cpp:0:0
#11 0x00005608ae598cfa llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (build-all/bin/opt+0x434bcfa)
#12 0x00005608af70e41d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilderPipelines.cpp:0:0
#13 0x00005608ae59d891 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x4350891)
#14 0x00005608af70793d llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) PassBuilderPipelines.cpp:0:0
#15 0x00005608ae59798a llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (build-all/bin/opt+0x434a98a)
#16 0x00005608af6b118b llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) (build-all/bin/opt+0x546418b)
#17 0x00005608ae357c3d optMain (build-all/bin/opt+0x410ac3d)
#18 0x00007f3775cc1d85 __libc_start_main (/lib64/libc.so.6+0x3ad85)
#19 0x00005608ae3517ae _start (build-all/bin/opt+0x41047ae)
Abort (core dumped)

Maybe it's the dead basic block if.then449 that trips things over?

bbi-98332.ll.gz

Thanks for the reproducer, I'll take a look!

mikaelholmen · 2024-09-02T13:46:10Z

I've seen another crash with this patch: opt -passes=dfa-jump-threading bbi-98705.ll -o /dev/null
Results in:

opt: ../lib/Transforms/Scalar/DFAJumpThreading.cpp:332: void (anonymous namespace)::unfold(llvm::DomTreeUpdater *, llvm::LoopInfo *, (anonymous namespace)::SelectInstToUnfold, std::vector<SelectInstToUnfold> *, std::vector<BasicBlock *> *): Assertion `SI->use_empty() && "Select must be dead now"' failed.

@UsmanNadeem : how is debugging of these crashes (also mentioned in #106083) going?

bbi-98705.ll.gz

mshockwave · 2024-10-29T18:11:57Z

Just want to leave a note that this patch caused performance regression (>2% increase on cycle counts and ~0.23% increase on dynamic instruction counts) in 400.perlbench for RISC-V. More specifically, after this patch DFAJumpThreading stops optimizing some cases in the hot function (i.e. S_regmatch).

…values After llvm#96127 landed, mshockwave reported that the pass was no longer threading SPEC2006/perlbench. After 96127 we started bailing out in `getStateDefMap` and rejecting the transformation because one of the unpredictable values was coming from inside the loop. There was no fundamental change in that function except that we started calling `Loop->contains(IncomingBB)` instead of `LoopBBs.count(IncomingBB)`. After some analysis I came to the conclusion that even before 96127 we would reject the transformation if we provided large enough limits on the path traversal (large enough so that LoopBBs contained blocks corresponding to that unpredictable value). In this patch I changed `getStateDefMap` to not terminate early on finding an unpredictable value, this is because `getPathsFromStateDefMap`, later, actually has checks to ensure that the final list of paths only have predictable values. As a result we can now partially thread functions like `negative6` in the tests that have some predictable paths. This patch does not really have any compile-time impact on the test suite without `-dfa-early-exit-heuristic=false` (early exit is enabled by default). Change-Id: Ie1633b370ed4a0eda8dea52650b40f6f66ef49a3

apazos · 2024-12-12T21:59:43Z

Thanks for reporting the regression @mshockwave, let us know if the new patch from Usman resolves the issue.

…values (#119774) After #96127 landed, mshockwave reported that the pass was no longer threading SPEC2006/perlbench. After 96127 we started bailing out in `getStateDefMap` and rejecting the transformation because one of the unpredictable values was coming from inside the loop. There was no fundamental change in that function except that we started calling `Loop->contains(IncomingBB)` instead of `LoopBBs.count(IncomingBB)`. After some analysis I came to the conclusion that even before 96127 we would reject the transformation if we provided large enough limits on the path traversal (large enough so that LoopBBs contained blocks corresponding to that unpredictable value). In this patch I changed `getStateDefMap` to not terminate early on finding an unpredictable value, this is because `getPathsFromStateDefMap`, later, actually has checks to ensure that the final list of paths only have predictable values. As a result we can now partially thread functions like `negative6` in the tests that have some predictable paths. This patch does not really have any compile-time impact on the test suite without `-dfa-early-exit-heuristic=false` (early exit is enabled by default). Change-Id: Ie1633b370ed4a0eda8dea52650b40f6f66ef49a3

UsmanNadeem requested a review from XChy June 20, 2024 00:52

llvmbot added the llvm:transforms label Jun 20, 2024

dtcxzyw requested a review from nikic June 20, 2024 03:39

UsmanNadeem requested a review from efriedma-quic June 28, 2024 23:29

XChy reviewed Jul 19, 2024

View reviewed changes

Address comments

a8dad2c

Change-Id: I1205e80ff94ec3da400536ddbf3bedf270e7a6d0

UsmanNadeem requested a review from XChy July 29, 2024 16:33

XChy approved these changes Aug 8, 2024

View reviewed changes

UsmanNadeem merged commit b167ada into llvm:main Aug 10, 2024
7 checks passed

JonPsson1 mentioned this pull request Aug 26, 2024

[DFAJumpThreading] crash since b167ada #106083

Closed

UsmanNadeem mentioned this pull request Aug 28, 2024

[DFAJumpThreading] Enable the pass by default #83033

Open

mikaelholmen mentioned this pull request Sep 23, 2024

[DFAJumpThreading] Handle select unfolding when user phi is not a dir… #109511

Merged

UsmanNadeem mentioned this pull request Dec 12, 2024

[DFAJumpThreading] Don't bail early after encountering unpredictable values #119774

Merged

[DFAJumpThreading] Rewrite the way paths are enumerated #96127

[DFAJumpThreading] Rewrite the way paths are enumerated #96127

Uh oh!

Conversation

UsmanNadeem commented Jun 20, 2024

Uh oh!

llvmbot commented Jun 20, 2024

Uh oh!

XChy commented Jun 29, 2024

Uh oh!

UsmanNadeem commented Jul 1, 2024

Uh oh!

UsmanNadeem commented Jul 17, 2024

Uh oh!

XChy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

XChy Jul 19, 2024

Choose a reason for hiding this comment

Uh oh!

UsmanNadeem Jul 25, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

UsmanNadeem commented Aug 2, 2024

Uh oh!

XChy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mikaelholmen commented Aug 22, 2024

Uh oh!

UsmanNadeem commented Aug 22, 2024

Uh oh!

mikaelholmen commented Sep 2, 2024

Uh oh!

mshockwave commented Oct 29, 2024

Uh oh!

apazos commented Dec 12, 2024

Uh oh!

Uh oh!