Skip to content

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Feb 23, 2025

This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform.

The main logic to perform predication is ready to review, although there are few things to note that should be improved, either directly in the PR or in the future:

  • Edge and block masks are cached in VPPredicator, but the block masks are still made available via VPRecipeBuilder, so they can be accessed during recipe construction. As a follow-up, this should be replaced by adding mask operands to all VPInstructions that need them and use that during recipe construction.
  • The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands.

@llvmbot
Copy link
Member

llvmbot commented Feb 23, 2025

@llvm/pr-subscribers-vectorizers

Author: Florian Hahn (fhahn)

Changes

This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform.

The main logic to perform predication is ready to review, although there are few things to note that should be improved, either directly in the PR or in the future:

  • Edge and block masks are cached in VPRecipeBuilder, so they can be accessed during recipe construction. A better alternative may be to add mask operands to all VPInstructions that need them and use that during recipe construction
  • The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands.

Currently this is still WIP due to early-exit loop handling not working due to the exit conditions not being available in the initial VPlans. This will be fixed with #128419 and follow-ups

All tests except early-exit loops are passing


Patch is 38.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128420.diff

8 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/CMakeLists.txt (+1)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+27-259)
  • (modified) llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h (+18-27)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp (+13-11)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h (-12)
  • (added) llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp (+274)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+3-2)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+3)
diff --git a/llvm/lib/Transforms/Vectorize/CMakeLists.txt b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
index 38670ba304e53..74ae61440327c 100644
--- a/llvm/lib/Transforms/Vectorize/CMakeLists.txt
+++ b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
@@ -23,6 +23,7 @@ add_llvm_component_library(LLVMVectorize
   VPlan.cpp
   VPlanAnalysis.cpp
   VPlanHCFGBuilder.cpp
+  VPlanPredicator.cpp
   VPlanRecipes.cpp
   VPlanSLP.cpp
   VPlanTransforms.cpp
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index ced01df7b0d44..a2e20a701d612 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8115,185 +8115,6 @@ void EpilogueVectorizerEpilogueLoop::printDebugTracesAtEnd() {
   });
 }
 
-void VPRecipeBuilder::createSwitchEdgeMasks(SwitchInst *SI) {
-  BasicBlock *Src = SI->getParent();
-  assert(!OrigLoop->isLoopExiting(Src) &&
-         all_of(successors(Src),
-                [this](BasicBlock *Succ) {
-                  return OrigLoop->getHeader() != Succ;
-                }) &&
-         "unsupported switch either exiting loop or continuing to header");
-  // Create masks where the terminator in Src is a switch. We create mask for
-  // all edges at the same time. This is more efficient, as we can create and
-  // collect compares for all cases once.
-  VPValue *Cond = getVPValueOrAddLiveIn(SI->getCondition());
-  BasicBlock *DefaultDst = SI->getDefaultDest();
-  MapVector<BasicBlock *, SmallVector<VPValue *>> Dst2Compares;
-  for (auto &C : SI->cases()) {
-    BasicBlock *Dst = C.getCaseSuccessor();
-    assert(!EdgeMaskCache.contains({Src, Dst}) && "Edge masks already created");
-    // Cases whose destination is the same as default are redundant and can be
-    // ignored - they will get there anyhow.
-    if (Dst == DefaultDst)
-      continue;
-    auto &Compares = Dst2Compares[Dst];
-    VPValue *V = getVPValueOrAddLiveIn(C.getCaseValue());
-    Compares.push_back(Builder.createICmp(CmpInst::ICMP_EQ, Cond, V));
-  }
-
-  // We need to handle 2 separate cases below for all entries in Dst2Compares,
-  // which excludes destinations matching the default destination.
-  VPValue *SrcMask = getBlockInMask(Src);
-  VPValue *DefaultMask = nullptr;
-  for (const auto &[Dst, Conds] : Dst2Compares) {
-    // 1. Dst is not the default destination. Dst is reached if any of the cases
-    // with destination == Dst are taken. Join the conditions for each case
-    // whose destination == Dst using an OR.
-    VPValue *Mask = Conds[0];
-    for (VPValue *V : ArrayRef<VPValue *>(Conds).drop_front())
-      Mask = Builder.createOr(Mask, V);
-    if (SrcMask)
-      Mask = Builder.createLogicalAnd(SrcMask, Mask);
-    EdgeMaskCache[{Src, Dst}] = Mask;
-
-    // 2. Create the mask for the default destination, which is reached if none
-    // of the cases with destination != default destination are taken. Join the
-    // conditions for each case where the destination is != Dst using an OR and
-    // negate it.
-    DefaultMask = DefaultMask ? Builder.createOr(DefaultMask, Mask) : Mask;
-  }
-
-  if (DefaultMask) {
-    DefaultMask = Builder.createNot(DefaultMask);
-    if (SrcMask)
-      DefaultMask = Builder.createLogicalAnd(SrcMask, DefaultMask);
-  }
-  EdgeMaskCache[{Src, DefaultDst}] = DefaultMask;
-}
-
-VPValue *VPRecipeBuilder::createEdgeMask(BasicBlock *Src, BasicBlock *Dst) {
-  assert(is_contained(predecessors(Dst), Src) && "Invalid edge");
-
-  // Look for cached value.
-  std::pair<BasicBlock *, BasicBlock *> Edge(Src, Dst);
-  EdgeMaskCacheTy::iterator ECEntryIt = EdgeMaskCache.find(Edge);
-  if (ECEntryIt != EdgeMaskCache.end())
-    return ECEntryIt->second;
-
-  if (auto *SI = dyn_cast<SwitchInst>(Src->getTerminator())) {
-    createSwitchEdgeMasks(SI);
-    assert(EdgeMaskCache.contains(Edge) && "Mask for Edge not created?");
-    return EdgeMaskCache[Edge];
-  }
-
-  VPValue *SrcMask = getBlockInMask(Src);
-
-  // The terminator has to be a branch inst!
-  BranchInst *BI = dyn_cast<BranchInst>(Src->getTerminator());
-  assert(BI && "Unexpected terminator found");
-  if (!BI->isConditional() || BI->getSuccessor(0) == BI->getSuccessor(1))
-    return EdgeMaskCache[Edge] = SrcMask;
-
-  // If source is an exiting block, we know the exit edge is dynamically dead
-  // in the vector loop, and thus we don't need to restrict the mask.  Avoid
-  // adding uses of an otherwise potentially dead instruction unless we are
-  // vectorizing a loop with uncountable exits. In that case, we always
-  // materialize the mask.
-  if (OrigLoop->isLoopExiting(Src) &&
-      Src != Legal->getUncountableEarlyExitingBlock())
-    return EdgeMaskCache[Edge] = SrcMask;
-
-  VPValue *EdgeMask = getVPValueOrAddLiveIn(BI->getCondition());
-  assert(EdgeMask && "No Edge Mask found for condition");
-
-  if (BI->getSuccessor(0) != Dst)
-    EdgeMask = Builder.createNot(EdgeMask, BI->getDebugLoc());
-
-  if (SrcMask) { // Otherwise block in-mask is all-one, no need to AND.
-    // The bitwise 'And' of SrcMask and EdgeMask introduces new UB if SrcMask
-    // is false and EdgeMask is poison. Avoid that by using 'LogicalAnd'
-    // instead which generates 'select i1 SrcMask, i1 EdgeMask, i1 false'.
-    EdgeMask = Builder.createLogicalAnd(SrcMask, EdgeMask, BI->getDebugLoc());
-  }
-
-  return EdgeMaskCache[Edge] = EdgeMask;
-}
-
-VPValue *VPRecipeBuilder::getEdgeMask(BasicBlock *Src, BasicBlock *Dst) const {
-  assert(is_contained(predecessors(Dst), Src) && "Invalid edge");
-
-  // Look for cached value.
-  std::pair<BasicBlock *, BasicBlock *> Edge(Src, Dst);
-  EdgeMaskCacheTy::const_iterator ECEntryIt = EdgeMaskCache.find(Edge);
-  assert(ECEntryIt != EdgeMaskCache.end() &&
-         "looking up mask for edge which has not been created");
-  return ECEntryIt->second;
-}
-
-void VPRecipeBuilder::createHeaderMask() {
-  BasicBlock *Header = OrigLoop->getHeader();
-
-  // When not folding the tail, use nullptr to model all-true mask.
-  if (!CM.foldTailByMasking()) {
-    BlockMaskCache[Header] = nullptr;
-    return;
-  }
-
-  // Introduce the early-exit compare IV <= BTC to form header block mask.
-  // This is used instead of IV < TC because TC may wrap, unlike BTC. Start by
-  // constructing the desired canonical IV in the header block as its first
-  // non-phi instructions.
-
-  VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock();
-  auto NewInsertionPoint = HeaderVPBB->getFirstNonPhi();
-  auto *IV = new VPWidenCanonicalIVRecipe(Plan.getCanonicalIV());
-  HeaderVPBB->insert(IV, NewInsertionPoint);
-
-  VPBuilder::InsertPointGuard Guard(Builder);
-  Builder.setInsertPoint(HeaderVPBB, NewInsertionPoint);
-  VPValue *BlockMask = nullptr;
-  VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();
-  BlockMask = Builder.createICmp(CmpInst::ICMP_ULE, IV, BTC);
-  BlockMaskCache[Header] = BlockMask;
-}
-
-VPValue *VPRecipeBuilder::getBlockInMask(BasicBlock *BB) const {
-  // Return the cached value.
-  BlockMaskCacheTy::const_iterator BCEntryIt = BlockMaskCache.find(BB);
-  assert(BCEntryIt != BlockMaskCache.end() &&
-         "Trying to access mask for block without one.");
-  return BCEntryIt->second;
-}
-
-void VPRecipeBuilder::createBlockInMask(BasicBlock *BB) {
-  assert(OrigLoop->contains(BB) && "Block is not a part of a loop");
-  assert(BlockMaskCache.count(BB) == 0 && "Mask for block already computed");
-  assert(OrigLoop->getHeader() != BB &&
-         "Loop header must have cached block mask");
-
-  // All-one mask is modelled as no-mask following the convention for masked
-  // load/store/gather/scatter. Initialize BlockMask to no-mask.
-  VPValue *BlockMask = nullptr;
-  // This is the block mask. We OR all unique incoming edges.
-  for (auto *Predecessor :
-       SetVector<BasicBlock *>(pred_begin(BB), pred_end(BB))) {
-    VPValue *EdgeMask = createEdgeMask(Predecessor, BB);
-    if (!EdgeMask) { // Mask of predecessor is all-one so mask of block is too.
-      BlockMaskCache[BB] = EdgeMask;
-      return;
-    }
-
-    if (!BlockMask) { // BlockMask has its initialized nullptr value.
-      BlockMask = EdgeMask;
-      continue;
-    }
-
-    BlockMask = Builder.createOr(BlockMask, EdgeMask, {});
-  }
-
-  BlockMaskCache[BB] = BlockMask;
-}
-
 VPWidenMemoryRecipe *
 VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
                                   VFRange &Range) {
@@ -8318,7 +8139,7 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
 
   VPValue *Mask = nullptr;
   if (Legal->isMaskRequired(I))
-    Mask = getBlockInMask(I->getParent());
+    Mask = getBlockInMask(Builder.getInsertBlock());
 
   // Determine if the pointer operand of the access is either consecutive or
   // reverse consecutive.
@@ -8437,38 +8258,6 @@ VPWidenIntOrFpInductionRecipe *VPRecipeBuilder::tryToOptimizeInductionTruncate(
   return nullptr;
 }
 
-VPBlendRecipe *VPRecipeBuilder::tryToBlend(PHINode *Phi,
-                                           ArrayRef<VPValue *> Operands) {
-  unsigned NumIncoming = Phi->getNumIncomingValues();
-
-  // We know that all PHIs in non-header blocks are converted into selects, so
-  // we don't have to worry about the insertion order and we can just use the
-  // builder. At this point we generate the predication tree. There may be
-  // duplications since this is a simple recursive scan, but future
-  // optimizations will clean it up.
-
-  // Map incoming IR BasicBlocks to incoming VPValues, for lookup below.
-  // TODO: Add operands and masks in order from the VPlan predecessors.
-  DenseMap<BasicBlock *, VPValue *> VPIncomingValues;
-  for (const auto &[Idx, Pred] : enumerate(predecessors(Phi->getParent())))
-    VPIncomingValues[Pred] = Operands[Idx];
-
-  SmallVector<VPValue *, 2> OperandsWithMask;
-  for (unsigned In = 0; In < NumIncoming; In++) {
-    BasicBlock *Pred = Phi->getIncomingBlock(In);
-    OperandsWithMask.push_back(VPIncomingValues.lookup(Pred));
-    VPValue *EdgeMask = getEdgeMask(Pred, Phi->getParent());
-    if (!EdgeMask) {
-      assert(In == 0 && "Both null and non-null edge masks found");
-      assert(all_equal(Operands) &&
-             "Distinct incoming values with one having a full mask");
-      break;
-    }
-    OperandsWithMask.push_back(EdgeMask);
-  }
-  return new VPBlendRecipe(Phi, OperandsWithMask);
-}
-
 VPSingleDefRecipe *VPRecipeBuilder::tryToWidenCall(CallInst *CI,
                                                    ArrayRef<VPValue *> Operands,
                                                    VFRange &Range) {
@@ -8544,7 +8333,7 @@ VPSingleDefRecipe *VPRecipeBuilder::tryToWidenCall(CallInst *CI,
       //      all-true mask.
       VPValue *Mask = nullptr;
       if (Legal->isMaskRequired(CI))
-        Mask = getBlockInMask(CI->getParent());
+        Mask = getBlockInMask(Builder.getInsertBlock());
       else
         Mask = Plan.getOrAddLiveIn(
             ConstantInt::getTrue(IntegerType::getInt1Ty(CI->getContext())));
@@ -8586,7 +8375,7 @@ VPWidenRecipe *VPRecipeBuilder::tryToWiden(Instruction *I,
     // div/rem operation itself.  Otherwise fall through to general handling below.
     if (CM.isPredicatedInst(I)) {
       SmallVector<VPValue *> Ops(Operands);
-      VPValue *Mask = getBlockInMask(I->getParent());
+      VPValue *Mask = getBlockInMask(Builder.getInsertBlock());
       VPValue *One =
           Plan.getOrAddLiveIn(ConstantInt::get(I->getType(), 1u, false));
       auto *SafeRHS = Builder.createSelect(Mask, Ops[1], One, I->getDebugLoc());
@@ -8668,7 +8457,7 @@ VPRecipeBuilder::tryToWidenHistogram(const HistogramInfo *HI,
   // In case of predicated execution (due to tail-folding, or conditional
   // execution, or both), pass the relevant mask.
   if (Legal->isMaskRequired(HI->Store))
-    HGramOps.push_back(getBlockInMask(HI->Store->getParent()));
+    HGramOps.push_back(getBlockInMask(Builder.getInsertBlock()));
 
   return new VPHistogramRecipe(Opcode,
                                make_range(HGramOps.begin(), HGramOps.end()),
@@ -8724,7 +8513,7 @@ VPRecipeBuilder::handleReplication(Instruction *I, ArrayRef<VPValue *> Operands,
     // added initially. Masked replicate recipes will later be placed under an
     // if-then construct to prevent side-effects. Generate recipes to compute
     // the block mask for this region.
-    BlockInMask = getBlockInMask(I->getParent());
+    BlockInMask = getBlockInMask(Builder.getInsertBlock());
   }
 
   // Note that there is some custom logic to mark some intrinsics as uniform
@@ -8857,9 +8646,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
   // nodes, calls and memory operations.
   VPRecipeBase *Recipe;
   if (auto *Phi = dyn_cast<PHINode>(Instr)) {
-    if (Phi->getParent() != OrigLoop->getHeader())
-      return tryToBlend(Phi, Operands);
-
+    assert(Phi->getParent() == OrigLoop->getHeader() &&
+           "Non-header phis should have been handled during predication");
     assert(Operands.size() == 2 && "Must have 2 operands for header phis");
     if ((Recipe = tryToOptimizeInductionPHI(Phi, Operands, Range)))
       return Recipe;
@@ -8964,7 +8752,7 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction,
             ReductionOpcode == Instruction::Sub) &&
            "Expected an ADD or SUB operation for predicated partial "
            "reductions (because the neutral element in the mask is zero)!");
-    VPValue *Mask = getBlockInMask(Reduction->getParent());
+    VPValue *Mask = getBlockInMask(Builder.getInsertBlock());
     VPValue *Zero =
         Plan.getOrAddLiveIn(ConstantInt::get(Reduction->getType(), 0));
     BinOp = Builder.createSelect(Mask, BinOp, Zero, Reduction->getDebugLoc());
@@ -9332,9 +9120,6 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
   bool HasNUW = !IVUpdateMayOverflow || Style == TailFoldingStyle::None;
   addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), HasNUW, DL);
 
-  VPRecipeBuilder RecipeBuilder(*Plan, OrigLoop, TLI, &TTI, Legal, CM, PSE,
-                                Builder);
-
   // ---------------------------------------------------------------------------
   // Pre-construction: record ingredients whose recipes we'll need to further
   // process after constructing the initial VPlan.
@@ -9375,39 +9160,24 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
         return Legal->blockNeedsPredication(BB) || NeedsBlends;
       });
 
-  RecipeBuilder.collectScaledReductions(Range);
 
   auto *MiddleVPBB = Plan->getMiddleBlock();
 
+  VPRecipeBuilder RecipeBuilder(*Plan, OrigLoop, TLI, &TTI, Legal, CM, PSE,
+                                Builder);
+  if (NeedsMasks) {
+    VPlanTransforms::predicateAndLinearize(*Plan, CM.foldTailByMasking(),
+                                           RecipeBuilder);
+  }
+  RecipeBuilder.collectScaledReductions(Range);
+
   // Scan the body of the loop in a topological order to visit each basic block
   // after having visited its predecessor basic blocks.
   ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
       HeaderVPBB);
 
   VPBasicBlock::iterator MBIP = MiddleVPBB->getFirstNonPhi();
-  VPBlockBase *PrevVPBB = nullptr;
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
-    // Handle VPBBs down to the latch.
-    if (VPBB == LoopRegion->getExiting()) {
-      assert(!HCFGBuilder.getIRBBForVPB(VPBB) &&
-             "the latch block shouldn't have a corresponding IRBB");
-      VPBlockUtils::connectBlocks(PrevVPBB, VPBB);
-      break;
-    }
-
-    // Create mask based on the IR BB corresponding to VPBB.
-    // TODO: Predicate directly based on VPlan.
-    Builder.setInsertPoint(VPBB, VPBB->begin());
-    if (VPBB == HeaderVPBB) {
-      Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());
-      RecipeBuilder.createHeaderMask();
-    } else if (NeedsMasks) {
-      // FIXME: At the moment, masks need to be placed at the beginning of the
-      // block, as blends introduced for phi nodes need to use it. The created
-      // blends should be sunk after the mask recipes.
-      RecipeBuilder.createBlockInMask(HCFGBuilder.getIRBBForVPB(VPBB));
-    }
-
     // Convert input VPInstructions to widened recipes.
     for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
       auto *SingleDef = cast<VPSingleDefRecipe>(&R);
@@ -9417,7 +9187,8 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
       // latter are added above for masking.
       // FIXME: Migrate code relying on the underlying instruction from VPlan0
       // to construct recipes below to not use the underlying instruction.
-      if (isa<VPCanonicalIVPHIRecipe, VPWidenCanonicalIVRecipe>(&R) ||
+      if (isa<VPCanonicalIVPHIRecipe, VPWidenCanonicalIVRecipe, VPBlendRecipe>(
+              &R) ||
           (isa<VPInstruction>(&R) && !UnderlyingValue))
         continue;
 
@@ -9469,22 +9240,18 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
       } else {
         Builder.insert(Recipe);
       }
-      if (Recipe->getNumDefinedValues() == 1)
+      if (Recipe->getNumDefinedValues() == 1) {
         SingleDef->replaceAllUsesWith(Recipe->getVPSingleValue());
-      else
+        for (auto &[_, V] : RecipeBuilder.BlockMaskCache) {
+          if (V == SingleDef)
+            V = Recipe->getVPSingleValue();
+        }
+      } else
         assert(Recipe->getNumDefinedValues() == 0 &&
                "Unexpected multidef recipe");
       R.eraseFromParent();
     }
 
-    // Flatten the CFG in the loop. Masks for blocks have already been generated
-    // and added to recipes as needed. To do so, first disconnect VPBB from its
-    // successors. Then connect VPBB to the previously visited VPBB.
-    for (auto *Succ : to_vector(VPBB->getSuccessors()))
-      VPBlockUtils::disconnectBlocks(VPBB, Succ);
-    if (PrevVPBB)
-      VPBlockUtils::connectBlocks(PrevVPBB, VPBB);
-    PrevVPBB = VPBB;
   }
 
   assert(isa<VPRegionBlock>(Plan->getVectorLoopRegion()) &&
@@ -9783,7 +9550,7 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
       BasicBlock *BB = CurrentLinkI->getParent();
       VPValue *CondOp = nullptr;
       if (CM.blockNeedsPredicationForAnyReason(BB))
-        CondOp = RecipeBuilder.getBlockInMask(BB);
+        CondOp = RecipeBuilder.getBlockInMask(CurrentLink->getParent());
 
       auto *RedRecipe = new VPReductionRecipe(
           RdxDesc, CurrentLinkI, PreviousLink, VecOp, CondOp,
@@ -9818,7 +9585,8 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
     // different numbers of lanes. Partial reductions mask the input instead.
     if (!PhiR->isInLoop() && CM.foldTailByMasking() &&
         !isa<VPPartialReductionRecipe>(OrigExitingVPV->getDefiningRecipe())) {
-      VPValue *Cond = RecipeBuilder.getBlockInMask(OrigLoop->getHeader());
+      VPValue *Cond =
+          RecipeBuilder.getBlockInMask(VectorLoopRegion->getEntryBasicBlock());
       assert(OrigExitingVPV->getDefiningRecipe()->getParent() != LatchVPBB &&
              "reduction recipe must be defined before latch");
       Type *PhiTy = PhiR->getOperand(0)->getLiveInIRValue()->getType();
diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
index 334cfbad8bd7c..9900c4117c5f6 100644
--- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
@@ -73,11 +73,14 @@ class VPRecipeBuilder {
   /// if-conversion currently takes place during VPlan-construction, so these
   /// caches are only used at that stage.
   using EdgeMaskCacheTy =
-      DenseMap<std::pair<BasicBlock *, BasicBlock *>, VPValue *>;
-  using BlockMaskCacheTy = DenseMap<BasicBlock *, VPValue *>;
+      DenseMap<std::pair<VPBasicBlock *, VPBasicBlock *>, VPValue *>;
+  using BlockMaskCacheTy = DenseMap<VPBasicBlock *, VPValue *>;
   EdgeMaskCacheTy EdgeMaskCache;
+
+public:
   BlockMaskCacheTy BlockMaskCache;
 
+private:
   // VPlan construction support: Hold a mapping from ingredients to
   // their recipe.
   DenseMap<Instruction *, VPRecipeBase *> Ingredient2Recipe;
@@ -114,11 +117,6 @@ class VPRecipeBuilder {
   tryToOptimizeInductionTruncate(TruncInst *I, ArrayRef<VPValue *> Operands,
                                  VFRange &Range);
 
-  /// Handle non-...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Feb 23, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform.

The main logic to perform predication is ready to review, although there are few things to note that should be improved, either directly in the PR or in the future:

  • Edge and block masks are cached in VPRecipeBuilder, so they can be accessed during recipe construction. A better alternative may be to add mask operands to all VPInstructions that need them and use that during recipe construction
  • The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands.

Currently this is still WIP due to early-exit loop handling not working due to the exit conditions not being available in the initial VPlans. This will be fixed with #128419 and follow-ups

All tests except early-exit loops are passing


Patch is 38.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128420.diff

8 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/CMakeLists.txt (+1)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+27-259)
  • (modified) llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h (+18-27)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp (+13-11)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h (-12)
  • (added) llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp (+274)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+3-2)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+3)
diff --git a/llvm/lib/Transforms/Vectorize/CMakeLists.txt b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
index 38670ba304e53..74ae61440327c 100644
--- a/llvm/lib/Transforms/Vectorize/CMakeLists.txt
+++ b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
@@ -23,6 +23,7 @@ add_llvm_component_library(LLVMVectorize
   VPlan.cpp
   VPlanAnalysis.cpp
   VPlanHCFGBuilder.cpp
+  VPlanPredicator.cpp
   VPlanRecipes.cpp
   VPlanSLP.cpp
   VPlanTransforms.cpp
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index ced01df7b0d44..a2e20a701d612 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8115,185 +8115,6 @@ void EpilogueVectorizerEpilogueLoop::printDebugTracesAtEnd() {
   });
 }
 
-void VPRecipeBuilder::createSwitchEdgeMasks(SwitchInst *SI) {
-  BasicBlock *Src = SI->getParent();
-  assert(!OrigLoop->isLoopExiting(Src) &&
-         all_of(successors(Src),
-                [this](BasicBlock *Succ) {
-                  return OrigLoop->getHeader() != Succ;
-                }) &&
-         "unsupported switch either exiting loop or continuing to header");
-  // Create masks where the terminator in Src is a switch. We create mask for
-  // all edges at the same time. This is more efficient, as we can create and
-  // collect compares for all cases once.
-  VPValue *Cond = getVPValueOrAddLiveIn(SI->getCondition());
-  BasicBlock *DefaultDst = SI->getDefaultDest();
-  MapVector<BasicBlock *, SmallVector<VPValue *>> Dst2Compares;
-  for (auto &C : SI->cases()) {
-    BasicBlock *Dst = C.getCaseSuccessor();
-    assert(!EdgeMaskCache.contains({Src, Dst}) && "Edge masks already created");
-    // Cases whose destination is the same as default are redundant and can be
-    // ignored - they will get there anyhow.
-    if (Dst == DefaultDst)
-      continue;
-    auto &Compares = Dst2Compares[Dst];
-    VPValue *V = getVPValueOrAddLiveIn(C.getCaseValue());
-    Compares.push_back(Builder.createICmp(CmpInst::ICMP_EQ, Cond, V));
-  }
-
-  // We need to handle 2 separate cases below for all entries in Dst2Compares,
-  // which excludes destinations matching the default destination.
-  VPValue *SrcMask = getBlockInMask(Src);
-  VPValue *DefaultMask = nullptr;
-  for (const auto &[Dst, Conds] : Dst2Compares) {
-    // 1. Dst is not the default destination. Dst is reached if any of the cases
-    // with destination == Dst are taken. Join the conditions for each case
-    // whose destination == Dst using an OR.
-    VPValue *Mask = Conds[0];
-    for (VPValue *V : ArrayRef<VPValue *>(Conds).drop_front())
-      Mask = Builder.createOr(Mask, V);
-    if (SrcMask)
-      Mask = Builder.createLogicalAnd(SrcMask, Mask);
-    EdgeMaskCache[{Src, Dst}] = Mask;
-
-    // 2. Create the mask for the default destination, which is reached if none
-    // of the cases with destination != default destination are taken. Join the
-    // conditions for each case where the destination is != Dst using an OR and
-    // negate it.
-    DefaultMask = DefaultMask ? Builder.createOr(DefaultMask, Mask) : Mask;
-  }
-
-  if (DefaultMask) {
-    DefaultMask = Builder.createNot(DefaultMask);
-    if (SrcMask)
-      DefaultMask = Builder.createLogicalAnd(SrcMask, DefaultMask);
-  }
-  EdgeMaskCache[{Src, DefaultDst}] = DefaultMask;
-}
-
-VPValue *VPRecipeBuilder::createEdgeMask(BasicBlock *Src, BasicBlock *Dst) {
-  assert(is_contained(predecessors(Dst), Src) && "Invalid edge");
-
-  // Look for cached value.
-  std::pair<BasicBlock *, BasicBlock *> Edge(Src, Dst);
-  EdgeMaskCacheTy::iterator ECEntryIt = EdgeMaskCache.find(Edge);
-  if (ECEntryIt != EdgeMaskCache.end())
-    return ECEntryIt->second;
-
-  if (auto *SI = dyn_cast<SwitchInst>(Src->getTerminator())) {
-    createSwitchEdgeMasks(SI);
-    assert(EdgeMaskCache.contains(Edge) && "Mask for Edge not created?");
-    return EdgeMaskCache[Edge];
-  }
-
-  VPValue *SrcMask = getBlockInMask(Src);
-
-  // The terminator has to be a branch inst!
-  BranchInst *BI = dyn_cast<BranchInst>(Src->getTerminator());
-  assert(BI && "Unexpected terminator found");
-  if (!BI->isConditional() || BI->getSuccessor(0) == BI->getSuccessor(1))
-    return EdgeMaskCache[Edge] = SrcMask;
-
-  // If source is an exiting block, we know the exit edge is dynamically dead
-  // in the vector loop, and thus we don't need to restrict the mask.  Avoid
-  // adding uses of an otherwise potentially dead instruction unless we are
-  // vectorizing a loop with uncountable exits. In that case, we always
-  // materialize the mask.
-  if (OrigLoop->isLoopExiting(Src) &&
-      Src != Legal->getUncountableEarlyExitingBlock())
-    return EdgeMaskCache[Edge] = SrcMask;
-
-  VPValue *EdgeMask = getVPValueOrAddLiveIn(BI->getCondition());
-  assert(EdgeMask && "No Edge Mask found for condition");
-
-  if (BI->getSuccessor(0) != Dst)
-    EdgeMask = Builder.createNot(EdgeMask, BI->getDebugLoc());
-
-  if (SrcMask) { // Otherwise block in-mask is all-one, no need to AND.
-    // The bitwise 'And' of SrcMask and EdgeMask introduces new UB if SrcMask
-    // is false and EdgeMask is poison. Avoid that by using 'LogicalAnd'
-    // instead which generates 'select i1 SrcMask, i1 EdgeMask, i1 false'.
-    EdgeMask = Builder.createLogicalAnd(SrcMask, EdgeMask, BI->getDebugLoc());
-  }
-
-  return EdgeMaskCache[Edge] = EdgeMask;
-}
-
-VPValue *VPRecipeBuilder::getEdgeMask(BasicBlock *Src, BasicBlock *Dst) const {
-  assert(is_contained(predecessors(Dst), Src) && "Invalid edge");
-
-  // Look for cached value.
-  std::pair<BasicBlock *, BasicBlock *> Edge(Src, Dst);
-  EdgeMaskCacheTy::const_iterator ECEntryIt = EdgeMaskCache.find(Edge);
-  assert(ECEntryIt != EdgeMaskCache.end() &&
-         "looking up mask for edge which has not been created");
-  return ECEntryIt->second;
-}
-
-void VPRecipeBuilder::createHeaderMask() {
-  BasicBlock *Header = OrigLoop->getHeader();
-
-  // When not folding the tail, use nullptr to model all-true mask.
-  if (!CM.foldTailByMasking()) {
-    BlockMaskCache[Header] = nullptr;
-    return;
-  }
-
-  // Introduce the early-exit compare IV <= BTC to form header block mask.
-  // This is used instead of IV < TC because TC may wrap, unlike BTC. Start by
-  // constructing the desired canonical IV in the header block as its first
-  // non-phi instructions.
-
-  VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock();
-  auto NewInsertionPoint = HeaderVPBB->getFirstNonPhi();
-  auto *IV = new VPWidenCanonicalIVRecipe(Plan.getCanonicalIV());
-  HeaderVPBB->insert(IV, NewInsertionPoint);
-
-  VPBuilder::InsertPointGuard Guard(Builder);
-  Builder.setInsertPoint(HeaderVPBB, NewInsertionPoint);
-  VPValue *BlockMask = nullptr;
-  VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();
-  BlockMask = Builder.createICmp(CmpInst::ICMP_ULE, IV, BTC);
-  BlockMaskCache[Header] = BlockMask;
-}
-
-VPValue *VPRecipeBuilder::getBlockInMask(BasicBlock *BB) const {
-  // Return the cached value.
-  BlockMaskCacheTy::const_iterator BCEntryIt = BlockMaskCache.find(BB);
-  assert(BCEntryIt != BlockMaskCache.end() &&
-         "Trying to access mask for block without one.");
-  return BCEntryIt->second;
-}
-
-void VPRecipeBuilder::createBlockInMask(BasicBlock *BB) {
-  assert(OrigLoop->contains(BB) && "Block is not a part of a loop");
-  assert(BlockMaskCache.count(BB) == 0 && "Mask for block already computed");
-  assert(OrigLoop->getHeader() != BB &&
-         "Loop header must have cached block mask");
-
-  // All-one mask is modelled as no-mask following the convention for masked
-  // load/store/gather/scatter. Initialize BlockMask to no-mask.
-  VPValue *BlockMask = nullptr;
-  // This is the block mask. We OR all unique incoming edges.
-  for (auto *Predecessor :
-       SetVector<BasicBlock *>(pred_begin(BB), pred_end(BB))) {
-    VPValue *EdgeMask = createEdgeMask(Predecessor, BB);
-    if (!EdgeMask) { // Mask of predecessor is all-one so mask of block is too.
-      BlockMaskCache[BB] = EdgeMask;
-      return;
-    }
-
-    if (!BlockMask) { // BlockMask has its initialized nullptr value.
-      BlockMask = EdgeMask;
-      continue;
-    }
-
-    BlockMask = Builder.createOr(BlockMask, EdgeMask, {});
-  }
-
-  BlockMaskCache[BB] = BlockMask;
-}
-
 VPWidenMemoryRecipe *
 VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
                                   VFRange &Range) {
@@ -8318,7 +8139,7 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
 
   VPValue *Mask = nullptr;
   if (Legal->isMaskRequired(I))
-    Mask = getBlockInMask(I->getParent());
+    Mask = getBlockInMask(Builder.getInsertBlock());
 
   // Determine if the pointer operand of the access is either consecutive or
   // reverse consecutive.
@@ -8437,38 +8258,6 @@ VPWidenIntOrFpInductionRecipe *VPRecipeBuilder::tryToOptimizeInductionTruncate(
   return nullptr;
 }
 
-VPBlendRecipe *VPRecipeBuilder::tryToBlend(PHINode *Phi,
-                                           ArrayRef<VPValue *> Operands) {
-  unsigned NumIncoming = Phi->getNumIncomingValues();
-
-  // We know that all PHIs in non-header blocks are converted into selects, so
-  // we don't have to worry about the insertion order and we can just use the
-  // builder. At this point we generate the predication tree. There may be
-  // duplications since this is a simple recursive scan, but future
-  // optimizations will clean it up.
-
-  // Map incoming IR BasicBlocks to incoming VPValues, for lookup below.
-  // TODO: Add operands and masks in order from the VPlan predecessors.
-  DenseMap<BasicBlock *, VPValue *> VPIncomingValues;
-  for (const auto &[Idx, Pred] : enumerate(predecessors(Phi->getParent())))
-    VPIncomingValues[Pred] = Operands[Idx];
-
-  SmallVector<VPValue *, 2> OperandsWithMask;
-  for (unsigned In = 0; In < NumIncoming; In++) {
-    BasicBlock *Pred = Phi->getIncomingBlock(In);
-    OperandsWithMask.push_back(VPIncomingValues.lookup(Pred));
-    VPValue *EdgeMask = getEdgeMask(Pred, Phi->getParent());
-    if (!EdgeMask) {
-      assert(In == 0 && "Both null and non-null edge masks found");
-      assert(all_equal(Operands) &&
-             "Distinct incoming values with one having a full mask");
-      break;
-    }
-    OperandsWithMask.push_back(EdgeMask);
-  }
-  return new VPBlendRecipe(Phi, OperandsWithMask);
-}
-
 VPSingleDefRecipe *VPRecipeBuilder::tryToWidenCall(CallInst *CI,
                                                    ArrayRef<VPValue *> Operands,
                                                    VFRange &Range) {
@@ -8544,7 +8333,7 @@ VPSingleDefRecipe *VPRecipeBuilder::tryToWidenCall(CallInst *CI,
       //      all-true mask.
       VPValue *Mask = nullptr;
       if (Legal->isMaskRequired(CI))
-        Mask = getBlockInMask(CI->getParent());
+        Mask = getBlockInMask(Builder.getInsertBlock());
       else
         Mask = Plan.getOrAddLiveIn(
             ConstantInt::getTrue(IntegerType::getInt1Ty(CI->getContext())));
@@ -8586,7 +8375,7 @@ VPWidenRecipe *VPRecipeBuilder::tryToWiden(Instruction *I,
     // div/rem operation itself.  Otherwise fall through to general handling below.
     if (CM.isPredicatedInst(I)) {
       SmallVector<VPValue *> Ops(Operands);
-      VPValue *Mask = getBlockInMask(I->getParent());
+      VPValue *Mask = getBlockInMask(Builder.getInsertBlock());
       VPValue *One =
           Plan.getOrAddLiveIn(ConstantInt::get(I->getType(), 1u, false));
       auto *SafeRHS = Builder.createSelect(Mask, Ops[1], One, I->getDebugLoc());
@@ -8668,7 +8457,7 @@ VPRecipeBuilder::tryToWidenHistogram(const HistogramInfo *HI,
   // In case of predicated execution (due to tail-folding, or conditional
   // execution, or both), pass the relevant mask.
   if (Legal->isMaskRequired(HI->Store))
-    HGramOps.push_back(getBlockInMask(HI->Store->getParent()));
+    HGramOps.push_back(getBlockInMask(Builder.getInsertBlock()));
 
   return new VPHistogramRecipe(Opcode,
                                make_range(HGramOps.begin(), HGramOps.end()),
@@ -8724,7 +8513,7 @@ VPRecipeBuilder::handleReplication(Instruction *I, ArrayRef<VPValue *> Operands,
     // added initially. Masked replicate recipes will later be placed under an
     // if-then construct to prevent side-effects. Generate recipes to compute
     // the block mask for this region.
-    BlockInMask = getBlockInMask(I->getParent());
+    BlockInMask = getBlockInMask(Builder.getInsertBlock());
   }
 
   // Note that there is some custom logic to mark some intrinsics as uniform
@@ -8857,9 +8646,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
   // nodes, calls and memory operations.
   VPRecipeBase *Recipe;
   if (auto *Phi = dyn_cast<PHINode>(Instr)) {
-    if (Phi->getParent() != OrigLoop->getHeader())
-      return tryToBlend(Phi, Operands);
-
+    assert(Phi->getParent() == OrigLoop->getHeader() &&
+           "Non-header phis should have been handled during predication");
     assert(Operands.size() == 2 && "Must have 2 operands for header phis");
     if ((Recipe = tryToOptimizeInductionPHI(Phi, Operands, Range)))
       return Recipe;
@@ -8964,7 +8752,7 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction,
             ReductionOpcode == Instruction::Sub) &&
            "Expected an ADD or SUB operation for predicated partial "
            "reductions (because the neutral element in the mask is zero)!");
-    VPValue *Mask = getBlockInMask(Reduction->getParent());
+    VPValue *Mask = getBlockInMask(Builder.getInsertBlock());
     VPValue *Zero =
         Plan.getOrAddLiveIn(ConstantInt::get(Reduction->getType(), 0));
     BinOp = Builder.createSelect(Mask, BinOp, Zero, Reduction->getDebugLoc());
@@ -9332,9 +9120,6 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
   bool HasNUW = !IVUpdateMayOverflow || Style == TailFoldingStyle::None;
   addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), HasNUW, DL);
 
-  VPRecipeBuilder RecipeBuilder(*Plan, OrigLoop, TLI, &TTI, Legal, CM, PSE,
-                                Builder);
-
   // ---------------------------------------------------------------------------
   // Pre-construction: record ingredients whose recipes we'll need to further
   // process after constructing the initial VPlan.
@@ -9375,39 +9160,24 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
         return Legal->blockNeedsPredication(BB) || NeedsBlends;
       });
 
-  RecipeBuilder.collectScaledReductions(Range);
 
   auto *MiddleVPBB = Plan->getMiddleBlock();
 
+  VPRecipeBuilder RecipeBuilder(*Plan, OrigLoop, TLI, &TTI, Legal, CM, PSE,
+                                Builder);
+  if (NeedsMasks) {
+    VPlanTransforms::predicateAndLinearize(*Plan, CM.foldTailByMasking(),
+                                           RecipeBuilder);
+  }
+  RecipeBuilder.collectScaledReductions(Range);
+
   // Scan the body of the loop in a topological order to visit each basic block
   // after having visited its predecessor basic blocks.
   ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
       HeaderVPBB);
 
   VPBasicBlock::iterator MBIP = MiddleVPBB->getFirstNonPhi();
-  VPBlockBase *PrevVPBB = nullptr;
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
-    // Handle VPBBs down to the latch.
-    if (VPBB == LoopRegion->getExiting()) {
-      assert(!HCFGBuilder.getIRBBForVPB(VPBB) &&
-             "the latch block shouldn't have a corresponding IRBB");
-      VPBlockUtils::connectBlocks(PrevVPBB, VPBB);
-      break;
-    }
-
-    // Create mask based on the IR BB corresponding to VPBB.
-    // TODO: Predicate directly based on VPlan.
-    Builder.setInsertPoint(VPBB, VPBB->begin());
-    if (VPBB == HeaderVPBB) {
-      Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());
-      RecipeBuilder.createHeaderMask();
-    } else if (NeedsMasks) {
-      // FIXME: At the moment, masks need to be placed at the beginning of the
-      // block, as blends introduced for phi nodes need to use it. The created
-      // blends should be sunk after the mask recipes.
-      RecipeBuilder.createBlockInMask(HCFGBuilder.getIRBBForVPB(VPBB));
-    }
-
     // Convert input VPInstructions to widened recipes.
     for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
       auto *SingleDef = cast<VPSingleDefRecipe>(&R);
@@ -9417,7 +9187,8 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
       // latter are added above for masking.
       // FIXME: Migrate code relying on the underlying instruction from VPlan0
       // to construct recipes below to not use the underlying instruction.
-      if (isa<VPCanonicalIVPHIRecipe, VPWidenCanonicalIVRecipe>(&R) ||
+      if (isa<VPCanonicalIVPHIRecipe, VPWidenCanonicalIVRecipe, VPBlendRecipe>(
+              &R) ||
           (isa<VPInstruction>(&R) && !UnderlyingValue))
         continue;
 
@@ -9469,22 +9240,18 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
       } else {
         Builder.insert(Recipe);
       }
-      if (Recipe->getNumDefinedValues() == 1)
+      if (Recipe->getNumDefinedValues() == 1) {
         SingleDef->replaceAllUsesWith(Recipe->getVPSingleValue());
-      else
+        for (auto &[_, V] : RecipeBuilder.BlockMaskCache) {
+          if (V == SingleDef)
+            V = Recipe->getVPSingleValue();
+        }
+      } else
         assert(Recipe->getNumDefinedValues() == 0 &&
                "Unexpected multidef recipe");
       R.eraseFromParent();
     }
 
-    // Flatten the CFG in the loop. Masks for blocks have already been generated
-    // and added to recipes as needed. To do so, first disconnect VPBB from its
-    // successors. Then connect VPBB to the previously visited VPBB.
-    for (auto *Succ : to_vector(VPBB->getSuccessors()))
-      VPBlockUtils::disconnectBlocks(VPBB, Succ);
-    if (PrevVPBB)
-      VPBlockUtils::connectBlocks(PrevVPBB, VPBB);
-    PrevVPBB = VPBB;
   }
 
   assert(isa<VPRegionBlock>(Plan->getVectorLoopRegion()) &&
@@ -9783,7 +9550,7 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
       BasicBlock *BB = CurrentLinkI->getParent();
       VPValue *CondOp = nullptr;
       if (CM.blockNeedsPredicationForAnyReason(BB))
-        CondOp = RecipeBuilder.getBlockInMask(BB);
+        CondOp = RecipeBuilder.getBlockInMask(CurrentLink->getParent());
 
       auto *RedRecipe = new VPReductionRecipe(
           RdxDesc, CurrentLinkI, PreviousLink, VecOp, CondOp,
@@ -9818,7 +9585,8 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
     // different numbers of lanes. Partial reductions mask the input instead.
     if (!PhiR->isInLoop() && CM.foldTailByMasking() &&
         !isa<VPPartialReductionRecipe>(OrigExitingVPV->getDefiningRecipe())) {
-      VPValue *Cond = RecipeBuilder.getBlockInMask(OrigLoop->getHeader());
+      VPValue *Cond =
+          RecipeBuilder.getBlockInMask(VectorLoopRegion->getEntryBasicBlock());
       assert(OrigExitingVPV->getDefiningRecipe()->getParent() != LatchVPBB &&
              "reduction recipe must be defined before latch");
       Type *PhiTy = PhiR->getOperand(0)->getLiveInIRValue()->getType();
diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
index 334cfbad8bd7c..9900c4117c5f6 100644
--- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
@@ -73,11 +73,14 @@ class VPRecipeBuilder {
   /// if-conversion currently takes place during VPlan-construction, so these
   /// caches are only used at that stage.
   using EdgeMaskCacheTy =
-      DenseMap<std::pair<BasicBlock *, BasicBlock *>, VPValue *>;
-  using BlockMaskCacheTy = DenseMap<BasicBlock *, VPValue *>;
+      DenseMap<std::pair<VPBasicBlock *, VPBasicBlock *>, VPValue *>;
+  using BlockMaskCacheTy = DenseMap<VPBasicBlock *, VPValue *>;
   EdgeMaskCacheTy EdgeMaskCache;
+
+public:
   BlockMaskCacheTy BlockMaskCache;
 
+private:
   // VPlan construction support: Hold a mapping from ingredients to
   // their recipe.
   DenseMap<Instruction *, VPRecipeBase *> Ingredient2Recipe;
@@ -114,11 +117,6 @@ class VPRecipeBuilder {
   tryToOptimizeInductionTruncate(TruncInst *I, ArrayRef<VPValue *> Operands,
                                  VFRange &Range);
 
-  /// Handle non-...
[truncated]

Copy link

github-actions bot commented Feb 23, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@fhahn
Copy link
Contributor Author

fhahn commented Mar 30, 2025

Still WIP, but early-exits are now handled properly as well, by retaining exit branches during initial construction.

This needs to be split up, which I'll start once #129402 lands

@fhahn fhahn force-pushed the vplan-predication branch from 915b55b to a06af46 Compare April 5, 2025 13:19
@fhahn fhahn force-pushed the vplan-predication branch from a06af46 to 7f61860 Compare April 28, 2025 12:35
fhahn added a commit to fhahn/llvm-project that referenced this pull request Apr 28, 2025
Update initial VPlan construction to include exit conditions and
edges.

For now, all early exits are disconnected before forming the regions,
but a follow-up will update uncountable exit handling to also happen
here. This is required to enable VPlan predication and remove the
dependence any IR BBs (llvm#128420).

This includes updates in a few places to use
replaceSuccessor/replacePredecessor to preserve the order of predecessors
and successors, to reduce the need of fixing up phi operand orderings.
This unfortunately required making them public, not sure if there's a
fhahn added a commit to fhahn/llvm-project that referenced this pull request May 3, 2025
Update initial VPlan construction to include exit conditions and
edges.

For now, all early exits are disconnected before forming the regions,
but a follow-up will update uncountable exit handling to also happen
here. This is required to enable VPlan predication and remove the
dependence any IR BBs (llvm#128420).

This includes updates in a few places to use
replaceSuccessor/replacePredecessor to preserve the order of predecessors
and successors, to reduce the need of fixing up phi operand orderings.
This unfortunately required making them public, not sure if there's a
fhahn added a commit to fhahn/llvm-project that referenced this pull request May 3, 2025
Move early-exit handling up front to original VPlan construction, before
introducing early exits.

This builds on llvm#137709, which
adds exiting edges to the original VPlan, instead of adding exit blocks
later.

This retains the exit conditions early, and means we can handle early
exits before forming regions, without the reliance on VPRecipeBuilder.

Once we retain all exits initially, handling early exits before region
construction ensures the regions are valid; otherwise we would leave
edges exiting the region from elsewhere than the latch.

Removing the reliance on VPRecipeBuilder removes the dependence on
mapping IR BBs to VPBBs and unblocks predication as VPlan transform:
llvm#128420.

Depends on llvm#137709.
fhahn added a commit to fhahn/llvm-project that referenced this pull request May 5, 2025
Update initial VPlan construction to include exit conditions and
edges.

For now, all early exits are disconnected before forming the regions,
but a follow-up will update uncountable exit handling to also happen
here. This is required to enable VPlan predication and remove the
dependence any IR BBs (llvm#128420).

This includes updates in a few places to use
replaceSuccessor/replacePredecessor to preserve the order of predecessors
and successors, to reduce the need of fixing up phi operand orderings.
This unfortunately required making them public, not sure if there's a
fhahn added a commit to fhahn/llvm-project that referenced this pull request May 6, 2025
Move early-exit handling up front to original VPlan construction, before
introducing early exits.

This builds on llvm#137709, which
adds exiting edges to the original VPlan, instead of adding exit blocks
later.

This retains the exit conditions early, and means we can handle early
exits before forming regions, without the reliance on VPRecipeBuilder.

Once we retain all exits initially, handling early exits before region
construction ensures the regions are valid; otherwise we would leave
edges exiting the region from elsewhere than the latch.

Removing the reliance on VPRecipeBuilder removes the dependence on
mapping IR BBs to VPBBs and unblocks predication as VPlan transform:
llvm#128420.

Depends on llvm#137709.
fhahn added a commit that referenced this pull request May 8, 2025
…7709)

Update initial VPlan construction to include exit conditions and edges.

The loop region is now first constructed without entry/exiting. Those
are set after inserting the region in the CFG, to preserve the original
predecessor/successor order of blocks.

For now, all early exits are disconnected before forming the regions,
but a follow-up will update uncountable exit handling to also happen
here. This is required to enable VPlan predication and remove the
dependence any IR BBs
(#128420).

PR: #137709
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 8, 2025
…(NFC). (#137709)

Update initial VPlan construction to include exit conditions and edges.

The loop region is now first constructed without entry/exiting. Those
are set after inserting the region in the CFG, to preserve the original
predecessor/successor order of blocks.

For now, all early exits are disconnected before forming the regions,
but a follow-up will update uncountable exit handling to also happen
here. This is required to enable VPlan predication and remove the
dependence any IR BBs
(llvm/llvm-project#128420).

PR: llvm/llvm-project#137709
fhahn added a commit to fhahn/llvm-project that referenced this pull request May 8, 2025
Move early-exit handling up front to original VPlan construction, before
introducing early exits.

This builds on llvm#137709, which
adds exiting edges to the original VPlan, instead of adding exit blocks
later.

This retains the exit conditions early, and means we can handle early
exits before forming regions, without the reliance on VPRecipeBuilder.

Once we retain all exits initially, handling early exits before region
construction ensures the regions are valid; otherwise we would leave
edges exiting the region from elsewhere than the latch.

Removing the reliance on VPRecipeBuilder removes the dependence on
mapping IR BBs to VPBBs and unblocks predication as VPlan transform:
llvm#128420.

Depends on llvm#137709.
@fhahn fhahn force-pushed the vplan-predication branch 2 times, most recently from fcfde33 to 4129042 Compare May 10, 2025 11:47
fhahn added a commit that referenced this pull request May 10, 2025
This allows migrating some more code to be based on VPBBs in
VPRecipeBuilder, in preparation for
#128420.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 10, 2025
This allows migrating some more code to be based on VPBBs in
VPRecipeBuilder, in preparation for
llvm/llvm-project#128420.
fhahn added a commit that referenced this pull request May 11, 2025
Update recipe construction to use VPBBs to look up masks, in preparation
for #128420.
@fhahn
Copy link
Contributor Author

fhahn commented May 18, 2025

May be worth reviewing the "native" VPlanPredicator logic introduced in https://reviews.llvm.org/D53349 and removed in https://reviews.llvm.org/D123017.

Might be good to as follow-up to potentially improve the predication implementation, once we completed the NFC move and completed the transition? Although the original l VPlanPredicator may need more work, as it was not enabled by default even in the native path and only tested via C++ unit tests.

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, thanks!
Raised several suggestions, can also be addressed as follow-up.

// to remove the need to keep a map of masks beyond the predication
// transform.
RecipeBuilder.updateBlockMaskCache(Old2New);
for (const auto &[Old, New] : Old2New)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (const auto &[Old, New] : Old2New)
for (const auto &[Old, _] : Old2New)

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

void updateBlockMaskCache(const DenseMap<VPValue *, VPValue *> &Old2New) {
for (auto &[_, V] : BlockMaskCache) {
if (auto *New = Old2New.lookup(V)) {
V->replaceAllUsesWith(New);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: worth removing V from Old2New now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot be done for now, as Old2New is used to erase old recipes after updateBlockMaskCache

@@ -66,8 +66,7 @@ class PlainCFGBuilder {
: TheLoop(Lp), LI(LI), Plan(std::make_unique<VPlan>(Lp)) {}

/// Build plain CFG for TheLoop and connects it to Plan's entry.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Build plain CFG for TheLoop and connects it to Plan's entry.
/// Build plain CFG for TheLoop and connect it to Plan's entry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated thanks.

@@ -9488,7 +9267,8 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range,
// latter are added above for masking.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: have this stage take care of widening original scalar recipes, including canonical IV, blend, and masking recipes (underlying-less VPInstructions)?

});

// ---------------------------------------------------------------------------
// Construct recipes for the instructions in the loop
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Construct recipes for the instructions in the loop
// Construct wide recipes and apply predication for original scalar VPInstructions in the loop.

?

Follow-up: outline this into a VPlanTransform?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep plan to do so, thanks

}

VPValue *VPPredicator::createBlockInMask(VPBasicBlock *VPBB) {
Builder.setInsertPoint(VPBB, VPBB->begin());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps better to

Suggested change
Builder.setInsertPoint(VPBB, VPBB->begin());
Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());

as this keeps phi's in order and allows subsequent traversals of phis() to convert them into blends?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it needs to stay as-is for now, as blends need masks that have been created earlier. Will check and adjust separately.

Comment on lines 243 to 247
SmallVector<VPWidenPHIRecipe *> Phis;
for (VPRecipeBase &R : VPBB->phis())
Phis.push_back(cast<VPWidenPHIRecipe>(&R));

Predicator.createBlockInMask(VPBB);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SmallVector<VPWidenPHIRecipe *> Phis;
for (VPRecipeBase &R : VPBB->phis())
Phis.push_back(cast<VPWidenPHIRecipe>(&R));
Predicator.createBlockInMask(VPBB);
Predicator.createBlockInMask(VPBB);
SmallVector<VPWidenPHIRecipe *> Phis;
for (VPRecipeBase &R : VPBB->phis())
Phis.push_back(cast<VPWidenPHIRecipe>(&R));

seems a bit more consistent as Phis are part of the "PhiToBlends" below while createBlockInMask() are part of "introducingBlockMasks" started above with header mask; provided createBlockInMask sets its insert point after all phi's.
Would make_early_inc_range suffice instead of copying into a SmallVector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, adjusted to insert blends using Builder.insert, removing the need for a vector.

// Linearize the blocks of the loop into one serial chain.
VPBlockBase *PrevVPBB = nullptr;
for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
// Handle VPBBs down to the latch.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "Handle VPBBs down to latch" early-break is needed when traversing CFG to stop RPOT from going out of the loop. Is it still needed here where RPOT traverses the region, shallowly? If so, is it needed in the createBlockMasks/convertPhisToBlends loop above too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thannks

@@ -224,6 +222,16 @@ struct VPlanTransforms {
/// candidates.
static void narrowInterleaveGroups(VPlan &Plan, ElementCount VF,
unsigned VectorRegWidth);

/// Predicate and linearize the control-flow in the only loop region of
/// \p Plan. If \p FoldTail is true, also create a mask guarding the loop
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// \p Plan. If \p FoldTail is true, also create a mask guarding the loop
/// \p Plan. If \p FoldTail is true, create a mask guarding the loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks

/// Predicate and linearize the control-flow in the only loop region of
/// \p Plan. If \p FoldTail is true, also create a mask guarding the loop
/// header, otherwise use all-true for the header mask. Masks for blocks are
/// added to \p BlockMaskCache, which in turn will temporarily be used later
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// added to \p BlockMaskCache, which in turn will temporarily be used later
/// added to \p BlockMaskCache in order to be used later

@ayalz
Copy link
Collaborator

ayalz commented May 20, 2025

May be worth reviewing the "native" VPlanPredicator logic introduced in https://reviews.llvm.org/D53349 and removed in https://reviews.llvm.org/D123017.

Might be good to as follow-up to potentially improve the predication implementation, once we completed the NFC move and completed the transition? Although the original l VPlanPredicator may need more work, as it was not enabled by default even in the native path and only tested via C++ unit tests.

Sure, just noting that this revives an older VPlanPredicator.cpp (along with its log?), which could offer some directions for improvements and/or extensions.

@fhahn fhahn merged commit b263c08 into llvm:main May 21, 2025
11 checks passed
@fhahn fhahn deleted the vplan-predication branch May 21, 2025 14:47
@llvm-ci
Copy link
Collaborator

llvm-ci commented May 21, 2025

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime-2 running on rocm-worker-hw-02 while building llvm at step 6 "test-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/10/builds/5775

Here is the relevant piece of the build log for the reference
Step 6 (test-openmp) failure: test (failure)
******************** TEST 'libarcher :: races/lock-unrelated.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 13
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp  -gdwarf-4 -O1 -fsanitize=thread  -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src   /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp 2>&1 | tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp.log | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp -gdwarf-4 -O1 -fsanitize=thread -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp -latomic
# note: command had no output on stdout or stderr
# executed command: env TSAN_OPTIONS=ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1 /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp
# note: command had no output on stdout or stderr
# executed command: tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp.log
# note: command had no output on stdout or stderr
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c
# note: command had no output on stdout or stderr
# RUN: at line 14
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp  -gdwarf-4 -O1 -fsanitize=thread  -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src   /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp -latomic && env ARCHER_OPTIONS="ignore_serial=1 report_data_leak=1" env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp 2>&1 | tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp.log | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp -gdwarf-4 -O1 -fsanitize=thread -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp -latomic
# note: command had no output on stdout or stderr
# executed command: env 'ARCHER_OPTIONS=ignore_serial=1 report_data_leak=1' env TSAN_OPTIONS=ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1 /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp
# note: command had no output on stdout or stderr
# executed command: tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp.log
# note: command had no output on stdout or stderr
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c
# .---command stderr------------
# | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c:47:11: error: CHECK: expected string not found in input
# | // CHECK: ThreadSanitizer: reported {{[1-7]}} warnings
# |           ^
# | <stdin>:26:5: note: scanning from here
# | DONE
# |     ^
# | <stdin>:27:1: note: possible intended match here
# | ThreadSanitizer: thread T4 finished with ignores enabled, created at:
# | ^
# | 
# | Input file: <stdin>
# | Check file: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             .
# |             .
# |             .
# |            21:  #0 pthread_create /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1045:3 (lock-unrelated.c.tmp+0xa2c2a) 
# |            22:  #1 __kmp_create_worker z_Linux_util.cpp (libomp.so+0xcac82) 
# |            23:  
# |            24: SUMMARY: ThreadSanitizer: data race /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c:31:8 in main.omp_outlined_debug__ 
# |            25: ================== 
...

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 21, 2025
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform. It mostly ports the existing logic directly.

There are a number of follow-ups planned in the near future to
further improve on the implementation:
* Edge and block masks are cached in VPPredicator, but the block masks
are still made available to VPRecipeBuilder, so they can be accessed
during recipe construction. As a follow-up, this should be replaced by
adding mask operands to all VPInstructions that need them and use that
during recipe construction.
* The mask caching in a map also means that this map needs updating each
time a new recipe replaces a VPInstruction; this would also be handled
by adding mask operands.

PR: llvm/llvm-project#128420
@llvm-ci
Copy link
Collaborator

llvm-ci commented May 21, 2025

LLVM Buildbot has detected a new failure on builder llvm-nvptx64-nvidia-ubuntu running on as-builder-7 while building llvm at step 2 "checkout".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/160/builds/17815

Here is the relevant piece of the build log for the reference
Step 2 (checkout) failure: update (failure)
...
Resolving deltas:  58% (92/157)
Resolving deltas:  59% (93/157)
Resolving deltas:  60% (95/157)
Resolving deltas:  61% (96/157)
Resolving deltas:  62% (98/157)
Resolving deltas:  63% (99/157)
Resolving deltas:  64% (101/157)
Resolving deltas:  65% (103/157)
Resolving deltas:  66% (104/157)
Resolving deltas:  67% (106/157)
Resolving deltas:  68% (107/157)
Resolving deltas:  69% (109/157)
Resolving deltas:  70% (110/157)
Resolving deltas:  71% (112/157)
Resolving deltas:  72% (114/157)
Resolving deltas:  73% (115/157)
Resolving deltas:  74% (117/157)
Resolving deltas:  75% (118/157)
Resolving deltas:  76% (120/157)
Resolving deltas:  77% (121/157)
Resolving deltas:  78% (123/157)
Resolving deltas:  79% (125/157)
Resolving deltas:  80% (126/157)
Resolving deltas:  81% (128/157)
Resolving deltas:  82% (129/157)
Resolving deltas:  83% (131/157)
Resolving deltas:  84% (132/157)
Resolving deltas:  85% (134/157)
Resolving deltas:  86% (136/157)
Resolving deltas:  87% (137/157)
Resolving deltas:  88% (139/157)
Resolving deltas:  89% (140/157)
Resolving deltas:  90% (142/157)
Resolving deltas:  91% (143/157)
Resolving deltas:  92% (145/157)
Resolving deltas:  93% (147/157)
Resolving deltas:  94% (148/157)
Resolving deltas:  95% (150/157)
Resolving deltas:  96% (151/157)
Resolving deltas:  97% (153/157)
Resolving deltas:  98% (154/157)
Resolving deltas:  99% (156/157)
Resolving deltas: 100% (157/157)
Resolving deltas: 100% (157/157), completed with 123 local objects.
From https://github.com/llvm/llvm-project
 * branch                      main       -> FETCH_HEAD
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
fatal: sha1 file '/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/llvm-project/.git/index.lock' write error. Out of diskspace
fatal: sha1 file '/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/llvm-project/.git/index.lock' write error. Out of diskspace

@llvm-ci
Copy link
Collaborator

llvm-ci commented May 21, 2025

LLVM Buildbot has detected a new failure on builder llvm-nvptx-nvidia-ubuntu running on as-builder-7 while building llvm at step 2 "checkout".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/180/builds/17958

Here is the relevant piece of the build log for the reference
Step 2 (checkout) failure: update (failure)
...
Resolving deltas:  58% (92/158)
Resolving deltas:  59% (94/158)
Resolving deltas:  60% (95/158)
Resolving deltas:  61% (97/158)
Resolving deltas:  62% (98/158)
Resolving deltas:  63% (100/158)
Resolving deltas:  64% (102/158)
Resolving deltas:  65% (103/158)
Resolving deltas:  66% (105/158)
Resolving deltas:  67% (106/158)
Resolving deltas:  68% (108/158)
Resolving deltas:  69% (110/158)
Resolving deltas:  70% (111/158)
Resolving deltas:  71% (113/158)
Resolving deltas:  72% (114/158)
Resolving deltas:  73% (116/158)
Resolving deltas:  74% (117/158)
Resolving deltas:  75% (119/158)
Resolving deltas:  76% (121/158)
Resolving deltas:  77% (122/158)
Resolving deltas:  78% (124/158)
Resolving deltas:  79% (125/158)
Resolving deltas:  80% (127/158)
Resolving deltas:  81% (128/158)
Resolving deltas:  82% (130/158)
Resolving deltas:  83% (132/158)
Resolving deltas:  84% (133/158)
Resolving deltas:  85% (135/158)
Resolving deltas:  86% (136/158)
Resolving deltas:  87% (138/158)
Resolving deltas:  88% (140/158)
Resolving deltas:  89% (141/158)
Resolving deltas:  90% (143/158)
Resolving deltas:  91% (144/158)
Resolving deltas:  92% (146/158)
Resolving deltas:  93% (147/158)
Resolving deltas:  94% (149/158)
Resolving deltas:  95% (151/158)
Resolving deltas:  96% (152/158)
Resolving deltas:  97% (154/158)
Resolving deltas:  98% (155/158)
Resolving deltas:  99% (157/158)
Resolving deltas: 100% (158/158)
Resolving deltas: 100% (158/158), completed with 124 local objects.
From https://github.com/llvm/llvm-project
 * branch                      main       -> FETCH_HEAD
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
fatal: sha1 file '/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx-nvidia-ubuntu/llvm-project/.git/index.lock' write error. Out of diskspace
fatal: sha1 file '/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx-nvidia-ubuntu/llvm-project/.git/index.lock' write error. Out of diskspace

@llvm-ci
Copy link
Collaborator

llvm-ci commented May 21, 2025

LLVM Buildbot has detected a new failure on builder flang-runtime-cuda-gcc running on as-builder-7 while building llvm at step 6 "build-flang-rt".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/152/builds/2711

Here is the relevant piece of the build log for the reference
Step 6 (build-flang-rt) failure: cmake (failure)
...
          detected during instantiation of "__nv_bool Fortran::runtime::io::ChildUnformattedIoStatementState<DIR>::Receive(char *, std::size_t, std::size_t) [with DIR=Fortran::runtime::io::Direction::Input]" at line 1101

8.957 [2/6/117] Building CUDA object flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/extrema.ptx
10.596 [2/5/118] Building CXX object flang-rt/lib/runtime/CMakeFiles/flang_rt.runtime.static.dir/reduce.cpp.o
11.028 [1/5/119] Linking CXX static library /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build/lib/clang/21/lib/x86_64-unknown-linux-gnu/libflang_rt.runtime.a
11.033 [1/4/120] Building CUDA object flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/matmul.ptx
11.635 [1/3/121] Building CUDA object flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/findloc.ptx
12.041 [1/2/122] Building CUDA object flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/matmul-transpose.ptx
19.054 [1/1/123] Building CUDA object flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/dot-product.ptx
19.728 [0/1/124] Linking CUDA static library flang-rt/lib/runtime/libflang_rt.runtimePTX.a
FAILED: flang-rt/lib/runtime/libflang_rt.runtimePTX.a 
: && /usr/bin/cmake -E rm -f flang-rt/lib/runtime/libflang_rt.runtimePTX.a && /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build/bin/llvm-ar qc flang-rt/lib/runtime/libflang_rt.runtimePTX.a  flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/lib/Decimal/binary-to-decimal.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/lib/Decimal/decimal-to-binary.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/ISO_Fortran_binding.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/allocator-registry.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/allocatable.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/array-constructor.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/assign.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/buffer.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/character.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/connection.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/copy.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/derived-api.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/derived.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/descriptor-io.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/descriptor.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/dot-product.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/edit-input.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/edit-output.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/environment.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/external-unit.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/extrema.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/file.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/findloc.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/format.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/inquiry.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/internal-unit.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/io-api.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/io-api-minimal.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/io-error.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/io-stmt.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/iostat.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/matmul-transpose.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/matmul.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/memory.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/misc-intrinsic.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/namelist.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/non-tbp-dio.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/numeric.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/pointer.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/product.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/pseudo-unit.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/ragged.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/stat.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/stop.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/sum.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/support.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/terminator.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.ru
t.runtimePTX.dir/type-code.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/type-info.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/unit.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/utf.ptx && /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build/bin/llvm-ranlib flang-rt/lib/runtime/libflang_rt.runtimePTX.a && :
LLVM ERROR: IO failure on output stream: No space left on device
ninja: build stopped: subcommand failed.
FAILED: runtimes/CMakeFiles/flang-rt /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build/runtimes/CMakeFiles/flang-rt 
cd /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build/runtimes/runtimes-bins && /usr/bin/cmake --build /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build/runtimes/runtimes-bins/ --target flang-rt --config Release
ninja: build stopped: subcommand failed.

@llvm-ci
Copy link
Collaborator

llvm-ci commented May 21, 2025

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve2-vla running on linaro-g4-02 while building llvm at step 14 "test-suite".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/198/builds/4652

Here is the relevant piece of the build log for the reference
Step 14 (test-suite) failure: test (failure)
...
size..init_array: 16 
size..interp: 27 
size..note.ABI-tag: 32 
size..plt: 928 
size..rela.dyn: 1176 
size..rela.plt: 1344 
size..rodata: 14491 
size..text: 198032 
**********
NOEXE: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90.test (6089 of 10355)
******************** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/test/sandbox/build/Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90' is missing
********************
PASS: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__implicit_12_f90.test (6090 of 10355)
********** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__implicit_12_f90.test' RESULTS **********
compile_time: 0.8763 
exec_time: 0.0005 
hash: "ac6b5721de1683371acd2a9be1d52f6f" 
link_time: 0.0000 
size: 565496 
size..bss: 176 
size..comment: 256 
size..data: 232 
size..data.rel.ro: 176 
size..dynamic: 496 
size..dynstr: 630 
size..dynsym: 1608 
size..eh_frame: 20608 
size..eh_frame_hdr: 5740 
size..fini: 20 
size..fini_array: 8 
size..gnu.hash: 28 
size..gnu.version: 134 
size..gnu.version_r: 96 
size..got: 160 
size..got.plt: 480 
size..init: 24 
size..init_array: 16 
size..interp: 27 
size..note.ABI-tag: 32 
size..plt: 944 
size..rela.dyn: 1248 
size..rela.plt: 1368 
size..rodata: 16035 
size..text: 353904 
**********
PASS: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__implicit_class_1_f90.test (6091 of 10355)
********** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__implicit_class_1_f90.test' RESULTS **********
compile_time: 0.8763 

@llvm-ci
Copy link
Collaborator

llvm-ci commented May 21, 2025

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve-vla running on linaro-g3-02 while building llvm at step 14 "test-suite".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/17/builds/8225

Here is the relevant piece of the build log for the reference
Step 14 (test-suite) failure: test (failure)
...
size..init_array: 16 
size..interp: 27 
size..note.ABI-tag: 32 
size..plt: 928 
size..rela.dyn: 1176 
size..rela.plt: 1344 
size..rodata: 14480 
size..text: 198032 
**********
NOEXE: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90.test (6111 of 10355)
******************** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vla/test/sandbox/build/Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90' is missing
********************
PASS: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_transpose_1_f90.test (6112 of 10355)
********** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_transpose_1_f90.test' RESULTS **********
compile_time: 1.1315 
exec_time: 0.0000 
hash: "3413d9388541ee8829fbee93834edce8" 
link_time: 0.0000 
size: 972144 
size..bss: 824 
size..comment: 256 
size..data: 688 
size..data.rel.ro: 176 
size..dynamic: 496 
size..dynstr: 666 
size..dynsym: 1704 
size..eh_frame: 34784 
size..eh_frame_hdr: 8908 
size..fini: 20 
size..fini_array: 8 
size..gnu.hash: 28 
size..gnu.version: 142 
size..gnu.version_r: 96 
size..got: 192 
size..got.plt: 512 
size..init: 24 
size..init_array: 16 
size..interp: 27 
size..note.ABI-tag: 32 
size..plt: 1008 
size..rela.dyn: 2376 
size..rela.plt: 1464 
size..rodata: 26854 
size..text: 636816 
**********
PASS: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_matmul_13_f90.test (6113 of 10355)
********** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_matmul_13_f90.test' RESULTS **********
compile_time: 1.1315 

fhahn added a commit that referenced this pull request May 21, 2025
This reverts commit b263c08.

Looks like this triggers a crash in one of the Fortran tests. Reverting
while I investigate
    https://lab.llvm.org/buildbot/#/builders/41/builds/6825
@fhahn
Copy link
Contributor Author

fhahn commented May 21, 2025

Reverted for now in 793bb6b as it looks like this triggers a crash in one of the Fortran tests. Reverting while I investigate
https://lab.llvm.org/buildbot/\#/builders/41/builds/6825

@llvm-ci
Copy link
Collaborator

llvm-ci commented May 21, 2025

LLVM Buildbot has detected a new failure on builder clang-ppc64-aix running on aix-ppc64 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/64/builds/3723

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'lit :: timeout-hang.py' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 13
not env -u FILECHECK_OPTS "/home/llvm/llvm-external-buildbots/workers/env/bin/python3.11" /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical Inputs/timeout-hang/run-nonexistent.txt  --timeout=1 --param external=0 | "/home/llvm/llvm-external-buildbots/workers/env/bin/python3.11" /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/timeout-hang.py 1
# executed command: not env -u FILECHECK_OPTS /home/llvm/llvm-external-buildbots/workers/env/bin/python3.11 /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical Inputs/timeout-hang/run-nonexistent.txt --timeout=1 --param external=0
# .---command stderr------------
# | lit.py: /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 1 seconds was requested on the command line. Forcing timeout to be 1 seconds.
# `-----------------------------
# executed command: /home/llvm/llvm-external-buildbots/workers/env/bin/python3.11 /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/timeout-hang.py 1
# .---command stdout------------
# | Testing took as long or longer than timeout
# `-----------------------------
# error: command failed with exit status: 1

--

********************


@llvm-ci
Copy link
Collaborator

llvm-ci commented May 21, 2025

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve-vls running on linaro-g3-01 while building llvm at step 14 "test-suite".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/143/builds/7863

Here is the relevant piece of the build log for the reference
Step 14 (test-suite) failure: test (failure)
...
size..init_array: 16 
size..interp: 27 
size..note.ABI-tag: 32 
size..plt: 928 
size..rela.dyn: 1176 
size..rela.plt: 1344 
size..rodata: 14480 
size..text: 198032 
**********
NOEXE: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90.test (6111 of 10355)
******************** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/sandbox/build/Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90' is missing
********************
PASS: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_matmul_19_f90.test (6112 of 10355)
********** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_matmul_19_f90.test' RESULTS **********
compile_time: 1.1250 
exec_time: 0.0000 
hash: "d0a43790b67f6ed5f40c19e41e794469" 
link_time: 0.0000 
size: 420824 
size..bss: 960 
size..comment: 256 
size..data: 232 
size..data.rel.ro: 176 
size..dynamic: 496 
size..dynstr: 655 
size..dynsym: 1680 
size..eh_frame: 19960 
size..eh_frame_hdr: 5532 
size..fini: 20 
size..fini_array: 8 
size..gnu.hash: 28 
size..gnu.version: 140 
size..gnu.version_r: 96 
size..got: 168 
size..got.plt: 504 
size..init: 24 
size..init_array: 24 
size..interp: 27 
size..note.ABI-tag: 32 
size..plt: 992 
size..rela.dyn: 1296 
size..rela.plt: 1440 
size..rodata: 15482 
size..text: 211600 
**********
PASS: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__initialization_11_f90.test (6113 of 10355)
********** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__initialization_11_f90.test' RESULTS **********
compile_time: 1.1250 

fhahn added a commit that referenced this pull request May 22, 2025
This reverts commit 793bb6b.

The recommitted version contains a fix to make sure only the original
phis are processed in convertPhisToBlends nu collecting them in a vector
first. This fixes a crash when no mask is needed, because there is only
a single incoming value.

Original message:
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform. It mostly ports the existing logic directly.

There are a number of follow-ups planned in the near future to
further improve on the implementation:
* Edge and block masks are cached in VPPredicator, but the block masks
are still made available to VPRecipeBuilder, so they can be accessed
during recipe construction. As a follow-up, this should be replaced by
adding mask operands to all VPInstructions that need them and use that
during recipe construction.
* The mask caching in a map also means that this map needs updating each
time a new recipe replaces a VPInstruction; this would also be handled
by adding mask operands.

PR: #128420
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 22, 2025
… (#128420)"

This reverts commit 793bb6b.

The recommitted version contains a fix to make sure only the original
phis are processed in convertPhisToBlends nu collecting them in a vector
first. This fixes a crash when no mask is needed, because there is only
a single incoming value.

Original message:
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform. It mostly ports the existing logic directly.

There are a number of follow-ups planned in the near future to
further improve on the implementation:
* Edge and block masks are cached in VPPredicator, but the block masks
are still made available to VPRecipeBuilder, so they can be accessed
during recipe construction. As a follow-up, this should be replaced by
adding mask operands to all VPInstructions that need them and use that
during recipe construction.
* The mask caching in a map also means that this map needs updating each
time a new recipe replaces a VPInstruction; this would also be handled
by adding mask operands.

PR: llvm/llvm-project#128420
@fhahn
Copy link
Contributor Author

fhahn commented May 22, 2025

Re-landed in 95ba550 this morning, looks like the flang bots are happy.

@llvm-ci
Copy link
Collaborator

llvm-ci commented May 25, 2025

LLVM Buildbot has detected a new failure on builder bolt-x86_64-ubuntu-clang running on bolt-worker while building llvm at step 6 "test-build-clang-bolt-stage2-clang-bolt".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/113/builds/7336

Here is the relevant piece of the build log for the reference
Step 6 (test-build-clang-bolt-stage2-clang-bolt) failure: test (failure)
...
924.532 [12/6/3198] Linking CXX static library lib/libclangExtractAPI.a
924.932 [12/5/3199] Linking CXX static library lib/libclangStaticAnalyzerCore.a
925.001 [12/4/3200] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/WebKit/ForwardDeclChecker.cpp.o
925.159 [12/3/3201] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/driver.cpp.o
926.135 [12/2/3202] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/WebKit/RetainPtrCtorAdoptChecker.cpp.o
927.647 [11/2/3203] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o
927.744 [11/1/3204] Linking CXX static library lib/libclangStaticAnalyzerCheckers.a
927.792 [10/1/3205] Linking CXX static library lib/libclangStaticAnalyzerFrontend.a
927.805 [9/1/3206] Linking CXX static library lib/libclangFrontendTool.a
1009.453 [8/1/3207] Linking CXX executable bin/clang-21
FAILED: bin/clang-21 
: && /home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/./bin/clang++ -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wno-unnecessary-virtual-specifier -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -fprofile-instr-generate="/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang/stage2-instrumented-bins/profiles/%4m.profraw" -flto=thin -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -DNDEBUG -Wl,--emit-relocs,-znow -fuse-ld=lld -Wl,--color-diagnostics -fprofile-instr-generate="/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang/stage2-instrumented-bins/profiles/%4m.profraw" -flto=thin -Wl,--thinlto-cache-dir=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang/stage2-instrumented-bins/lto.cache   -Wl,--export-dynamic tools/clang/tools/driver/CMakeFiles/clang.dir/driver.cpp.o tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o tools/clang/tools/driver/CMakeFiles/clang.dir/cc1as_main.cpp.o tools/clang/tools/driver/CMakeFiles/clang.dir/cc1gen_reproducer_main.cpp.o tools/clang/tools/driver/CMakeFiles/clang.dir/clang-driver.cpp.o -o bin/clang-21  -Wl,-rpath,"\$ORIGIN/../lib:"  lib/libLLVMX86CodeGen.a  lib/libLLVMX86AsmParser.a  lib/libLLVMX86Desc.a  lib/libLLVMX86Disassembler.a  lib/libLLVMX86Info.a  lib/libLLVMAnalysis.a  lib/libLLVMCodeGen.a  lib/libLLVMCore.a  lib/libLLVMipo.a  lib/libLLVMAggressiveInstCombine.a  lib/libLLVMInstCombine.a  lib/libLLVMInstrumentation.a  lib/libLLVMMC.a  lib/libLLVMMCParser.a  lib/libLLVMObjCARCOpts.a  lib/libLLVMOption.a  lib/libLLVMScalarOpts.a  lib/libLLVMSupport.a  lib/libLLVMTargetParser.a  lib/libLLVMTransformUtils.a  lib/libLLVMVectorize.a  lib/libclangBasic.a  lib/libclangCodeGen.a  lib/libclangDriver.a  lib/libclangFrontend.a  lib/libclangFrontendTool.a  lib/libclangSerialization.a  lib/libLLVMAsmPrinter.a  lib/libLLVMMCDisassembler.a  lib/libclangCodeGen.a  lib/libLLVMCoverage.a  lib/libLLVMFrontendDriver.a  lib/libLLVMLTO.a  lib/libLLVMExtensions.a  lib/libLLVMPasses.a  lib/libLLVMCFGuard.a  lib/libLLVMGlobalISel.a  lib/libLLVMSelectionDAG.a  lib/libLLVMCodeGen.a  lib/libLLVMObjCARCOpts.a  lib/libLLVMCGData.a  lib/libLLVMCodeGenTypes.a  lib/libLLVMIRPrinter.a  lib/libLLVMTarget.a  lib/libLLVMCoroutines.a  lib/libLLVMipo.a  lib/libLLVMInstrumentation.a  lib/libLLVMVectorize.a  lib/libLLVMSandboxIR.a  lib/libLLVMBitWriter.a  lib/libLLVMLinker.a  lib/libLLVMHipStdPar.a  lib/libclangExtractAPI.a  lib/libclangInstallAPI.a  lib/libLLVMTextAPIBinaryReader.a  lib/libclangRewriteFrontend.a  lib/libclangStaticAnalyzerFrontend.a  lib/libclangStaticAnalyzerCheckers.a  lib/libclangStaticAnalyzerCore.a  lib/libclangCrossTU.a  lib/libclangIndex.a  lib/libclangFrontend.a  lib/libclangDriver.a  lib/libLLVMWindowsDriver.a  lib/libLLVMOption.a  lib/libclangParse.a  lib/libclangSerialization.a  lib/libclangSema.a  lib/libclangAnalysis.a  lib/libclangASTMatchers.a  lib/libclangAPINotes.a  lib/libclangEdit.a  lib/libclangAST.a  lib/libLLVMFrontendHLSL.a  lib/libclangSupport.a  lib/libclangFormat.a  lib/libclangToolingInclusions.a  lib/libclangToolingCore.a  lib/libclangRewrite.a  lib/libclangLex.a  lib/libclangBasic.a  lib/libLLVMFrontendOpenMP.a  lib/libLLVMScalarOpts.a  lib/libLLVMAggressiveInstCombine.a  lib/libLLVMInstCombine.a  lib/libLLVMFrontendOffloading.a  lib/libLLVMTransformUtils.a  lib/libLLVMObjectYAML.a  lib/libLLVMFrontendAtomic.a  lib/libLLVMAnalysis.a  lib/libLLVMProfileData.a  lib/libLLVMSymbolize.a  lib/libLLVMDebugInfoGSYM.a  lib/libLLVMDebugInfoDWARF.a  lib/libLLVMDebugInfoPDB.a  lib/libLLVMDebugInfoCodeView.a  lib/libLLVMDebugInfoMSF.a  lib/libLLVMDebugInfoBTF.a  lib/libLLVMObject.a  lib/libLLVMMCParser.a  lib/libLLVMMC.a  lib/libLLVMIRReader.a  lib/libLLVMBitReader.a  lib/libLLVMAsmParser.a  lib/libLLVMTextAPI.a  lib/libLLVMCore.a  lib/libLLVMBinaryFormat.a  lib/libLLVMTargetParser.a  lib/libLLVMRemarks.a  lib/libLLVMBitstreamReader.a  lib/libLLVMSupport.a  lib/libLLVMDemangle.a  -lrt  -ldl  -lm  /usr/lib/x86_64-linux-gnu/libz.so && :
ld.lld: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/llvm-project/llvm/include/llvm/Support/Casting.h:578: decltype(auto) llvm::cast(From*) [with To = llvm::VPWidenPHIRecipe; From = llvm::VPRecipeBase]: Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Running pass "function<eager-inv>(float2int,lower-constant-intrinsics,chr,loop(loop-rotate<header-duplication;no-prepare-for-lto>,loop-deletion),loop-distribute,inject-tli-mappings,loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>,infer-alignment,loop-load-elim,instcombine<max-iterations=1;no-verify-fixpoint>,simplifycfg<bonus-inst-threshold=1;forward-switch-cond;switch-range-to-icmp;switch-to-lookup;no-keep-loops;hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,slp-vectorizer,vector-combine,instcombine<max-iterations=1;no-verify-fixpoint>,loop-unroll<O3>,transform-warning,sroa<preserve-cfg>,infer-alignment,instcombine<max-iterations=1;no-verify-fixpoint>,loop-mssa(licm<allowspeculation>),alignment-from-assumptions,loop-sink,instsimplify,div-rem-pairs,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;speculate-unpredictables>)" on module "lib/libclangSema.a(SemaConcept.cpp.o at 73478474)"
1.	Running pass "loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>" on function "_ZN5clang4Sema22IsAtLeastAsConstrainedEPKNS_9NamedDeclEN4llvm15MutableArrayRefINS_20AssociatedConstraintEEES3_S7_Rb"
 #0 0x000056202d775240 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x1b7c240)
 #1 0x000056202d77264f llvm::sys::RunSignalHandlers() (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x1b7964f)
 #2 0x000056202d77279a SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007fdab0442520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007fdab04969fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007fdab04969fc __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #6 0x00007fdab04969fc pthread_kill ./nptl/pthread_kill.c:89:10
 #7 0x00007fdab0442476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #8 0x00007fdab04287f3 abort ./stdlib/abort.c:81:7
 #9 0x00007fdab042871b _nl_load_domain ./intl/loadmsgcat.c:1177:9
#10 0x00007fdab0439e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
#11 0x000056202fd0f2af llvm::VPlanTransforms::introduceMasksAndLinearize(llvm::VPlan&, bool) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x41162af)
#12 0x000056202fb57266 llvm::LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(llvm::VFRange&, llvm::LoopVersioning*) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x3f5e266)
#13 0x000056202fb5989c llvm::LoopVectorizationPlanner::buildVPlansWithVPRecipes(llvm::ElementCount, llvm::ElementCount) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x3f6089c)
#14 0x000056202fb5a323 llvm::LoopVectorizationPlanner::plan(llvm::ElementCount, unsigned int) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x3f61323)
#15 0x000056202fb5c33a llvm::LoopVectorizePass::processLoop(llvm::Loop*) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x3f6333a)
#16 0x000056202fb5efb1 llvm::LoopVectorizePass::runImpl(llvm::Function&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x3f65fb1)
#17 0x000056202fb5f606 llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x3f66606)
#18 0x000056202e2d7286 llvm::detail::PassModel<llvm::Function, llvm::LoopVectorizePass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#19 0x0000562030bac60f llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x4fb360f)
#20 0x000056202e0aced6 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) X86CodeGenPassBuilder.cpp:0:0
#21 0x0000562030bacb33 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x4fb3b33)
#22 0x000056202e0ad896 llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) X86CodeGenPassBuilder.cpp:0:0
#23 0x0000562030bae2ed llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x4fb52ed)
#24 0x000056202e2c55dc runNewPMPasses(llvm::lto::Config const&, llvm::Module&, llvm::TargetMachine*, unsigned int, bool, llvm::ModuleSummaryIndex*, llvm::ModuleSummaryIndex const*) LTOBackend.cpp:0:0
#25 0x000056202e2c7032 llvm::lto::opt(llvm::lto::Config const&, llvm::TargetMachine*, unsigned int, llvm::Module&, bool, llvm::ModuleSummaryIndex*, llvm::ModuleSummaryIndex const*, std::vector<unsigned char, std::allocator<unsigned char>> const&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x26ce032)
#26 0x000056202e2c873e llvm::lto::thinBackend(llvm::lto::Config const&, unsigned int, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, llvm::Module&, llvm::ModuleSummaryIndex const&, llvm::FunctionImporter::ImportMapTy const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>*, bool, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, std::vector<unsigned char, std::allocator<unsigned char>> const&)::'lambda'(llvm::Module&, llvm::TargetMachine*, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>)::operator()(llvm::Module&, llvm::TargetMachine*, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>) const LTOBackend.cpp:0:0
#27 0x000056202e2c95de llvm::lto::thinBackend(llvm::lto::Config const&, unsigned int, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, llvm::Module&, llvm::ModuleSummaryIndex const&, llvm::FunctionImporter::ImportMapTy const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>*, bool, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, std::vector<unsigned char, std::allocator<unsigned char>> const&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x26d05de)
#28 0x000056202e2a7c65 (anonymous namespace)::InProcessThinBackend::runThinLTOBackendThread(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, llvm::FileCache, unsigned int, llvm::BitcodeModule, llvm::ModuleSummaryIndex&, llvm::FunctionImporter::ImportMapTy const&, llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void>> const&, std::map<unsigned long, llvm::GlobalValue::LinkageTypes, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValue::LinkageTypes>>> const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>&)::'lambda'(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>)::operator()(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>) const LTO.cpp:0:0
#29 0x000056202e2b6833 (anonymous namespace)::InProcessThinBackend::runThinLTOBackendThread(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, llvm::FileCache, unsigned int, llvm::BitcodeModule, llvm::ModuleSummaryIndex&, llvm::FunctionImporter::ImportMapTy const&, llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void>> const&, std::map<unsigned long, llvm::GlobalValue::LinkageTypes, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValue::LinkageTypes>>> const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>&) LTO.cpp:0:0
#30 0x000056202e2a5e58 std::_Function_handler<void (), std::_Bind<(anonymous namespace)::InProcessThinBackend::start(unsigned int, llvm::BitcodeModule, llvm::FunctionImporter::ImportMapTy const&, llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void>> const&, std::map<unsigned long, llvm::GlobalValue::LinkageTypes, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValue::LinkageTypes>>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>&)::'lambda'(llvm::BitcodeModule, llvm::ModuleSummaryIndex&, llvm::FunctionImporter::ImportMapTy const&, llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void>> const&, std::map<unsigned long, llvm::GlobalValue::LinkageTypes, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValue::LinkageTypes>>> const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>&) (llvm::BitcodeModule, std::reference_wrapper<llvm::ModuleSummaryIndex>, std::reference_wrapper<llvm::FunctionImporter::ImportMapTy const>, std::reference_wrapper<llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void>> const>, std::reference_wrapper<std::map<unsigned long, llvm::GlobalValue::LinkageTypes, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValue::LinkageTypes>>> const>, std::reference_wrapper<llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const>, std::reference_wrapper<llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>>)>>::_M_invoke(std::_Any_data const&) LTO.cpp:0:0
#31 0x000056202db691d2 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<std::function<void ()>>>, void>>::_M_invoke(std::_Any_data const&) BalancedPartitioning.cpp:0:0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants