[VPlan] Move predication to VPlanTransform (NFC). #128420

fhahn · 2025-02-23T14:04:42Z

This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform.

The main logic to perform predication is ready to review, although there are few things to note that should be improved, either directly in the PR or in the future:

Edge and block masks are cached in VPPredicator, but the block masks are still made available via VPRecipeBuilder, so they can be accessed during recipe construction. As a follow-up, this should be replaced by adding mask operands to all VPInstructions that need them and use that during recipe construction.
The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands.

llvmbot · 2025-02-23T14:05:14Z

@llvm/pr-subscribers-vectorizers

Author: Florian Hahn (fhahn)

Changes

This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform.

The main logic to perform predication is ready to review, although there are few things to note that should be improved, either directly in the PR or in the future:

Edge and block masks are cached in VPRecipeBuilder, so they can be accessed during recipe construction. A better alternative may be to add mask operands to all VPInstructions that need them and use that during recipe construction
The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands.

Currently this is still WIP due to early-exit loop handling not working due to the exit conditions not being available in the initial VPlans. This will be fixed with #128419 and follow-ups

All tests except early-exit loops are passing

Patch is 38.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128420.diff

8 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/CMakeLists.txt (+1)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+27-259)
(modified) llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h (+18-27)
(modified) llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp (+13-11)
(modified) llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h (-12)
(added) llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp (+274)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+3-2)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+3)

diff --git a/llvm/lib/Transforms/Vectorize/CMakeLists.txt b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
index 38670ba304e53..74ae61440327c 100644
--- a/llvm/lib/Transforms/Vectorize/CMakeLists.txt
+++ b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
@@ -23,6 +23,7 @@ add_llvm_component_library(LLVMVectorize
   VPlan.cpp
   VPlanAnalysis.cpp
   VPlanHCFGBuilder.cpp
+  VPlanPredicator.cpp
   VPlanRecipes.cpp
   VPlanSLP.cpp
   VPlanTransforms.cpp
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index ced01df7b0d44..a2e20a701d612 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8115,185 +8115,6 @@ void EpilogueVectorizerEpilogueLoop::printDebugTracesAtEnd() {
   });
 }
 
-void VPRecipeBuilder::createSwitchEdgeMasks(SwitchInst *SI) {
-  BasicBlock *Src = SI->getParent();
-  assert(!OrigLoop->isLoopExiting(Src) &&
-         all_of(successors(Src),
-                [this](BasicBlock *Succ) {
-                  return OrigLoop->getHeader() != Succ;
-                }) &&
-         "unsupported switch either exiting loop or continuing to header");
-  // Create masks where the terminator in Src is a switch. We create mask for
-  // all edges at the same time. This is more efficient, as we can create and
-  // collect compares for all cases once.
-  VPValue *Cond = getVPValueOrAddLiveIn(SI->getCondition());
-  BasicBlock *DefaultDst = SI->getDefaultDest();
-  MapVector<BasicBlock *, SmallVector<VPValue *>> Dst2Compares;
-  for (auto &C : SI->cases()) {
-    BasicBlock *Dst = C.getCaseSuccessor();
-    assert(!EdgeMaskCache.contains({Src, Dst}) && "Edge masks already created");
-    // Cases whose destination is the same as default are redundant and can be
-    // ignored - they will get there anyhow.
-    if (Dst == DefaultDst)
-      continue;
-    auto &Compares = Dst2Compares[Dst];
-    VPValue *V = getVPValueOrAddLiveIn(C.getCaseValue());
-    Compares.push_back(Builder.createICmp(CmpInst::ICMP_EQ, Cond, V));
-  }
-
-  // We need to handle 2 separate cases below for all entries in Dst2Compares,
-  // which excludes destinations matching the default destination.
-  VPValue *SrcMask = getBlockInMask(Src);
-  VPValue *DefaultMask = nullptr;
-  for (const auto &[Dst, Conds] : Dst2Compares) {
-    // 1. Dst is not the default destination. Dst is reached if any of the cases
-    // with destination == Dst are taken. Join the conditions for each case
-    // whose destination == Dst using an OR.
-    VPValue *Mask = Conds[0];
-    for (VPValue *V : ArrayRef<VPValue *>(Conds).drop_front())
-      Mask = Builder.createOr(Mask, V);
-    if (SrcMask)
-      Mask = Builder.createLogicalAnd(SrcMask, Mask);
-    EdgeMaskCache[{Src, Dst}] = Mask;
-
-    // 2. Create the mask for the default destination, which is reached if none
-    // of the cases with destination != default destination are taken. Join the
-    // conditions for each case where the destination is != Dst using an OR and
-    // negate it.
-    DefaultMask = DefaultMask ? Builder.createOr(DefaultMask, Mask) : Mask;
-  }
-
-  if (DefaultMask) {
-    DefaultMask = Builder.createNot(DefaultMask);
-    if (SrcMask)
-      DefaultMask = Builder.createLogicalAnd(SrcMask, DefaultMask);
-  }
-  EdgeMaskCache[{Src, DefaultDst}] = DefaultMask;
-}
-
-VPValue *VPRecipeBuilder::createEdgeMask(BasicBlock *Src, BasicBlock *Dst) {
-  assert(is_contained(predecessors(Dst), Src) && "Invalid edge");
-
-  // Look for cached value.
-  std::pair<BasicBlock *, BasicBlock *> Edge(Src, Dst);
-  EdgeMaskCacheTy::iterator ECEntryIt = EdgeMaskCache.find(Edge);
-  if (ECEntryIt != EdgeMaskCache.end())
-    return ECEntryIt->second;
-
-  if (auto *SI = dyn_cast<SwitchInst>(Src->getTerminator())) {
-    createSwitchEdgeMasks(SI);
-    assert(EdgeMaskCache.contains(Edge) && "Mask for Edge not created?");
-    return EdgeMaskCache[Edge];
-  }
-
-  VPValue *SrcMask = getBlockInMask(Src);
-
-  // The terminator has to be a branch inst!
-  BranchInst *BI = dyn_cast<BranchInst>(Src->getTerminator());
-  assert(BI && "Unexpected terminator found");
-  if (!BI->isConditional() || BI->getSuccessor(0) == BI->getSuccessor(1))
-    return EdgeMaskCache[Edge] = SrcMask;
-
-  // If source is an exiting block, we know the exit edge is dynamically dead
-  // in the vector loop, and thus we don't need to restrict the mask.  Avoid
-  // adding uses of an otherwise potentially dead instruction unless we are
-  // vectorizing a loop with uncountable exits. In that case, we always
-  // materialize the mask.
-  if (OrigLoop->isLoopExiting(Src) &&
-      Src != Legal->getUncountableEarlyExitingBlock())
-    return EdgeMaskCache[Edge] = SrcMask;
-
-  VPValue *EdgeMask = getVPValueOrAddLiveIn(BI->getCondition());
-  assert(EdgeMask && "No Edge Mask found for condition");
-
-  if (BI->getSuccessor(0) != Dst)
-    EdgeMask = Builder.createNot(EdgeMask, BI->getDebugLoc());
-
-  if (SrcMask) { // Otherwise block in-mask is all-one, no need to AND.
-    // The bitwise 'And' of SrcMask and EdgeMask introduces new UB if SrcMask
-    // is false and EdgeMask is poison. Avoid that by using 'LogicalAnd'
-    // instead which generates 'select i1 SrcMask, i1 EdgeMask, i1 false'.
-    EdgeMask = Builder.createLogicalAnd(SrcMask, EdgeMask, BI->getDebugLoc());
-  }
-
-  return EdgeMaskCache[Edge] = EdgeMask;
-}
-
-VPValue *VPRecipeBuilder::getEdgeMask(BasicBlock *Src, BasicBlock *Dst) const {
-  assert(is_contained(predecessors(Dst), Src) && "Invalid edge");
-
-  // Look for cached value.
-  std::pair<BasicBlock *, BasicBlock *> Edge(Src, Dst);
-  EdgeMaskCacheTy::const_iterator ECEntryIt = EdgeMaskCache.find(Edge);
-  assert(ECEntryIt != EdgeMaskCache.end() &&
-         "looking up mask for edge which has not been created");
-  return ECEntryIt->second;
-}
-
-void VPRecipeBuilder::createHeaderMask() {
-  BasicBlock *Header = OrigLoop->getHeader();
-
-  // When not folding the tail, use nullptr to model all-true mask.
-  if (!CM.foldTailByMasking()) {
-    BlockMaskCache[Header] = nullptr;
-    return;
-  }
-
-  // Introduce the early-exit compare IV <= BTC to form header block mask.
-  // This is used instead of IV < TC because TC may wrap, unlike BTC. Start by
-  // constructing the desired canonical IV in the header block as its first
-  // non-phi instructions.
-
-  VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock();
-  auto NewInsertionPoint = HeaderVPBB->getFirstNonPhi();
-  auto *IV = new VPWidenCanonicalIVRecipe(Plan.getCanonicalIV());
-  HeaderVPBB->insert(IV, NewInsertionPoint);
-
-  VPBuilder::InsertPointGuard Guard(Builder);
-  Builder.setInsertPoint(HeaderVPBB, NewInsertionPoint);
-  VPValue *BlockMask = nullptr;
-  VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();
-  BlockMask = Builder.createICmp(CmpInst::ICMP_ULE, IV, BTC);
-  BlockMaskCache[Header] = BlockMask;
-}
-
-VPValue *VPRecipeBuilder::getBlockInMask(BasicBlock *BB) const {
-  // Return the cached value.
-  BlockMaskCacheTy::const_iterator BCEntryIt = BlockMaskCache.find(BB);
-  assert(BCEntryIt != BlockMaskCache.end() &&
-         "Trying to access mask for block without one.");
-  return BCEntryIt->second;
-}
-
-void VPRecipeBuilder::createBlockInMask(BasicBlock *BB) {
-  assert(OrigLoop->contains(BB) && "Block is not a part of a loop");
-  assert(BlockMaskCache.count(BB) == 0 && "Mask for block already computed");
-  assert(OrigLoop->getHeader() != BB &&
-         "Loop header must have cached block mask");
-
-  // All-one mask is modelled as no-mask following the convention for masked
-  // load/store/gather/scatter. Initialize BlockMask to no-mask.
-  VPValue *BlockMask = nullptr;
-  // This is the block mask. We OR all unique incoming edges.
-  for (auto *Predecessor :
-       SetVector<BasicBlock *>(pred_begin(BB), pred_end(BB))) {
-    VPValue *EdgeMask = createEdgeMask(Predecessor, BB);
-    if (!EdgeMask) { // Mask of predecessor is all-one so mask of block is too.
-      BlockMaskCache[BB] = EdgeMask;
-      return;
-    }
-
-    if (!BlockMask) { // BlockMask has its initialized nullptr value.
-      BlockMask = EdgeMask;
-      continue;
-    }
-
-    BlockMask = Builder.createOr(BlockMask, EdgeMask, {});
-  }
-
-  BlockMaskCache[BB] = BlockMask;
-}
-
 VPWidenMemoryRecipe *
 VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
                                   VFRange &Range) {
@@ -8318,7 +8139,7 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
 
   VPValue *Mask = nullptr;
   if (Legal->isMaskRequired(I))
-    Mask = getBlockInMask(I->getParent());
+    Mask = getBlockInMask(Builder.getInsertBlock());
 
   // Determine if the pointer operand of the access is either consecutive or
   // reverse consecutive.
@@ -8437,38 +8258,6 @@ VPWidenIntOrFpInductionRecipe *VPRecipeBuilder::tryToOptimizeInductionTruncate(
   return nullptr;
 }
 
-VPBlendRecipe *VPRecipeBuilder::tryToBlend(PHINode *Phi,
-                                           ArrayRef<VPValue *> Operands) {
-  unsigned NumIncoming = Phi->getNumIncomingValues();
-
-  // We know that all PHIs in non-header blocks are converted into selects, so
-  // we don't have to worry about the insertion order and we can just use the
-  // builder. At this point we generate the predication tree. There may be
-  // duplications since this is a simple recursive scan, but future
-  // optimizations will clean it up.
-
-  // Map incoming IR BasicBlocks to incoming VPValues, for lookup below.
-  // TODO: Add operands and masks in order from the VPlan predecessors.
-  DenseMap<BasicBlock *, VPValue *> VPIncomingValues;
-  for (const auto &[Idx, Pred] : enumerate(predecessors(Phi->getParent())))
-    VPIncomingValues[Pred] = Operands[Idx];
-
-  SmallVector<VPValue *, 2> OperandsWithMask;
-  for (unsigned In = 0; In < NumIncoming; In++) {
-    BasicBlock *Pred = Phi->getIncomingBlock(In);
-    OperandsWithMask.push_back(VPIncomingValues.lookup(Pred));
-    VPValue *EdgeMask = getEdgeMask(Pred, Phi->getParent());
-    if (!EdgeMask) {
-      assert(In == 0 && "Both null and non-null edge masks found");
-      assert(all_equal(Operands) &&
-             "Distinct incoming values with one having a full mask");
-      break;
-    }
-    OperandsWithMask.push_back(EdgeMask);
-  }
-  return new VPBlendRecipe(Phi, OperandsWithMask);
-}
-
 VPSingleDefRecipe *VPRecipeBuilder::tryToWidenCall(CallInst *CI,
                                                    ArrayRef<VPValue *> Operands,
                                                    VFRange &Range) {
@@ -8544,7 +8333,7 @@ VPSingleDefRecipe *VPRecipeBuilder::tryToWidenCall(CallInst *CI,
       //      all-true mask.
       VPValue *Mask = nullptr;
       if (Legal->isMaskRequired(CI))
-        Mask = getBlockInMask(CI->getParent());
+        Mask = getBlockInMask(Builder.getInsertBlock());
       else
         Mask = Plan.getOrAddLiveIn(
             ConstantInt::getTrue(IntegerType::getInt1Ty(CI->getContext())));
@@ -8586,7 +8375,7 @@ VPWidenRecipe *VPRecipeBuilder::tryToWiden(Instruction *I,
     // div/rem operation itself.  Otherwise fall through to general handling below.
     if (CM.isPredicatedInst(I)) {
       SmallVector<VPValue *> Ops(Operands);
-      VPValue *Mask = getBlockInMask(I->getParent());
+      VPValue *Mask = getBlockInMask(Builder.getInsertBlock());
       VPValue *One =
           Plan.getOrAddLiveIn(ConstantInt::get(I->getType(), 1u, false));
       auto *SafeRHS = Builder.createSelect(Mask, Ops[1], One, I->getDebugLoc());
@@ -8668,7 +8457,7 @@ VPRecipeBuilder::tryToWidenHistogram(const HistogramInfo *HI,
   // In case of predicated execution (due to tail-folding, or conditional
   // execution, or both), pass the relevant mask.
   if (Legal->isMaskRequired(HI->Store))
-    HGramOps.push_back(getBlockInMask(HI->Store->getParent()));
+    HGramOps.push_back(getBlockInMask(Builder.getInsertBlock()));
 
   return new VPHistogramRecipe(Opcode,
                                make_range(HGramOps.begin(), HGramOps.end()),
@@ -8724,7 +8513,7 @@ VPRecipeBuilder::handleReplication(Instruction *I, ArrayRef<VPValue *> Operands,
     // added initially. Masked replicate recipes will later be placed under an
     // if-then construct to prevent side-effects. Generate recipes to compute
     // the block mask for this region.
-    BlockInMask = getBlockInMask(I->getParent());
+    BlockInMask = getBlockInMask(Builder.getInsertBlock());
   }
 
   // Note that there is some custom logic to mark some intrinsics as uniform
@@ -8857,9 +8646,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
   // nodes, calls and memory operations.
   VPRecipeBase *Recipe;
   if (auto *Phi = dyn_cast<PHINode>(Instr)) {
-    if (Phi->getParent() != OrigLoop->getHeader())
-      return tryToBlend(Phi, Operands);
-
+    assert(Phi->getParent() == OrigLoop->getHeader() &&
+           "Non-header phis should have been handled during predication");
     assert(Operands.size() == 2 && "Must have 2 operands for header phis");
     if ((Recipe = tryToOptimizeInductionPHI(Phi, Operands, Range)))
       return Recipe;
@@ -8964,7 +8752,7 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction,
             ReductionOpcode == Instruction::Sub) &&
            "Expected an ADD or SUB operation for predicated partial "
            "reductions (because the neutral element in the mask is zero)!");
-    VPValue *Mask = getBlockInMask(Reduction->getParent());
+    VPValue *Mask = getBlockInMask(Builder.getInsertBlock());
     VPValue *Zero =
         Plan.getOrAddLiveIn(ConstantInt::get(Reduction->getType(), 0));
     BinOp = Builder.createSelect(Mask, BinOp, Zero, Reduction->getDebugLoc());
@@ -9332,9 +9120,6 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
   bool HasNUW = !IVUpdateMayOverflow || Style == TailFoldingStyle::None;
   addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), HasNUW, DL);
 
-  VPRecipeBuilder RecipeBuilder(*Plan, OrigLoop, TLI, &TTI, Legal, CM, PSE,
-                                Builder);
-
   // ---------------------------------------------------------------------------
   // Pre-construction: record ingredients whose recipes we'll need to further
   // process after constructing the initial VPlan.
@@ -9375,39 +9160,24 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
         return Legal->blockNeedsPredication(BB) || NeedsBlends;
       });
 
-  RecipeBuilder.collectScaledReductions(Range);
 
   auto *MiddleVPBB = Plan->getMiddleBlock();
 
+  VPRecipeBuilder RecipeBuilder(*Plan, OrigLoop, TLI, &TTI, Legal, CM, PSE,
+                                Builder);
+  if (NeedsMasks) {
+    VPlanTransforms::predicateAndLinearize(*Plan, CM.foldTailByMasking(),
+                                           RecipeBuilder);
+  }
+  RecipeBuilder.collectScaledReductions(Range);
+
   // Scan the body of the loop in a topological order to visit each basic block
   // after having visited its predecessor basic blocks.
   ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
       HeaderVPBB);
 
   VPBasicBlock::iterator MBIP = MiddleVPBB->getFirstNonPhi();
-  VPBlockBase *PrevVPBB = nullptr;
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
-    // Handle VPBBs down to the latch.
-    if (VPBB == LoopRegion->getExiting()) {
-      assert(!HCFGBuilder.getIRBBForVPB(VPBB) &&
-             "the latch block shouldn't have a corresponding IRBB");
-      VPBlockUtils::connectBlocks(PrevVPBB, VPBB);
-      break;
-    }
-
-    // Create mask based on the IR BB corresponding to VPBB.
-    // TODO: Predicate directly based on VPlan.
-    Builder.setInsertPoint(VPBB, VPBB->begin());
-    if (VPBB == HeaderVPBB) {
-      Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());
-      RecipeBuilder.createHeaderMask();
-    } else if (NeedsMasks) {
-      // FIXME: At the moment, masks need to be placed at the beginning of the
-      // block, as blends introduced for phi nodes need to use it. The created
-      // blends should be sunk after the mask recipes.
-      RecipeBuilder.createBlockInMask(HCFGBuilder.getIRBBForVPB(VPBB));
-    }
-
     // Convert input VPInstructions to widened recipes.
     for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
       auto *SingleDef = cast<VPSingleDefRecipe>(&R);
@@ -9417,7 +9187,8 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
       // latter are added above for masking.
       // FIXME: Migrate code relying on the underlying instruction from VPlan0
       // to construct recipes below to not use the underlying instruction.
-      if (isa<VPCanonicalIVPHIRecipe, VPWidenCanonicalIVRecipe>(&R) ||
+      if (isa<VPCanonicalIVPHIRecipe, VPWidenCanonicalIVRecipe, VPBlendRecipe>(
+              &R) ||
           (isa<VPInstruction>(&R) && !UnderlyingValue))
         continue;
 
@@ -9469,22 +9240,18 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
       } else {
         Builder.insert(Recipe);
       }
-      if (Recipe->getNumDefinedValues() == 1)
+      if (Recipe->getNumDefinedValues() == 1) {
         SingleDef->replaceAllUsesWith(Recipe->getVPSingleValue());
-      else
+        for (auto &[_, V] : RecipeBuilder.BlockMaskCache) {
+          if (V == SingleDef)
+            V = Recipe->getVPSingleValue();
+        }
+      } else
         assert(Recipe->getNumDefinedValues() == 0 &&
                "Unexpected multidef recipe");
       R.eraseFromParent();
     }
 
-    // Flatten the CFG in the loop. Masks for blocks have already been generated
-    // and added to recipes as needed. To do so, first disconnect VPBB from its
-    // successors. Then connect VPBB to the previously visited VPBB.
-    for (auto *Succ : to_vector(VPBB->getSuccessors()))
-      VPBlockUtils::disconnectBlocks(VPBB, Succ);
-    if (PrevVPBB)
-      VPBlockUtils::connectBlocks(PrevVPBB, VPBB);
-    PrevVPBB = VPBB;
   }
 
   assert(isa<VPRegionBlock>(Plan->getVectorLoopRegion()) &&
@@ -9783,7 +9550,7 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
       BasicBlock *BB = CurrentLinkI->getParent();
       VPValue *CondOp = nullptr;
       if (CM.blockNeedsPredicationForAnyReason(BB))
-        CondOp = RecipeBuilder.getBlockInMask(BB);
+        CondOp = RecipeBuilder.getBlockInMask(CurrentLink->getParent());
 
       auto *RedRecipe = new VPReductionRecipe(
           RdxDesc, CurrentLinkI, PreviousLink, VecOp, CondOp,
@@ -9818,7 +9585,8 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
     // different numbers of lanes. Partial reductions mask the input instead.
     if (!PhiR->isInLoop() && CM.foldTailByMasking() &&
         !isa<VPPartialReductionRecipe>(OrigExitingVPV->getDefiningRecipe())) {
-      VPValue *Cond = RecipeBuilder.getBlockInMask(OrigLoop->getHeader());
+      VPValue *Cond =
+          RecipeBuilder.getBlockInMask(VectorLoopRegion->getEntryBasicBlock());
       assert(OrigExitingVPV->getDefiningRecipe()->getParent() != LatchVPBB &&
              "reduction recipe must be defined before latch");
       Type *PhiTy = PhiR->getOperand(0)->getLiveInIRValue()->getType();
diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
index 334cfbad8bd7c..9900c4117c5f6 100644
--- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
@@ -73,11 +73,14 @@ class VPRecipeBuilder {
   /// if-conversion currently takes place during VPlan-construction, so these
   /// caches are only used at that stage.
   using EdgeMaskCacheTy =
-      DenseMap<std::pair<BasicBlock *, BasicBlock *>, VPValue *>;
-  using BlockMaskCacheTy = DenseMap<BasicBlock *, VPValue *>;
+      DenseMap<std::pair<VPBasicBlock *, VPBasicBlock *>, VPValue *>;
+  using BlockMaskCacheTy = DenseMap<VPBasicBlock *, VPValue *>;
   EdgeMaskCacheTy EdgeMaskCache;
+
+public:
   BlockMaskCacheTy BlockMaskCache;
 
+private:
   // VPlan construction support: Hold a mapping from ingredients to
   // their recipe.
   DenseMap<Instruction *, VPRecipeBase *> Ingredient2Recipe;
@@ -114,11 +117,6 @@ class VPRecipeBuilder {
   tryToOptimizeInductionTruncate(TruncInst *I, ArrayRef<VPValue *> Operands,
                                  VFRange &Range);
 
-  /// Handle non-...
[truncated]

llvmbot · 2025-02-23T14:05:14Z

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform.

The main logic to perform predication is ready to review, although there are few things to note that should be improved, either directly in the PR or in the future:

Edge and block masks are cached in VPRecipeBuilder, so they can be accessed during recipe construction. A better alternative may be to add mask operands to all VPInstructions that need them and use that during recipe construction
The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands.

Currently this is still WIP due to early-exit loop handling not working due to the exit conditions not being available in the initial VPlans. This will be fixed with #128419 and follow-ups

All tests except early-exit loops are passing

Patch is 38.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128420.diff

8 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/CMakeLists.txt (+1)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+27-259)
(modified) llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h (+18-27)
(modified) llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp (+13-11)
(modified) llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.h (-12)
(added) llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp (+274)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+3-2)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+3)

diff --git a/llvm/lib/Transforms/Vectorize/CMakeLists.txt b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
index 38670ba304e53..74ae61440327c 100644
--- a/llvm/lib/Transforms/Vectorize/CMakeLists.txt
+++ b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
@@ -23,6 +23,7 @@ add_llvm_component_library(LLVMVectorize
   VPlan.cpp
   VPlanAnalysis.cpp
   VPlanHCFGBuilder.cpp
+  VPlanPredicator.cpp
   VPlanRecipes.cpp
   VPlanSLP.cpp
   VPlanTransforms.cpp
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index ced01df7b0d44..a2e20a701d612 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -8115,185 +8115,6 @@ void EpilogueVectorizerEpilogueLoop::printDebugTracesAtEnd() {
   });
 }
 
-void VPRecipeBuilder::createSwitchEdgeMasks(SwitchInst *SI) {
-  BasicBlock *Src = SI->getParent();
-  assert(!OrigLoop->isLoopExiting(Src) &&
-         all_of(successors(Src),
-                [this](BasicBlock *Succ) {
-                  return OrigLoop->getHeader() != Succ;
-                }) &&
-         "unsupported switch either exiting loop or continuing to header");
-  // Create masks where the terminator in Src is a switch. We create mask for
-  // all edges at the same time. This is more efficient, as we can create and
-  // collect compares for all cases once.
-  VPValue *Cond = getVPValueOrAddLiveIn(SI->getCondition());
-  BasicBlock *DefaultDst = SI->getDefaultDest();
-  MapVector<BasicBlock *, SmallVector<VPValue *>> Dst2Compares;
-  for (auto &C : SI->cases()) {
-    BasicBlock *Dst = C.getCaseSuccessor();
-    assert(!EdgeMaskCache.contains({Src, Dst}) && "Edge masks already created");
-    // Cases whose destination is the same as default are redundant and can be
-    // ignored - they will get there anyhow.
-    if (Dst == DefaultDst)
-      continue;
-    auto &Compares = Dst2Compares[Dst];
-    VPValue *V = getVPValueOrAddLiveIn(C.getCaseValue());
-    Compares.push_back(Builder.createICmp(CmpInst::ICMP_EQ, Cond, V));
-  }
-
-  // We need to handle 2 separate cases below for all entries in Dst2Compares,
-  // which excludes destinations matching the default destination.
-  VPValue *SrcMask = getBlockInMask(Src);
-  VPValue *DefaultMask = nullptr;
-  for (const auto &[Dst, Conds] : Dst2Compares) {
-    // 1. Dst is not the default destination. Dst is reached if any of the cases
-    // with destination == Dst are taken. Join the conditions for each case
-    // whose destination == Dst using an OR.
-    VPValue *Mask = Conds[0];
-    for (VPValue *V : ArrayRef<VPValue *>(Conds).drop_front())
-      Mask = Builder.createOr(Mask, V);
-    if (SrcMask)
-      Mask = Builder.createLogicalAnd(SrcMask, Mask);
-    EdgeMaskCache[{Src, Dst}] = Mask;
-
-    // 2. Create the mask for the default destination, which is reached if none
-    // of the cases with destination != default destination are taken. Join the
-    // conditions for each case where the destination is != Dst using an OR and
-    // negate it.
-    DefaultMask = DefaultMask ? Builder.createOr(DefaultMask, Mask) : Mask;
-  }
-
-  if (DefaultMask) {
-    DefaultMask = Builder.createNot(DefaultMask);
-    if (SrcMask)
-      DefaultMask = Builder.createLogicalAnd(SrcMask, DefaultMask);
-  }
-  EdgeMaskCache[{Src, DefaultDst}] = DefaultMask;
-}
-
-VPValue *VPRecipeBuilder::createEdgeMask(BasicBlock *Src, BasicBlock *Dst) {
-  assert(is_contained(predecessors(Dst), Src) && "Invalid edge");
-
-  // Look for cached value.
-  std::pair<BasicBlock *, BasicBlock *> Edge(Src, Dst);
-  EdgeMaskCacheTy::iterator ECEntryIt = EdgeMaskCache.find(Edge);
-  if (ECEntryIt != EdgeMaskCache.end())
-    return ECEntryIt->second;
-
-  if (auto *SI = dyn_cast<SwitchInst>(Src->getTerminator())) {
-    createSwitchEdgeMasks(SI);
-    assert(EdgeMaskCache.contains(Edge) && "Mask for Edge not created?");
-    return EdgeMaskCache[Edge];
-  }
-
-  VPValue *SrcMask = getBlockInMask(Src);
-
-  // The terminator has to be a branch inst!
-  BranchInst *BI = dyn_cast<BranchInst>(Src->getTerminator());
-  assert(BI && "Unexpected terminator found");
-  if (!BI->isConditional() || BI->getSuccessor(0) == BI->getSuccessor(1))
-    return EdgeMaskCache[Edge] = SrcMask;
-
-  // If source is an exiting block, we know the exit edge is dynamically dead
-  // in the vector loop, and thus we don't need to restrict the mask.  Avoid
-  // adding uses of an otherwise potentially dead instruction unless we are
-  // vectorizing a loop with uncountable exits. In that case, we always
-  // materialize the mask.
-  if (OrigLoop->isLoopExiting(Src) &&
-      Src != Legal->getUncountableEarlyExitingBlock())
-    return EdgeMaskCache[Edge] = SrcMask;
-
-  VPValue *EdgeMask = getVPValueOrAddLiveIn(BI->getCondition());
-  assert(EdgeMask && "No Edge Mask found for condition");
-
-  if (BI->getSuccessor(0) != Dst)
-    EdgeMask = Builder.createNot(EdgeMask, BI->getDebugLoc());
-
-  if (SrcMask) { // Otherwise block in-mask is all-one, no need to AND.
-    // The bitwise 'And' of SrcMask and EdgeMask introduces new UB if SrcMask
-    // is false and EdgeMask is poison. Avoid that by using 'LogicalAnd'
-    // instead which generates 'select i1 SrcMask, i1 EdgeMask, i1 false'.
-    EdgeMask = Builder.createLogicalAnd(SrcMask, EdgeMask, BI->getDebugLoc());
-  }
-
-  return EdgeMaskCache[Edge] = EdgeMask;
-}
-
-VPValue *VPRecipeBuilder::getEdgeMask(BasicBlock *Src, BasicBlock *Dst) const {
-  assert(is_contained(predecessors(Dst), Src) && "Invalid edge");
-
-  // Look for cached value.
-  std::pair<BasicBlock *, BasicBlock *> Edge(Src, Dst);
-  EdgeMaskCacheTy::const_iterator ECEntryIt = EdgeMaskCache.find(Edge);
-  assert(ECEntryIt != EdgeMaskCache.end() &&
-         "looking up mask for edge which has not been created");
-  return ECEntryIt->second;
-}
-
-void VPRecipeBuilder::createHeaderMask() {
-  BasicBlock *Header = OrigLoop->getHeader();
-
-  // When not folding the tail, use nullptr to model all-true mask.
-  if (!CM.foldTailByMasking()) {
-    BlockMaskCache[Header] = nullptr;
-    return;
-  }
-
-  // Introduce the early-exit compare IV <= BTC to form header block mask.
-  // This is used instead of IV < TC because TC may wrap, unlike BTC. Start by
-  // constructing the desired canonical IV in the header block as its first
-  // non-phi instructions.
-
-  VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock();
-  auto NewInsertionPoint = HeaderVPBB->getFirstNonPhi();
-  auto *IV = new VPWidenCanonicalIVRecipe(Plan.getCanonicalIV());
-  HeaderVPBB->insert(IV, NewInsertionPoint);
-
-  VPBuilder::InsertPointGuard Guard(Builder);
-  Builder.setInsertPoint(HeaderVPBB, NewInsertionPoint);
-  VPValue *BlockMask = nullptr;
-  VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();
-  BlockMask = Builder.createICmp(CmpInst::ICMP_ULE, IV, BTC);
-  BlockMaskCache[Header] = BlockMask;
-}
-
-VPValue *VPRecipeBuilder::getBlockInMask(BasicBlock *BB) const {
-  // Return the cached value.
-  BlockMaskCacheTy::const_iterator BCEntryIt = BlockMaskCache.find(BB);
-  assert(BCEntryIt != BlockMaskCache.end() &&
-         "Trying to access mask for block without one.");
-  return BCEntryIt->second;
-}
-
-void VPRecipeBuilder::createBlockInMask(BasicBlock *BB) {
-  assert(OrigLoop->contains(BB) && "Block is not a part of a loop");
-  assert(BlockMaskCache.count(BB) == 0 && "Mask for block already computed");
-  assert(OrigLoop->getHeader() != BB &&
-         "Loop header must have cached block mask");
-
-  // All-one mask is modelled as no-mask following the convention for masked
-  // load/store/gather/scatter. Initialize BlockMask to no-mask.
-  VPValue *BlockMask = nullptr;
-  // This is the block mask. We OR all unique incoming edges.
-  for (auto *Predecessor :
-       SetVector<BasicBlock *>(pred_begin(BB), pred_end(BB))) {
-    VPValue *EdgeMask = createEdgeMask(Predecessor, BB);
-    if (!EdgeMask) { // Mask of predecessor is all-one so mask of block is too.
-      BlockMaskCache[BB] = EdgeMask;
-      return;
-    }
-
-    if (!BlockMask) { // BlockMask has its initialized nullptr value.
-      BlockMask = EdgeMask;
-      continue;
-    }
-
-    BlockMask = Builder.createOr(BlockMask, EdgeMask, {});
-  }
-
-  BlockMaskCache[BB] = BlockMask;
-}
-
 VPWidenMemoryRecipe *
 VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
                                   VFRange &Range) {
@@ -8318,7 +8139,7 @@ VPRecipeBuilder::tryToWidenMemory(Instruction *I, ArrayRef<VPValue *> Operands,
 
   VPValue *Mask = nullptr;
   if (Legal->isMaskRequired(I))
-    Mask = getBlockInMask(I->getParent());
+    Mask = getBlockInMask(Builder.getInsertBlock());
 
   // Determine if the pointer operand of the access is either consecutive or
   // reverse consecutive.
@@ -8437,38 +8258,6 @@ VPWidenIntOrFpInductionRecipe *VPRecipeBuilder::tryToOptimizeInductionTruncate(
   return nullptr;
 }
 
-VPBlendRecipe *VPRecipeBuilder::tryToBlend(PHINode *Phi,
-                                           ArrayRef<VPValue *> Operands) {
-  unsigned NumIncoming = Phi->getNumIncomingValues();
-
-  // We know that all PHIs in non-header blocks are converted into selects, so
-  // we don't have to worry about the insertion order and we can just use the
-  // builder. At this point we generate the predication tree. There may be
-  // duplications since this is a simple recursive scan, but future
-  // optimizations will clean it up.
-
-  // Map incoming IR BasicBlocks to incoming VPValues, for lookup below.
-  // TODO: Add operands and masks in order from the VPlan predecessors.
-  DenseMap<BasicBlock *, VPValue *> VPIncomingValues;
-  for (const auto &[Idx, Pred] : enumerate(predecessors(Phi->getParent())))
-    VPIncomingValues[Pred] = Operands[Idx];
-
-  SmallVector<VPValue *, 2> OperandsWithMask;
-  for (unsigned In = 0; In < NumIncoming; In++) {
-    BasicBlock *Pred = Phi->getIncomingBlock(In);
-    OperandsWithMask.push_back(VPIncomingValues.lookup(Pred));
-    VPValue *EdgeMask = getEdgeMask(Pred, Phi->getParent());
-    if (!EdgeMask) {
-      assert(In == 0 && "Both null and non-null edge masks found");
-      assert(all_equal(Operands) &&
-             "Distinct incoming values with one having a full mask");
-      break;
-    }
-    OperandsWithMask.push_back(EdgeMask);
-  }
-  return new VPBlendRecipe(Phi, OperandsWithMask);
-}
-
 VPSingleDefRecipe *VPRecipeBuilder::tryToWidenCall(CallInst *CI,
                                                    ArrayRef<VPValue *> Operands,
                                                    VFRange &Range) {
@@ -8544,7 +8333,7 @@ VPSingleDefRecipe *VPRecipeBuilder::tryToWidenCall(CallInst *CI,
       //      all-true mask.
       VPValue *Mask = nullptr;
       if (Legal->isMaskRequired(CI))
-        Mask = getBlockInMask(CI->getParent());
+        Mask = getBlockInMask(Builder.getInsertBlock());
       else
         Mask = Plan.getOrAddLiveIn(
             ConstantInt::getTrue(IntegerType::getInt1Ty(CI->getContext())));
@@ -8586,7 +8375,7 @@ VPWidenRecipe *VPRecipeBuilder::tryToWiden(Instruction *I,
     // div/rem operation itself.  Otherwise fall through to general handling below.
     if (CM.isPredicatedInst(I)) {
       SmallVector<VPValue *> Ops(Operands);
-      VPValue *Mask = getBlockInMask(I->getParent());
+      VPValue *Mask = getBlockInMask(Builder.getInsertBlock());
       VPValue *One =
           Plan.getOrAddLiveIn(ConstantInt::get(I->getType(), 1u, false));
       auto *SafeRHS = Builder.createSelect(Mask, Ops[1], One, I->getDebugLoc());
@@ -8668,7 +8457,7 @@ VPRecipeBuilder::tryToWidenHistogram(const HistogramInfo *HI,
   // In case of predicated execution (due to tail-folding, or conditional
   // execution, or both), pass the relevant mask.
   if (Legal->isMaskRequired(HI->Store))
-    HGramOps.push_back(getBlockInMask(HI->Store->getParent()));
+    HGramOps.push_back(getBlockInMask(Builder.getInsertBlock()));
 
   return new VPHistogramRecipe(Opcode,
                                make_range(HGramOps.begin(), HGramOps.end()),
@@ -8724,7 +8513,7 @@ VPRecipeBuilder::handleReplication(Instruction *I, ArrayRef<VPValue *> Operands,
     // added initially. Masked replicate recipes will later be placed under an
     // if-then construct to prevent side-effects. Generate recipes to compute
     // the block mask for this region.
-    BlockInMask = getBlockInMask(I->getParent());
+    BlockInMask = getBlockInMask(Builder.getInsertBlock());
   }
 
   // Note that there is some custom logic to mark some intrinsics as uniform
@@ -8857,9 +8646,8 @@ VPRecipeBase *VPRecipeBuilder::tryToCreateWidenRecipe(
   // nodes, calls and memory operations.
   VPRecipeBase *Recipe;
   if (auto *Phi = dyn_cast<PHINode>(Instr)) {
-    if (Phi->getParent() != OrigLoop->getHeader())
-      return tryToBlend(Phi, Operands);
-
+    assert(Phi->getParent() == OrigLoop->getHeader() &&
+           "Non-header phis should have been handled during predication");
     assert(Operands.size() == 2 && "Must have 2 operands for header phis");
     if ((Recipe = tryToOptimizeInductionPHI(Phi, Operands, Range)))
       return Recipe;
@@ -8964,7 +8752,7 @@ VPRecipeBuilder::tryToCreatePartialReduction(Instruction *Reduction,
             ReductionOpcode == Instruction::Sub) &&
            "Expected an ADD or SUB operation for predicated partial "
            "reductions (because the neutral element in the mask is zero)!");
-    VPValue *Mask = getBlockInMask(Reduction->getParent());
+    VPValue *Mask = getBlockInMask(Builder.getInsertBlock());
     VPValue *Zero =
         Plan.getOrAddLiveIn(ConstantInt::get(Reduction->getType(), 0));
     BinOp = Builder.createSelect(Mask, BinOp, Zero, Reduction->getDebugLoc());
@@ -9332,9 +9120,6 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
   bool HasNUW = !IVUpdateMayOverflow || Style == TailFoldingStyle::None;
   addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), HasNUW, DL);
 
-  VPRecipeBuilder RecipeBuilder(*Plan, OrigLoop, TLI, &TTI, Legal, CM, PSE,
-                                Builder);
-
   // ---------------------------------------------------------------------------
   // Pre-construction: record ingredients whose recipes we'll need to further
   // process after constructing the initial VPlan.
@@ -9375,39 +9160,24 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
         return Legal->blockNeedsPredication(BB) || NeedsBlends;
       });
 
-  RecipeBuilder.collectScaledReductions(Range);
 
   auto *MiddleVPBB = Plan->getMiddleBlock();
 
+  VPRecipeBuilder RecipeBuilder(*Plan, OrigLoop, TLI, &TTI, Legal, CM, PSE,
+                                Builder);
+  if (NeedsMasks) {
+    VPlanTransforms::predicateAndLinearize(*Plan, CM.foldTailByMasking(),
+                                           RecipeBuilder);
+  }
+  RecipeBuilder.collectScaledReductions(Range);
+
   // Scan the body of the loop in a topological order to visit each basic block
   // after having visited its predecessor basic blocks.
   ReversePostOrderTraversal<VPBlockShallowTraversalWrapper<VPBlockBase *>> RPOT(
       HeaderVPBB);
 
   VPBasicBlock::iterator MBIP = MiddleVPBB->getFirstNonPhi();
-  VPBlockBase *PrevVPBB = nullptr;
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
-    // Handle VPBBs down to the latch.
-    if (VPBB == LoopRegion->getExiting()) {
-      assert(!HCFGBuilder.getIRBBForVPB(VPBB) &&
-             "the latch block shouldn't have a corresponding IRBB");
-      VPBlockUtils::connectBlocks(PrevVPBB, VPBB);
-      break;
-    }
-
-    // Create mask based on the IR BB corresponding to VPBB.
-    // TODO: Predicate directly based on VPlan.
-    Builder.setInsertPoint(VPBB, VPBB->begin());
-    if (VPBB == HeaderVPBB) {
-      Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());
-      RecipeBuilder.createHeaderMask();
-    } else if (NeedsMasks) {
-      // FIXME: At the moment, masks need to be placed at the beginning of the
-      // block, as blends introduced for phi nodes need to use it. The created
-      // blends should be sunk after the mask recipes.
-      RecipeBuilder.createBlockInMask(HCFGBuilder.getIRBBForVPB(VPBB));
-    }
-
     // Convert input VPInstructions to widened recipes.
     for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
       auto *SingleDef = cast<VPSingleDefRecipe>(&R);
@@ -9417,7 +9187,8 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
       // latter are added above for masking.
       // FIXME: Migrate code relying on the underlying instruction from VPlan0
       // to construct recipes below to not use the underlying instruction.
-      if (isa<VPCanonicalIVPHIRecipe, VPWidenCanonicalIVRecipe>(&R) ||
+      if (isa<VPCanonicalIVPHIRecipe, VPWidenCanonicalIVRecipe, VPBlendRecipe>(
+              &R) ||
           (isa<VPInstruction>(&R) && !UnderlyingValue))
         continue;
 
@@ -9469,22 +9240,18 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
       } else {
         Builder.insert(Recipe);
       }
-      if (Recipe->getNumDefinedValues() == 1)
+      if (Recipe->getNumDefinedValues() == 1) {
         SingleDef->replaceAllUsesWith(Recipe->getVPSingleValue());
-      else
+        for (auto &[_, V] : RecipeBuilder.BlockMaskCache) {
+          if (V == SingleDef)
+            V = Recipe->getVPSingleValue();
+        }
+      } else
         assert(Recipe->getNumDefinedValues() == 0 &&
                "Unexpected multidef recipe");
       R.eraseFromParent();
     }
 
-    // Flatten the CFG in the loop. Masks for blocks have already been generated
-    // and added to recipes as needed. To do so, first disconnect VPBB from its
-    // successors. Then connect VPBB to the previously visited VPBB.
-    for (auto *Succ : to_vector(VPBB->getSuccessors()))
-      VPBlockUtils::disconnectBlocks(VPBB, Succ);
-    if (PrevVPBB)
-      VPBlockUtils::connectBlocks(PrevVPBB, VPBB);
-    PrevVPBB = VPBB;
   }
 
   assert(isa<VPRegionBlock>(Plan->getVectorLoopRegion()) &&
@@ -9783,7 +9550,7 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
       BasicBlock *BB = CurrentLinkI->getParent();
       VPValue *CondOp = nullptr;
       if (CM.blockNeedsPredicationForAnyReason(BB))
-        CondOp = RecipeBuilder.getBlockInMask(BB);
+        CondOp = RecipeBuilder.getBlockInMask(CurrentLink->getParent());
 
       auto *RedRecipe = new VPReductionRecipe(
           RdxDesc, CurrentLinkI, PreviousLink, VecOp, CondOp,
@@ -9818,7 +9585,8 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
     // different numbers of lanes. Partial reductions mask the input instead.
     if (!PhiR->isInLoop() && CM.foldTailByMasking() &&
         !isa<VPPartialReductionRecipe>(OrigExitingVPV->getDefiningRecipe())) {
-      VPValue *Cond = RecipeBuilder.getBlockInMask(OrigLoop->getHeader());
+      VPValue *Cond =
+          RecipeBuilder.getBlockInMask(VectorLoopRegion->getEntryBasicBlock());
       assert(OrigExitingVPV->getDefiningRecipe()->getParent() != LatchVPBB &&
              "reduction recipe must be defined before latch");
       Type *PhiTy = PhiR->getOperand(0)->getLiveInIRValue()->getType();
diff --git a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
index 334cfbad8bd7c..9900c4117c5f6 100644
--- a/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
+++ b/llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
@@ -73,11 +73,14 @@ class VPRecipeBuilder {
   /// if-conversion currently takes place during VPlan-construction, so these
   /// caches are only used at that stage.
   using EdgeMaskCacheTy =
-      DenseMap<std::pair<BasicBlock *, BasicBlock *>, VPValue *>;
-  using BlockMaskCacheTy = DenseMap<BasicBlock *, VPValue *>;
+      DenseMap<std::pair<VPBasicBlock *, VPBasicBlock *>, VPValue *>;
+  using BlockMaskCacheTy = DenseMap<VPBasicBlock *, VPValue *>;
   EdgeMaskCacheTy EdgeMaskCache;
+
+public:
   BlockMaskCacheTy BlockMaskCache;
 
+private:
   // VPlan construction support: Hold a mapping from ingredients to
   // their recipe.
   DenseMap<Instruction *, VPRecipeBase *> Ingredient2Recipe;
@@ -114,11 +117,6 @@ class VPRecipeBuilder {
   tryToOptimizeInductionTruncate(TruncInst *I, ArrayRef<VPValue *> Operands,
                                  VFRange &Range);
 
-  /// Handle non-...
[truncated]

github-actions · 2025-02-23T14:08:32Z

✅ With the latest revision this PR passed the C/C++ code formatter.

fhahn · 2025-03-30T16:24:51Z

Still WIP, but early-exits are now handled properly as well, by retaining exit branches during initial construction.

This needs to be split up, which I'll start once #129402 lands

Update initial VPlan construction to include exit conditions and edges. For now, all early exits are disconnected before forming the regions, but a follow-up will update uncountable exit handling to also happen here. This is required to enable VPlan predication and remove the dependence any IR BBs (llvm#128420). This includes updates in a few places to use replaceSuccessor/replacePredecessor to preserve the order of predecessors and successors, to reduce the need of fixing up phi operand orderings. This unfortunately required making them public, not sure if there's a

Move early-exit handling up front to original VPlan construction, before introducing early exits. This builds on llvm#137709, which adds exiting edges to the original VPlan, instead of adding exit blocks later. This retains the exit conditions early, and means we can handle early exits before forming regions, without the reliance on VPRecipeBuilder. Once we retain all exits initially, handling early exits before region construction ensures the regions are valid; otherwise we would leave edges exiting the region from elsewhere than the latch. Removing the reliance on VPRecipeBuilder removes the dependence on mapping IR BBs to VPBBs and unblocks predication as VPlan transform: llvm#128420. Depends on llvm#137709.

Update initial VPlan construction to include exit conditions and edges. For now, all early exits are disconnected before forming the regions, but a follow-up will update uncountable exit handling to also happen here. This is required to enable VPlan predication and remove the dependence any IR BBs (llvm#128420). This includes updates in a few places to use replaceSuccessor/replacePredecessor to preserve the order of predecessors and successors, to reduce the need of fixing up phi operand orderings. This unfortunately required making them public, not sure if there's a

Move early-exit handling up front to original VPlan construction, before introducing early exits. This builds on llvm#137709, which adds exiting edges to the original VPlan, instead of adding exit blocks later. This retains the exit conditions early, and means we can handle early exits before forming regions, without the reliance on VPRecipeBuilder. Once we retain all exits initially, handling early exits before region construction ensures the regions are valid; otherwise we would leave edges exiting the region from elsewhere than the latch. Removing the reliance on VPRecipeBuilder removes the dependence on mapping IR BBs to VPBBs and unblocks predication as VPlan transform: llvm#128420. Depends on llvm#137709.

…7709) Update initial VPlan construction to include exit conditions and edges. The loop region is now first constructed without entry/exiting. Those are set after inserting the region in the CFG, to preserve the original predecessor/successor order of blocks. For now, all early exits are disconnected before forming the regions, but a follow-up will update uncountable exit handling to also happen here. This is required to enable VPlan predication and remove the dependence any IR BBs (#128420). PR: #137709

…(NFC). (#137709) Update initial VPlan construction to include exit conditions and edges. The loop region is now first constructed without entry/exiting. Those are set after inserting the region in the CFG, to preserve the original predecessor/successor order of blocks. For now, all early exits are disconnected before forming the regions, but a follow-up will update uncountable exit handling to also happen here. This is required to enable VPlan predication and remove the dependence any IR BBs (llvm/llvm-project#128420). PR: llvm/llvm-project#137709

Move early-exit handling up front to original VPlan construction, before introducing early exits. This builds on llvm#137709, which adds exiting edges to the original VPlan, instead of adding exit blocks later. This retains the exit conditions early, and means we can handle early exits before forming regions, without the reliance on VPRecipeBuilder. Once we retain all exits initially, handling early exits before region construction ensures the regions are valid; otherwise we would leave edges exiting the region from elsewhere than the latch. Removing the reliance on VPRecipeBuilder removes the dependence on mapping IR BBs to VPBBs and unblocks predication as VPlan transform: llvm#128420. Depends on llvm#137709.

This allows migrating some more code to be based on VPBBs in VPRecipeBuilder, in preparation for #128420.

This allows migrating some more code to be based on VPBBs in VPRecipeBuilder, in preparation for llvm/llvm-project#128420.

Update recipe construction to use VPBBs to look up masks, in preparation for #128420.

fhahn · 2025-05-18T18:41:02Z

May be worth reviewing the "native" VPlanPredicator logic introduced in https://reviews.llvm.org/D53349 and removed in https://reviews.llvm.org/D123017.

Might be good to as follow-up to potentially improve the predication implementation, once we completed the NFC move and completed the transition? Although the original l VPlanPredicator may need more work, as it was not enabled by default even in the native path and only tested via C++ unit tests.

ayalz

This LGTM, thanks!
Raised several suggestions, can also be addressed as follow-up.

ayalz · 2025-05-18T18:16:49Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  // to remove the need to keep a map of masks beyond the predication
+  // transform.
+  RecipeBuilder.updateBlockMaskCache(Old2New);
+  for (const auto &[Old, New] : Old2New)


Suggested change

for (const auto &[Old, New] : Old2New)

for (const auto &[Old, _] : Old2New)

?

Done thanks

ayalz · 2025-05-18T18:20:45Z

llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h

+  void updateBlockMaskCache(const DenseMap<VPValue *, VPValue *> &Old2New) {
+    for (auto &[_, V] : BlockMaskCache) {
+      if (auto *New = Old2New.lookup(V)) {
+        V->replaceAllUsesWith(New);


nit: worth removing V from Old2New now?

Cannot be done for now, as Old2New is used to erase old recipes after updateBlockMaskCache

ayalz · 2025-05-18T18:21:28Z

llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp

@@ -66,8 +66,7 @@ class PlainCFGBuilder {
      : TheLoop(Lp), LI(LI), Plan(std::make_unique<VPlan>(Lp)) {}

  /// Build plain CFG for TheLoop  and connects it to Plan's entry.


Suggested change

/// Build plain CFG for TheLoop and connects it to Plan's entry.

/// Build plain CFG for TheLoop and connect it to Plan's entry.

Updated thanks.

ayalz · 2025-05-19T08:17:41Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -9488,7 +9267,8 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range,
      // latter are added above for masking.


Follow-up: have this stage take care of widening original scalar recipes, including canonical IV, blend, and masking recipes (underlying-less VPInstructions)?

ayalz · 2025-05-19T08:19:00Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-      });
-
+  // ---------------------------------------------------------------------------
+  // Construct recipes for the instructions in the loop


Suggested change

// Construct recipes for the instructions in the loop

// Construct wide recipes and apply predication for original scalar VPInstructions in the loop.

?

Follow-up: outline this into a VPlanTransform?

Yep plan to do so, thanks

ayalz · 2025-05-20T13:26:47Z

llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp

+}
+
+VPValue *VPPredicator::createBlockInMask(VPBasicBlock *VPBB) {
+  Builder.setInsertPoint(VPBB, VPBB->begin());


Perhaps better to

Suggested change

Builder.setInsertPoint(VPBB, VPBB->begin());

Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());

as this keeps phi's in order and allows subsequent traversals of phis() to convert them into blends?

it needs to stay as-is for now, as blends need masks that have been created earlier. Will check and adjust separately.

ayalz · 2025-05-20T13:30:53Z

llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp

+    SmallVector<VPWidenPHIRecipe *> Phis;
+    for (VPRecipeBase &R : VPBB->phis())
+      Phis.push_back(cast<VPWidenPHIRecipe>(&R));
+
+    Predicator.createBlockInMask(VPBB);


Suggested change

SmallVector<VPWidenPHIRecipe *> Phis;

for (VPRecipeBase &R : VPBB->phis())

Phis.push_back(cast<VPWidenPHIRecipe>(&R));

Predicator.createBlockInMask(VPBB);

Predicator.createBlockInMask(VPBB);

SmallVector<VPWidenPHIRecipe *> Phis;

for (VPRecipeBase &R : VPBB->phis())

Phis.push_back(cast<VPWidenPHIRecipe>(&R));

seems a bit more consistent as Phis are part of the "PhiToBlends" below while createBlockInMask() are part of "introducingBlockMasks" started above with header mask; provided createBlockInMask sets its insert point after all phi's.
Would make_early_inc_range suffice instead of copying into a SmallVector?

Yep, adjusted to insert blends using Builder.insert, removing the need for a vector.

ayalz · 2025-05-20T13:49:07Z

llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp

+  // Linearize the blocks of the loop into one serial chain.
+  VPBlockBase *PrevVPBB = nullptr;
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
+    // Handle VPBBs down to the latch.


This "Handle VPBBs down to latch" early-break is needed when traversing CFG to stop RPOT from going out of the loop. Is it still needed here where RPOT traverses the region, shallowly? If so, is it needed in the createBlockMasks/convertPhisToBlends loop above too?

Removed, thannks

ayalz · 2025-05-20T13:57:31Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.h

@@ -224,6 +222,16 @@ struct VPlanTransforms {
  /// candidates.
  static void narrowInterleaveGroups(VPlan &Plan, ElementCount VF,
                                     unsigned VectorRegWidth);
+
+  /// Predicate and linearize the control-flow in the only loop region of
+  /// \p Plan. If \p FoldTail is true, also create a mask guarding the loop


Suggested change

/// \p Plan. If \p FoldTail is true, also create a mask guarding the loop

/// \p Plan. If \p FoldTail is true, create a mask guarding the loop

done thanks

ayalz · 2025-05-20T13:58:45Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.h

+  /// Predicate and linearize the control-flow in the only loop region of
+  /// \p Plan. If \p FoldTail is true, also create a mask guarding the loop
+  /// header, otherwise use all-true for the header mask. Masks for blocks are
+  /// added to \p BlockMaskCache, which in turn will temporarily be used later


Suggested change

/// added to \p BlockMaskCache, which in turn will temporarily be used later

/// added to \p BlockMaskCache in order to be used later

ayalz · 2025-05-20T14:17:40Z

May be worth reviewing the "native" VPlanPredicator logic introduced in https://reviews.llvm.org/D53349 and removed in https://reviews.llvm.org/D123017.

Might be good to as follow-up to potentially improve the predication implementation, once we completed the NFC move and completed the transition? Although the original l VPlanPredicator may need more work, as it was not enabled by default even in the native path and only tested via C++ unit tests.

Sure, just noting that this revives an older VPlanPredicator.cpp (along with its log?), which could offer some directions for improvements and/or extensions.

llvm-ci · 2025-05-21T15:02:11Z

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime-2 running on rocm-worker-hw-02 while building llvm at step 6 "test-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/10/builds/5775

Here is the relevant piece of the build log for the reference

Step 6 (test-openmp) failure: test (failure)
******************** TEST 'libarcher :: races/lock-unrelated.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 13
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp  -gdwarf-4 -O1 -fsanitize=thread  -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src   /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp 2>&1 | tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp.log | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp -gdwarf-4 -O1 -fsanitize=thread -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp -latomic
# note: command had no output on stdout or stderr
# executed command: env TSAN_OPTIONS=ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1 /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp
# note: command had no output on stdout or stderr
# executed command: tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp.log
# note: command had no output on stdout or stderr
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c
# note: command had no output on stdout or stderr
# RUN: at line 14
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp  -gdwarf-4 -O1 -fsanitize=thread  -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src   /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp -latomic && env ARCHER_OPTIONS="ignore_serial=1 report_data_leak=1" env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp 2>&1 | tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp.log | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp -gdwarf-4 -O1 -fsanitize=thread -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp -latomic
# note: command had no output on stdout or stderr
# executed command: env 'ARCHER_OPTIONS=ignore_serial=1 report_data_leak=1' env TSAN_OPTIONS=ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1 /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp
# note: command had no output on stdout or stderr
# executed command: tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/lock-unrelated.c.tmp.log
# note: command had no output on stdout or stderr
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c
# .---command stderr------------
# | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c:47:11: error: CHECK: expected string not found in input
# | // CHECK: ThreadSanitizer: reported {{[1-7]}} warnings
# |           ^
# | <stdin>:26:5: note: scanning from here
# | DONE
# |     ^
# | <stdin>:27:1: note: possible intended match here
# | ThreadSanitizer: thread T4 finished with ignores enabled, created at:
# | ^
# | 
# | Input file: <stdin>
# | Check file: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             .
# |             .
# |             .
# |            21:  #0 pthread_create /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1045:3 (lock-unrelated.c.tmp+0xa2c2a) 
# |            22:  #1 __kmp_create_worker z_Linux_util.cpp (libomp.so+0xcac82) 
# |            23:  
# |            24: SUMMARY: ThreadSanitizer: data race /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/lock-unrelated.c:31:8 in main.omp_outlined_debug__ 
# |            25: ================== 
...

This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform. It mostly ports the existing logic directly. There are a number of follow-ups planned in the near future to further improve on the implementation: * Edge and block masks are cached in VPPredicator, but the block masks are still made available to VPRecipeBuilder, so they can be accessed during recipe construction. As a follow-up, this should be replaced by adding mask operands to all VPInstructions that need them and use that during recipe construction. * The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands. PR: llvm/llvm-project#128420

llvm-ci · 2025-05-21T16:02:43Z

LLVM Buildbot has detected a new failure on builder llvm-nvptx64-nvidia-ubuntu running on as-builder-7 while building llvm at step 2 "checkout".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/160/builds/17815

Here is the relevant piece of the build log for the reference

Step 2 (checkout) failure: update (failure)
...
Resolving deltas:  58% (92/157)
Resolving deltas:  59% (93/157)
Resolving deltas:  60% (95/157)
Resolving deltas:  61% (96/157)
Resolving deltas:  62% (98/157)
Resolving deltas:  63% (99/157)
Resolving deltas:  64% (101/157)
Resolving deltas:  65% (103/157)
Resolving deltas:  66% (104/157)
Resolving deltas:  67% (106/157)
Resolving deltas:  68% (107/157)
Resolving deltas:  69% (109/157)
Resolving deltas:  70% (110/157)
Resolving deltas:  71% (112/157)
Resolving deltas:  72% (114/157)
Resolving deltas:  73% (115/157)
Resolving deltas:  74% (117/157)
Resolving deltas:  75% (118/157)
Resolving deltas:  76% (120/157)
Resolving deltas:  77% (121/157)
Resolving deltas:  78% (123/157)
Resolving deltas:  79% (125/157)
Resolving deltas:  80% (126/157)
Resolving deltas:  81% (128/157)
Resolving deltas:  82% (129/157)
Resolving deltas:  83% (131/157)
Resolving deltas:  84% (132/157)
Resolving deltas:  85% (134/157)
Resolving deltas:  86% (136/157)
Resolving deltas:  87% (137/157)
Resolving deltas:  88% (139/157)
Resolving deltas:  89% (140/157)
Resolving deltas:  90% (142/157)
Resolving deltas:  91% (143/157)
Resolving deltas:  92% (145/157)
Resolving deltas:  93% (147/157)
Resolving deltas:  94% (148/157)
Resolving deltas:  95% (150/157)
Resolving deltas:  96% (151/157)
Resolving deltas:  97% (153/157)
Resolving deltas:  98% (154/157)
Resolving deltas:  99% (156/157)
Resolving deltas: 100% (157/157)
Resolving deltas: 100% (157/157), completed with 123 local objects.
From https://github.com/llvm/llvm-project
 * branch                      main       -> FETCH_HEAD
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
fatal: sha1 file '/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/llvm-project/.git/index.lock' write error. Out of diskspace
fatal: sha1 file '/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx64-nvidia-ubuntu/llvm-project/.git/index.lock' write error. Out of diskspace

llvm-ci · 2025-05-21T16:03:15Z

LLVM Buildbot has detected a new failure on builder llvm-nvptx-nvidia-ubuntu running on as-builder-7 while building llvm at step 2 "checkout".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/180/builds/17958

Here is the relevant piece of the build log for the reference

Step 2 (checkout) failure: update (failure)
...
Resolving deltas:  58% (92/158)
Resolving deltas:  59% (94/158)
Resolving deltas:  60% (95/158)
Resolving deltas:  61% (97/158)
Resolving deltas:  62% (98/158)
Resolving deltas:  63% (100/158)
Resolving deltas:  64% (102/158)
Resolving deltas:  65% (103/158)
Resolving deltas:  66% (105/158)
Resolving deltas:  67% (106/158)
Resolving deltas:  68% (108/158)
Resolving deltas:  69% (110/158)
Resolving deltas:  70% (111/158)
Resolving deltas:  71% (113/158)
Resolving deltas:  72% (114/158)
Resolving deltas:  73% (116/158)
Resolving deltas:  74% (117/158)
Resolving deltas:  75% (119/158)
Resolving deltas:  76% (121/158)
Resolving deltas:  77% (122/158)
Resolving deltas:  78% (124/158)
Resolving deltas:  79% (125/158)
Resolving deltas:  80% (127/158)
Resolving deltas:  81% (128/158)
Resolving deltas:  82% (130/158)
Resolving deltas:  83% (132/158)
Resolving deltas:  84% (133/158)
Resolving deltas:  85% (135/158)
Resolving deltas:  86% (136/158)
Resolving deltas:  87% (138/158)
Resolving deltas:  88% (140/158)
Resolving deltas:  89% (141/158)
Resolving deltas:  90% (143/158)
Resolving deltas:  91% (144/158)
Resolving deltas:  92% (146/158)
Resolving deltas:  93% (147/158)
Resolving deltas:  94% (149/158)
Resolving deltas:  95% (151/158)
Resolving deltas:  96% (152/158)
Resolving deltas:  97% (154/158)
Resolving deltas:  98% (155/158)
Resolving deltas:  99% (157/158)
Resolving deltas: 100% (158/158)
Resolving deltas: 100% (158/158), completed with 124 local objects.
From https://github.com/llvm/llvm-project
 * branch                      main       -> FETCH_HEAD
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
fatal: sha1 file '/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx-nvidia-ubuntu/llvm-project/.git/index.lock' write error. Out of diskspace
fatal: sha1 file '/home/buildbot/worker/as-builder-7/ramdisk/llvm-nvptx-nvidia-ubuntu/llvm-project/.git/index.lock' write error. Out of diskspace

llvm-ci · 2025-05-21T16:25:45Z

LLVM Buildbot has detected a new failure on builder flang-runtime-cuda-gcc running on as-builder-7 while building llvm at step 6 "build-flang-rt".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/152/builds/2711

Here is the relevant piece of the build log for the reference

Step 6 (build-flang-rt) failure: cmake (failure)
...
          detected during instantiation of "__nv_bool Fortran::runtime::io::ChildUnformattedIoStatementState<DIR>::Receive(char *, std::size_t, std::size_t) [with DIR=Fortran::runtime::io::Direction::Input]" at line 1101

8.957 [2/6/117] Building CUDA object flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/extrema.ptx
10.596 [2/5/118] Building CXX object flang-rt/lib/runtime/CMakeFiles/flang_rt.runtime.static.dir/reduce.cpp.o
11.028 [1/5/119] Linking CXX static library /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build/lib/clang/21/lib/x86_64-unknown-linux-gnu/libflang_rt.runtime.a
11.033 [1/4/120] Building CUDA object flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/matmul.ptx
11.635 [1/3/121] Building CUDA object flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/findloc.ptx
12.041 [1/2/122] Building CUDA object flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/matmul-transpose.ptx
19.054 [1/1/123] Building CUDA object flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/dot-product.ptx
19.728 [0/1/124] Linking CUDA static library flang-rt/lib/runtime/libflang_rt.runtimePTX.a
FAILED: flang-rt/lib/runtime/libflang_rt.runtimePTX.a 
: && /usr/bin/cmake -E rm -f flang-rt/lib/runtime/libflang_rt.runtimePTX.a && /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build/bin/llvm-ar qc flang-rt/lib/runtime/libflang_rt.runtimePTX.a  flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/lib/Decimal/binary-to-decimal.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/llvm-project/flang/lib/Decimal/decimal-to-binary.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/ISO_Fortran_binding.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/allocator-registry.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/allocatable.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/array-constructor.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/assign.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/buffer.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/character.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/connection.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/copy.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/derived-api.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/derived.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/descriptor-io.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/descriptor.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/dot-product.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/edit-input.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/edit-output.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/environment.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/external-unit.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/extrema.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/file.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/findloc.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/format.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/inquiry.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/internal-unit.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/io-api.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/io-api-minimal.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/io-error.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/io-stmt.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/iostat.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/matmul-transpose.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/matmul.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/memory.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/misc-intrinsic.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/namelist.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/non-tbp-dio.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/numeric.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/pointer.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/product.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/pseudo-unit.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/ragged.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/stat.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/stop.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/sum.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/support.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/terminator.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.ru
t.runtimePTX.dir/type-code.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/type-info.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/unit.ptx flang-rt/lib/runtime/CMakeFiles/obj.flang_rt.runtimePTX.dir/utf.ptx && /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build/bin/llvm-ranlib flang-rt/lib/runtime/libflang_rt.runtimePTX.a && :
LLVM ERROR: IO failure on output stream: No space left on device
ninja: build stopped: subcommand failed.
FAILED: runtimes/CMakeFiles/flang-rt /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build/runtimes/CMakeFiles/flang-rt 
cd /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build/runtimes/runtimes-bins && /usr/bin/cmake --build /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-gcc/build/runtimes/runtimes-bins/ --target flang-rt --config Release
ninja: build stopped: subcommand failed.

llvm-ci · 2025-05-21T17:19:07Z

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve2-vla running on linaro-g4-02 while building llvm at step 14 "test-suite".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/198/builds/4652

Here is the relevant piece of the build log for the reference

Step 14 (test-suite) failure: test (failure)
...
size..init_array: 16 
size..interp: 27 
size..note.ABI-tag: 32 
size..plt: 928 
size..rela.dyn: 1176 
size..rela.plt: 1344 
size..rodata: 14491 
size..text: 198032 
**********
NOEXE: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90.test (6089 of 10355)
******************** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve2-vla/test/sandbox/build/Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90' is missing
********************
PASS: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__implicit_12_f90.test (6090 of 10355)
********** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__implicit_12_f90.test' RESULTS **********
compile_time: 0.8763 
exec_time: 0.0005 
hash: "ac6b5721de1683371acd2a9be1d52f6f" 
link_time: 0.0000 
size: 565496 
size..bss: 176 
size..comment: 256 
size..data: 232 
size..data.rel.ro: 176 
size..dynamic: 496 
size..dynstr: 630 
size..dynsym: 1608 
size..eh_frame: 20608 
size..eh_frame_hdr: 5740 
size..fini: 20 
size..fini_array: 8 
size..gnu.hash: 28 
size..gnu.version: 134 
size..gnu.version_r: 96 
size..got: 160 
size..got.plt: 480 
size..init: 24 
size..init_array: 16 
size..interp: 27 
size..note.ABI-tag: 32 
size..plt: 944 
size..rela.dyn: 1248 
size..rela.plt: 1368 
size..rodata: 16035 
size..text: 353904 
**********
PASS: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__implicit_class_1_f90.test (6091 of 10355)
********** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__implicit_class_1_f90.test' RESULTS **********
compile_time: 0.8763

llvm-ci · 2025-05-21T18:14:26Z

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve-vla running on linaro-g3-02 while building llvm at step 14 "test-suite".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/17/builds/8225

Here is the relevant piece of the build log for the reference

Step 14 (test-suite) failure: test (failure)
...
size..init_array: 16 
size..interp: 27 
size..note.ABI-tag: 32 
size..plt: 928 
size..rela.dyn: 1176 
size..rela.plt: 1344 
size..rodata: 14480 
size..text: 198032 
**********
NOEXE: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90.test (6111 of 10355)
******************** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vla/test/sandbox/build/Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90' is missing
********************
PASS: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_transpose_1_f90.test (6112 of 10355)
********** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_transpose_1_f90.test' RESULTS **********
compile_time: 1.1315 
exec_time: 0.0000 
hash: "3413d9388541ee8829fbee93834edce8" 
link_time: 0.0000 
size: 972144 
size..bss: 824 
size..comment: 256 
size..data: 688 
size..data.rel.ro: 176 
size..dynamic: 496 
size..dynstr: 666 
size..dynsym: 1704 
size..eh_frame: 34784 
size..eh_frame_hdr: 8908 
size..fini: 20 
size..fini_array: 8 
size..gnu.hash: 28 
size..gnu.version: 142 
size..gnu.version_r: 96 
size..got: 192 
size..got.plt: 512 
size..init: 24 
size..init_array: 16 
size..interp: 27 
size..note.ABI-tag: 32 
size..plt: 1008 
size..rela.dyn: 2376 
size..rela.plt: 1464 
size..rodata: 26854 
size..text: 636816 
**********
PASS: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_matmul_13_f90.test (6113 of 10355)
********** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_matmul_13_f90.test' RESULTS **********
compile_time: 1.1315

This reverts commit b263c08. Looks like this triggers a crash in one of the Fortran tests. Reverting while I investigate https://lab.llvm.org/buildbot/#/builders/41/builds/6825

fhahn · 2025-05-21T18:25:13Z

Reverted for now in 793bb6b as it looks like this triggers a crash in one of the Fortran tests. Reverting while I investigate
https://lab.llvm.org/buildbot/\#/builders/41/builds/6825

llvm-ci · 2025-05-21T18:44:03Z

LLVM Buildbot has detected a new failure on builder clang-ppc64-aix running on aix-ppc64 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/64/builds/3723

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'lit :: timeout-hang.py' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 13
not env -u FILECHECK_OPTS "/home/llvm/llvm-external-buildbots/workers/env/bin/python3.11" /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical Inputs/timeout-hang/run-nonexistent.txt  --timeout=1 --param external=0 | "/home/llvm/llvm-external-buildbots/workers/env/bin/python3.11" /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/timeout-hang.py 1
# executed command: not env -u FILECHECK_OPTS /home/llvm/llvm-external-buildbots/workers/env/bin/python3.11 /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical Inputs/timeout-hang/run-nonexistent.txt --timeout=1 --param external=0
# .---command stderr------------
# | lit.py: /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 1 seconds was requested on the command line. Forcing timeout to be 1 seconds.
# `-----------------------------
# executed command: /home/llvm/llvm-external-buildbots/workers/env/bin/python3.11 /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/timeout-hang.py 1
# .---command stdout------------
# | Testing took as long or longer than timeout
# `-----------------------------
# error: command failed with exit status: 1

--

********************

llvm-ci · 2025-05-21T19:02:51Z

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve-vls running on linaro-g3-01 while building llvm at step 14 "test-suite".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/143/builds/7863

Here is the relevant piece of the build log for the reference

Step 14 (test-suite) failure: test (failure)
...
size..init_array: 16 
size..interp: 27 
size..note.ABI-tag: 32 
size..plt: 928 
size..rela.dyn: 1176 
size..rela.plt: 1344 
size..rodata: 14480 
size..text: 198032 
**********
NOEXE: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90.test (6111 of 10355)
******************** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90.test' FAILED ********************
Executable '/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/test/sandbox/build/Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_sum_3_f90' is missing
********************
PASS: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_matmul_19_f90.test (6112 of 10355)
********** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__inline_matmul_19_f90.test' RESULTS **********
compile_time: 1.1250 
exec_time: 0.0000 
hash: "d0a43790b67f6ed5f40c19e41e794469" 
link_time: 0.0000 
size: 420824 
size..bss: 960 
size..comment: 256 
size..data: 232 
size..data.rel.ro: 176 
size..dynamic: 496 
size..dynstr: 655 
size..dynsym: 1680 
size..eh_frame: 19960 
size..eh_frame_hdr: 5532 
size..fini: 20 
size..fini_array: 8 
size..gnu.hash: 28 
size..gnu.version: 140 
size..gnu.version_r: 96 
size..got: 168 
size..got.plt: 504 
size..init: 24 
size..init_array: 24 
size..interp: 27 
size..note.ABI-tag: 32 
size..plt: 992 
size..rela.dyn: 1296 
size..rela.plt: 1440 
size..rodata: 15482 
size..text: 211600 
**********
PASS: test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__initialization_11_f90.test (6113 of 10355)
********** TEST 'test-suite :: Fortran/gfortran/regression/gfortran-regression-execute-regression__initialization_11_f90.test' RESULTS **********
compile_time: 1.1250

This reverts commit 793bb6b. The recommitted version contains a fix to make sure only the original phis are processed in convertPhisToBlends nu collecting them in a vector first. This fixes a crash when no mask is needed, because there is only a single incoming value. Original message: This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform. It mostly ports the existing logic directly. There are a number of follow-ups planned in the near future to further improve on the implementation: * Edge and block masks are cached in VPPredicator, but the block masks are still made available to VPRecipeBuilder, so they can be accessed during recipe construction. As a follow-up, this should be replaced by adding mask operands to all VPInstructions that need them and use that during recipe construction. * The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands. PR: #128420

… (#128420)" This reverts commit 793bb6b. The recommitted version contains a fix to make sure only the original phis are processed in convertPhisToBlends nu collecting them in a vector first. This fixes a crash when no mask is needed, because there is only a single incoming value. Original message: This patch moves the logic to predicate and linearize a VPlan to a dedicated VPlan transform. It mostly ports the existing logic directly. There are a number of follow-ups planned in the near future to further improve on the implementation: * Edge and block masks are cached in VPPredicator, but the block masks are still made available to VPRecipeBuilder, so they can be accessed during recipe construction. As a follow-up, this should be replaced by adding mask operands to all VPInstructions that need them and use that during recipe construction. * The mask caching in a map also means that this map needs updating each time a new recipe replaces a VPInstruction; this would also be handled by adding mask operands. PR: llvm/llvm-project#128420

fhahn · 2025-05-22T17:39:03Z

Re-landed in 95ba550 this morning, looks like the flang bots are happy.

llvm-ci · 2025-05-25T11:44:59Z

LLVM Buildbot has detected a new failure on builder bolt-x86_64-ubuntu-clang running on bolt-worker while building llvm at step 6 "test-build-clang-bolt-stage2-clang-bolt".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/113/builds/7336

Here is the relevant piece of the build log for the reference

Step 6 (test-build-clang-bolt-stage2-clang-bolt) failure: test (failure)
...
924.532 [12/6/3198] Linking CXX static library lib/libclangExtractAPI.a
924.932 [12/5/3199] Linking CXX static library lib/libclangStaticAnalyzerCore.a
925.001 [12/4/3200] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/WebKit/ForwardDeclChecker.cpp.o
925.159 [12/3/3201] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/driver.cpp.o
926.135 [12/2/3202] Building CXX object tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/WebKit/RetainPtrCtorAdoptChecker.cpp.o
927.647 [11/2/3203] Building CXX object tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o
927.744 [11/1/3204] Linking CXX static library lib/libclangStaticAnalyzerCheckers.a
927.792 [10/1/3205] Linking CXX static library lib/libclangStaticAnalyzerFrontend.a
927.805 [9/1/3206] Linking CXX static library lib/libclangFrontendTool.a
1009.453 [8/1/3207] Linking CXX executable bin/clang-21
FAILED: bin/clang-21 
: && /home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/./bin/clang++ -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wno-unnecessary-virtual-specifier -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -fprofile-instr-generate="/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang/stage2-instrumented-bins/profiles/%4m.profraw" -flto=thin -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -DNDEBUG -Wl,--emit-relocs,-znow -fuse-ld=lld -Wl,--color-diagnostics -fprofile-instr-generate="/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang/stage2-instrumented-bins/profiles/%4m.profraw" -flto=thin -Wl,--thinlto-cache-dir=/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/tools/clang/stage2-instrumented-bins/lto.cache   -Wl,--export-dynamic tools/clang/tools/driver/CMakeFiles/clang.dir/driver.cpp.o tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o tools/clang/tools/driver/CMakeFiles/clang.dir/cc1as_main.cpp.o tools/clang/tools/driver/CMakeFiles/clang.dir/cc1gen_reproducer_main.cpp.o tools/clang/tools/driver/CMakeFiles/clang.dir/clang-driver.cpp.o -o bin/clang-21  -Wl,-rpath,"\$ORIGIN/../lib:"  lib/libLLVMX86CodeGen.a  lib/libLLVMX86AsmParser.a  lib/libLLVMX86Desc.a  lib/libLLVMX86Disassembler.a  lib/libLLVMX86Info.a  lib/libLLVMAnalysis.a  lib/libLLVMCodeGen.a  lib/libLLVMCore.a  lib/libLLVMipo.a  lib/libLLVMAggressiveInstCombine.a  lib/libLLVMInstCombine.a  lib/libLLVMInstrumentation.a  lib/libLLVMMC.a  lib/libLLVMMCParser.a  lib/libLLVMObjCARCOpts.a  lib/libLLVMOption.a  lib/libLLVMScalarOpts.a  lib/libLLVMSupport.a  lib/libLLVMTargetParser.a  lib/libLLVMTransformUtils.a  lib/libLLVMVectorize.a  lib/libclangBasic.a  lib/libclangCodeGen.a  lib/libclangDriver.a  lib/libclangFrontend.a  lib/libclangFrontendTool.a  lib/libclangSerialization.a  lib/libLLVMAsmPrinter.a  lib/libLLVMMCDisassembler.a  lib/libclangCodeGen.a  lib/libLLVMCoverage.a  lib/libLLVMFrontendDriver.a  lib/libLLVMLTO.a  lib/libLLVMExtensions.a  lib/libLLVMPasses.a  lib/libLLVMCFGuard.a  lib/libLLVMGlobalISel.a  lib/libLLVMSelectionDAG.a  lib/libLLVMCodeGen.a  lib/libLLVMObjCARCOpts.a  lib/libLLVMCGData.a  lib/libLLVMCodeGenTypes.a  lib/libLLVMIRPrinter.a  lib/libLLVMTarget.a  lib/libLLVMCoroutines.a  lib/libLLVMipo.a  lib/libLLVMInstrumentation.a  lib/libLLVMVectorize.a  lib/libLLVMSandboxIR.a  lib/libLLVMBitWriter.a  lib/libLLVMLinker.a  lib/libLLVMHipStdPar.a  lib/libclangExtractAPI.a  lib/libclangInstallAPI.a  lib/libLLVMTextAPIBinaryReader.a  lib/libclangRewriteFrontend.a  lib/libclangStaticAnalyzerFrontend.a  lib/libclangStaticAnalyzerCheckers.a  lib/libclangStaticAnalyzerCore.a  lib/libclangCrossTU.a  lib/libclangIndex.a  lib/libclangFrontend.a  lib/libclangDriver.a  lib/libLLVMWindowsDriver.a  lib/libLLVMOption.a  lib/libclangParse.a  lib/libclangSerialization.a  lib/libclangSema.a  lib/libclangAnalysis.a  lib/libclangASTMatchers.a  lib/libclangAPINotes.a  lib/libclangEdit.a  lib/libclangAST.a  lib/libLLVMFrontendHLSL.a  lib/libclangSupport.a  lib/libclangFormat.a  lib/libclangToolingInclusions.a  lib/libclangToolingCore.a  lib/libclangRewrite.a  lib/libclangLex.a  lib/libclangBasic.a  lib/libLLVMFrontendOpenMP.a  lib/libLLVMScalarOpts.a  lib/libLLVMAggressiveInstCombine.a  lib/libLLVMInstCombine.a  lib/libLLVMFrontendOffloading.a  lib/libLLVMTransformUtils.a  lib/libLLVMObjectYAML.a  lib/libLLVMFrontendAtomic.a  lib/libLLVMAnalysis.a  lib/libLLVMProfileData.a  lib/libLLVMSymbolize.a  lib/libLLVMDebugInfoGSYM.a  lib/libLLVMDebugInfoDWARF.a  lib/libLLVMDebugInfoPDB.a  lib/libLLVMDebugInfoCodeView.a  lib/libLLVMDebugInfoMSF.a  lib/libLLVMDebugInfoBTF.a  lib/libLLVMObject.a  lib/libLLVMMCParser.a  lib/libLLVMMC.a  lib/libLLVMIRReader.a  lib/libLLVMBitReader.a  lib/libLLVMAsmParser.a  lib/libLLVMTextAPI.a  lib/libLLVMCore.a  lib/libLLVMBinaryFormat.a  lib/libLLVMTargetParser.a  lib/libLLVMRemarks.a  lib/libLLVMBitstreamReader.a  lib/libLLVMSupport.a  lib/libLLVMDemangle.a  -lrt  -ldl  -lm  /usr/lib/x86_64-linux-gnu/libz.so && :
ld.lld: /home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/llvm-project/llvm/include/llvm/Support/Casting.h:578: decltype(auto) llvm::cast(From*) [with To = llvm::VPWidenPHIRecipe; From = llvm::VPRecipeBase]: Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Running pass "function<eager-inv>(float2int,lower-constant-intrinsics,chr,loop(loop-rotate<header-duplication;no-prepare-for-lto>,loop-deletion),loop-distribute,inject-tli-mappings,loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>,infer-alignment,loop-load-elim,instcombine<max-iterations=1;no-verify-fixpoint>,simplifycfg<bonus-inst-threshold=1;forward-switch-cond;switch-range-to-icmp;switch-to-lookup;no-keep-loops;hoist-common-insts;no-hoist-loads-stores-with-cond-faulting;sink-common-insts;speculate-blocks;simplify-cond-branch;no-speculate-unpredictables>,slp-vectorizer,vector-combine,instcombine<max-iterations=1;no-verify-fixpoint>,loop-unroll<O3>,transform-warning,sroa<preserve-cfg>,infer-alignment,instcombine<max-iterations=1;no-verify-fixpoint>,loop-mssa(licm<allowspeculation>),alignment-from-assumptions,loop-sink,instsimplify,div-rem-pairs,tailcallelim,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;hoist-loads-stores-with-cond-faulting;no-sink-common-insts;speculate-blocks;simplify-cond-branch;speculate-unpredictables>)" on module "lib/libclangSema.a(SemaConcept.cpp.o at 73478474)"
1.	Running pass "loop-vectorize<no-interleave-forced-only;no-vectorize-forced-only;>" on function "_ZN5clang4Sema22IsAtLeastAsConstrainedEPKNS_9NamedDeclEN4llvm15MutableArrayRefINS_20AssociatedConstraintEEES3_S7_Rb"
 #0 0x000056202d775240 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x1b7c240)
 #1 0x000056202d77264f llvm::sys::RunSignalHandlers() (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x1b7964f)
 #2 0x000056202d77279a SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007fdab0442520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007fdab04969fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007fdab04969fc __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #6 0x00007fdab04969fc pthread_kill ./nptl/pthread_kill.c:89:10
 #7 0x00007fdab0442476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #8 0x00007fdab04287f3 abort ./stdlib/abort.c:81:7
 #9 0x00007fdab042871b _nl_load_domain ./intl/loadmsgcat.c:1177:9
#10 0x00007fdab0439e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
#11 0x000056202fd0f2af llvm::VPlanTransforms::introduceMasksAndLinearize(llvm::VPlan&, bool) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x41162af)
#12 0x000056202fb57266 llvm::LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(llvm::VFRange&, llvm::LoopVersioning*) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x3f5e266)
#13 0x000056202fb5989c llvm::LoopVectorizationPlanner::buildVPlansWithVPRecipes(llvm::ElementCount, llvm::ElementCount) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x3f6089c)
#14 0x000056202fb5a323 llvm::LoopVectorizationPlanner::plan(llvm::ElementCount, unsigned int) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x3f61323)
#15 0x000056202fb5c33a llvm::LoopVectorizePass::processLoop(llvm::Loop*) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x3f6333a)
#16 0x000056202fb5efb1 llvm::LoopVectorizePass::runImpl(llvm::Function&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x3f65fb1)
#17 0x000056202fb5f606 llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x3f66606)
#18 0x000056202e2d7286 llvm::detail::PassModel<llvm::Function, llvm::LoopVectorizePass, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#19 0x0000562030bac60f llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x4fb360f)
#20 0x000056202e0aced6 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) X86CodeGenPassBuilder.cpp:0:0
#21 0x0000562030bacb33 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x4fb3b33)
#22 0x000056202e0ad896 llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) X86CodeGenPassBuilder.cpp:0:0
#23 0x0000562030bae2ed llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x4fb52ed)
#24 0x000056202e2c55dc runNewPMPasses(llvm::lto::Config const&, llvm::Module&, llvm::TargetMachine*, unsigned int, bool, llvm::ModuleSummaryIndex*, llvm::ModuleSummaryIndex const*) LTOBackend.cpp:0:0
#25 0x000056202e2c7032 llvm::lto::opt(llvm::lto::Config const&, llvm::TargetMachine*, unsigned int, llvm::Module&, bool, llvm::ModuleSummaryIndex*, llvm::ModuleSummaryIndex const*, std::vector<unsigned char, std::allocator<unsigned char>> const&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x26ce032)
#26 0x000056202e2c873e llvm::lto::thinBackend(llvm::lto::Config const&, unsigned int, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, llvm::Module&, llvm::ModuleSummaryIndex const&, llvm::FunctionImporter::ImportMapTy const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>*, bool, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, std::vector<unsigned char, std::allocator<unsigned char>> const&)::'lambda'(llvm::Module&, llvm::TargetMachine*, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>)::operator()(llvm::Module&, llvm::TargetMachine*, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>) const LTOBackend.cpp:0:0
#27 0x000056202e2c95de llvm::lto::thinBackend(llvm::lto::Config const&, unsigned int, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, llvm::Module&, llvm::ModuleSummaryIndex const&, llvm::FunctionImporter::ImportMapTy const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>*, bool, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, std::vector<unsigned char, std::allocator<unsigned char>> const&) (/home/worker/bolt-worker2/bolt-x86_64-ubuntu-clang/build/bin/ld.lld+0x26d05de)
#28 0x000056202e2a7c65 (anonymous namespace)::InProcessThinBackend::runThinLTOBackendThread(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, llvm::FileCache, unsigned int, llvm::BitcodeModule, llvm::ModuleSummaryIndex&, llvm::FunctionImporter::ImportMapTy const&, llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void>> const&, std::map<unsigned long, llvm::GlobalValue::LinkageTypes, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValue::LinkageTypes>>> const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>&)::'lambda'(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>)::operator()(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>) const LTO.cpp:0:0
#29 0x000056202e2b6833 (anonymous namespace)::InProcessThinBackend::runThinLTOBackendThread(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, llvm::FileCache, unsigned int, llvm::BitcodeModule, llvm::ModuleSummaryIndex&, llvm::FunctionImporter::ImportMapTy const&, llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void>> const&, std::map<unsigned long, llvm::GlobalValue::LinkageTypes, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValue::LinkageTypes>>> const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>&) LTO.cpp:0:0
#30 0x000056202e2a5e58 std::_Function_handler<void (), std::_Bind<(anonymous namespace)::InProcessThinBackend::start(unsigned int, llvm::BitcodeModule, llvm::FunctionImporter::ImportMapTy const&, llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void>> const&, std::map<unsigned long, llvm::GlobalValue::LinkageTypes, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValue::LinkageTypes>>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>&)::'lambda'(llvm::BitcodeModule, llvm::ModuleSummaryIndex&, llvm::FunctionImporter::ImportMapTy const&, llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void>> const&, std::map<unsigned long, llvm::GlobalValue::LinkageTypes, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValue::LinkageTypes>>> const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>&) (llvm::BitcodeModule, std::reference_wrapper<llvm::ModuleSummaryIndex>, std::reference_wrapper<llvm::FunctionImporter::ImportMapTy const>, std::reference_wrapper<llvm::DenseSet<llvm::ValueInfo, llvm::DenseMapInfo<llvm::ValueInfo, void>> const>, std::reference_wrapper<std::map<unsigned long, llvm::GlobalValue::LinkageTypes, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValue::LinkageTypes>>> const>, std::reference_wrapper<llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const>, std::reference_wrapper<llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, llvm::SmallVector<std::pair<llvm::StringRef, llvm::BitcodeModule>, 0u>>>)>>::_M_invoke(std::_Any_data const&) LTO.cpp:0:0
#31 0x000056202db691d2 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<std::function<void ()>>>, void>>::_M_invoke(std::_Any_data const&) BalancedPartitioning.cpp:0:0

fhahn requested review from rengolin, ayalz and aniragil February 23, 2025 14:04

llvmbot added vectorizers llvm:transforms labels Feb 23, 2025

fhahn mentioned this pull request Feb 28, 2025

[LV] Optionally preserve uniform branches when vectorizing #128187

Open

fhahn force-pushed the vplan-predication branch from 92e45cd to 915b55b Compare March 30, 2025 16:22

fhahn force-pushed the vplan-predication branch from 915b55b to a06af46 Compare April 5, 2025 13:19

fhahn force-pushed the vplan-predication branch from a06af46 to 7f61860 Compare April 28, 2025 12:35

fhahn mentioned this pull request Apr 28, 2025

[VPlan] Retain exit conditions and edges in initial VPlan (NFC). #137709

Merged

fhahn mentioned this pull request May 3, 2025

[VPlan] Handle early exit before forming regions. (NFC) #138393

Merged

fhahn force-pushed the vplan-predication branch 2 times, most recently from fcfde33 to 4129042 Compare May 10, 2025 11:47

fhahn added a commit that referenced this pull request May 10, 2025

[VPlan] Sink VPB2IRBB lookups to VPRecipeBuilder (NFC).

cfde685

This allows migrating some more code to be based on VPBBs in VPRecipeBuilder, in preparation for #128420.

fhahn added a commit that referenced this pull request May 11, 2025

[VPlan] Use VPBBs to look up masks for newly created recipes (NFC).

2acecfe

Update recipe construction to use VPBBs to look up masks, in preparation for #128420.

ayalz approved these changes May 20, 2025

View reviewed changes

fhahn added 2 commits May 21, 2025 12:30

Merge remote-tracking branch 'origin/main' into vplan-predication

91423f6

!fixup address latest comments, thanks

763d667

fhahn merged commit b263c08 into llvm:main May 21, 2025
11 checks passed

fhahn deleted the vplan-predication branch May 21, 2025 14:47

fhahn mentioned this pull request May 21, 2025

VP Recipe cast assertion error in loop vectorize #140931

Closed

	for (const auto &[Old, New] : Old2New)
	for (const auto &[Old, _] : Old2New)

		@@ -66,8 +66,7 @@ class PlainCFGBuilder {
		: TheLoop(Lp), LI(LI), Plan(std::make_unique<VPlan>(Lp)) {}

		/// Build plain CFG for TheLoop and connects it to Plan's entry.

		@@ -9488,7 +9267,8 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range,
		// latter are added above for masking.

	// Construct recipes for the instructions in the loop
	// Construct wide recipes and apply predication for original scalar VPInstructions in the loop.

	Builder.setInsertPoint(VPBB, VPBB->begin());
	Builder.setInsertPoint(VPBB, VPBB->getFirstNonPhi());

	/// \p Plan. If \p FoldTail is true, also create a mask guarding the loop
	/// \p Plan. If \p FoldTail is true, create a mask guarding the loop

	/// added to \p BlockMaskCache, which in turn will temporarily be used later
	/// added to \p BlockMaskCache in order to be used later

[VPlan] Move predication to VPlanTransform (NFC). #128420

[VPlan] Move predication to VPlanTransform (NFC). #128420

Uh oh!

Conversation

fhahn commented Feb 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 23, 2025

Uh oh!

llvmbot commented Feb 23, 2025

Uh oh!

github-actions bot commented Feb 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhahn commented Mar 30, 2025

Uh oh!

fhahn commented May 18, 2025

Uh oh!

ayalz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ayalz commented May 20, 2025

Uh oh!

Uh oh!

llvm-ci commented May 21, 2025

Uh oh!

llvm-ci commented May 21, 2025

Uh oh!

llvm-ci commented May 21, 2025

Uh oh!

llvm-ci commented May 21, 2025

Uh oh!

llvm-ci commented May 21, 2025

Uh oh!

llvm-ci commented May 21, 2025

Uh oh!

fhahn commented May 21, 2025

Uh oh!

llvm-ci commented May 21, 2025

Uh oh!

llvm-ci commented May 21, 2025

Uh oh!

fhahn commented Feb 23, 2025 •

edited

Loading

github-actions bot commented Feb 23, 2025 •

edited

Loading