[LV] Improve AnyOf reduction codegen. #78304

fhahn · 2024-01-16T15:59:44Z

Update AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block.

The patch incorporates feedback from https://reviews.llvm.org/D153697.

This fixes the #62565, as now there aren't multiple uses of the start/new values.

Fixes #62565

Update AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block. This fixes the llvm#62565, as now there aren't multiple uses of the start/new values. Fixes llvm#62565

llvmbot · 2024-01-16T16:00:15Z

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Update AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block.

The patch incorporates feedback from https://reviews.llvm.org/D153697.

This fixes the #62565, as now there aren't multiple uses of the start/new values.

Fixes #62565

Patch is 78.76 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/78304.diff

10 Files Affected:

(modified) llvm/include/llvm/Transforms/Utils/LoopUtils.h (-9)
(modified) llvm/lib/Transforms/Utils/LoopUtils.cpp (+6-18)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h (+1)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+41-3)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+4-3)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll (+15-23)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll (+44-76)
(modified) llvm/test/Transforms/LoopVectorize/select-cmp-predicated.ll (+15-14)
(modified) llvm/test/Transforms/LoopVectorize/select-cmp.ll (+74-72)
(modified) llvm/test/Transforms/LoopVectorize/select-reduction-start-value-may-be-undef-or-poison.ll (+18-25)

diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index 5a1385d01d8e44..3bad7b616d9d75 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -363,15 +363,6 @@ Intrinsic::ID getMinMaxReductionIntrinsicOp(RecurKind RK);
 /// Returns the comparison predicate used when expanding a min/max reduction.
 CmpInst::Predicate getMinMaxReductionPredicate(RecurKind RK);
 
-/// See RecurrenceDescriptor::isAnyOfPattern for a description of the pattern we
-/// are trying to match. In this pattern, we are only ever selecting between two
-/// values: 1) an initial start value \p StartVal of the reduction PHI, and 2) a
-/// loop invariant value. If any of lane value in \p Left, \p Right is not equal
-/// to \p StartVal, select the loop invariant value. This is done by selecting
-/// \p Right iff \p Left is equal to \p StartVal.
-Value *createAnyOfOp(IRBuilderBase &Builder, Value *StartVal, RecurKind RK,
-                     Value *Left, Value *Right);
-
 /// Returns a Min/Max operation corresponding to MinMaxRecurrenceKind.
 /// The Builder's fast-math-flags must be set to propagate the expected values.
 Value *createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value *Left,
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 59485126b280ab..c0582fb7d7e150 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -962,15 +962,6 @@ CmpInst::Predicate llvm::getMinMaxReductionPredicate(RecurKind RK) {
   }
 }
 
-Value *llvm::createAnyOfOp(IRBuilderBase &Builder, Value *StartVal,
-                           RecurKind RK, Value *Left, Value *Right) {
-  if (auto VTy = dyn_cast<VectorType>(Left->getType()))
-    StartVal = Builder.CreateVectorSplat(VTy->getElementCount(), StartVal);
-  Value *Cmp =
-      Builder.CreateCmp(CmpInst::ICMP_NE, Left, StartVal, "rdx.select.cmp");
-  return Builder.CreateSelect(Cmp, Left, Right, "rdx.select");
-}
-
 Value *llvm::createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value *Left,
                             Value *Right) {
   Type *Ty = Left->getType();
@@ -1079,16 +1070,13 @@ Value *llvm::createAnyOfTargetReduction(IRBuilderBase &Builder, Value *Src,
     NewVal = SI->getTrueValue();
   }
 
-  // Create a splat vector with the new value and compare this to the vector
-  // we want to reduce.
-  ElementCount EC = cast<VectorType>(Src->getType())->getElementCount();
-  Value *Right = Builder.CreateVectorSplat(EC, InitVal);
-  Value *Cmp =
-      Builder.CreateCmp(CmpInst::ICMP_NE, Src, Right, "rdx.select.cmp");
-
   // If any predicate is true it means that we want to select the new value.
-  Cmp = Builder.CreateOrReduce(Cmp);
-  return Builder.CreateSelect(Cmp, NewVal, InitVal, "rdx.select");
+  Value *AnyOf =
+      Src->getType()->isVectorTy() ? Builder.CreateOrReduce(Src) : Src;
+  // The compares in the loop may yield poison, which propagates through the
+  // bitwise ORs. Freeze it here before the condition is used.
+  AnyOf = Builder.CreateFreeze(AnyOf);
+  return Builder.CreateSelect(AnyOf, NewVal, InitVal, "rdx.select");
 }
 
 Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder, Value *Src,
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index a7ebf78e54ceb6..9d3ef5b96c72fb 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -68,6 +68,7 @@ class VPBuilder {
 public:
   VPBuilder() = default;
   VPBuilder(VPBasicBlock *InsertBB) { setInsertPoint(InsertBB); }
+  VPBuilder(VPRecipeBase *InsertPt) { setInsertPoint(InsertPt); }
 
   /// Clear the insertion point: created instructions will not be inserted into
   /// a block.
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index aa5d1bfa57d535..973bcf4f6c2d7c 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7428,7 +7428,8 @@ static void createAndCollectMergePhiForReduction(
   auto *PhiR = cast<VPReductionPHIRecipe>(RedResult->getOperand(0));
   const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();
 
-  TrackingVH<Value> ReductionStartValue = RdxDesc.getRecurrenceStartValue();
+  TrackingVH<Value> ReductionStartValue =
+      State.get(PhiR->getStartValue(), VPIteration(0, 0));
   Value *FinalValue =
       State.get(RedResult, VPIteration(State.UF - 1, VPLane::getFirstLane()));
   auto *ResumePhi =
@@ -7452,7 +7453,7 @@ static void createAndCollectMergePhiForReduction(
       BCBlockPhi->addIncoming(ResumePhi->getIncomingValueForBlock(Incoming),
                               Incoming);
     else
-      BCBlockPhi->addIncoming(ReductionStartValue, Incoming);
+      BCBlockPhi->addIncoming(RdxDesc.getRecurrenceStartValue(), Incoming);
   }
 
   auto *OrigPhi = cast<PHINode>(PhiR->getUnderlyingValue());
@@ -9054,6 +9055,41 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
       continue;
 
     const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();
+    // Adjust AnyOf reductions; replace the reduction phi for the selected value
+    // with a boolean reduction phi node to check if the condition is true in
+    // any iteration. The final value is selected by the final
+    // ComputeReductionResult.
+    if (RecurrenceDescriptor::isAnyOfRecurrenceKind(
+            RdxDesc.getRecurrenceKind())) {
+      auto *Select = cast<VPRecipeBase>(*find_if(PhiR->users(), [](VPUser *U) {
+        return isa<VPWidenSelectRecipe>(U) ||
+               (isa<VPReplicateRecipe>(U) &&
+                cast<VPReplicateRecipe>(U)->getUnderlyingInstr()->getOpcode() ==
+                    Instruction::Select);
+      }));
+      VPValue *Cmp = Select->getOperand(0);
+      // If the compare is checking the reduction PHI node, adjust it to check
+      // the start value.
+      if (VPRecipeBase *CmpR = Cmp->getDefiningRecipe()) {
+        for (unsigned I = 0; I != CmpR->getNumOperands(); ++I)
+          if (CmpR->getOperand(I) == PhiR)
+            CmpR->setOperand(I, PhiR->getStartValue());
+      }
+      VPBuilder::InsertPointGuard Guard(Builder);
+      Builder.setInsertPoint(Select);
+
+      // If the true value of the select is the reduction phi, the new value is
+      // selected if the negated condition is true in any iteration.
+      if (Select->getOperand(1) == PhiR)
+        Cmp = Builder.createNot(Cmp);
+      VPValue *Or = Builder.createOr(PhiR, Cmp);
+      Select->getVPSingleValue()->replaceAllUsesWith(Or);
+
+      // Convert the reduction phi to operate on bools.
+      PhiR->setOperand(0, Plan->getVPValueOrAddLiveIn(ConstantInt::getFalse(
+                              OrigLoop->getHeader()->getContext())));
+    }
+
     // If tail is folded by masking, introduce selects between the phi
     // and the live-out instruction of each reduction, at the beginning of the
     // dedicated latch block.
@@ -9086,7 +9122,9 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
     // then extend the loop exit value to enable InstCombine to evaluate the
     // entire expression in the smaller type.
     Type *PhiTy = PhiR->getStartValue()->getLiveInIRValue()->getType();
-    if (MinVF.isVector() && PhiTy != RdxDesc.getRecurrenceType()) {
+    if (MinVF.isVector() && PhiTy != RdxDesc.getRecurrenceType() &&
+        !RecurrenceDescriptor::isAnyOfRecurrenceKind(
+            RdxDesc.getRecurrenceKind())) {
       assert(!PhiR->isInLoop() && "Unexpected truncated inloop reduction!");
       Type *RdxTy = RdxDesc.getRecurrenceType();
       auto *Trunc =
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index c553e2c9e76839..f87461fd548bd7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -452,8 +452,7 @@ Value *VPInstruction::generateInstruction(VPTransformState &State,
         else if (RecurrenceDescriptor::isAnyOfRecurrenceKind(RK)) {
           TrackingVH<Value> ReductionStartValue =
               RdxDesc.getRecurrenceStartValue();
-          ReducedPartRdx = createAnyOfOp(Builder, ReductionStartValue, RK,
-                                         ReducedPartRdx, RdxPart);
+          ReducedPartRdx = Builder.CreateOr(ReducedPartRdx, RdxPart);
         } else
           ReducedPartRdx = createMinMaxOp(Builder, RK, ReducedPartRdx, RdxPart);
       }
@@ -461,7 +460,9 @@ Value *VPInstruction::generateInstruction(VPTransformState &State,
 
     // Create the reduction after the loop. Note that inloop reductions create
     // the target reduction in the loop using a Reduction recipe.
-    if (State.VF.isVector() && !PhiR->isInLoop()) {
+    if ((State.VF.isVector() ||
+         RecurrenceDescriptor::isAnyOfRecurrenceKind(RK)) &&
+        !PhiR->isInLoop()) {
       ReducedPartRdx =
           createTargetReduction(Builder, RdxDesc, ReducedPartRdx, OrigPhi);
       // If the reduction can be performed in a smaller type, we need to extend
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
index 1c26ee8479e578..493ffd2ca569de 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
@@ -8,13 +8,12 @@ target triple = "aarch64-linux-gnu"
 define i32 @select_const_i32_from_icmp(ptr nocapture readonly %v, i64 %n) #0 {
 ; CHECK-VF4IC1-LABEL: @select_const_i32_from_icmp
 ; CHECK-VF4IC1:      vector.body:
-; CHECK-VF4IC1:        [[VEC_PHI:%.*]] = phi <vscale x 4 x i32> [ shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
+; CHECK-VF4IC1:        [[VEC_PHI:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
 ; CHECK-VF4IC1:        [[VEC_LOAD:%.*]] = load <vscale x 4 x i32>
 ; CHECK-VF4IC1-NEXT:   [[VEC_ICMP:%.*]] = icmp eq <vscale x 4 x i32> [[VEC_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT:   [[VEC_SEL]] = select <vscale x 4 x i1> [[VEC_ICMP]], <vscale x 4 x i32> [[VEC_PHI]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 7, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
+; CHECK-VF4IC1-NEXT:   [[VEC_SEL]] = or <vscale x 4 x i1> [[VEC_PHI]], [[VEC_ICMP]]
 ; CHECK-VF4IC1:      middle.block:
-; CHECK-VF4IC1-NEXT:   [[FIN_ICMP:%.*]] = icmp ne <vscale x 4 x i32> [[VEC_SEL]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[FIN_ICMP]])
+; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[VEC_SEL]])
 ; CHECK-VF4IC1-NEXT:   {{.*}} = select i1 [[OR_RDX]], i32 7, i32 3
 
 ; CHECK-VF4IC4-LABEL: @select_const_i32_from_icmp
@@ -62,20 +61,15 @@ exit:                                     ; preds = %for.body
 define i32 @select_i32_from_icmp(ptr nocapture readonly %v, i32 %a, i32 %b, i64 %n) #0 {
 ; CHECK-VF4IC1-LABEL: @select_i32_from_icmp
 ; CHECK-VF4IC1:      vector.ph:
-; CHECK-VF4IC1:        [[TMP1:%.*]] = insertelement <vscale x 4 x i32> poison, i32 %a, i64 0
-; CHECK-VF4IC1-NEXT:   [[SPLAT_OF_A:%.*]] = shufflevector <vscale x 4 x i32> [[TMP1]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
-; CHECK-VF4IC1-NEXT:   [[TMP2:%.*]] = insertelement <vscale x 4 x i32> poison, i32 %b, i64 0
-; CHECK-VF4IC1-NEXT:   [[SPLAT_OF_B:%.*]] = shufflevector <vscale x 4 x i32> [[TMP2]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
+; CHECK-VF4IC1-NOT:    shufflevector <vscale x 4 x i32>
+; CHECK-VF4IC1-NOT:    shufflevector <vscale x 4 x i32>
 ; CHECK-VF4IC1:      vector.body:
-; CHECK-VF4IC1:        [[VEC_PHI:%.*]] = phi <vscale x 4 x i32> [ [[SPLAT_OF_A]], %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
+; CHECK-VF4IC1:        [[VEC_PHI:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
 ; CHECK-VF4IC1:        [[VEC_LOAD:%.*]] = load <vscale x 4 x i32>
 ; CHECK-VF4IC1-NEXT:   [[VEC_ICMP:%.*]] = icmp eq <vscale x 4 x i32> [[VEC_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT:   [[VEC_SEL]] = select <vscale x 4 x i1> [[VEC_ICMP]], <vscale x 4 x i32> [[VEC_PHI]], <vscale x 4 x i32> [[SPLAT_OF_B]]
+; CHECK-VF4IC1-NEXT:   [[VEC_SEL]] = or <vscale x 4 x i1> [[VEC_PHI]], [[VEC_ICMP]]
 ; CHECK-VF4IC1:      middle.block:
-; CHECK-VF4IC1-NEXT:   [[FIN_INS:%.*]] = insertelement <vscale x 4 x i32> poison, i32 %a, i64 0
-; CHECK-VF4IC1-NEXT:   [[FIN_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[FIN_INS]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
-; CHECK-VF4IC1-NEXT:   [[FIN_CMP:%.*]] = icmp ne <vscale x 4 x i32> [[VEC_SEL]], [[FIN_SPLAT]]
-; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[FIN_CMP]])
+; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[VEC_SEL]])
 ; CHECK-VF4IC1-NEXT:   {{.*}} = select i1 [[OR_RDX]], i32 %b, i32 %a
 
 ; CHECK-VF4IC4-LABEL: @select_i32_from_icmp
@@ -101,13 +95,12 @@ exit:                                     ; preds = %for.body
 define i32 @select_const_i32_from_fcmp(ptr nocapture readonly %v, i64 %n) #0 {
 ; CHECK-VF4IC1-LABEL: @select_const_i32_from_fcmp
 ; CHECK-VF4IC1:      vector.body:
-; CHECK-VF4IC1:        [[VEC_PHI:%.*]] = phi <vscale x 4 x i32> [ shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 2, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
+; CHECK-VF4IC1:        [[VEC_PHI:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
 ; CHECK-VF4IC1:        [[VEC_LOAD:%.*]] = load <vscale x 4 x float>
 ; CHECK-VF4IC1-NEXT:   [[VEC_ICMP:%.*]] = fcmp fast ueq <vscale x 4 x float> [[VEC_LOAD]], shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 3.000000e+00, i64 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT:   [[VEC_SEL]] = select <vscale x 4 x i1> [[VEC_ICMP]], <vscale x 4 x i32> [[VEC_PHI]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
+; CHECK-VF4IC1-NEXT:   [[VEC_SEL]] = or <vscale x 4 x i1> [[VEC_PHI]], [[VEC_ICMP]]
 ; CHECK-VF4IC1:      middle.block:
-; CHECK-VF4IC1-NEXT:   [[FIN_ICMP:%.*]] = icmp ne <vscale x 4 x i32> [[VEC_SEL]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 2, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[FIN_ICMP]])
+; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[VEC_SEL]])
 ; CHECK-VF4IC1-NEXT:   {{.*}} = select i1 [[OR_RDX]], i32 1, i32 2
 
 ; CHECK-VF4IC4-LABEL: @select_const_i32_from_fcmp
@@ -156,16 +149,15 @@ exit:                                     ; preds = %for.body
 define i32 @pred_select_const_i32_from_icmp(ptr noalias nocapture readonly %src1, ptr noalias nocapture readonly %src2, i64 %n) #0 {
 ; CHECK-VF4IC1-LABEL: @pred_select_const_i32_from_icmp
 ; CHECK-VF4IC1:      vector.body:
-; CHECK-VF4IC1:        [[VEC_PHI:%.*]] = phi <vscale x 4 x i32> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
+; CHECK-VF4IC1:        [[VEC_PHI:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
 ; CHECK-VF4IC1:        [[VEC_LOAD:%.*]] = load <vscale x 4 x i32>
 ; CHECK-VF4IC1:        [[MASK:%.*]] = icmp sgt <vscale x 4 x i32> [[VEC_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 35, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
 ; CHECK-VF4IC1:        [[MASKED_LOAD:%.*]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr {{%.*}}, i32 4, <vscale x 4 x i1> [[MASK]], <vscale x 4 x i32> poison)
 ; CHECK-VF4IC1-NEXT:   [[VEC_ICMP:%.*]] = icmp eq <vscale x 4 x i32> [[MASKED_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 2, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT:   [[VEC_SEL_TMP:%.*]] = select <vscale x 4 x i1> [[VEC_ICMP]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> [[VEC_PHI]]
-; CHECK-VF4IC1:        [[VEC_SEL:%.*]] = select <vscale x 4 x i1> [[MASK]], <vscale x 4 x i32> [[VEC_SEL_TMP]], <vscale x 4 x i32> [[VEC_PHI]]
+; CHECK-VF4IC1-NEXT:   [[VEC_SEL_TMP:%.*]] = or <vscale x 4 x i1> [[VEC_PHI]], [[VEC_ICMP]]
+; CHECK-VF4IC1:        [[VEC_SEL:%.*]] = select <vscale x 4 x i1> [[MASK]], <vscale x 4 x i1> [[VEC_SEL_TMP]], <vscale x 4 x i1> [[VEC_PHI]]
 ; CHECK-VF4IC1:      middle.block:
-; CHECK-VF4IC1-NEXT:   [[FIN_ICMP:%.*]] = icmp ne <vscale x 4 x i32> [[VEC_SEL]], zeroinitializer
-; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[FIN_ICMP]])
+; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[VEC_SEL]])
 ; CHECK-VF4IC1-NEXT:   {{.*}} = select i1 [[OR_RDX]], i32 1, i32 0
 
 ; CHECK-VF4IC4-LABEL: @pred_select_const_i32_from_icmp
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll b/llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll
index 34a7987bb40abe..6c50915c88e824 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll
@@ -12,25 +12,22 @@ define i32 @select_icmp(i32 %x, i32 %y, ptr nocapture readonly %c, i64 %n) #0 {
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 %n, [[N_MOD_VF]]
 ; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[X:%.*]], i64 0
 ; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
-; CHECK-NEXT:    [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <4 x i32> poison, i32 [[Y:%.*]], i64 0
-; CHECK-NEXT:    [[BROADCAST_SPLAT2:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT1]], <4 x i32> poison, <4 x i32> zeroinitializer
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
-; CHECK-NEXT:    [[VEC_PHI:%.*]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP5:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[VEC_PHI:%.*]] = phi <4 x i1> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP5:%.*]], [[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP0:%.*]] = add i64 [[INDEX]], 0
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[C:%.*]], i64 [[TMP0]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP2]], align 4
 ; CHECK-NEXT:    [[TMP4:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
-; CHECK-NEXT:    [[TMP5]] = select <4 x i1> [[TMP4]], <4 x i32> [[VEC_PHI]], <4 x i32> [[BROADCAST_SPLAT2]]
+; CHECK-NEXT:    [[TMP5]] = or <4 x i1> [[VEC_PHI]], [[TMP4]]
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
 ; CHECK-NEXT:    [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
 ; CHECK-NEXT:    br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [...
[truncated]

Mel-Chen · 2024-01-17T10:11:40Z

llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll

+; CHECK-VF4IC1:        [[VEC_PHI:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
 ; CHECK-VF4IC1:        [[VEC_LOAD:%.*]] = load <vscale x 4 x i32>
 ; CHECK-VF4IC1-NEXT:   [[VEC_ICMP:%.*]] = icmp eq <vscale x 4 x i32> [[VEC_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT:   [[VEC_SEL]] = select <vscale x 4 x i1> [[VEC_ICMP]], <vscale x 4 x i32> [[VEC_PHI]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 7, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
+; CHECK-VF4IC1-NEXT:   [[VEC_SEL]] = or <vscale x 4 x i1> [[VEC_PHI]], [[VEC_ICMP]]
 ; CHECK-VF4IC1:      middle.block:
-; CHECK-VF4IC1-NEXT:   [[FIN_ICMP:%.*]] = icmp ne <vscale x 4 x i32> [[VEC_SEL]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[FIN_ICMP]])
+; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[VEC_SEL]])
 ; CHECK-VF4IC1-NEXT:   {{.*}} = select i1 [[OR_RDX]], i32 7, i32 3


I'm glad to see someone willing to address this issue. It was also in my task list, so I have some information to share with you, hoping it will be helpful. I've divided the cases of AnyOf into two types:

True-side update

rdx = <condition> ? <loop-invariant-rhs> : rdx

It can be transformed to OR reduction:

%vmask = <all-false> for () { %pred = <widened-predicate> %vmask |= %pred } %rdx = reduce.or %vmask %red = select %rdx, <loop-invariant-rhs>, <initial-value>

False-side update:

rdx = <condition> ? rdx : <loop-invariant-rhs>

It can be transformed to OR reduction with NOT condition:

%vmask = <all-false> for () { %pred = <widened-predicate> %pred.not = (%pred == <all-false>) %vmask |= %pred.not } %rdx = reduce.or %vmask %red = select %rdx, <loop-invariant-rhs>, <initial-value>

Or, to AND reduction:

%vmask = <all-true> for () { %pred = <widened-predicate> %vmask &= %pred } %rdx = reduce.and %vmask %red = select %rdx, <initial-value>, <loop-invariant-rhs>

Take @select_const_i32_from_icmp as an example, this should be the type 2, so there may be bugs with the current transformation.

Thanks, the updates in this particular test file are still stale and need updating. The current codegen generates OR (NOT ..), as should be visible in llvm/test/Transforms/LoopVectorize/select-cmp.ll

I see.
But besides this case, llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll also lacks the xor operation. I haven't thoroughly checked if other test cases have similar situations.
Could you please update it? Thank you.

fhahn · 2024-01-25T12:15:49Z

ping :)

Mel-Chen · 2024-01-26T09:42:09Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

          TrackingVH<Value> ReductionStartValue =
              RdxDesc.getRecurrenceStartValue();


This can be removed. And after it is clean up, the braces can be removed too.

Folded into the CreateBinaryOp above, thanks!

Mel-Chen · 2024-01-26T09:59:22Z

llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll

+; CHECK-VF4IC1:        [[VEC_PHI:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
 ; CHECK-VF4IC1:        [[VEC_LOAD:%.*]] = load <vscale x 4 x i32>
 ; CHECK-VF4IC1-NEXT:   [[VEC_ICMP:%.*]] = icmp eq <vscale x 4 x i32> [[VEC_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT:   [[VEC_SEL]] = select <vscale x 4 x i1> [[VEC_ICMP]], <vscale x 4 x i32> [[VEC_PHI]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 7, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
+; CHECK-VF4IC1-NEXT:   [[VEC_SEL]] = or <vscale x 4 x i1> [[VEC_PHI]], [[VEC_ICMP]]
 ; CHECK-VF4IC1:      middle.block:
-; CHECK-VF4IC1-NEXT:   [[FIN_ICMP:%.*]] = icmp ne <vscale x 4 x i32> [[VEC_SEL]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[FIN_ICMP]])
+; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[VEC_SEL]])
 ; CHECK-VF4IC1-NEXT:   {{.*}} = select i1 [[OR_RDX]], i32 7, i32 3


I see.
But besides this case, llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll also lacks the xor operation. I haven't thoroughly checked if other test cases have similar situations.
Could you please update it? Thank you.

ayalz

Thanks for following up on this!

ayalz · 2024-01-30T21:51:49Z

llvm/lib/Transforms/Utils/LoopUtils.cpp

+      Src->getType()->isVectorTy() ? Builder.CreateOrReduce(Src) : Src;
+  // The compares in the loop may yield poison, which propagates through the
+  // bitwise ORs. Freeze it here before the condition is used.
+  AnyOf = Builder.CreateFreeze(AnyOf);


Further refactoring thought: the AnyOf boolean reduction completes with the CreateOrReduce(), and freezing it. Similar to a plain "result |= value[i]" OR reduction, on i1 values, where freezing may also be needed? The subsequent select deserves a separate recipe, which post-processes the result of the AnyOf reduction and also depends on the "IfAny" and "Else" (or "Start" and "Other") Live-In VP/Values directly, rather than looking for them here and now.

Similar to a plain "result |= value[i]" OR reduction,

Freeze won't be needed in that case; if there's already a binary OR in the input, poison from the compare gets already propagated. It is only needed when converting from the select form (which doesn't propagate poison from the condition to its result)

Yes, but this will need a bit of additional refactoring, in particular how createAndCollectMergePhiForReduction looks up the reduction result value and when ComputeReductionResult VPInstructions are created.

ayalz · 2024-01-30T21:56:34Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -7486,7 +7486,8 @@ static void createAndCollectMergePhiForReduction(
  auto *PhiR = cast<VPReductionPHIRecipe>(RedResult->getOperand(0));
  const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();

-  TrackingVH<Value> ReductionStartValue = RdxDesc.getRecurrenceStartValue();
+  TrackingVH<Value> ReductionStartValue =


Is ReductionStartValue still used, after inlining it below? Why/is this change needed?

Not needed any longer, removed, thanks!

ayalz · 2024-01-30T22:01:29Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -9110,6 +9111,41 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
      continue;

    const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();
+    // Adjust AnyOf reductions; replace the reduction phi for the selected value


adjustRecipesForReductions() is getting excessively long, should be refactored. Its documentation above should (also) include the adjustment of AnyOf reductions described here.

Agreed, probably best as followup?

ayalz · 2024-01-30T22:03:04Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+               (isa<VPReplicateRecipe>(U) &&
+                cast<VPReplicateRecipe>(U)->getUnderlyingInstr()->getOpcode() ==
+                    Instruction::Select);
+      }));


nit: assert(Select && "a meaningful error message");

cast should already assert to check for non-null.

ayalz · 2024-01-31T07:20:06Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-    if (MinVF.isVector() && PhiTy != RdxDesc.getRecurrenceType()) {
+    if (MinVF.isVector() && PhiTy != RdxDesc.getRecurrenceType() &&
+        !RecurrenceDescriptor::isAnyOfRecurrenceKind(
+            RdxDesc.getRecurrenceKind())) {


Would be it better to truncate AnyOf reductions to smaller type (boolean) here instead of above?

We only reach this path now because we adjust PhiR's start value to a bool. It requires more than plain truncates as below, so I think it's probably worth to keep it separate. I think it also needs handling before introducing selects for tail-folding; otherwise those selects would also need updating.

ayalz · 2024-01-31T07:23:01Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

@@ -453,16 +453,17 @@ Value *VPInstruction::generateInstruction(VPTransformState &State,
        else if (RecurrenceDescriptor::isAnyOfRecurrenceKind(RK)) {
          TrackingVH<Value> ReductionStartValue =
              RdxDesc.getRecurrenceStartValue();
-          ReducedPartRdx = createAnyOfOp(Builder, ReductionStartValue, RK,
-                                         ReducedPartRdx, RdxPart);
+          ReducedPartRdx = Builder.CreateOr(ReducedPartRdx, RdxPart);


Should Op be set to OR and treated by CreateBinOp() above?

Adjusted, thanks!

ayalz · 2024-01-31T07:58:56Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

-    if (State.VF.isVector() && !PhiR->isInLoop()) {
+    if ((State.VF.isVector() ||
+         RecurrenceDescriptor::isAnyOfRecurrenceKind(RK)) &&
+        !PhiR->isInLoop()) {
      ReducedPartRdx =
          createTargetReduction(Builder, RdxDesc, ReducedPartRdx, OrigPhi);


As raised above, it may be better to have ComputeReductionResult recipe take care only of reducing AnyOf to a boolean here, followed by a Select recipe to chose between Start and Other live-in values.

Yes, but this will need a bit of additional refactoring, in particular how createAndCollectMergePhiForReduction looks up the reduction result value and when ComputeReductionResult VPInstructions are created.

ayalz · 2024-01-31T09:03:10Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      }));
+      VPValue *Cmp = Select->getOperand(0);
+      // If the compare is checking the reduction PHI node, adjust it to check
+      // the start value.


Hmm, either the true or false values of the select should use PhiR, but only it? The condition of an AnyOf reduction should be any boolean-expression predicate that depends on the current iteration only, i.e., should be a recipe (if it's live-in the entire select should be LICM'd), independent of AnyOf's PhiR, neither directly nor indirectly?

At the moment, AnyOf reduction are also formed for code like

define i32 @select_i32_from_icmp_same_inputs(i32 %a, i32 %b, i64 %n) { entry: br label %for.body for.body: ; preds = %entry, %for.body %0 = phi i64 [ 0, %entry ], [ %4, %for.body ] %1 = phi i32 [ %a, %entry ], [ %3, %for.body ] %2 = icmp eq i32 %1, 3 %3 = select i1 %2, i32 %1, i32 %b %4 = add nuw nsw i64 %0, 1 %5 = icmp eq i64 %4, %n br i1 %5, label %exit, label %for.body exit: ; preds = %for.body ret i32 %3 }

@ayalz The current recognition for AnyOf reduction involves a set of cmp- select instruction and cannot identify cases with only a single select. This requires expanding isAnyOfPattern or applying conditional reduction to model the AnyOf idiom for support.

ayalz · 2024-01-31T09:25:49Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      // If the true value of the select is the reduction phi, the new value is
+      // selected if the negated condition is true in any iteration.
+      if (Select->getOperand(1) == PhiR)
+        Cmp = Builder.createNot(Cmp);


As Mel pointed out, this case of "Red = cond ? PhiR : Other" where a single false cond suffices for the result to be Other, could be considered an AllOf reduction starting with true using createAnd() instead of createOr(), resulting in "AND(conds) ? Start : Other".

Negating cond swaps the operands and translates into the "Red = !cond ? Other : PhiR" form of an AnyOf reduction starting with false and using createOr(), where a single true !cond suffices for the result to be Other, resulting in "OR(!conds) ? Other : Start".

Yes, this could be handled like that, but I think then we would need to support both patterns here and also in codegen, so negating seems simpler (and the negation should be removable by instcombine). Left as is for now, but happy to adjust if needed. But then it should probably be modeled as AllOf directly in the reduction descriptor.

fhahn · 2024-02-25T12:13:37Z

ping :)

Mel-Chen

LGTM

RKSimon · 2024-03-14T14:22:48Z

@fhahn I'm not certain (and I haven't been able to repro as I don't have a LTO stage2 build) but do you think this patch could be causing the failure on https://lab.llvm.org/buildbot/#/builders/67/builds/14877 ?

Broke sanitizer bots: https://lab.llvm.org/buildbot/#/builders/74/builds/26697 This reverts commit 95fef1d.

kstoimenov · 2024-03-14T15:01:21Z

@fhahn reverted because of sanitizer bot failure. There is a stack trace here if it helps: https://lab.llvm.org/buildbot/#/builders/74/builds/26697/steps/9/logs/stdio

aleks-tmb · 2024-03-29T16:12:27Z

@fhahn Hi, this patch has been reverted, but before it we got a failure in our internal testing.
The reduced IR is:

define void @test(i64 %limit, i8 %guard) {
entry:
  br label %pre
pre:                                              ; preds = %loop, %entry
  %iv0 = phi i8 [ 0, %entry ], [ %select, %loop ]
  %cmp = icmp sgt i8 %iv0, %guard
  br label %loop

loop:                                             ; preds = %loop, %pre
  %iv1 = phi i64 [ %iv1.next, %loop ], [ 0, %pre ]
  %iv2 = phi i8 [ %select, %loop ], [ %iv0, %pre ]
  %select = select i1 %cmp, i8 %iv2, i8 0
  %iv1.next = add nuw nsw i64 %iv1, 1
  %check = icmp slt i64 %iv1.next, %limit
  br i1 %check, label %loop, label %pre
}

To reproduce, take this patch and run bin/opt -mtriple=x86_64-unknown-linux-gnu -passes=loop-vectorize test.ll

The crash backtrace:

opt: /home/apopov/llvm-project/llvm/lib/IR/Instructions.cpp:3327: static llvm::BinaryOperator* llvm::BinaryOperator::Create(llvm::Instruction::BinaryOps, llvm::Value*, llvm::Value*, const llvm::Twine&, llvm::Instruction*): Assertion `S1->getType() == S2->getType() && "Cannot create binary operator with two operands of differing type!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: bin/opt -mtriple=x86_64-unknown-linux-gnu -passes=loop-vectorize ../../orca/build/test.ll -S
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  libLLVM.so.19.0git 0x00007f1528cb3fc8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 82
1  libLLVM.so.19.0git 0x00007f1528cb43de
2  libLLVM.so.19.0git 0x00007f1528cb1964 llvm::sys::RunSignalHandlers() + 159
3  libLLVM.so.19.0git 0x00007f1528cb3882
4  libc.so.6          0x00007f1525b91090
5  libc.so.6          0x00007f1525b9100b gsignal + 203
6  libc.so.6          0x00007f1525b70859 abort + 299
7  libc.so.6          0x00007f1525b70729
8  libc.so.6          0x00007f1525b81fd6
9  libLLVM.so.19.0git 0x00007f1528f9b1cb llvm::BinaryOperator::Create(llvm::Instruction::BinaryOps, llvm::Value*, llvm::Value*, llvm::Twine const&, llvm::Instruction*) + 97
10 libLLVM.so.19.0git 0x00007f1528dff294
11 libLLVM.so.19.0git 0x00007f152b5487c1 llvm::VPInstruction::generateInstruction(llvm::VPTransformState&, unsigned int) + 497
12 libLLVM.so.19.0git 0x00007f152b549c12 llvm::VPInstruction::execute(llvm::VPTransformState&) + 290
13 libLLVM.so.19.0git 0x00007f152b4fd82d llvm::VPBasicBlock::execute(llvm::VPTransformState*) + 1079
14 libLLVM.so.19.0git 0x00007f152b4fefd0 llvm::VPRegionBlock::execute(llvm::VPTransformState*) + 616
15 libLLVM.so.19.0git 0x00007f152b4fffdf llvm::VPlan::execute(llvm::VPTransformState*) + 335
16 libLLVM.so.19.0git 0x00007f152b344b3f llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*, bool, llvm::DenseMap<llvm::SCEV const*, llvm::Value*, llvm::DenseMapInfo<llvm::SCEV const*, void>, llvm::detail::DenseMapPair<llvm::SCEV const*, llvm::Value*>> const*) + 1627
17 libLLVM.so.19.0git 0x00007f152b353822 llvm::LoopVectorizePass::processLoop(llvm::Loop*) + 8712
18 libLLVM.so.19.0git 0x00007f152b354112 llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo*, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AssumptionCache&, llvm::LoopAccessInfoManager&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) + 1118
19 libLLVM.so.19.0git 0x00007f152b354464 llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) + 650
20 libLLVM.so.19.0git 0x00007f152d8885a7
21 libLLVM.so.19.0git 0x00007f1529076902 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) + 366
22 libLLVM.so.19.0git 0x00007f152cc20f61
23 libLLVM.so.19.0git 0x00007f1529075b5e llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) + 390
24 libLLVM.so.19.0git 0x00007f152cc20e91
25 libLLVM.so.19.0git 0x00007f1529076580 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) + 366
26 opt                0x0000562e2118153c llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) + 6348
27 opt                0x0000562e211531a7 optMain + 6524
28 opt                0x0000562e21150c41 main + 88
29 libc.so.6          0x00007f1525b72083 __libc_start_main + 243
30 opt                0x0000562e21150b2e _start + 46
Aborted

fhahn · 2024-04-03T12:38:21Z

Thanks for the heads-up. The issue is related to epilogue vectorization, added additional test coverage in 399ff08

This reverts the revert commit 589c7ab. This patch includes a fix for any-of reductions and epilogue vectorization. Extra test coverage for the issue that caused the revert has been added in 399ff08. -------------------------------- Original commit message: Update AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block. The patch incorporates feedback from https://reviews.llvm.org/D153697. This fixes the #62565, as now there aren't multiple uses of the start/new values. Fixes #62565 PR: #78304

fhahn · 2024-04-05T13:33:29Z

Recommitted with a fix, please let me know if you see any further issues!

joanahalili · 2024-04-12T16:17:19Z

just a heads-up , we are have a few targets breaking on our end due to a miscompile caused by this change. We will try to provide a reproducer on Monday.

aeubanks · 2024-04-15T21:22:52Z

c.ll.txt

I believe running this through the loop vectorizer results in a miscompile, although I haven't found exactly where yet. I'm trying to compare the IR after the loop-vectorizer with and without this commit.

aeubanks · 2024-04-15T22:18:26Z

a bit more reduced:
reduced.ll.txt

aeubanks · 2024-04-15T22:29:29Z

it looks like the C++ code getting miscompiled is

  uint32_t character_bytes = 0;
  bool all_same_length = true;
  for (const auto& value : input_data) {
    if (value.size() != input_data.front().size()) {
      all_same_length = false;
    }
    character_bytes += value.size();
  }

all_same_length is false with -fno-vectorize but true when vectorizing

aeubanks · 2024-04-15T23:00:39Z

toward the end of the IR, we branch based on all_same_length. in the good IR after loop vectorization with this patch reverted, I believe that's

%bc.merge.rdx22 = phi i1 [ true, %iter.check ], [ %rdx.select, %vec.epilog.iter.check ], [ %rdx.select20, %vec.epilog.middle.block ]

and in the bad IR I believe it's

%bc.merge.rdx21 = phi i1 [ true, %iter.check ], [ true, %vec.epilog.iter.check ], [ %rdx.select19, %vec.epilog.middle.block ]

the [ true, %vec.epilog.iter.check ] is suspicious

aeubanks · 2024-04-16T17:38:31Z

much more reduced IR that exhibits the same issue:

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-grtev4-linux-gnu"

define void @barney(ptr %arg, ptr %arg1) {
bb:
  %icmp = icmp eq ptr %arg, %arg1
  br i1 %icmp, label %bb11, label %bb2

bb2:                                              ; preds = %bb
  %getelementptr = getelementptr inbounds i8, ptr %arg, i64 8
  %load = load i64, ptr %getelementptr, align 8
  br label %bb3

bb3:                                              ; preds = %bb3, %bb2
  %phi = phi i1 [ %select, %bb3 ], [ true, %bb2 ]
  %phi4 = phi ptr [ %getelementptr8, %bb3 ], [ %arg, %bb2 ]
  %getelementptr5 = getelementptr inbounds i8, ptr %phi4, i64 8
  %load6 = load i64, ptr %getelementptr5, align 8
  %icmp7 = icmp eq i64 %load6, %load
  %select = select i1 %icmp7, i1 %phi, i1 false
  %getelementptr8 = getelementptr inbounds i8, ptr %phi4, i64 16
  %icmp9 = icmp eq ptr %getelementptr8, %arg1
  br i1 %icmp9, label %bb10, label %bb3

bb10:                                             ; preds = %bb3
  br i1 %select, label %bb11, label %bb12

bb11:                                             ; preds = %bb10, %bb
  call void @spam()
  br label %bb13

bb12:                                             ; preds = %bb10
  call void @spam.1()
  br label %bb13

bb13:                                             ; preds = %bb12, %bb11
  ret void
}

declare void @spam()

declare void @spam.1()

this is roughly

void f(A* begin, A* end) {
  bool all_same_val = true;
  for (A* a : begin to end) {
    if (a.value != begin->value) {
      all_same_val = false;
    }
  }
  if (all_same_val) {
    g();
  } else {
    h();
  }
}

in the good (with this reverted), we see

%bc.merge.rdx24 = phi i1 [ true, %iter.check ], [ %rdx.select9, %vec.epilog.iter.check ], [ %rdx.select23, %vec.epilog.middle.block ]

and in the bad (at ToT) we see

%bc.merge.rdx21 = phi i1 [ true, %iter.check ], [ true, %vec.epilog.iter.check ], [ %rdx.select20, %vec.epilog.middle.block ]

@fhahn can we revert this for now?

fhahn · 2024-04-16T20:27:29Z

@aeubanks thanks for the reproducer! Will take a look/revert tomorrow morning when I am back in the office. Feel free to revert in the meantime if it helps to unblock you.

This reverts commit c6e38b9. Causes miscompiles, see comments on #78304.

aeubanks · 2024-04-16T20:44:03Z

thanks, reverted in c6e0162

This patch adds an assert to createAndCollectMergePhiForReduction to make sure there is a resume phi when vectorizing the epilogue loop. This is needed to set the resume value from the main vector loop. This assertion guards against the issue caused the revert of #78304.

Additional test case from #78304.

This reverts the revert commit c6e0162. This patch includes a fix for any-of reductions and epilogue vectorization. Extra test coverage for the issue that caused the revert has been added in bce3bfc and an assertion has been added in c7209cb. -------------------------------- Original commit message: Update AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block. The patch incorporates feedback from https://reviews.llvm.org/D153697. This fixes the #62565, as now there aren't multiple uses of the start/new values. Fixes #62565 PR: #78304

fhahn · 2024-05-03T15:04:24Z

@aeubanks thanks for the reproducer, should be fixed in the recommitted version bccb7ed. Also added an assert to catch the issue.

fhahn requested review from Mel-Chen, ayalz and aniragil January 16, 2024 15:59

llvmbot added vectorizers llvm:transforms labels Jan 16, 2024

fhahn mentioned this pull request Jan 16, 2024

[LoopVectorize] Failure to compare multiple inputs as the same (splatted) value #62565

Closed

Mel-Chen reviewed Jan 17, 2024

View reviewed changes

Mel-Chen reviewed Jan 26, 2024

View reviewed changes

fhahn added 3 commits January 28, 2024 21:32

Merge branch 'main' into lv-anyof-red

bfd5d93

Merge branch 'main' into lv-anyof-red

0732641

!fixup update remainng tests.

0057eb2

ayalz reviewed Jan 31, 2024

View reviewed changes

fhahn added 2 commits February 3, 2024 20:26

Merge branch 'main' into lv-anyof-red

eafe294

!fixup address latest comments, thanks!

f9fea8e

Mel-Chen approved these changes Mar 7, 2024

View reviewed changes

fhahn added 2 commits March 13, 2024 16:13

Merge branch 'main' into lv-anyof-red

fa841cd

Merge branch 'main' into lv-anyof-red

ba35741

fhahn merged commit 95fef1d into llvm:main Mar 14, 2024

fhahn deleted the lv-anyof-red branch March 14, 2024 11:22

kstoimenov added a commit that referenced this pull request Mar 14, 2024

Revert "[LV] Improve AnyOf reduction codegen. (#78304)"

589c7ab

Broke sanitizer bots: https://lab.llvm.org/buildbot/#/builders/74/builds/26697 This reverts commit 95fef1d.

aeubanks added a commit that referenced this pull request Apr 16, 2024

Revert "Reapply "[LV] Improve AnyOf reduction codegen. (#78304)""

c6e0162

This reverts commit c6e38b9. Causes miscompiles, see comments on #78304.

fhahn added a commit that referenced this pull request May 2, 2024

[LV] Add another epilogue test with an AnyOfReduction of i1.

bce3bfc

Additional test case from #78304.

		TrackingVH<Value> ReductionStartValue =
		RdxDesc.getRecurrenceStartValue();

[LV] Improve AnyOf reduction codegen. #78304

[LV] Improve AnyOf reduction codegen. #78304

Conversation

fhahn commented Jan 16, 2024

llvmbot commented Jan 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fhahn commented Jan 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ayalz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fhahn commented Feb 25, 2024

Mel-Chen left a comment

Choose a reason for hiding this comment

RKSimon commented Mar 14, 2024

kstoimenov commented Mar 14, 2024

aleks-tmb commented Mar 29, 2024

fhahn commented Apr 3, 2024

fhahn commented Apr 5, 2024

joanahalili commented Apr 12, 2024

aeubanks commented Apr 15, 2024 • edited Loading

aeubanks commented Apr 15, 2024

aeubanks commented Apr 15, 2024

aeubanks commented Apr 15, 2024 • edited Loading

aeubanks commented Apr 16, 2024

fhahn commented Apr 16, 2024

aeubanks commented Apr 16, 2024

fhahn commented May 3, 2024

aeubanks commented Apr 15, 2024 •

edited

Loading

aeubanks commented Apr 15, 2024 •

edited

Loading