Skip to content

[LV] Improve AnyOf reduction codegen. #78304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 14, 2024
Merged

[LV] Improve AnyOf reduction codegen. #78304

merged 8 commits into from
Mar 14, 2024

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Jan 16, 2024

Update AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block.

The patch incorporates feedback from https://reviews.llvm.org/D153697.

This fixes the #62565, as now there aren't multiple uses of the start/new values.

Fixes #62565

Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.

This fixes the llvm#62565, as now there aren't multiple uses of the
start/new values.

Fixes llvm#62565
@llvmbot
Copy link
Member

llvmbot commented Jan 16, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Update AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block.

The patch incorporates feedback from https://reviews.llvm.org/D153697.

This fixes the #62565, as now there aren't multiple uses of the start/new values.

Fixes #62565


Patch is 78.76 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/78304.diff

10 Files Affected:

  • (modified) llvm/include/llvm/Transforms/Utils/LoopUtils.h (-9)
  • (modified) llvm/lib/Transforms/Utils/LoopUtils.cpp (+6-18)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h (+1)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+41-3)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+4-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll (+15-23)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll (+44-76)
  • (modified) llvm/test/Transforms/LoopVectorize/select-cmp-predicated.ll (+15-14)
  • (modified) llvm/test/Transforms/LoopVectorize/select-cmp.ll (+74-72)
  • (modified) llvm/test/Transforms/LoopVectorize/select-reduction-start-value-may-be-undef-or-poison.ll (+18-25)
diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index 5a1385d01d8e44..3bad7b616d9d75 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -363,15 +363,6 @@ Intrinsic::ID getMinMaxReductionIntrinsicOp(RecurKind RK);
 /// Returns the comparison predicate used when expanding a min/max reduction.
 CmpInst::Predicate getMinMaxReductionPredicate(RecurKind RK);
 
-/// See RecurrenceDescriptor::isAnyOfPattern for a description of the pattern we
-/// are trying to match. In this pattern, we are only ever selecting between two
-/// values: 1) an initial start value \p StartVal of the reduction PHI, and 2) a
-/// loop invariant value. If any of lane value in \p Left, \p Right is not equal
-/// to \p StartVal, select the loop invariant value. This is done by selecting
-/// \p Right iff \p Left is equal to \p StartVal.
-Value *createAnyOfOp(IRBuilderBase &Builder, Value *StartVal, RecurKind RK,
-                     Value *Left, Value *Right);
-
 /// Returns a Min/Max operation corresponding to MinMaxRecurrenceKind.
 /// The Builder's fast-math-flags must be set to propagate the expected values.
 Value *createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value *Left,
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 59485126b280ab..c0582fb7d7e150 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -962,15 +962,6 @@ CmpInst::Predicate llvm::getMinMaxReductionPredicate(RecurKind RK) {
   }
 }
 
-Value *llvm::createAnyOfOp(IRBuilderBase &Builder, Value *StartVal,
-                           RecurKind RK, Value *Left, Value *Right) {
-  if (auto VTy = dyn_cast<VectorType>(Left->getType()))
-    StartVal = Builder.CreateVectorSplat(VTy->getElementCount(), StartVal);
-  Value *Cmp =
-      Builder.CreateCmp(CmpInst::ICMP_NE, Left, StartVal, "rdx.select.cmp");
-  return Builder.CreateSelect(Cmp, Left, Right, "rdx.select");
-}
-
 Value *llvm::createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value *Left,
                             Value *Right) {
   Type *Ty = Left->getType();
@@ -1079,16 +1070,13 @@ Value *llvm::createAnyOfTargetReduction(IRBuilderBase &Builder, Value *Src,
     NewVal = SI->getTrueValue();
   }
 
-  // Create a splat vector with the new value and compare this to the vector
-  // we want to reduce.
-  ElementCount EC = cast<VectorType>(Src->getType())->getElementCount();
-  Value *Right = Builder.CreateVectorSplat(EC, InitVal);
-  Value *Cmp =
-      Builder.CreateCmp(CmpInst::ICMP_NE, Src, Right, "rdx.select.cmp");
-
   // If any predicate is true it means that we want to select the new value.
-  Cmp = Builder.CreateOrReduce(Cmp);
-  return Builder.CreateSelect(Cmp, NewVal, InitVal, "rdx.select");
+  Value *AnyOf =
+      Src->getType()->isVectorTy() ? Builder.CreateOrReduce(Src) : Src;
+  // The compares in the loop may yield poison, which propagates through the
+  // bitwise ORs. Freeze it here before the condition is used.
+  AnyOf = Builder.CreateFreeze(AnyOf);
+  return Builder.CreateSelect(AnyOf, NewVal, InitVal, "rdx.select");
 }
 
 Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder, Value *Src,
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index a7ebf78e54ceb6..9d3ef5b96c72fb 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -68,6 +68,7 @@ class VPBuilder {
 public:
   VPBuilder() = default;
   VPBuilder(VPBasicBlock *InsertBB) { setInsertPoint(InsertBB); }
+  VPBuilder(VPRecipeBase *InsertPt) { setInsertPoint(InsertPt); }
 
   /// Clear the insertion point: created instructions will not be inserted into
   /// a block.
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index aa5d1bfa57d535..973bcf4f6c2d7c 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7428,7 +7428,8 @@ static void createAndCollectMergePhiForReduction(
   auto *PhiR = cast<VPReductionPHIRecipe>(RedResult->getOperand(0));
   const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();
 
-  TrackingVH<Value> ReductionStartValue = RdxDesc.getRecurrenceStartValue();
+  TrackingVH<Value> ReductionStartValue =
+      State.get(PhiR->getStartValue(), VPIteration(0, 0));
   Value *FinalValue =
       State.get(RedResult, VPIteration(State.UF - 1, VPLane::getFirstLane()));
   auto *ResumePhi =
@@ -7452,7 +7453,7 @@ static void createAndCollectMergePhiForReduction(
       BCBlockPhi->addIncoming(ResumePhi->getIncomingValueForBlock(Incoming),
                               Incoming);
     else
-      BCBlockPhi->addIncoming(ReductionStartValue, Incoming);
+      BCBlockPhi->addIncoming(RdxDesc.getRecurrenceStartValue(), Incoming);
   }
 
   auto *OrigPhi = cast<PHINode>(PhiR->getUnderlyingValue());
@@ -9054,6 +9055,41 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
       continue;
 
     const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();
+    // Adjust AnyOf reductions; replace the reduction phi for the selected value
+    // with a boolean reduction phi node to check if the condition is true in
+    // any iteration. The final value is selected by the final
+    // ComputeReductionResult.
+    if (RecurrenceDescriptor::isAnyOfRecurrenceKind(
+            RdxDesc.getRecurrenceKind())) {
+      auto *Select = cast<VPRecipeBase>(*find_if(PhiR->users(), [](VPUser *U) {
+        return isa<VPWidenSelectRecipe>(U) ||
+               (isa<VPReplicateRecipe>(U) &&
+                cast<VPReplicateRecipe>(U)->getUnderlyingInstr()->getOpcode() ==
+                    Instruction::Select);
+      }));
+      VPValue *Cmp = Select->getOperand(0);
+      // If the compare is checking the reduction PHI node, adjust it to check
+      // the start value.
+      if (VPRecipeBase *CmpR = Cmp->getDefiningRecipe()) {
+        for (unsigned I = 0; I != CmpR->getNumOperands(); ++I)
+          if (CmpR->getOperand(I) == PhiR)
+            CmpR->setOperand(I, PhiR->getStartValue());
+      }
+      VPBuilder::InsertPointGuard Guard(Builder);
+      Builder.setInsertPoint(Select);
+
+      // If the true value of the select is the reduction phi, the new value is
+      // selected if the negated condition is true in any iteration.
+      if (Select->getOperand(1) == PhiR)
+        Cmp = Builder.createNot(Cmp);
+      VPValue *Or = Builder.createOr(PhiR, Cmp);
+      Select->getVPSingleValue()->replaceAllUsesWith(Or);
+
+      // Convert the reduction phi to operate on bools.
+      PhiR->setOperand(0, Plan->getVPValueOrAddLiveIn(ConstantInt::getFalse(
+                              OrigLoop->getHeader()->getContext())));
+    }
+
     // If tail is folded by masking, introduce selects between the phi
     // and the live-out instruction of each reduction, at the beginning of the
     // dedicated latch block.
@@ -9086,7 +9122,9 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
     // then extend the loop exit value to enable InstCombine to evaluate the
     // entire expression in the smaller type.
     Type *PhiTy = PhiR->getStartValue()->getLiveInIRValue()->getType();
-    if (MinVF.isVector() && PhiTy != RdxDesc.getRecurrenceType()) {
+    if (MinVF.isVector() && PhiTy != RdxDesc.getRecurrenceType() &&
+        !RecurrenceDescriptor::isAnyOfRecurrenceKind(
+            RdxDesc.getRecurrenceKind())) {
       assert(!PhiR->isInLoop() && "Unexpected truncated inloop reduction!");
       Type *RdxTy = RdxDesc.getRecurrenceType();
       auto *Trunc =
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index c553e2c9e76839..f87461fd548bd7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -452,8 +452,7 @@ Value *VPInstruction::generateInstruction(VPTransformState &State,
         else if (RecurrenceDescriptor::isAnyOfRecurrenceKind(RK)) {
           TrackingVH<Value> ReductionStartValue =
               RdxDesc.getRecurrenceStartValue();
-          ReducedPartRdx = createAnyOfOp(Builder, ReductionStartValue, RK,
-                                         ReducedPartRdx, RdxPart);
+          ReducedPartRdx = Builder.CreateOr(ReducedPartRdx, RdxPart);
         } else
           ReducedPartRdx = createMinMaxOp(Builder, RK, ReducedPartRdx, RdxPart);
       }
@@ -461,7 +460,9 @@ Value *VPInstruction::generateInstruction(VPTransformState &State,
 
     // Create the reduction after the loop. Note that inloop reductions create
     // the target reduction in the loop using a Reduction recipe.
-    if (State.VF.isVector() && !PhiR->isInLoop()) {
+    if ((State.VF.isVector() ||
+         RecurrenceDescriptor::isAnyOfRecurrenceKind(RK)) &&
+        !PhiR->isInLoop()) {
       ReducedPartRdx =
           createTargetReduction(Builder, RdxDesc, ReducedPartRdx, OrigPhi);
       // If the reduction can be performed in a smaller type, we need to extend
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
index 1c26ee8479e578..493ffd2ca569de 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
@@ -8,13 +8,12 @@ target triple = "aarch64-linux-gnu"
 define i32 @select_const_i32_from_icmp(ptr nocapture readonly %v, i64 %n) #0 {
 ; CHECK-VF4IC1-LABEL: @select_const_i32_from_icmp
 ; CHECK-VF4IC1:      vector.body:
-; CHECK-VF4IC1:        [[VEC_PHI:%.*]] = phi <vscale x 4 x i32> [ shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
+; CHECK-VF4IC1:        [[VEC_PHI:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
 ; CHECK-VF4IC1:        [[VEC_LOAD:%.*]] = load <vscale x 4 x i32>
 ; CHECK-VF4IC1-NEXT:   [[VEC_ICMP:%.*]] = icmp eq <vscale x 4 x i32> [[VEC_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT:   [[VEC_SEL]] = select <vscale x 4 x i1> [[VEC_ICMP]], <vscale x 4 x i32> [[VEC_PHI]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 7, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
+; CHECK-VF4IC1-NEXT:   [[VEC_SEL]] = or <vscale x 4 x i1> [[VEC_PHI]], [[VEC_ICMP]]
 ; CHECK-VF4IC1:      middle.block:
-; CHECK-VF4IC1-NEXT:   [[FIN_ICMP:%.*]] = icmp ne <vscale x 4 x i32> [[VEC_SEL]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[FIN_ICMP]])
+; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[VEC_SEL]])
 ; CHECK-VF4IC1-NEXT:   {{.*}} = select i1 [[OR_RDX]], i32 7, i32 3
 
 ; CHECK-VF4IC4-LABEL: @select_const_i32_from_icmp
@@ -62,20 +61,15 @@ exit:                                     ; preds = %for.body
 define i32 @select_i32_from_icmp(ptr nocapture readonly %v, i32 %a, i32 %b, i64 %n) #0 {
 ; CHECK-VF4IC1-LABEL: @select_i32_from_icmp
 ; CHECK-VF4IC1:      vector.ph:
-; CHECK-VF4IC1:        [[TMP1:%.*]] = insertelement <vscale x 4 x i32> poison, i32 %a, i64 0
-; CHECK-VF4IC1-NEXT:   [[SPLAT_OF_A:%.*]] = shufflevector <vscale x 4 x i32> [[TMP1]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
-; CHECK-VF4IC1-NEXT:   [[TMP2:%.*]] = insertelement <vscale x 4 x i32> poison, i32 %b, i64 0
-; CHECK-VF4IC1-NEXT:   [[SPLAT_OF_B:%.*]] = shufflevector <vscale x 4 x i32> [[TMP2]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
+; CHECK-VF4IC1-NOT:    shufflevector <vscale x 4 x i32>
+; CHECK-VF4IC1-NOT:    shufflevector <vscale x 4 x i32>
 ; CHECK-VF4IC1:      vector.body:
-; CHECK-VF4IC1:        [[VEC_PHI:%.*]] = phi <vscale x 4 x i32> [ [[SPLAT_OF_A]], %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
+; CHECK-VF4IC1:        [[VEC_PHI:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
 ; CHECK-VF4IC1:        [[VEC_LOAD:%.*]] = load <vscale x 4 x i32>
 ; CHECK-VF4IC1-NEXT:   [[VEC_ICMP:%.*]] = icmp eq <vscale x 4 x i32> [[VEC_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT:   [[VEC_SEL]] = select <vscale x 4 x i1> [[VEC_ICMP]], <vscale x 4 x i32> [[VEC_PHI]], <vscale x 4 x i32> [[SPLAT_OF_B]]
+; CHECK-VF4IC1-NEXT:   [[VEC_SEL]] = or <vscale x 4 x i1> [[VEC_PHI]], [[VEC_ICMP]]
 ; CHECK-VF4IC1:      middle.block:
-; CHECK-VF4IC1-NEXT:   [[FIN_INS:%.*]] = insertelement <vscale x 4 x i32> poison, i32 %a, i64 0
-; CHECK-VF4IC1-NEXT:   [[FIN_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[FIN_INS]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
-; CHECK-VF4IC1-NEXT:   [[FIN_CMP:%.*]] = icmp ne <vscale x 4 x i32> [[VEC_SEL]], [[FIN_SPLAT]]
-; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[FIN_CMP]])
+; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[VEC_SEL]])
 ; CHECK-VF4IC1-NEXT:   {{.*}} = select i1 [[OR_RDX]], i32 %b, i32 %a
 
 ; CHECK-VF4IC4-LABEL: @select_i32_from_icmp
@@ -101,13 +95,12 @@ exit:                                     ; preds = %for.body
 define i32 @select_const_i32_from_fcmp(ptr nocapture readonly %v, i64 %n) #0 {
 ; CHECK-VF4IC1-LABEL: @select_const_i32_from_fcmp
 ; CHECK-VF4IC1:      vector.body:
-; CHECK-VF4IC1:        [[VEC_PHI:%.*]] = phi <vscale x 4 x i32> [ shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 2, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
+; CHECK-VF4IC1:        [[VEC_PHI:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
 ; CHECK-VF4IC1:        [[VEC_LOAD:%.*]] = load <vscale x 4 x float>
 ; CHECK-VF4IC1-NEXT:   [[VEC_ICMP:%.*]] = fcmp fast ueq <vscale x 4 x float> [[VEC_LOAD]], shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 3.000000e+00, i64 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT:   [[VEC_SEL]] = select <vscale x 4 x i1> [[VEC_ICMP]], <vscale x 4 x i32> [[VEC_PHI]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
+; CHECK-VF4IC1-NEXT:   [[VEC_SEL]] = or <vscale x 4 x i1> [[VEC_PHI]], [[VEC_ICMP]]
 ; CHECK-VF4IC1:      middle.block:
-; CHECK-VF4IC1-NEXT:   [[FIN_ICMP:%.*]] = icmp ne <vscale x 4 x i32> [[VEC_SEL]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 2, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[FIN_ICMP]])
+; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[VEC_SEL]])
 ; CHECK-VF4IC1-NEXT:   {{.*}} = select i1 [[OR_RDX]], i32 1, i32 2
 
 ; CHECK-VF4IC4-LABEL: @select_const_i32_from_fcmp
@@ -156,16 +149,15 @@ exit:                                     ; preds = %for.body
 define i32 @pred_select_const_i32_from_icmp(ptr noalias nocapture readonly %src1, ptr noalias nocapture readonly %src2, i64 %n) #0 {
 ; CHECK-VF4IC1-LABEL: @pred_select_const_i32_from_icmp
 ; CHECK-VF4IC1:      vector.body:
-; CHECK-VF4IC1:        [[VEC_PHI:%.*]] = phi <vscale x 4 x i32> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
+; CHECK-VF4IC1:        [[VEC_PHI:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
 ; CHECK-VF4IC1:        [[VEC_LOAD:%.*]] = load <vscale x 4 x i32>
 ; CHECK-VF4IC1:        [[MASK:%.*]] = icmp sgt <vscale x 4 x i32> [[VEC_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 35, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
 ; CHECK-VF4IC1:        [[MASKED_LOAD:%.*]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr {{%.*}}, i32 4, <vscale x 4 x i1> [[MASK]], <vscale x 4 x i32> poison)
 ; CHECK-VF4IC1-NEXT:   [[VEC_ICMP:%.*]] = icmp eq <vscale x 4 x i32> [[MASKED_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 2, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT:   [[VEC_SEL_TMP:%.*]] = select <vscale x 4 x i1> [[VEC_ICMP]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> [[VEC_PHI]]
-; CHECK-VF4IC1:        [[VEC_SEL:%.*]] = select <vscale x 4 x i1> [[MASK]], <vscale x 4 x i32> [[VEC_SEL_TMP]], <vscale x 4 x i32> [[VEC_PHI]]
+; CHECK-VF4IC1-NEXT:   [[VEC_SEL_TMP:%.*]] = or <vscale x 4 x i1> [[VEC_PHI]], [[VEC_ICMP]]
+; CHECK-VF4IC1:        [[VEC_SEL:%.*]] = select <vscale x 4 x i1> [[MASK]], <vscale x 4 x i1> [[VEC_SEL_TMP]], <vscale x 4 x i1> [[VEC_PHI]]
 ; CHECK-VF4IC1:      middle.block:
-; CHECK-VF4IC1-NEXT:   [[FIN_ICMP:%.*]] = icmp ne <vscale x 4 x i32> [[VEC_SEL]], zeroinitializer
-; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[FIN_ICMP]])
+; CHECK-VF4IC1-NEXT:   [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[VEC_SEL]])
 ; CHECK-VF4IC1-NEXT:   {{.*}} = select i1 [[OR_RDX]], i32 1, i32 0
 
 ; CHECK-VF4IC4-LABEL: @pred_select_const_i32_from_icmp
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll b/llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll
index 34a7987bb40abe..6c50915c88e824 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll
@@ -12,25 +12,22 @@ define i32 @select_icmp(i32 %x, i32 %y, ptr nocapture readonly %c, i64 %n) #0 {
 ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 %n, [[N_MOD_VF]]
 ; CHECK-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[X:%.*]], i64 0
 ; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
-; CHECK-NEXT:    [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <4 x i32> poison, i32 [[Y:%.*]], i64 0
-; CHECK-NEXT:    [[BROADCAST_SPLAT2:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT1]], <4 x i32> poison, <4 x i32> zeroinitializer
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
-; CHECK-NEXT:    [[VEC_PHI:%.*]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP5:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[VEC_PHI:%.*]] = phi <4 x i1> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP5:%.*]], [[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP0:%.*]] = add i64 [[INDEX]], 0
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[C:%.*]], i64 [[TMP0]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
 ; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP2]], align 4
 ; CHECK-NEXT:    [[TMP4:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
-; CHECK-NEXT:    [[TMP5]] = select <4 x i1> [[TMP4]], <4 x i32> [[VEC_PHI]], <4 x i32> [[BROADCAST_SPLAT2]]
+; CHECK-NEXT:    [[TMP5]] = or <4 x i1> [[VEC_PHI]], [[TMP4]]
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
 ; CHECK-NEXT:    [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
 ; CHECK-NEXT:    br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [...
[truncated]

Comment on lines 11 to 17
; CHECK-VF4IC1: [[VEC_PHI:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
; CHECK-VF4IC1: [[VEC_LOAD:%.*]] = load <vscale x 4 x i32>
; CHECK-VF4IC1-NEXT: [[VEC_ICMP:%.*]] = icmp eq <vscale x 4 x i32> [[VEC_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
; CHECK-VF4IC1-NEXT: [[VEC_SEL]] = select <vscale x 4 x i1> [[VEC_ICMP]], <vscale x 4 x i32> [[VEC_PHI]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 7, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
; CHECK-VF4IC1-NEXT: [[VEC_SEL]] = or <vscale x 4 x i1> [[VEC_PHI]], [[VEC_ICMP]]
; CHECK-VF4IC1: middle.block:
; CHECK-VF4IC1-NEXT: [[FIN_ICMP:%.*]] = icmp ne <vscale x 4 x i32> [[VEC_SEL]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
; CHECK-VF4IC1-NEXT: [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[FIN_ICMP]])
; CHECK-VF4IC1-NEXT: [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[VEC_SEL]])
; CHECK-VF4IC1-NEXT: {{.*}} = select i1 [[OR_RDX]], i32 7, i32 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad to see someone willing to address this issue. It was also in my task list, so I have some information to share with you, hoping it will be helpful. I've divided the cases of AnyOf into two types:

  1. True-side update
rdx = <condition> ? <loop-invariant-rhs> : rdx

It can be transformed to OR reduction:

%vmask = <all-false>
for () {
  %pred = <widened-predicate>
  %vmask |= %pred
}
%rdx = reduce.or %vmask 
%red = select %rdx, <loop-invariant-rhs>, <initial-value>
  1. False-side update:
rdx = <condition> ? rdx :  <loop-invariant-rhs>

It can be transformed to OR reduction with NOT condition:

%vmask = <all-false>
for () {
  %pred = <widened-predicate>
  %pred.not = (%pred == <all-false>)
  %vmask |= %pred.not
}
%rdx = reduce.or %vmask 
%red = select %rdx, <loop-invariant-rhs>, <initial-value>

Or, to AND reduction:

%vmask = <all-true>
for () {
  %pred = <widened-predicate>
  %vmask &= %pred
}
%rdx = reduce.and %vmask
%red = select %rdx, <initial-value>, <loop-invariant-rhs>

Take @select_const_i32_from_icmp as an example, this should be the type 2, so there may be bugs with the current transformation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the updates in this particular test file are still stale and need updating. The current codegen generates OR (NOT ..), as should be visible in llvm/test/Transforms/LoopVectorize/select-cmp.ll

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.
But besides this case, llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll also lacks the xor operation. I haven't thoroughly checked if other test cases have similar situations.
Could you please update it? Thank you.

@fhahn
Copy link
Contributor Author

fhahn commented Jan 25, 2024

ping :)

Comment on lines 453 to 454
TrackingVH<Value> ReductionStartValue =
RdxDesc.getRecurrenceStartValue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed. And after it is clean up, the braces can be removed too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Folded into the CreateBinaryOp above, thanks!

Comment on lines 11 to 17
; CHECK-VF4IC1: [[VEC_PHI:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
; CHECK-VF4IC1: [[VEC_LOAD:%.*]] = load <vscale x 4 x i32>
; CHECK-VF4IC1-NEXT: [[VEC_ICMP:%.*]] = icmp eq <vscale x 4 x i32> [[VEC_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
; CHECK-VF4IC1-NEXT: [[VEC_SEL]] = select <vscale x 4 x i1> [[VEC_ICMP]], <vscale x 4 x i32> [[VEC_PHI]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 7, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
; CHECK-VF4IC1-NEXT: [[VEC_SEL]] = or <vscale x 4 x i1> [[VEC_PHI]], [[VEC_ICMP]]
; CHECK-VF4IC1: middle.block:
; CHECK-VF4IC1-NEXT: [[FIN_ICMP:%.*]] = icmp ne <vscale x 4 x i32> [[VEC_SEL]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
; CHECK-VF4IC1-NEXT: [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[FIN_ICMP]])
; CHECK-VF4IC1-NEXT: [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[VEC_SEL]])
; CHECK-VF4IC1-NEXT: {{.*}} = select i1 [[OR_RDX]], i32 7, i32 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.
But besides this case, llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll also lacks the xor operation. I haven't thoroughly checked if other test cases have similar situations.
Could you please update it? Thank you.

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up on this!

Src->getType()->isVectorTy() ? Builder.CreateOrReduce(Src) : Src;
// The compares in the loop may yield poison, which propagates through the
// bitwise ORs. Freeze it here before the condition is used.
AnyOf = Builder.CreateFreeze(AnyOf);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further refactoring thought: the AnyOf boolean reduction completes with the CreateOrReduce(), and freezing it. Similar to a plain "result |= value[i]" OR reduction, on i1 values, where freezing may also be needed? The subsequent select deserves a separate recipe, which post-processes the result of the AnyOf reduction and also depends on the "IfAny" and "Else" (or "Start" and "Other") Live-In VP/Values directly, rather than looking for them here and now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to a plain "result |= value[i]" OR reduction,

Freeze won't be needed in that case; if there's already a binary OR in the input, poison from the compare gets already propagated. It is only needed when converting from the select form (which doesn't propagate poison from the condition to its result)

Yes, but this will need a bit of additional refactoring, in particular how createAndCollectMergePhiForReduction looks up the reduction result value and when ComputeReductionResult VPInstructions are created.

@@ -7486,7 +7486,8 @@ static void createAndCollectMergePhiForReduction(
auto *PhiR = cast<VPReductionPHIRecipe>(RedResult->getOperand(0));
const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();

TrackingVH<Value> ReductionStartValue = RdxDesc.getRecurrenceStartValue();
TrackingVH<Value> ReductionStartValue =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ReductionStartValue still used, after inlining it below? Why/is this change needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed any longer, removed, thanks!

@@ -9110,6 +9111,41 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
continue;

const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();
// Adjust AnyOf reductions; replace the reduction phi for the selected value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjustRecipesForReductions() is getting excessively long, should be refactored. Its documentation above should (also) include the adjustment of AnyOf reductions described here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, probably best as followup?

(isa<VPReplicateRecipe>(U) &&
cast<VPReplicateRecipe>(U)->getUnderlyingInstr()->getOpcode() ==
Instruction::Select);
}));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: assert(Select && "a meaningful error message");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast should already assert to check for non-null.

if (MinVF.isVector() && PhiTy != RdxDesc.getRecurrenceType()) {
if (MinVF.isVector() && PhiTy != RdxDesc.getRecurrenceType() &&
!RecurrenceDescriptor::isAnyOfRecurrenceKind(
RdxDesc.getRecurrenceKind())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be it better to truncate AnyOf reductions to smaller type (boolean) here instead of above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only reach this path now because we adjust PhiR's start value to a bool. It requires more than plain truncates as below, so I think it's probably worth to keep it separate. I think it also needs handling before introducing selects for tail-folding; otherwise those selects would also need updating.

@@ -453,16 +453,17 @@ Value *VPInstruction::generateInstruction(VPTransformState &State,
else if (RecurrenceDescriptor::isAnyOfRecurrenceKind(RK)) {
TrackingVH<Value> ReductionStartValue =
RdxDesc.getRecurrenceStartValue();
ReducedPartRdx = createAnyOfOp(Builder, ReductionStartValue, RK,
ReducedPartRdx, RdxPart);
ReducedPartRdx = Builder.CreateOr(ReducedPartRdx, RdxPart);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Op be set to OR and treated by CreateBinOp() above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted, thanks!

if (State.VF.isVector() && !PhiR->isInLoop()) {
if ((State.VF.isVector() ||
RecurrenceDescriptor::isAnyOfRecurrenceKind(RK)) &&
!PhiR->isInLoop()) {
ReducedPartRdx =
createTargetReduction(Builder, RdxDesc, ReducedPartRdx, OrigPhi);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As raised above, it may be better to have ComputeReductionResult recipe take care only of reducing AnyOf to a boolean here, followed by a Select recipe to chose between Start and Other live-in values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this will need a bit of additional refactoring, in particular how createAndCollectMergePhiForReduction looks up the reduction result value and when ComputeReductionResult VPInstructions are created.

}));
VPValue *Cmp = Select->getOperand(0);
// If the compare is checking the reduction PHI node, adjust it to check
// the start value.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, either the true or false values of the select should use PhiR, but only it? The condition of an AnyOf reduction should be any boolean-expression predicate that depends on the current iteration only, i.e., should be a recipe (if it's live-in the entire select should be LICM'd), independent of AnyOf's PhiR, neither directly nor indirectly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, AnyOf reduction are also formed for code like

define i32 @select_i32_from_icmp_same_inputs(i32 %a, i32 %b, i64 %n) {                                   
entry:
  br label %for.body

for.body:                                      ; preds = %entry, %for.body
  %0 = phi i64 [ 0, %entry ], [ %4, %for.body ]
  %1 = phi i32 [ %a, %entry ], [ %3, %for.body ]
  %2 = icmp eq i32 %1, 3
  %3 = select i1 %2, i32 %1, i32 %b
  %4 = add nuw nsw i64 %0, 1
  %5 = icmp eq i64 %4, %n
  br i1 %5, label %exit, label %for.body

exit:                                     ; preds = %for.body
  ret i32 %3
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayalz The current recognition for AnyOf reduction involves a set of cmp- select instruction and cannot identify cases with only a single select. This requires expanding isAnyOfPattern or applying conditional reduction to model the AnyOf idiom for support.

// If the true value of the select is the reduction phi, the new value is
// selected if the negated condition is true in any iteration.
if (Select->getOperand(1) == PhiR)
Cmp = Builder.createNot(Cmp);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Mel pointed out, this case of "Red = cond ? PhiR : Other" where a single false cond suffices for the result to be Other, could be considered an AllOf reduction starting with true using createAnd() instead of createOr(), resulting in "AND(conds) ? Start : Other".

Negating cond swaps the operands and translates into the "Red = !cond ? Other : PhiR" form of an AnyOf reduction starting with false and using createOr(), where a single true !cond suffices for the result to be Other, resulting in "OR(!conds) ? Other : Start".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this could be handled like that, but I think then we would need to support both patterns here and also in codegen, so negating seems simpler (and the negation should be removable by instcombine). Left as is for now, but happy to adjust if needed. But then it should probably be modeled as AllOf directly in the reduction descriptor.

@fhahn
Copy link
Contributor Author

fhahn commented Feb 25, 2024

ping :)

Copy link
Contributor

@Mel-Chen Mel-Chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fhahn fhahn merged commit 95fef1d into llvm:main Mar 14, 2024
@fhahn fhahn deleted the lv-anyof-red branch March 14, 2024 11:22
@RKSimon
Copy link
Collaborator

RKSimon commented Mar 14, 2024

@fhahn I'm not certain (and I haven't been able to repro as I don't have a LTO stage2 build) but do you think this patch could be causing the failure on https://lab.llvm.org/buildbot/#/builders/67/builds/14877 ?

kstoimenov added a commit that referenced this pull request Mar 14, 2024
@kstoimenov
Copy link
Contributor

@fhahn reverted because of sanitizer bot failure. There is a stack trace here if it helps: https://lab.llvm.org/buildbot/#/builders/74/builds/26697/steps/9/logs/stdio

@aleks-tmb
Copy link
Contributor

@fhahn Hi, this patch has been reverted, but before it we got a failure in our internal testing.
The reduced IR is:

define void @test(i64 %limit, i8 %guard) {
entry:
  br label %pre
pre:                                              ; preds = %loop, %entry
  %iv0 = phi i8 [ 0, %entry ], [ %select, %loop ]
  %cmp = icmp sgt i8 %iv0, %guard
  br label %loop

loop:                                             ; preds = %loop, %pre
  %iv1 = phi i64 [ %iv1.next, %loop ], [ 0, %pre ]
  %iv2 = phi i8 [ %select, %loop ], [ %iv0, %pre ]
  %select = select i1 %cmp, i8 %iv2, i8 0
  %iv1.next = add nuw nsw i64 %iv1, 1
  %check = icmp slt i64 %iv1.next, %limit
  br i1 %check, label %loop, label %pre
}

To reproduce, take this patch and run bin/opt -mtriple=x86_64-unknown-linux-gnu -passes=loop-vectorize test.ll

The crash backtrace:

opt: /home/apopov/llvm-project/llvm/lib/IR/Instructions.cpp:3327: static llvm::BinaryOperator* llvm::BinaryOperator::Create(llvm::Instruction::BinaryOps, llvm::Value*, llvm::Value*, const llvm::Twine&, llvm::Instruction*): Assertion `S1->getType() == S2->getType() && "Cannot create binary operator with two operands of differing type!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: bin/opt -mtriple=x86_64-unknown-linux-gnu -passes=loop-vectorize ../../orca/build/test.ll -S
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  libLLVM.so.19.0git 0x00007f1528cb3fc8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 82
1  libLLVM.so.19.0git 0x00007f1528cb43de
2  libLLVM.so.19.0git 0x00007f1528cb1964 llvm::sys::RunSignalHandlers() + 159
3  libLLVM.so.19.0git 0x00007f1528cb3882
4  libc.so.6          0x00007f1525b91090
5  libc.so.6          0x00007f1525b9100b gsignal + 203
6  libc.so.6          0x00007f1525b70859 abort + 299
7  libc.so.6          0x00007f1525b70729
8  libc.so.6          0x00007f1525b81fd6
9  libLLVM.so.19.0git 0x00007f1528f9b1cb llvm::BinaryOperator::Create(llvm::Instruction::BinaryOps, llvm::Value*, llvm::Value*, llvm::Twine const&, llvm::Instruction*) + 97
10 libLLVM.so.19.0git 0x00007f1528dff294
11 libLLVM.so.19.0git 0x00007f152b5487c1 llvm::VPInstruction::generateInstruction(llvm::VPTransformState&, unsigned int) + 497
12 libLLVM.so.19.0git 0x00007f152b549c12 llvm::VPInstruction::execute(llvm::VPTransformState&) + 290
13 libLLVM.so.19.0git 0x00007f152b4fd82d llvm::VPBasicBlock::execute(llvm::VPTransformState*) + 1079
14 libLLVM.so.19.0git 0x00007f152b4fefd0 llvm::VPRegionBlock::execute(llvm::VPTransformState*) + 616
15 libLLVM.so.19.0git 0x00007f152b4fffdf llvm::VPlan::execute(llvm::VPTransformState*) + 335
16 libLLVM.so.19.0git 0x00007f152b344b3f llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*, bool, llvm::DenseMap<llvm::SCEV const*, llvm::Value*, llvm::DenseMapInfo<llvm::SCEV const*, void>, llvm::detail::DenseMapPair<llvm::SCEV const*, llvm::Value*>> const*) + 1627
17 libLLVM.so.19.0git 0x00007f152b353822 llvm::LoopVectorizePass::processLoop(llvm::Loop*) + 8712
18 libLLVM.so.19.0git 0x00007f152b354112 llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo*, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AssumptionCache&, llvm::LoopAccessInfoManager&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) + 1118
19 libLLVM.so.19.0git 0x00007f152b354464 llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) + 650
20 libLLVM.so.19.0git 0x00007f152d8885a7
21 libLLVM.so.19.0git 0x00007f1529076902 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) + 366
22 libLLVM.so.19.0git 0x00007f152cc20f61
23 libLLVM.so.19.0git 0x00007f1529075b5e llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) + 390
24 libLLVM.so.19.0git 0x00007f152cc20e91
25 libLLVM.so.19.0git 0x00007f1529076580 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) + 366
26 opt                0x0000562e2118153c llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) + 6348
27 opt                0x0000562e211531a7 optMain + 6524
28 opt                0x0000562e21150c41 main + 88
29 libc.so.6          0x00007f1525b72083 __libc_start_main + 243
30 opt                0x0000562e21150b2e _start + 46
Aborted

@fhahn
Copy link
Contributor Author

fhahn commented Apr 3, 2024

Thanks for the heads-up. The issue is related to epilogue vectorization, added additional test coverage in 399ff08

fhahn added a commit that referenced this pull request Apr 5, 2024
This reverts the revert commit 589c7ab.

This patch includes a fix for any-of reductions and epilogue
vectorization. Extra test coverage for the issue that caused the revert
has been added in 399ff08.

--------------------------------
Original commit message:

Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.

The patch incorporates feedback from https://reviews.llvm.org/D153697.

This fixes the #62565, as now there aren't multiple uses of the
start/new values.

Fixes #62565

PR: #78304
@fhahn
Copy link
Contributor Author

fhahn commented Apr 5, 2024

Recommitted with a fix, please let me know if you see any further issues!

@joanahalili
Copy link

just a heads-up , we are have a few targets breaking on our end due to a miscompile caused by this change. We will try to provide a reproducer on Monday.

@aeubanks
Copy link
Contributor

aeubanks commented Apr 15, 2024

c.ll.txt

I believe running this through the loop vectorizer results in a miscompile, although I haven't found exactly where yet. I'm trying to compare the IR after the loop-vectorizer with and without this commit.

@aeubanks
Copy link
Contributor

a bit more reduced:
reduced.ll.txt

@aeubanks
Copy link
Contributor

it looks like the C++ code getting miscompiled is

  uint32_t character_bytes = 0;
  bool all_same_length = true;
  for (const auto& value : input_data) {
    if (value.size() != input_data.front().size()) {
      all_same_length = false;
    }
    character_bytes += value.size();
  }

all_same_length is false with -fno-vectorize but true when vectorizing

@aeubanks
Copy link
Contributor

aeubanks commented Apr 15, 2024

toward the end of the IR, we branch based on all_same_length. in the good IR after loop vectorization with this patch reverted, I believe that's

%bc.merge.rdx22 = phi i1 [ true, %iter.check ], [ %rdx.select, %vec.epilog.iter.check ], [ %rdx.select20, %vec.epilog.middle.block ]

and in the bad IR I believe it's

%bc.merge.rdx21 = phi i1 [ true, %iter.check ], [ true, %vec.epilog.iter.check ], [ %rdx.select19, %vec.epilog.middle.block ]

the [ true, %vec.epilog.iter.check ] is suspicious

@aeubanks
Copy link
Contributor

much more reduced IR that exhibits the same issue:

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-grtev4-linux-gnu"

define void @barney(ptr %arg, ptr %arg1) {
bb:
  %icmp = icmp eq ptr %arg, %arg1
  br i1 %icmp, label %bb11, label %bb2

bb2:                                              ; preds = %bb
  %getelementptr = getelementptr inbounds i8, ptr %arg, i64 8
  %load = load i64, ptr %getelementptr, align 8
  br label %bb3

bb3:                                              ; preds = %bb3, %bb2
  %phi = phi i1 [ %select, %bb3 ], [ true, %bb2 ]
  %phi4 = phi ptr [ %getelementptr8, %bb3 ], [ %arg, %bb2 ]
  %getelementptr5 = getelementptr inbounds i8, ptr %phi4, i64 8
  %load6 = load i64, ptr %getelementptr5, align 8
  %icmp7 = icmp eq i64 %load6, %load
  %select = select i1 %icmp7, i1 %phi, i1 false
  %getelementptr8 = getelementptr inbounds i8, ptr %phi4, i64 16
  %icmp9 = icmp eq ptr %getelementptr8, %arg1
  br i1 %icmp9, label %bb10, label %bb3

bb10:                                             ; preds = %bb3
  br i1 %select, label %bb11, label %bb12

bb11:                                             ; preds = %bb10, %bb
  call void @spam()
  br label %bb13

bb12:                                             ; preds = %bb10
  call void @spam.1()
  br label %bb13

bb13:                                             ; preds = %bb12, %bb11
  ret void
}

declare void @spam()

declare void @spam.1()

this is roughly

void f(A* begin, A* end) {
  bool all_same_val = true;
  for (A* a : begin to end) {
    if (a.value != begin->value) {
      all_same_val = false;
    }
  }
  if (all_same_val) {
    g();
  } else {
    h();
  }
}

in the good (with this reverted), we see

%bc.merge.rdx24 = phi i1 [ true, %iter.check ], [ %rdx.select9, %vec.epilog.iter.check ], [ %rdx.select23, %vec.epilog.middle.block ]

and in the bad (at ToT) we see

%bc.merge.rdx21 = phi i1 [ true, %iter.check ], [ true, %vec.epilog.iter.check ], [ %rdx.select20, %vec.epilog.middle.block ]

@fhahn can we revert this for now?

@fhahn
Copy link
Contributor Author

fhahn commented Apr 16, 2024

@aeubanks thanks for the reproducer! Will take a look/revert tomorrow morning when I am back in the office. Feel free to revert in the meantime if it helps to unblock you.

aeubanks added a commit that referenced this pull request Apr 16, 2024
This reverts commit c6e38b9.

Causes miscompiles, see comments on #78304.
@aeubanks
Copy link
Contributor

thanks, reverted in c6e0162

fhahn added a commit that referenced this pull request May 2, 2024
This patch adds an assert to createAndCollectMergePhiForReduction to
make sure there is a resume phi when vectorizing the epilogue loop. This
is needed to set the resume value from the main vector loop.

This assertion guards against the issue caused the revert of
#78304.
fhahn added a commit that referenced this pull request May 2, 2024
fhahn added a commit that referenced this pull request May 3, 2024
This reverts the revert commit c6e0162.

This patch includes a fix for any-of reductions and epilogue
vectorization. Extra test coverage for the issue that caused the revert
has been added in bce3bfc and an assertion has been added in
c7209cb.

--------------------------------
Original commit message:

Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.

The patch incorporates feedback from https://reviews.llvm.org/D153697.

This fixes the #62565, as now there aren't multiple uses of the
start/new values.

Fixes #62565

PR: #78304
@fhahn
Copy link
Contributor Author

fhahn commented May 3, 2024

@aeubanks thanks for the reproducer, should be fixed in the recommitted version bccb7ed. Also added an assert to catch the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[LoopVectorize] Failure to compare multiple inputs as the same (splatted) value
9 participants