-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[LV] Improve AnyOf reduction codegen. #78304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Update AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block. This fixes the llvm#62565, as now there aren't multiple uses of the start/new values. Fixes llvm#62565
@llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesUpdate AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block. The patch incorporates feedback from https://reviews.llvm.org/D153697. This fixes the #62565, as now there aren't multiple uses of the start/new values. Fixes #62565 Patch is 78.76 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/78304.diff 10 Files Affected:
diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index 5a1385d01d8e44..3bad7b616d9d75 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -363,15 +363,6 @@ Intrinsic::ID getMinMaxReductionIntrinsicOp(RecurKind RK);
/// Returns the comparison predicate used when expanding a min/max reduction.
CmpInst::Predicate getMinMaxReductionPredicate(RecurKind RK);
-/// See RecurrenceDescriptor::isAnyOfPattern for a description of the pattern we
-/// are trying to match. In this pattern, we are only ever selecting between two
-/// values: 1) an initial start value \p StartVal of the reduction PHI, and 2) a
-/// loop invariant value. If any of lane value in \p Left, \p Right is not equal
-/// to \p StartVal, select the loop invariant value. This is done by selecting
-/// \p Right iff \p Left is equal to \p StartVal.
-Value *createAnyOfOp(IRBuilderBase &Builder, Value *StartVal, RecurKind RK,
- Value *Left, Value *Right);
-
/// Returns a Min/Max operation corresponding to MinMaxRecurrenceKind.
/// The Builder's fast-math-flags must be set to propagate the expected values.
Value *createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value *Left,
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 59485126b280ab..c0582fb7d7e150 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -962,15 +962,6 @@ CmpInst::Predicate llvm::getMinMaxReductionPredicate(RecurKind RK) {
}
}
-Value *llvm::createAnyOfOp(IRBuilderBase &Builder, Value *StartVal,
- RecurKind RK, Value *Left, Value *Right) {
- if (auto VTy = dyn_cast<VectorType>(Left->getType()))
- StartVal = Builder.CreateVectorSplat(VTy->getElementCount(), StartVal);
- Value *Cmp =
- Builder.CreateCmp(CmpInst::ICMP_NE, Left, StartVal, "rdx.select.cmp");
- return Builder.CreateSelect(Cmp, Left, Right, "rdx.select");
-}
-
Value *llvm::createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value *Left,
Value *Right) {
Type *Ty = Left->getType();
@@ -1079,16 +1070,13 @@ Value *llvm::createAnyOfTargetReduction(IRBuilderBase &Builder, Value *Src,
NewVal = SI->getTrueValue();
}
- // Create a splat vector with the new value and compare this to the vector
- // we want to reduce.
- ElementCount EC = cast<VectorType>(Src->getType())->getElementCount();
- Value *Right = Builder.CreateVectorSplat(EC, InitVal);
- Value *Cmp =
- Builder.CreateCmp(CmpInst::ICMP_NE, Src, Right, "rdx.select.cmp");
-
// If any predicate is true it means that we want to select the new value.
- Cmp = Builder.CreateOrReduce(Cmp);
- return Builder.CreateSelect(Cmp, NewVal, InitVal, "rdx.select");
+ Value *AnyOf =
+ Src->getType()->isVectorTy() ? Builder.CreateOrReduce(Src) : Src;
+ // The compares in the loop may yield poison, which propagates through the
+ // bitwise ORs. Freeze it here before the condition is used.
+ AnyOf = Builder.CreateFreeze(AnyOf);
+ return Builder.CreateSelect(AnyOf, NewVal, InitVal, "rdx.select");
}
Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder, Value *Src,
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index a7ebf78e54ceb6..9d3ef5b96c72fb 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -68,6 +68,7 @@ class VPBuilder {
public:
VPBuilder() = default;
VPBuilder(VPBasicBlock *InsertBB) { setInsertPoint(InsertBB); }
+ VPBuilder(VPRecipeBase *InsertPt) { setInsertPoint(InsertPt); }
/// Clear the insertion point: created instructions will not be inserted into
/// a block.
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index aa5d1bfa57d535..973bcf4f6c2d7c 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7428,7 +7428,8 @@ static void createAndCollectMergePhiForReduction(
auto *PhiR = cast<VPReductionPHIRecipe>(RedResult->getOperand(0));
const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();
- TrackingVH<Value> ReductionStartValue = RdxDesc.getRecurrenceStartValue();
+ TrackingVH<Value> ReductionStartValue =
+ State.get(PhiR->getStartValue(), VPIteration(0, 0));
Value *FinalValue =
State.get(RedResult, VPIteration(State.UF - 1, VPLane::getFirstLane()));
auto *ResumePhi =
@@ -7452,7 +7453,7 @@ static void createAndCollectMergePhiForReduction(
BCBlockPhi->addIncoming(ResumePhi->getIncomingValueForBlock(Incoming),
Incoming);
else
- BCBlockPhi->addIncoming(ReductionStartValue, Incoming);
+ BCBlockPhi->addIncoming(RdxDesc.getRecurrenceStartValue(), Incoming);
}
auto *OrigPhi = cast<PHINode>(PhiR->getUnderlyingValue());
@@ -9054,6 +9055,41 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
continue;
const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();
+ // Adjust AnyOf reductions; replace the reduction phi for the selected value
+ // with a boolean reduction phi node to check if the condition is true in
+ // any iteration. The final value is selected by the final
+ // ComputeReductionResult.
+ if (RecurrenceDescriptor::isAnyOfRecurrenceKind(
+ RdxDesc.getRecurrenceKind())) {
+ auto *Select = cast<VPRecipeBase>(*find_if(PhiR->users(), [](VPUser *U) {
+ return isa<VPWidenSelectRecipe>(U) ||
+ (isa<VPReplicateRecipe>(U) &&
+ cast<VPReplicateRecipe>(U)->getUnderlyingInstr()->getOpcode() ==
+ Instruction::Select);
+ }));
+ VPValue *Cmp = Select->getOperand(0);
+ // If the compare is checking the reduction PHI node, adjust it to check
+ // the start value.
+ if (VPRecipeBase *CmpR = Cmp->getDefiningRecipe()) {
+ for (unsigned I = 0; I != CmpR->getNumOperands(); ++I)
+ if (CmpR->getOperand(I) == PhiR)
+ CmpR->setOperand(I, PhiR->getStartValue());
+ }
+ VPBuilder::InsertPointGuard Guard(Builder);
+ Builder.setInsertPoint(Select);
+
+ // If the true value of the select is the reduction phi, the new value is
+ // selected if the negated condition is true in any iteration.
+ if (Select->getOperand(1) == PhiR)
+ Cmp = Builder.createNot(Cmp);
+ VPValue *Or = Builder.createOr(PhiR, Cmp);
+ Select->getVPSingleValue()->replaceAllUsesWith(Or);
+
+ // Convert the reduction phi to operate on bools.
+ PhiR->setOperand(0, Plan->getVPValueOrAddLiveIn(ConstantInt::getFalse(
+ OrigLoop->getHeader()->getContext())));
+ }
+
// If tail is folded by masking, introduce selects between the phi
// and the live-out instruction of each reduction, at the beginning of the
// dedicated latch block.
@@ -9086,7 +9122,9 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
// then extend the loop exit value to enable InstCombine to evaluate the
// entire expression in the smaller type.
Type *PhiTy = PhiR->getStartValue()->getLiveInIRValue()->getType();
- if (MinVF.isVector() && PhiTy != RdxDesc.getRecurrenceType()) {
+ if (MinVF.isVector() && PhiTy != RdxDesc.getRecurrenceType() &&
+ !RecurrenceDescriptor::isAnyOfRecurrenceKind(
+ RdxDesc.getRecurrenceKind())) {
assert(!PhiR->isInLoop() && "Unexpected truncated inloop reduction!");
Type *RdxTy = RdxDesc.getRecurrenceType();
auto *Trunc =
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index c553e2c9e76839..f87461fd548bd7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -452,8 +452,7 @@ Value *VPInstruction::generateInstruction(VPTransformState &State,
else if (RecurrenceDescriptor::isAnyOfRecurrenceKind(RK)) {
TrackingVH<Value> ReductionStartValue =
RdxDesc.getRecurrenceStartValue();
- ReducedPartRdx = createAnyOfOp(Builder, ReductionStartValue, RK,
- ReducedPartRdx, RdxPart);
+ ReducedPartRdx = Builder.CreateOr(ReducedPartRdx, RdxPart);
} else
ReducedPartRdx = createMinMaxOp(Builder, RK, ReducedPartRdx, RdxPart);
}
@@ -461,7 +460,9 @@ Value *VPInstruction::generateInstruction(VPTransformState &State,
// Create the reduction after the loop. Note that inloop reductions create
// the target reduction in the loop using a Reduction recipe.
- if (State.VF.isVector() && !PhiR->isInLoop()) {
+ if ((State.VF.isVector() ||
+ RecurrenceDescriptor::isAnyOfRecurrenceKind(RK)) &&
+ !PhiR->isInLoop()) {
ReducedPartRdx =
createTargetReduction(Builder, RdxDesc, ReducedPartRdx, OrigPhi);
// If the reduction can be performed in a smaller type, we need to extend
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
index 1c26ee8479e578..493ffd2ca569de 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
@@ -8,13 +8,12 @@ target triple = "aarch64-linux-gnu"
define i32 @select_const_i32_from_icmp(ptr nocapture readonly %v, i64 %n) #0 {
; CHECK-VF4IC1-LABEL: @select_const_i32_from_icmp
; CHECK-VF4IC1: vector.body:
-; CHECK-VF4IC1: [[VEC_PHI:%.*]] = phi <vscale x 4 x i32> [ shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
+; CHECK-VF4IC1: [[VEC_PHI:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
; CHECK-VF4IC1: [[VEC_LOAD:%.*]] = load <vscale x 4 x i32>
; CHECK-VF4IC1-NEXT: [[VEC_ICMP:%.*]] = icmp eq <vscale x 4 x i32> [[VEC_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT: [[VEC_SEL]] = select <vscale x 4 x i1> [[VEC_ICMP]], <vscale x 4 x i32> [[VEC_PHI]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 7, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
+; CHECK-VF4IC1-NEXT: [[VEC_SEL]] = or <vscale x 4 x i1> [[VEC_PHI]], [[VEC_ICMP]]
; CHECK-VF4IC1: middle.block:
-; CHECK-VF4IC1-NEXT: [[FIN_ICMP:%.*]] = icmp ne <vscale x 4 x i32> [[VEC_SEL]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT: [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[FIN_ICMP]])
+; CHECK-VF4IC1-NEXT: [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[VEC_SEL]])
; CHECK-VF4IC1-NEXT: {{.*}} = select i1 [[OR_RDX]], i32 7, i32 3
; CHECK-VF4IC4-LABEL: @select_const_i32_from_icmp
@@ -62,20 +61,15 @@ exit: ; preds = %for.body
define i32 @select_i32_from_icmp(ptr nocapture readonly %v, i32 %a, i32 %b, i64 %n) #0 {
; CHECK-VF4IC1-LABEL: @select_i32_from_icmp
; CHECK-VF4IC1: vector.ph:
-; CHECK-VF4IC1: [[TMP1:%.*]] = insertelement <vscale x 4 x i32> poison, i32 %a, i64 0
-; CHECK-VF4IC1-NEXT: [[SPLAT_OF_A:%.*]] = shufflevector <vscale x 4 x i32> [[TMP1]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
-; CHECK-VF4IC1-NEXT: [[TMP2:%.*]] = insertelement <vscale x 4 x i32> poison, i32 %b, i64 0
-; CHECK-VF4IC1-NEXT: [[SPLAT_OF_B:%.*]] = shufflevector <vscale x 4 x i32> [[TMP2]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
+; CHECK-VF4IC1-NOT: shufflevector <vscale x 4 x i32>
+; CHECK-VF4IC1-NOT: shufflevector <vscale x 4 x i32>
; CHECK-VF4IC1: vector.body:
-; CHECK-VF4IC1: [[VEC_PHI:%.*]] = phi <vscale x 4 x i32> [ [[SPLAT_OF_A]], %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
+; CHECK-VF4IC1: [[VEC_PHI:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
; CHECK-VF4IC1: [[VEC_LOAD:%.*]] = load <vscale x 4 x i32>
; CHECK-VF4IC1-NEXT: [[VEC_ICMP:%.*]] = icmp eq <vscale x 4 x i32> [[VEC_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT: [[VEC_SEL]] = select <vscale x 4 x i1> [[VEC_ICMP]], <vscale x 4 x i32> [[VEC_PHI]], <vscale x 4 x i32> [[SPLAT_OF_B]]
+; CHECK-VF4IC1-NEXT: [[VEC_SEL]] = or <vscale x 4 x i1> [[VEC_PHI]], [[VEC_ICMP]]
; CHECK-VF4IC1: middle.block:
-; CHECK-VF4IC1-NEXT: [[FIN_INS:%.*]] = insertelement <vscale x 4 x i32> poison, i32 %a, i64 0
-; CHECK-VF4IC1-NEXT: [[FIN_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[FIN_INS]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
-; CHECK-VF4IC1-NEXT: [[FIN_CMP:%.*]] = icmp ne <vscale x 4 x i32> [[VEC_SEL]], [[FIN_SPLAT]]
-; CHECK-VF4IC1-NEXT: [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[FIN_CMP]])
+; CHECK-VF4IC1-NEXT: [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[VEC_SEL]])
; CHECK-VF4IC1-NEXT: {{.*}} = select i1 [[OR_RDX]], i32 %b, i32 %a
; CHECK-VF4IC4-LABEL: @select_i32_from_icmp
@@ -101,13 +95,12 @@ exit: ; preds = %for.body
define i32 @select_const_i32_from_fcmp(ptr nocapture readonly %v, i64 %n) #0 {
; CHECK-VF4IC1-LABEL: @select_const_i32_from_fcmp
; CHECK-VF4IC1: vector.body:
-; CHECK-VF4IC1: [[VEC_PHI:%.*]] = phi <vscale x 4 x i32> [ shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 2, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
+; CHECK-VF4IC1: [[VEC_PHI:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
; CHECK-VF4IC1: [[VEC_LOAD:%.*]] = load <vscale x 4 x float>
; CHECK-VF4IC1-NEXT: [[VEC_ICMP:%.*]] = fcmp fast ueq <vscale x 4 x float> [[VEC_LOAD]], shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 3.000000e+00, i64 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT: [[VEC_SEL]] = select <vscale x 4 x i1> [[VEC_ICMP]], <vscale x 4 x i32> [[VEC_PHI]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
+; CHECK-VF4IC1-NEXT: [[VEC_SEL]] = or <vscale x 4 x i1> [[VEC_PHI]], [[VEC_ICMP]]
; CHECK-VF4IC1: middle.block:
-; CHECK-VF4IC1-NEXT: [[FIN_ICMP:%.*]] = icmp ne <vscale x 4 x i32> [[VEC_SEL]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 2, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT: [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[FIN_ICMP]])
+; CHECK-VF4IC1-NEXT: [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[VEC_SEL]])
; CHECK-VF4IC1-NEXT: {{.*}} = select i1 [[OR_RDX]], i32 1, i32 2
; CHECK-VF4IC4-LABEL: @select_const_i32_from_fcmp
@@ -156,16 +149,15 @@ exit: ; preds = %for.body
define i32 @pred_select_const_i32_from_icmp(ptr noalias nocapture readonly %src1, ptr noalias nocapture readonly %src2, i64 %n) #0 {
; CHECK-VF4IC1-LABEL: @pred_select_const_i32_from_icmp
; CHECK-VF4IC1: vector.body:
-; CHECK-VF4IC1: [[VEC_PHI:%.*]] = phi <vscale x 4 x i32> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
+; CHECK-VF4IC1: [[VEC_PHI:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ]
; CHECK-VF4IC1: [[VEC_LOAD:%.*]] = load <vscale x 4 x i32>
; CHECK-VF4IC1: [[MASK:%.*]] = icmp sgt <vscale x 4 x i32> [[VEC_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 35, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
; CHECK-VF4IC1: [[MASKED_LOAD:%.*]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr {{%.*}}, i32 4, <vscale x 4 x i1> [[MASK]], <vscale x 4 x i32> poison)
; CHECK-VF4IC1-NEXT: [[VEC_ICMP:%.*]] = icmp eq <vscale x 4 x i32> [[MASKED_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 2, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
-; CHECK-VF4IC1-NEXT: [[VEC_SEL_TMP:%.*]] = select <vscale x 4 x i1> [[VEC_ICMP]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> [[VEC_PHI]]
-; CHECK-VF4IC1: [[VEC_SEL:%.*]] = select <vscale x 4 x i1> [[MASK]], <vscale x 4 x i32> [[VEC_SEL_TMP]], <vscale x 4 x i32> [[VEC_PHI]]
+; CHECK-VF4IC1-NEXT: [[VEC_SEL_TMP:%.*]] = or <vscale x 4 x i1> [[VEC_PHI]], [[VEC_ICMP]]
+; CHECK-VF4IC1: [[VEC_SEL:%.*]] = select <vscale x 4 x i1> [[MASK]], <vscale x 4 x i1> [[VEC_SEL_TMP]], <vscale x 4 x i1> [[VEC_PHI]]
; CHECK-VF4IC1: middle.block:
-; CHECK-VF4IC1-NEXT: [[FIN_ICMP:%.*]] = icmp ne <vscale x 4 x i32> [[VEC_SEL]], zeroinitializer
-; CHECK-VF4IC1-NEXT: [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[FIN_ICMP]])
+; CHECK-VF4IC1-NEXT: [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[VEC_SEL]])
; CHECK-VF4IC1-NEXT: {{.*}} = select i1 [[OR_RDX]], i32 1, i32 0
; CHECK-VF4IC4-LABEL: @pred_select_const_i32_from_icmp
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll b/llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll
index 34a7987bb40abe..6c50915c88e824 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll
@@ -12,25 +12,22 @@ define i32 @select_icmp(i32 %x, i32 %y, ptr nocapture readonly %c, i64 %n) #0 {
; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 %n, [[N_MOD_VF]]
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[X:%.*]], i64 0
; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
-; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <4 x i32> poison, i32 [[Y:%.*]], i64 0
-; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT1]], <4 x i32> poison, <4 x i32> zeroinitializer
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH:%.*]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
-; CHECK-NEXT: [[VEC_PHI:%.*]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP5:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT: [[VEC_PHI:%.*]] = phi <4 x i1> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP5:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[C:%.*]], i64 [[TMP0]]
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP2]], align 4
; CHECK-NEXT: [[TMP4:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
-; CHECK-NEXT: [[TMP5]] = select <4 x i1> [[TMP4]], <4 x i32> [[VEC_PHI]], <4 x i32> [[BROADCAST_SPLAT2]]
+; CHECK-NEXT: [[TMP5]] = or <4 x i1> [[VEC_PHI]], [[TMP4]]
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [...
[truncated]
|
; CHECK-VF4IC1: [[VEC_PHI:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ] | ||
; CHECK-VF4IC1: [[VEC_LOAD:%.*]] = load <vscale x 4 x i32> | ||
; CHECK-VF4IC1-NEXT: [[VEC_ICMP:%.*]] = icmp eq <vscale x 4 x i32> [[VEC_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer) | ||
; CHECK-VF4IC1-NEXT: [[VEC_SEL]] = select <vscale x 4 x i1> [[VEC_ICMP]], <vscale x 4 x i32> [[VEC_PHI]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 7, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer) | ||
; CHECK-VF4IC1-NEXT: [[VEC_SEL]] = or <vscale x 4 x i1> [[VEC_PHI]], [[VEC_ICMP]] | ||
; CHECK-VF4IC1: middle.block: | ||
; CHECK-VF4IC1-NEXT: [[FIN_ICMP:%.*]] = icmp ne <vscale x 4 x i32> [[VEC_SEL]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer) | ||
; CHECK-VF4IC1-NEXT: [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[FIN_ICMP]]) | ||
; CHECK-VF4IC1-NEXT: [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[VEC_SEL]]) | ||
; CHECK-VF4IC1-NEXT: {{.*}} = select i1 [[OR_RDX]], i32 7, i32 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad to see someone willing to address this issue. It was also in my task list, so I have some information to share with you, hoping it will be helpful. I've divided the cases of AnyOf into two types:
- True-side update
rdx = <condition> ? <loop-invariant-rhs> : rdx
It can be transformed to OR reduction:
%vmask = <all-false>
for () {
%pred = <widened-predicate>
%vmask |= %pred
}
%rdx = reduce.or %vmask
%red = select %rdx, <loop-invariant-rhs>, <initial-value>
- False-side update:
rdx = <condition> ? rdx : <loop-invariant-rhs>
It can be transformed to OR reduction with NOT condition:
%vmask = <all-false>
for () {
%pred = <widened-predicate>
%pred.not = (%pred == <all-false>)
%vmask |= %pred.not
}
%rdx = reduce.or %vmask
%red = select %rdx, <loop-invariant-rhs>, <initial-value>
Or, to AND reduction:
%vmask = <all-true>
for () {
%pred = <widened-predicate>
%vmask &= %pred
}
%rdx = reduce.and %vmask
%red = select %rdx, <initial-value>, <loop-invariant-rhs>
Take @select_const_i32_from_icmp
as an example, this should be the type 2, so there may be bugs with the current transformation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the updates in this particular test file are still stale and need updating. The current codegen generates OR (NOT ..)
, as should be visible in llvm/test/Transforms/LoopVectorize/select-cmp.ll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
But besides this case, llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll
also lacks the xor operation. I haven't thoroughly checked if other test cases have similar situations.
Could you please update it? Thank you.
ping :) |
TrackingVH<Value> ReductionStartValue = | ||
RdxDesc.getRecurrenceStartValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed. And after it is clean up, the braces can be removed too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Folded into the CreateBinaryOp above, thanks!
; CHECK-VF4IC1: [[VEC_PHI:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, %vector.ph ], [ [[VEC_SEL:%.*]], %vector.body ] | ||
; CHECK-VF4IC1: [[VEC_LOAD:%.*]] = load <vscale x 4 x i32> | ||
; CHECK-VF4IC1-NEXT: [[VEC_ICMP:%.*]] = icmp eq <vscale x 4 x i32> [[VEC_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer) | ||
; CHECK-VF4IC1-NEXT: [[VEC_SEL]] = select <vscale x 4 x i1> [[VEC_ICMP]], <vscale x 4 x i32> [[VEC_PHI]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 7, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer) | ||
; CHECK-VF4IC1-NEXT: [[VEC_SEL]] = or <vscale x 4 x i1> [[VEC_PHI]], [[VEC_ICMP]] | ||
; CHECK-VF4IC1: middle.block: | ||
; CHECK-VF4IC1-NEXT: [[FIN_ICMP:%.*]] = icmp ne <vscale x 4 x i32> [[VEC_SEL]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 3, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer) | ||
; CHECK-VF4IC1-NEXT: [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[FIN_ICMP]]) | ||
; CHECK-VF4IC1-NEXT: [[OR_RDX:%.*]] = call i1 @llvm.vector.reduce.or.nxv4i1(<vscale x 4 x i1> [[VEC_SEL]]) | ||
; CHECK-VF4IC1-NEXT: {{.*}} = select i1 [[OR_RDX]], i32 7, i32 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
But besides this case, llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll
also lacks the xor operation. I haven't thoroughly checked if other test cases have similar situations.
Could you please update it? Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for following up on this!
Src->getType()->isVectorTy() ? Builder.CreateOrReduce(Src) : Src; | ||
// The compares in the loop may yield poison, which propagates through the | ||
// bitwise ORs. Freeze it here before the condition is used. | ||
AnyOf = Builder.CreateFreeze(AnyOf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further refactoring thought: the AnyOf boolean reduction completes with the CreateOrReduce(), and freezing it. Similar to a plain "result |= value[i]" OR reduction, on i1 values, where freezing may also be needed? The subsequent select deserves a separate recipe, which post-processes the result of the AnyOf reduction and also depends on the "IfAny" and "Else" (or "Start" and "Other") Live-In VP/Values directly, rather than looking for them here and now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to a plain "result |= value[i]" OR reduction,
Freeze won't be needed in that case; if there's already a binary OR in the input, poison from the compare gets already propagated. It is only needed when converting from the select
form (which doesn't propagate poison from the condition to its result)
Yes, but this will need a bit of additional refactoring, in particular how createAndCollectMergePhiForReduction
looks up the reduction result value and when ComputeReductionResult
VPInstructions are created.
@@ -7486,7 +7486,8 @@ static void createAndCollectMergePhiForReduction( | |||
auto *PhiR = cast<VPReductionPHIRecipe>(RedResult->getOperand(0)); | |||
const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor(); | |||
|
|||
TrackingVH<Value> ReductionStartValue = RdxDesc.getRecurrenceStartValue(); | |||
TrackingVH<Value> ReductionStartValue = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ReductionStartValue still used, after inlining it below? Why/is this change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed any longer, removed, thanks!
@@ -9110,6 +9111,41 @@ void LoopVectorizationPlanner::adjustRecipesForReductions( | |||
continue; | |||
|
|||
const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor(); | |||
// Adjust AnyOf reductions; replace the reduction phi for the selected value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adjustRecipesForReductions() is getting excessively long, should be refactored. Its documentation above should (also) include the adjustment of AnyOf reductions described here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, probably best as followup?
(isa<VPReplicateRecipe>(U) && | ||
cast<VPReplicateRecipe>(U)->getUnderlyingInstr()->getOpcode() == | ||
Instruction::Select); | ||
})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: assert(Select && "a meaningful error message");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast
should already assert to check for non-null.
if (MinVF.isVector() && PhiTy != RdxDesc.getRecurrenceType()) { | ||
if (MinVF.isVector() && PhiTy != RdxDesc.getRecurrenceType() && | ||
!RecurrenceDescriptor::isAnyOfRecurrenceKind( | ||
RdxDesc.getRecurrenceKind())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be it better to truncate AnyOf reductions to smaller type (boolean) here instead of above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only reach this path now because we adjust PhiR's start value to a bool. It requires more than plain truncates as below, so I think it's probably worth to keep it separate. I think it also needs handling before introducing selects for tail-folding; otherwise those selects would also need updating.
@@ -453,16 +453,17 @@ Value *VPInstruction::generateInstruction(VPTransformState &State, | |||
else if (RecurrenceDescriptor::isAnyOfRecurrenceKind(RK)) { | |||
TrackingVH<Value> ReductionStartValue = | |||
RdxDesc.getRecurrenceStartValue(); | |||
ReducedPartRdx = createAnyOfOp(Builder, ReductionStartValue, RK, | |||
ReducedPartRdx, RdxPart); | |||
ReducedPartRdx = Builder.CreateOr(ReducedPartRdx, RdxPart); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should Op
be set to OR and treated by CreateBinOp() above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted, thanks!
if (State.VF.isVector() && !PhiR->isInLoop()) { | ||
if ((State.VF.isVector() || | ||
RecurrenceDescriptor::isAnyOfRecurrenceKind(RK)) && | ||
!PhiR->isInLoop()) { | ||
ReducedPartRdx = | ||
createTargetReduction(Builder, RdxDesc, ReducedPartRdx, OrigPhi); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As raised above, it may be better to have ComputeReductionResult recipe take care only of reducing AnyOf to a boolean here, followed by a Select recipe to chose between Start and Other live-in values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but this will need a bit of additional refactoring, in particular how createAndCollectMergePhiForReduction looks up the reduction result value and when ComputeReductionResult VPInstructions are created.
})); | ||
VPValue *Cmp = Select->getOperand(0); | ||
// If the compare is checking the reduction PHI node, adjust it to check | ||
// the start value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, either the true or false values of the select should use PhiR, but only it? The condition of an AnyOf reduction should be any boolean-expression predicate that depends on the current iteration only, i.e., should be a recipe (if it's live-in the entire select should be LICM'd), independent of AnyOf's PhiR, neither directly nor indirectly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, AnyOf reduction are also formed for code like
define i32 @select_i32_from_icmp_same_inputs(i32 %a, i32 %b, i64 %n) {
entry:
br label %for.body
for.body: ; preds = %entry, %for.body
%0 = phi i64 [ 0, %entry ], [ %4, %for.body ]
%1 = phi i32 [ %a, %entry ], [ %3, %for.body ]
%2 = icmp eq i32 %1, 3
%3 = select i1 %2, i32 %1, i32 %b
%4 = add nuw nsw i64 %0, 1
%5 = icmp eq i64 %4, %n
br i1 %5, label %exit, label %for.body
exit: ; preds = %for.body
ret i32 %3
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ayalz The current recognition for AnyOf reduction involves a set of cmp- select instruction and cannot identify cases with only a single select. This requires expanding isAnyOfPattern
or applying conditional reduction to model the AnyOf idiom for support.
// If the true value of the select is the reduction phi, the new value is | ||
// selected if the negated condition is true in any iteration. | ||
if (Select->getOperand(1) == PhiR) | ||
Cmp = Builder.createNot(Cmp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Mel pointed out, this case of "Red = cond ? PhiR : Other" where a single false cond suffices for the result to be Other, could be considered an AllOf
reduction starting with true using createAnd() instead of createOr(), resulting in "AND(conds) ? Start : Other".
Negating cond swaps the operands and translates into the "Red = !cond ? Other : PhiR" form of an AnyOf
reduction starting with false and using createOr(), where a single true !cond suffices for the result to be Other, resulting in "OR(!conds) ? Other : Start".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this could be handled like that, but I think then we would need to support both patterns here and also in codegen, so negating seems simpler (and the negation should be removable by instcombine). Left as is for now, but happy to adjust if needed. But then it should probably be modeled as AllOf directly in the reduction descriptor.
ping :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@fhahn I'm not certain (and I haven't been able to repro as I don't have a LTO stage2 build) but do you think this patch could be causing the failure on https://lab.llvm.org/buildbot/#/builders/67/builds/14877 ? |
Broke sanitizer bots: https://lab.llvm.org/buildbot/#/builders/74/builds/26697 This reverts commit 95fef1d.
@fhahn reverted because of sanitizer bot failure. There is a stack trace here if it helps: https://lab.llvm.org/buildbot/#/builders/74/builds/26697/steps/9/logs/stdio |
@fhahn Hi, this patch has been reverted, but before it we got a failure in our internal testing. define void @test(i64 %limit, i8 %guard) {
entry:
br label %pre
pre: ; preds = %loop, %entry
%iv0 = phi i8 [ 0, %entry ], [ %select, %loop ]
%cmp = icmp sgt i8 %iv0, %guard
br label %loop
loop: ; preds = %loop, %pre
%iv1 = phi i64 [ %iv1.next, %loop ], [ 0, %pre ]
%iv2 = phi i8 [ %select, %loop ], [ %iv0, %pre ]
%select = select i1 %cmp, i8 %iv2, i8 0
%iv1.next = add nuw nsw i64 %iv1, 1
%check = icmp slt i64 %iv1.next, %limit
br i1 %check, label %loop, label %pre
} To reproduce, take this patch and run The crash backtrace:
|
Thanks for the heads-up. The issue is related to epilogue vectorization, added additional test coverage in 399ff08 |
This reverts the revert commit 589c7ab. This patch includes a fix for any-of reductions and epilogue vectorization. Extra test coverage for the issue that caused the revert has been added in 399ff08. -------------------------------- Original commit message: Update AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block. The patch incorporates feedback from https://reviews.llvm.org/D153697. This fixes the #62565, as now there aren't multiple uses of the start/new values. Fixes #62565 PR: #78304
Recommitted with a fix, please let me know if you see any further issues! |
just a heads-up , we are have a few targets breaking on our end due to a miscompile caused by this change. We will try to provide a reproducer on Monday. |
I believe running this through the loop vectorizer results in a miscompile, although I haven't found exactly where yet. I'm trying to compare the IR after the loop-vectorizer with and without this commit. |
a bit more reduced: |
it looks like the C++ code getting miscompiled is
|
toward the end of the IR, we branch based on
and in the bad IR I believe it's
the |
much more reduced IR that exhibits the same issue:
this is roughly
in the good (with this reverted), we see
and in the bad (at ToT) we see
@fhahn can we revert this for now? |
@aeubanks thanks for the reproducer! Will take a look/revert tomorrow morning when I am back in the office. Feel free to revert in the meantime if it helps to unblock you. |
thanks, reverted in c6e0162 |
This patch adds an assert to createAndCollectMergePhiForReduction to make sure there is a resume phi when vectorizing the epilogue loop. This is needed to set the resume value from the main vector loop. This assertion guards against the issue caused the revert of #78304.
Additional test case from #78304.
This reverts the revert commit c6e0162. This patch includes a fix for any-of reductions and epilogue vectorization. Extra test coverage for the issue that caused the revert has been added in bce3bfc and an assertion has been added in c7209cb. -------------------------------- Original commit message: Update AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block. The patch incorporates feedback from https://reviews.llvm.org/D153697. This fixes the #62565, as now there aren't multiple uses of the start/new values. Fixes #62565 PR: #78304
Update AnyOf reduction code generation to only keep track of the AnyOf property in a boolean vector in the loop, only selecting either the new or start value in the middle block.
The patch incorporates feedback from https://reviews.llvm.org/D153697.
This fixes the #62565, as now there aren't multiple uses of the start/new values.
Fixes #62565