[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform #127180

lukel97 · 2025-02-14T08:44:21Z

This is a copy of #126177, since it was automatically and permanently closed because I messed up the source branch on my remote

This patch proposes to avoid converting widening recipes to VP intrinsics during the EVL transform.

IIUC we initially did this to avoid vl toggles on RISC-V. However we now have the RISCVVLOptimizer pass which mostly makes this redundant.

Emitting regular IR instead of VP intrinsics allows more generic optimisations, both in the middle end and DAGCombiner, and we generally have better patterns in the RISC-V backend for non-VP nodes. Sticking to regular IR instructions is likely a lot less work than reimplementing all of these optimisations for VP intrinsics.

On SPEC CPU 2017 we get noticeably better code generation:

Better matching of mixed width arithmetic:

-       vzext.vf2       v16, v14
-       vzext.vf2       v14, v15
-       vwsubu.vv       v15, v16, v14
-       vsetvli zero, zero, e32, m1, ta, ma
-       vmul.vv v14, v15, v15
+       vwsubu.vv       v16, v14, v15
+       vsetvli zero, zero, e16, mf2, ta, ma
+       vwmul.vv        v14, v16, v16

Ability to match saturating arithmetic:

@@ -6896,24 +6825,19 @@
        sub     s5, t4, t6
        add     s1, a2, t6
        vsetvli s5, s5, e32, m2, ta, ma
-       vle8.v  v8, (s1)
+       vle8.v  v10, (s1)
        add     s1, t5, t6
        sub     s0, s0, t1
-       vmv.v.i v12, 0
-       vzext.vf4       v14, v8
-       vmadd.vx        v14, s2, v10
-       vsra.vx v8, v14, s3
-       vadd.vx v14, v8, s4
-       vmsgt.vi        v0, v14, 0
-       vmsltu.vx       v8, v14, t0
-       vmerge.vim      v12, v12, -1, v0
-       vmv1r.v v0, v8
-       vmerge.vvm      v8, v12, v14, v0
-       vsetvli zero, zero, e16, m1, ta, ma
-       vnsrl.wi        v12, v8, 0
+       vzext.vf4       v12, v10
+       vmadd.vx        v12, s2, v8
+       vsra.vx v10, v12, s3
+       vadd.vx v10, v10, s4
+       vmax.vx v10, v10, zero
+       vsetvli zero, zero, e16, m1, ta, ma
+       vnclipu.wi      v12, v10, 0

Strength reduction on div:

        vluxei64.v      v10, (a6), v12
        sub     a5, a5, a4
        vmadd.vv        v14, v10, v9
-       vdiv.vx v10, v14, s1
+       vmulh.vx        v10, v14, s0
+       vsra.vi v10, v10, 12
+       vsrl.vi v11, v10, 31
+       vadd.vv v10, v10, v11

Better mask code gen:

        vand.vx v9, v10, s4
        vmsne.vi        v0, v9, 0
-       vmv.v.i v9, 0
-       vmerge.vim      v9, v9, 1, v0
-       vsetvli zero, zero, e32, m1, tu, ma
-       vadd.vv v8, v8, v9
+       sub     a4, a4, a3
+       vsetvli zero, zero, e32, m1, tu, mu
+       vadd.vi v8, v8, 1, v0.t

Less constraints on vl which allows RISCVLOptimizer to remove some vsetvlis

-       vsetvli s1, a4, e32, m1, ta, ma
+       vsetvli a4, a4, e32, m1, ta, ma
        vle64.v v10, (a5)
        sub     a2, a2, s8
        vluxei64.v      v9, (zero), v10
        vsetvli a5, zero, e64, m2, ta, ma
-       vmv.v.x v10, s1
+       vmv.v.x v10, a4
        vmsleu.vv       v12, v10, v14
-       vmsltu.vx       v10, v14, s1
+       vmsltu.vx       v10, v14, a4
        vmand.mm        v11, v8, v12
-       vsetvli zero, a4, e32, m1, ta, ma
+       vsetvli zero, zero, e32, m1, ta, ma
        vand.vx v9, v9, s6
-       vsetvli a4, zero, e32, m1, ta, ma

Narrowing of indexed load indices:

@@ -3049,18 +3049,17 @@
                                         # =>This Inner Loop Header: Depth=1
        sub     a2, a7, a1
        add     a4, t1, a1
-       vsetvli a2, a2, e64, m4, ta, ma
+       vsetvli a2, a2, e8, mf2, ta, ma
        vle8.v  v8, (a4)
-       vzext.vf8       v12, v8
-       vadd.vv v8, v12, v12
+       vwaddu.vv       v9, v8, v8
        vsetvli zero, zero, e16, m1, ta, ma
-       vluxei64.v      v12, (a3), v8
+       vluxei16.v      v8, (a3), v9
        sh2add  a4, a1, a0
        sub     a5, a5, t2
        addi    a4, a4, 128
        vsetvli zero, zero, e32, m2, ta, ma
-       vzext.vf2       v8, v12
-       vse32.v v8, (a4)
+       vzext.vf2       v10, v8
+       vse32.v v10, (a4)
        add     a1, a1, a2
        bnez    a5, .LBB16_1

Better demanded bits analysis(?)

@@ -9556,11 +9556,7 @@
        sh1add  s1, a1, a2
        sub     a5, a5, a3
        vfncvt.rtz.x.f.w        v10, v8
-       vwmulsu.vx      v8, v10, a6
-       vnsrl.wi        v10, v8, 0
-       vsrl.vi v8, v10, 8
-       vsll.vi v9, v10, 8
-       vor.vv  v8, v9, v8
+       vand.vx v8, v10, a6
        vse16.v v8, (s1)
        add     a1, a1, a0
        bnez    a5, .LBB17_83

I've removed the VPWidenEVLRecipe in this patch since it's no longer used, but if people would prefer to keep it for other targets (PPC?) then I'd be happy to add it back in and gate it under a target hook.

llvmbot · 2025-02-14T08:44:55Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

This is a copy of #126177, since it was automatically and permanently closed because I messed up the source branch on my remote

This patch proposes to avoid converting widening recipes to VP intrinsics during the EVL transform.

IIUC we initially did this to avoid vl toggles on RISC-V. However we now have the RISCVVLOptimizer pass which mostly makes this redundant.

Emitting regular IR instead of VP intrinsics allows more generic optimisations, both in the middle end and DAGCombiner, and we generally have better patterns in the RISC-V backend for non-VP nodes. Sticking to regular IR instructions is likely a lot less work than reimplementing all of these optimisations for VP intrinsics.

On SPEC CPU 2017 we get noticeably better code generation:

Better matching of mixed width arithmetic:

-       vzext.vf2       v16, v14
-       vzext.vf2       v14, v15
-       vwsubu.vv       v15, v16, v14
-       vsetvli zero, zero, e32, m1, ta, ma
-       vmul.vv v14, v15, v15
+       vwsubu.vv       v16, v14, v15
+       vsetvli zero, zero, e16, mf2, ta, ma
+       vwmul.vv        v14, v16, v16

Ability to match saturating arithmetic:

@@ -6896,24 +6825,19 @@
        sub     s5, t4, t6
        add     s1, a2, t6
        vsetvli s5, s5, e32, m2, ta, ma
-       vle8.v  v8, (s1)
+       vle8.v  v10, (s1)
        add     s1, t5, t6
        sub     s0, s0, t1
-       vmv.v.i v12, 0
-       vzext.vf4       v14, v8
-       vmadd.vx        v14, s2, v10
-       vsra.vx v8, v14, s3
-       vadd.vx v14, v8, s4
-       vmsgt.vi        v0, v14, 0
-       vmsltu.vx       v8, v14, t0
-       vmerge.vim      v12, v12, -1, v0
-       vmv1r.v v0, v8
-       vmerge.vvm      v8, v12, v14, v0
-       vsetvli zero, zero, e16, m1, ta, ma
-       vnsrl.wi        v12, v8, 0
+       vzext.vf4       v12, v10
+       vmadd.vx        v12, s2, v8
+       vsra.vx v10, v12, s3
+       vadd.vx v10, v10, s4
+       vmax.vx v10, v10, zero
+       vsetvli zero, zero, e16, m1, ta, ma
+       vnclipu.wi      v12, v10, 0

Strength reduction on div:

        vluxei64.v      v10, (a6), v12
        sub     a5, a5, a4
        vmadd.vv        v14, v10, v9
-       vdiv.vx v10, v14, s1
+       vmulh.vx        v10, v14, s0
+       vsra.vi v10, v10, 12
+       vsrl.vi v11, v10, 31
+       vadd.vv v10, v10, v11

Better mask code gen:

        vand.vx v9, v10, s4
        vmsne.vi        v0, v9, 0
-       vmv.v.i v9, 0
-       vmerge.vim      v9, v9, 1, v0
-       vsetvli zero, zero, e32, m1, tu, ma
-       vadd.vv v8, v8, v9
+       sub     a4, a4, a3
+       vsetvli zero, zero, e32, m1, tu, mu
+       vadd.vi v8, v8, 1, v0.t

Less constraints on vl which allows RISCVLOptimizer to remove some vsetvlis

-       vsetvli s1, a4, e32, m1, ta, ma
+       vsetvli a4, a4, e32, m1, ta, ma
        vle64.v v10, (a5)
        sub     a2, a2, s8
        vluxei64.v      v9, (zero), v10
        vsetvli a5, zero, e64, m2, ta, ma
-       vmv.v.x v10, s1
+       vmv.v.x v10, a4
        vmsleu.vv       v12, v10, v14
-       vmsltu.vx       v10, v14, s1
+       vmsltu.vx       v10, v14, a4
        vmand.mm        v11, v8, v12
-       vsetvli zero, a4, e32, m1, ta, ma
+       vsetvli zero, zero, e32, m1, ta, ma
        vand.vx v9, v9, s6
-       vsetvli a4, zero, e32, m1, ta, ma

I've removed the VPWidenEVLRecipe in this patch since it's no longer used, but if people would prefer to keep it for other targets (PPC?) then I'd be happy to add it back in and gate it under a target hook.

Patch is 131.42 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/127180.diff

24 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+1-52)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+2-3)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (-48)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (-30)
(modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (-1)
(modified) llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp (-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll (+7-7)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-call-intrinsics.ll (+45-23)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll (+10-10)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cond-reduction.ll (+2-2)
(added) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-div.ll (+407)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-intermediate-store.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-masked-loadstore.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reduction.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-call-intrinsics.ll (+22-22)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-cast-intrinsics.ll (+20-20)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics-reduction.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-intrinsics.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vplan-vp-select-intrinsics.ll (+2-2)

diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index fac207287e0bc..064e916d6d817 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -523,7 +523,6 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue {
     case VPRecipeBase::VPWidenGEPSC:
     case VPRecipeBase::VPWidenIntrinsicSC:
     case VPRecipeBase::VPWidenSC:
-    case VPRecipeBase::VPWidenEVLSC:
     case VPRecipeBase::VPWidenSelectSC:
     case VPRecipeBase::VPBlendSC:
     case VPRecipeBase::VPPredInstPHISC:
@@ -710,7 +709,6 @@ class VPRecipeWithIRFlags : public VPSingleDefRecipe {
   static inline bool classof(const VPRecipeBase *R) {
     return R->getVPDefID() == VPRecipeBase::VPInstructionSC ||
            R->getVPDefID() == VPRecipeBase::VPWidenSC ||
-           R->getVPDefID() == VPRecipeBase::VPWidenEVLSC ||
            R->getVPDefID() == VPRecipeBase::VPWidenGEPSC ||
            R->getVPDefID() == VPRecipeBase::VPWidenCastSC ||
            R->getVPDefID() == VPRecipeBase::VPWidenIntrinsicSC ||
@@ -1116,8 +1114,7 @@ class VPWidenRecipe : public VPRecipeWithIRFlags {
   }
 
   static inline bool classof(const VPRecipeBase *R) {
-    return R->getVPDefID() == VPRecipeBase::VPWidenSC ||
-           R->getVPDefID() == VPRecipeBase::VPWidenEVLSC;
+    return R->getVPDefID() == VPRecipeBase::VPWidenSC;
   }
 
   static inline bool classof(const VPUser *U) {
@@ -1142,54 +1139,6 @@ class VPWidenRecipe : public VPRecipeWithIRFlags {
 #endif
 };
 
-/// A recipe for widening operations with vector-predication intrinsics with
-/// explicit vector length (EVL).
-class VPWidenEVLRecipe : public VPWidenRecipe {
-  using VPRecipeWithIRFlags::transferFlags;
-
-public:
-  template <typename IterT>
-  VPWidenEVLRecipe(Instruction &I, iterator_range<IterT> Operands, VPValue &EVL)
-      : VPWidenRecipe(VPDef::VPWidenEVLSC, I, Operands) {
-    addOperand(&EVL);
-  }
-  VPWidenEVLRecipe(VPWidenRecipe &W, VPValue &EVL)
-      : VPWidenEVLRecipe(*W.getUnderlyingInstr(), W.operands(), EVL) {
-    transferFlags(W);
-  }
-
-  ~VPWidenEVLRecipe() override = default;
-
-  VPWidenRecipe *clone() override final {
-    llvm_unreachable("VPWidenEVLRecipe cannot be cloned");
-    return nullptr;
-  }
-
-  VP_CLASSOF_IMPL(VPDef::VPWidenEVLSC);
-
-  VPValue *getEVL() { return getOperand(getNumOperands() - 1); }
-  const VPValue *getEVL() const { return getOperand(getNumOperands() - 1); }
-
-  /// Produce a vp-intrinsic using the opcode and operands of the recipe,
-  /// processing EVL elements.
-  void execute(VPTransformState &State) override final;
-
-  /// Returns true if the recipe only uses the first lane of operand \p Op.
-  bool onlyFirstLaneUsed(const VPValue *Op) const override {
-    assert(is_contained(operands(), Op) &&
-           "Op must be an operand of the recipe");
-    // EVL in that recipe is always the last operand, thus any use before means
-    // the VPValue should be vectorized.
-    return getEVL() == Op;
-  }
-
-#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
-  /// Print the recipe.
-  void print(raw_ostream &O, const Twine &Indent,
-             VPSlotTracker &SlotTracker) const override final;
-#endif
-};
-
 /// VPWidenCastRecipe is a recipe to create vector cast instructions.
 class VPWidenCastRecipe : public VPRecipeWithIRFlags {
   /// Cast instruction opcode.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 71fb6d42116cf..dcd8eeecdc15b 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -245,9 +245,8 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
                 VPPartialReductionRecipe>([this](const VPRecipeBase *R) {
             return inferScalarType(R->getOperand(0));
           })
-          .Case<VPBlendRecipe, VPInstruction, VPWidenRecipe, VPWidenEVLRecipe,
-                VPReplicateRecipe, VPWidenCallRecipe, VPWidenMemoryRecipe,
-                VPWidenSelectRecipe>(
+          .Case<VPBlendRecipe, VPInstruction, VPWidenRecipe, VPReplicateRecipe,
+                VPWidenCallRecipe, VPWidenMemoryRecipe, VPWidenSelectRecipe>(
               [this](const auto *R) { return inferScalarTypeForRecipe(R); })
           .Case<VPWidenIntrinsicRecipe>([](const VPWidenIntrinsicRecipe *R) {
             return R->getResultType();
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index c84a93d7398f7..83c764b87953d 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -85,7 +85,6 @@ bool VPRecipeBase::mayWriteToMemory() const {
   case VPWidenLoadSC:
   case VPWidenPHISC:
   case VPWidenSC:
-  case VPWidenEVLSC:
   case VPWidenSelectSC: {
     const Instruction *I =
         dyn_cast_or_null<Instruction>(getVPSingleValue()->getUnderlyingValue());
@@ -131,7 +130,6 @@ bool VPRecipeBase::mayReadFromMemory() const {
   case VPWidenIntOrFpInductionSC:
   case VPWidenPHISC:
   case VPWidenSC:
-  case VPWidenEVLSC:
   case VPWidenSelectSC: {
     const Instruction *I =
         dyn_cast_or_null<Instruction>(getVPSingleValue()->getUnderlyingValue());
@@ -172,7 +170,6 @@ bool VPRecipeBase::mayHaveSideEffects() const {
   case VPWidenPHISC:
   case VPWidenPointerInductionSC:
   case VPWidenSC:
-  case VPWidenEVLSC:
   case VPWidenSelectSC: {
     const Instruction *I =
         dyn_cast_or_null<Instruction>(getVPSingleValue()->getUnderlyingValue());
@@ -1550,42 +1547,6 @@ InstructionCost VPWidenRecipe::computeCost(ElementCount VF,
   }
 }
 
-void VPWidenEVLRecipe::execute(VPTransformState &State) {
-  unsigned Opcode = getOpcode();
-  // TODO: Support other opcodes
-  if (!Instruction::isBinaryOp(Opcode) && !Instruction::isUnaryOp(Opcode))
-    llvm_unreachable("Unsupported opcode in VPWidenEVLRecipe::execute");
-
-  State.setDebugLocFrom(getDebugLoc());
-
-  assert(State.get(getOperand(0))->getType()->isVectorTy() &&
-         "VPWidenEVLRecipe should not be used for scalars");
-
-  VPValue *EVL = getEVL();
-  Value *EVLArg = State.get(EVL, /*NeedsScalar=*/true);
-  IRBuilderBase &BuilderIR = State.Builder;
-  VectorBuilder Builder(BuilderIR);
-  Value *Mask = BuilderIR.CreateVectorSplat(State.VF, BuilderIR.getTrue());
-
-  SmallVector<Value *, 4> Ops;
-  for (unsigned I = 0, E = getNumOperands() - 1; I < E; ++I) {
-    VPValue *VPOp = getOperand(I);
-    Ops.push_back(State.get(VPOp));
-  }
-
-  Builder.setMask(Mask).setEVL(EVLArg);
-  Value *VPInst =
-      Builder.createVectorInstruction(Opcode, Ops[0]->getType(), Ops, "vp.op");
-  // Currently vp-intrinsics only accept FMF flags.
-  // TODO: Enable other flags when support is added.
-  if (isa<FPMathOperator>(VPInst))
-    setFlags(cast<Instruction>(VPInst));
-
-  State.set(this, VPInst);
-  State.addMetadata(VPInst,
-                    dyn_cast_or_null<Instruction>(getUnderlyingValue()));
-}
-
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
 void VPWidenRecipe::print(raw_ostream &O, const Twine &Indent,
                           VPSlotTracker &SlotTracker) const {
@@ -1595,15 +1556,6 @@ void VPWidenRecipe::print(raw_ostream &O, const Twine &Indent,
   printFlags(O);
   printOperands(O, SlotTracker);
 }
-
-void VPWidenEVLRecipe::print(raw_ostream &O, const Twine &Indent,
-                             VPSlotTracker &SlotTracker) const {
-  O << Indent << "WIDEN ";
-  printAsOperand(O, SlotTracker);
-  O << " = vp." << Instruction::getOpcodeName(getOpcode());
-  printFlags(O);
-  printOperands(O, SlotTracker);
-}
 #endif
 
 void VPWidenCastRecipe::execute(VPTransformState &State) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 7e9ef46133936..18323a078f1ee 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1665,40 +1665,10 @@ static VPRecipeBase *createEVLRecipe(VPValue *HeaderMask,
         VPValue *NewMask = GetNewMask(S->getMask());
         return new VPWidenStoreEVLRecipe(*S, EVL, NewMask);
       })
-      .Case<VPWidenRecipe>([&](VPWidenRecipe *W) -> VPRecipeBase * {
-        unsigned Opcode = W->getOpcode();
-        if (!Instruction::isBinaryOp(Opcode) && !Instruction::isUnaryOp(Opcode))
-          return nullptr;
-        return new VPWidenEVLRecipe(*W, EVL);
-      })
       .Case<VPReductionRecipe>([&](VPReductionRecipe *Red) {
         VPValue *NewMask = GetNewMask(Red->getCondOp());
         return new VPReductionEVLRecipe(*Red, EVL, NewMask);
       })
-      .Case<VPWidenIntrinsicRecipe, VPWidenCastRecipe>(
-          [&](auto *CR) -> VPRecipeBase * {
-            Intrinsic::ID VPID;
-            if (auto *CallR = dyn_cast<VPWidenIntrinsicRecipe>(CR)) {
-              VPID =
-                  VPIntrinsic::getForIntrinsic(CallR->getVectorIntrinsicID());
-            } else {
-              auto *CastR = cast<VPWidenCastRecipe>(CR);
-              VPID = VPIntrinsic::getForOpcode(CastR->getOpcode());
-            }
-
-            // Not all intrinsics have a corresponding VP intrinsic.
-            if (VPID == Intrinsic::not_intrinsic)
-              return nullptr;
-            assert(VPIntrinsic::getMaskParamPos(VPID) &&
-                   VPIntrinsic::getVectorLengthParamPos(VPID) &&
-                   "Expected VP intrinsic to have mask and EVL");
-
-            SmallVector<VPValue *> Ops(CR->operands());
-            Ops.push_back(&AllOneMask);
-            Ops.push_back(&EVL);
-            return new VPWidenIntrinsicRecipe(
-                VPID, Ops, TypeInfo.inferScalarType(CR), CR->getDebugLoc());
-          })
       .Case<VPWidenSelectRecipe>([&](VPWidenSelectRecipe *Sel) {
         SmallVector<VPValue *> Ops(Sel->operands());
         Ops.push_back(&EVL);
diff --git a/llvm/lib/Transforms/Vectorize/VPlanValue.h b/llvm/lib/Transforms/Vectorize/VPlanValue.h
index aabc4ab571e7a..a058b2a121d59 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanValue.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanValue.h
@@ -351,7 +351,6 @@ class VPDef {
     VPWidenStoreEVLSC,
     VPWidenStoreSC,
     VPWidenSC,
-    VPWidenEVLSC,
     VPWidenSelectSC,
     VPBlendSC,
     VPHistogramSC,
diff --git a/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp b/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
index 96156de444f88..a4b309d6dcd9f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
@@ -145,10 +145,6 @@ bool VPlanVerifier::verifyEVLRecipe(const VPInstruction &EVL) const {
             [&](const VPRecipeBase *S) { return VerifyEVLUse(*S, 2); })
         .Case<VPWidenLoadEVLRecipe, VPReverseVectorPointerRecipe>(
             [&](const VPRecipeBase *R) { return VerifyEVLUse(*R, 1); })
-        .Case<VPWidenEVLRecipe>([&](const VPWidenEVLRecipe *W) {
-          return VerifyEVLUse(*W,
-                              Instruction::isUnaryOp(W->getOpcode()) ? 1 : 2);
-        })
         .Case<VPScalarCastRecipe>(
             [&](const VPScalarCastRecipe *S) { return VerifyEVLUse(*S, 0); })
         .Case<VPInstruction>([&](const VPInstruction *I) {
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll b/llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
index 14818199072c2..b96a44a546a14 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
@@ -143,8 +143,8 @@ define i32 @add_i16_i32(ptr nocapture readonly %x, i32 %n) {
 ; IF-EVL-OUTLOOP-NEXT:    [[TMP7:%.*]] = getelementptr inbounds i16, ptr [[X:%.*]], i32 [[TMP6]]
 ; IF-EVL-OUTLOOP-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i16, ptr [[TMP7]], i32 0
 ; IF-EVL-OUTLOOP-NEXT:    [[VP_OP_LOAD:%.*]] = call <vscale x 4 x i16> @llvm.vp.load.nxv4i16.p0(ptr align 2 [[TMP8]], <vscale x 4 x i1> splat (i1 true), i32 [[TMP5]])
-; IF-EVL-OUTLOOP-NEXT:    [[TMP9:%.*]] = call <vscale x 4 x i32> @llvm.vp.sext.nxv4i32.nxv4i16(<vscale x 4 x i16> [[VP_OP_LOAD]], <vscale x 4 x i1> splat (i1 true), i32 [[TMP5]])
-; IF-EVL-OUTLOOP-NEXT:    [[VP_OP:%.*]] = call <vscale x 4 x i32> @llvm.vp.add.nxv4i32(<vscale x 4 x i32> [[VEC_PHI]], <vscale x 4 x i32> [[TMP9]], <vscale x 4 x i1> splat (i1 true), i32 [[TMP5]])
+; IF-EVL-OUTLOOP-NEXT:    [[TMP9:%.*]] = sext <vscale x 4 x i16> [[VP_OP_LOAD]] to <vscale x 4 x i32>
+; IF-EVL-OUTLOOP-NEXT:    [[VP_OP:%.*]] = add <vscale x 4 x i32> [[VEC_PHI]], [[TMP9]]
 ; IF-EVL-OUTLOOP-NEXT:    [[TMP10]] = call <vscale x 4 x i32> @llvm.vp.merge.nxv4i32(<vscale x 4 x i1> splat (i1 true), <vscale x 4 x i32> [[VP_OP]], <vscale x 4 x i32> [[VEC_PHI]], i32 [[TMP5]])
 ; IF-EVL-OUTLOOP-NEXT:    [[INDEX_EVL_NEXT]] = add nuw i32 [[TMP5]], [[EVL_BASED_IV]]
 ; IF-EVL-OUTLOOP-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], [[TMP4]]
@@ -200,7 +200,7 @@ define i32 @add_i16_i32(ptr nocapture readonly %x, i32 %n) {
 ; IF-EVL-INLOOP-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i16, ptr [[X:%.*]], i32 [[TMP7]]
 ; IF-EVL-INLOOP-NEXT:    [[TMP9:%.*]] = getelementptr inbounds i16, ptr [[TMP8]], i32 0
 ; IF-EVL-INLOOP-NEXT:    [[VP_OP_LOAD:%.*]] = call <vscale x 8 x i16> @llvm.vp.load.nxv8i16.p0(ptr align 2 [[TMP9]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP6]])
-; IF-EVL-INLOOP-NEXT:    [[TMP14:%.*]] = call <vscale x 8 x i32> @llvm.vp.sext.nxv8i32.nxv8i16(<vscale x 8 x i16> [[VP_OP_LOAD]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-INLOOP-NEXT:    [[TMP14:%.*]] = sext <vscale x 8 x i16> [[VP_OP_LOAD]] to <vscale x 8 x i32>
 ; IF-EVL-INLOOP-NEXT:    [[TMP10:%.*]] = call i32 @llvm.vp.reduce.add.nxv8i32(i32 0, <vscale x 8 x i32> [[TMP14]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP6]])
 ; IF-EVL-INLOOP-NEXT:    [[TMP11]] = add i32 [[TMP10]], [[VEC_PHI]]
 ; IF-EVL-INLOOP-NEXT:    [[INDEX_EVL_NEXT]] = add nuw i32 [[TMP6]], [[EVL_BASED_IV]]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll b/llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll
index 68b36f23de4b0..ba7158eb02d90 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll
@@ -27,10 +27,10 @@ define void @truncate_to_minimal_bitwidths_widen_cast_recipe(ptr %src) {
 ; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP4]]
 ; CHECK-NEXT:    [[TMP6:%.*]] = getelementptr i8, ptr [[TMP5]], i32 0
 ; CHECK-NEXT:    [[VP_OP_LOAD:%.*]] = call <vscale x 1 x i8> @llvm.vp.load.nxv1i8.p0(ptr align 1 [[TMP6]], <vscale x 1 x i1> splat (i1 true), i32 [[TMP3]])
-; CHECK-NEXT:    [[TMP7:%.*]] = call <vscale x 1 x i16> @llvm.vp.zext.nxv1i16.nxv1i8(<vscale x 1 x i8> [[VP_OP_LOAD]], <vscale x 1 x i1> splat (i1 true), i32 [[TMP3]])
-; CHECK-NEXT:    [[VP_OP:%.*]] = call <vscale x 1 x i16> @llvm.vp.mul.nxv1i16(<vscale x 1 x i16> zeroinitializer, <vscale x 1 x i16> [[TMP7]], <vscale x 1 x i1> splat (i1 true), i32 [[TMP3]])
-; CHECK-NEXT:    [[VP_OP1:%.*]] = call <vscale x 1 x i16> @llvm.vp.lshr.nxv1i16(<vscale x 1 x i16> [[VP_OP]], <vscale x 1 x i16> trunc (<vscale x 1 x i32> splat (i32 1) to <vscale x 1 x i16>), <vscale x 1 x i1> splat (i1 true), i32 [[TMP3]])
-; CHECK-NEXT:    [[TMP8:%.*]] = call <vscale x 1 x i8> @llvm.vp.trunc.nxv1i8.nxv1i16(<vscale x 1 x i16> [[VP_OP1]], <vscale x 1 x i1> splat (i1 true), i32 [[TMP3]])
+; CHECK-NEXT:    [[TMP7:%.*]] = zext <vscale x 1 x i8> [[VP_OP_LOAD]] to <vscale x 1 x i16>
+; CHECK-NEXT:    [[TMP12:%.*]] = mul <vscale x 1 x i16> zeroinitializer, [[TMP7]]
+; CHECK-NEXT:    [[VP_OP1:%.*]] = lshr <vscale x 1 x i16> [[TMP12]], trunc (<vscale x 1 x i32> splat (i32 1) to <vscale x 1 x i16>)
+; CHECK-NEXT:    [[TMP8:%.*]] = trunc <vscale x 1 x i16> [[VP_OP1]] to <vscale x 1 x i8>
 ; CHECK-NEXT:    call void @llvm.vp.scatter.nxv1i8.nxv1p0(<vscale x 1 x i8> [[TMP8]], <vscale x 1 x ptr> align 1 zeroinitializer, <vscale x 1 x i1> splat (i1 true), i32 [[TMP3]])
 ; CHECK-NEXT:    [[TMP9:%.*]] = zext i32 [[TMP3]] to i64
 ; CHECK-NEXT:    [[INDEX_EVL_NEXT]] = add nuw i64 [[TMP9]], [[EVL_BASED_IV]]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll b/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
index 48b73c7f1a4de..03e4b661a941b 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll
@@ -44,15 +44,15 @@ define void @type_info_cache_clobber(ptr %dstv, ptr %src, i64 %wide.trip.count)
 ; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[TMP12]]
 ; CHECK-NEXT:    [[TMP14:%.*]] = getelementptr i8, ptr [[TMP13]], i32 0
 ; CHECK-NEXT:    [[VP_OP_LOAD:%.*]] = call <vscale x 8 x i8> @llvm.vp.load.nxv8i8.p0(ptr align 1 [[TMP14]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]]), !alias.scope [[META0:![0-9]+]]
-; CHECK-NEXT:    [[TMP15:%.*]] = call <vscale x 8 x i32> @llvm.vp.zext.nxv8i32.nxv8i8(<vscale x 8 x i8> [[VP_OP_LOAD]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
-; CHECK-NEXT:    [[VP_OP:%.*]] = call <vscale x 8 x i32> @llvm.vp.mul.nxv8i32(<vscale x 8 x i32> [[TMP15]], <vscale x 8 x i32> zeroinitializer, <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
-; CHECK-NEXT:    [[VP_OP2:%.*]] = call <vscale x 8 x i32> @llvm.vp.ashr.nxv8i32(<vscale x 8 x i32> [[TMP15]], <vscale x 8 x i32> zeroinitializer, <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
-; CHECK-NEXT:    [[VP_OP3:%.*]] = call <vscale x 8 x i32> @llvm.vp.or.nxv8i32(<vscale x 8 x i32> [[VP_OP2]], <vscale x 8 x i32> zeroinitializer, <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
+; CHECK-NEXT:    [[TMP15:%.*]] = zext <vscale x 8 x i8> [[VP_OP_LOAD]] to <vscale x 8 x i32>
+; CHECK-NEXT:    [[VP_OP:%.*]] = mul <vscale x 8 x i32> [[TMP15]], zeroinitializer
+; CHECK-NEXT:    [[TMP23:%.*]] = ashr <vscale x 8 x i32> [[TMP15]], zeroinitializer
+; CHECK-NEXT:    [[VP_OP3:%.*]] = or <vscale x 8 x i32> [[TMP23]], zeroinitializer
 ; CHECK-NEXT:    [[TMP16:%.*]] = icmp ult <vscale x 8 x i32> [[TMP15]], zeroinitializer
 ; CHECK-NEXT:    [[TMP17:%.*]] = call <vscale x 8 x i32> @llvm.vp.select.nxv8i32(<vscale x 8 x i1> [[TMP16]], <vscale x 8 x i32> [[VP_OP3]], <vscale x 8 x i32> zeroinitializer, i32 [[TMP11]])
-; CHECK-NEXT:    [[TMP18:%.*]] = call <vscale x 8 x i8> @llvm.vp.trunc.nxv8i8.nxv8i32(<vscale x 8 x i32> [[TMP17]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
-; CHECK-NEXT:    call void @llvm.vp.scatter.nxv8i8.nxv8p0(<vscale x 8 x i8> [[TMP18]], <vscale x 8 x ptr> align 1 [[BROADCAST_SPLAT]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]]), !alias.scope [[META3:![0-9]+]], !noalias [[META0]]
-; CHECK-NEXT:    [[TMP19:%.*]] = call <vscale x 8 x i16> @llvm.vp.trunc.nxv8i16.nxv8i32(<vscale x 8 x i32> [[VP_OP]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
+; CHECK-NEXT:    [[TMP24:%.*]] = trunc <vscale x 8 x i32> [[TMP17]] to <vscale x 8 x i8>
+; CHECK-NEXT:    call void @llvm.vp.scatter.nxv8i8.nxv8p0(<vscale x 8 x i8> [[TMP24]], <vscale x 8 x ptr> align 1 [[BROADCAST_SPLAT]], <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]]), !alias.scope [[META3:![0-9]+]], !noalias [[META0]]
+; CHECK-NEXT:    [[TMP19:%.*]] = trunc <vscale x 8 x i32> [[VP_OP]] to <vscale x 8 x i16>
 ; CHECK-NEXT:    call void @llvm.vp.scatter.nxv8i16.nxv8p0(<vscale x 8 x i16> [[TMP19]], <vscale x 8 x ptr> align 2 zeroinitializer, <vscale x 8 x i1> splat (i1 true), i32 [[TMP11]])
 ; CHECK-NEXT:    [[TMP20:%.*]] = zext i32 [[TMP11]] to i64
 ; CHECK-NEXT:    [[INDEX_EVL_NEXT]] = add i64 [[TMP20]], [[EVL_BASED_IV]]
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll
index df9ca218aad70..2c111ff674eae 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll
@@ -42,7 +42,7 @@ define void @test_and(ptr nocapture %a, ptr nocapture readonly %b) {
 ; IF-EVL-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i8, ptr [[A]], i64 [[TMP12]]
 ; IF-EVL-NEXT:    [[TMP14:%.*]] = getelementptr inbounds i8, ptr [[TMP13]], i32 0
 ; IF-EVL-NEXT:    [[VP_OP_LOAD:%.*]] = call <vs...
[truncated]

lukel97 · 2025-02-14T08:45:59Z

@Mel-Chen re #126177 (review), I've added tests in llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-div.ll

Mel-Chen

Then I don't have any other concerns.
LGTM, but please wait for other reviewers a couple of days.

alexey-bataev · 2025-02-17T15:10:55Z

DO we have the tests anywhere that VLOptimizer does what's required with instruction inEVL-vectorized loops?

lukel97 · 2025-02-17T15:25:30Z

DO we have the tests anywhere that VLOptimizer does what's required with instruction inEVL-vectorized loops?

I don't think so, but they would be good to have. I'll add some

wangpc-pp

LGTM.

LiqinWeng

LGTM

fhahn

Very happy to see this cleanup!

Code changes look good, as long as everyone involved with EVL vectorization is happy with the direction

lukel97 · 2025-02-20T05:35:45Z

@alexey-bataev I've added a test in 77ab274

I've also uploaded a gist of the changes with this PR on SPEC CPU 2017 if people are curious to see the difference: https://gist.github.com/lukel97/9bc5023226504f604cb84342ab152f39

alexey-bataev

LG

createEVLRecipe tries to optimise recipes that use the header mask by replacing them with their VP equivalents and setting the EVL, allowing the mask to be removed. However we currently also convert widened selects to vp.select even though they don't necessarily use the header mask. Unlike vp.merge a vp.select only makes the "unused" lanes past EVL poison, so it's not needed for correctness. In the same vein as llvm#127180, this patch removes the transform for VPWidenSelectRecipes and keeps them as plain select instructions to allow for more optimisations. RISCVVLOptimizer will still be able to optimise away any VL toggles and we end up with better code generation across llvm-test-suite and SPEC CPU 2017: ```diff --- build.rva23u64-evl-O3.a/MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CMakeFiles/CLAMR.dir/mesh.s +++ build.rva23u64-evl-O3.b/MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CMakeFiles/CLAMR.dir/mesh.s vmsne.vi v9, v10, 0 - vand.vi v10, v16, 1 - vmsne.vi v16, v10, 0 - vmor.mm v9, v9, v16 - vmandn.mm v0, v18, v9 + vmandn.mm v0, v16, v9 ``` ```diff --- build.rva23u64-evl-O3.a/External/SPEC/CFP2017rate/510.parest_r/CMakeFiles/510.parest_r.dir/Users/luke/Developer/cpu2017/benchspec/CPU/510.parest_r/src/source/ grid/tria.s +++ build.rva23u64-evl-O3.b/External/SPEC/CFP2017rate/510.parest_r/CMakeFiles/510.parest_r.dir/Users/luke/Developer/cpu2017/benchspec/CPU/510.parest_r/src/source/ grid/tria.s @@ -19077,8 +19077,6 @@ vmv.v.x v8, a2 vsetvli zero, zero, e32, m1, ta, ma vid.v v11 - vsetvli zero, zero, e64, m2, ta, ma - vmv.v.i v18, 0 li t0, 8 ld a4, 24(sp) # 8-byte Folded Reload .LBB37_392: # %vector.body @@ -19112,9 +19110,8 @@ vsetvli zero, zero, e32, m1, ta, ma vsra.vi v16, v16, 6 vwmaccus.vx v14, t0, v16 - vsetvli zero, zero, e64, m2, ta, ma - vmerge.vim v16, v18, -8, v0 - vadd.vv v14, v14, v16 + vsetvli zero, zero, e64, m2, ta, mu + vadd.vi v14, v14, -8, v0.t vluxei64.v v14, (zero), v14 addw a5, a5, s0 vand.vx v12, v12, s9 ``` ```diff --- build.rva23u64-evl-O3.a/MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.s +++ build.rva23u64-evl-O3.b/MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.s # Parent Loop BB494_57 Depth=1 @@ -115460,16 +115443,13 @@ sub a3, a0, a1 sh2add a4, a1, a2 vsetvli a5, a3, e32, m1, ta, ma - vle32.v v14, (a4) - vmv2r.v v10, v16 - vmsltu.vx v0, v14, s1 - vsetvli a4, zero, e32, m1, ta, ma - vwsll.vv v12, v15, v14 - vsetvli zero, a3, e64, m2, ta, mu - vnot.v v10, v12, v0.t + vle32.v v12, (a4) add a1, a1, a5 - vsetvli zero, zero, e64, m2, tu, ma - vand.vv v8, v10, v8 + vmsltu.vx v0, v12, s1 + vsetvli a4, zero, e32, m1, ta, ma + vwsll.vv v10, v13, v12 + vsetvli zero, a3, e64, m2, tu, mu + vandn.vv v8, v8, v10, v0.t ```

…orm (#146695) createEVLRecipe tries to optimise recipes that use the header mask by replacing them with their VP equivalents and setting the EVL, allowing the mask to be removed. However we currently also convert widened selects to vp.select even though they don't necessarily use the header mask. Unlike vp.merge a vp.select only makes the "unused" lanes past EVL poison, so it's not needed for correctness. In the same vein as #127180, this patch removes the transform for VPWidenSelectRecipes and keeps them as plain select instructions to allow for more optimisations. RISCVVLOptimizer will still be able to optimise away any VL toggles and we end up with better code generation across llvm-test-suite and SPEC CPU 2017.

lukel97 requested review from fhahn, arcbbb, npanchen, alexey-bataev, Mel-Chen, LiqinWeng and wangpc-pp February 14, 2025 08:44

llvmbot added vectorizers llvm:transforms labels Feb 14, 2025

lukel97 mentioned this pull request Feb 14, 2025

[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform #126177

Closed

Mel-Chen approved these changes Feb 17, 2025

View reviewed changes

wangpc-pp approved these changes Feb 18, 2025

View reviewed changes

LiqinWeng reviewed Feb 18, 2025

View reviewed changes

LiqinWeng approved these changes Feb 18, 2025

View reviewed changes

lukel97 added 4 commits February 18, 2025 20:39

Precommit tests

7896ea1

[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform

5f17380

Add llc test for codegen of EVL tail folded loop

77ab274

Remove VPWidenEVLRecipe pattern match check

f9ce240

lukel97 force-pushed the loop-vectorize/no-vp-widen branch from a37125d to f9ce240 Compare February 18, 2025 14:07

fhahn approved these changes Feb 18, 2025

View reviewed changes

LiqinWeng mentioned this pull request Feb 19, 2025

[VPlan] Use VPWidenIntrinsicRecipe to support binary and unary operations with EVL-vectorization #114205

Closed

alexey-bataev approved these changes Feb 20, 2025

View reviewed changes

Merge branch 'main' into loop-vectorize/no-vp-widen

546334c

lukel97 merged commit e23ab73 into llvm:main Feb 22, 2025
11 checks passed

Mel-Chen mentioned this pull request Mar 3, 2025

[RISCV][EVL] Improve sdiv/udiv code generation for tail folding by EVL. #129538

Open

lukel97 mentioned this pull request Jul 2, 2025

[VPlan] Don't convert VPWidenSelectRecipes to vp.select in EVL transform #146695

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform #127180

[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform #127180

Uh oh!

lukel97 commented Feb 14, 2025 •

edited

Loading

Uh oh!

llvmbot commented Feb 14, 2025 •

edited

Loading

Uh oh!

lukel97 commented Feb 14, 2025

Uh oh!

Mel-Chen left a comment •

edited

Loading

Uh oh!

alexey-bataev commented Feb 17, 2025

Uh oh!

lukel97 commented Feb 17, 2025

Uh oh!

wangpc-pp left a comment

Uh oh!

LiqinWeng left a comment

Uh oh!

fhahn left a comment

Uh oh!

lukel97 commented Feb 20, 2025

Uh oh!

alexey-bataev left a comment

Uh oh!

Uh oh!

Uh oh!

[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform #127180

[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform #127180

Uh oh!

Conversation

lukel97 commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 commented Feb 14, 2025

Uh oh!

Mel-Chen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexey-bataev commented Feb 17, 2025

Uh oh!

lukel97 commented Feb 17, 2025

Uh oh!

wangpc-pp left a comment

Choose a reason for hiding this comment

Uh oh!

LiqinWeng left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Feb 20, 2025

Uh oh!

alexey-bataev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lukel97 commented Feb 14, 2025 •

edited

Loading

llvmbot commented Feb 14, 2025 •

edited

Loading

Mel-Chen left a comment •

edited

Loading