Skip to content

[RISCV][GISEL] Legalize G_INSERT_SUBVECTOR #108859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 21, 2024

Conversation

michaelmaitland
Copy link
Contributor

This code is heavily based on the SelectionDAG lowerINSERT_SUBVECTOR code.

@llvmbot
Copy link
Member

llvmbot commented Sep 16, 2024

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-risc-v

Author: Michael Maitland (michaelmaitland)

Changes

This code is heavily based on the SelectionDAG lowerINSERT_SUBVECTOR code.


Patch is 33.04 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/108859.diff

4 Files Affected:

  • (modified) llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.cpp (+156)
  • (modified) llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.h (+1)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrGISel.td (+18)
  • (added) llvm/test/CodeGen/RISCV/GlobalISel/legalizer/rvv/legalize-insert-subvector.mir (+402)
diff --git a/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.cpp b/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.cpp
index 8de908d20d245c..ff9ebac02b0dc2 100644
--- a/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.cpp
+++ b/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.cpp
@@ -581,6 +581,12 @@ RISCVLegalizerInfo::RISCVLegalizerInfo(const RISCVSubtarget &ST)
 
   SplatActions.clampScalar(1, sXLen, sXLen);
 
+  getActionDefinitionsBuilder(G_INSERT_SUBVECTOR)
+      .customIf(all(typeIsLegalBoolVec(0, BoolVecTys, ST),
+                    typeIsLegalBoolVec(1, BoolVecTys, ST)))
+      .customIf(all(typeIsLegalIntOrFPVec(0, IntOrFPVecTys, ST),
+                    typeIsLegalIntOrFPVec(1, IntOrFPVecTys, ST)));
+
   getLegacyLegalizerInfo().computeTables();
 }
 
@@ -915,6 +921,154 @@ bool RISCVLegalizerInfo::legalizeSplatVector(MachineInstr &MI,
   return true;
 }
 
+static LLT getLMUL1Ty(LLT VecTy) {
+  assert(VecTy.getElementType().getSizeInBits() <= 64 &&
+         "Unexpected vector LLT");
+  return LLT::scalable_vector(RISCV::RVVBitsPerBlock /
+                                  VecTy.getElementType().getSizeInBits(),
+                              VecTy.getElementType());
+}
+
+bool RISCVLegalizerInfo::legalizeInsertSubvector(MachineInstr &MI,
+                                                 MachineIRBuilder &MIB) const {
+  assert(MI.getOpcode() == TargetOpcode::G_INSERT_SUBVECTOR);
+
+  MachineRegisterInfo &MRI = *MIB.getMRI();
+
+  Register Dst = MI.getOperand(0).getReg();
+  Register Src1 = MI.getOperand(1).getReg();
+  Register Src2 = MI.getOperand(2).getReg();
+  uint64_t Idx = MI.getOperand(3).getImm();
+
+  LLT BigTy = MRI.getType(Src1);
+  LLT LitTy = MRI.getType(Src2);
+  Register BigVec = Src1;
+  Register LitVec = Src2;
+
+  // We don't have the ability to slide mask vectors up indexed by their i1
+  // elements; the smallest we can do is i8. Often we are able to bitcast to
+  // equivalent i8 vectors. Otherwise, we can must zeroextend to equivalent i8
+  // vectors and truncate down after the insert.
+  if (LitTy.getElementType() == LLT::scalar(1) &&
+      (Idx != 0 ||
+       MRI.getVRegDef(BigVec)->getOpcode() != TargetOpcode::G_IMPLICIT_DEF)) {
+    auto BigTyMinElts = BigTy.getElementCount().getKnownMinValue();
+    auto LitTyMinElts = LitTy.getElementCount().getKnownMinValue();
+    if (BigTyMinElts >= 8 && LitTyMinElts >= 8) {
+      assert(Idx % 8 == 0 && "Invalid index");
+      assert(BigTyMinElts % 8 == 0 && LitTyMinElts % 8 == 0 &&
+             "Unexpected mask vector lowering");
+      Idx /= 8;
+      BigTy = LLT::vector(BigTy.getElementCount().divideCoefficientBy(8), 8);
+      LitTy = LLT::vector(LitTy.getElementCount().divideCoefficientBy(8), 8);
+      BigVec = MIB.buildBitcast(BigTy, BigVec).getReg(0);
+      LitVec = MIB.buildBitcast(LitTy, LitVec).getReg(0);
+    } else {
+      // We can't slide this mask vector up indexed by its i1 elements.
+      // This poses a problem when we wish to insert a scalable vector which
+      // can't be re-expressed as a larger type. Just choose the slow path and
+      // extend to a larger type, then truncate back down.
+      LLT ExtBigTy = BigTy.changeElementType(LLT::scalar(8));
+      LLT ExtLitTy = LitTy.changeElementType(LLT::scalar(8));
+      auto BigZExt = MIB.buildZExt(ExtBigTy, BigVec);
+      auto LitZExt = MIB.buildZExt(ExtLitTy, LitVec);
+      auto Insert = MIB.buildInsertSubvector(ExtBigTy, BigZExt, LitZExt, Idx);
+      auto SplatZero = MIB.buildSplatVector(
+          ExtBigTy, MIB.buildConstant(ExtBigTy.getElementType(), 0));
+      MIB.buildICmp(CmpInst::Predicate::ICMP_NE, Dst, Insert, SplatZero);
+      MI.eraseFromParent();
+      return true;
+    }
+  }
+
+  const RISCVRegisterInfo *TRI = STI.getRegisterInfo();
+  MVT LitTyMVT = getMVTForLLT(LitTy);
+  unsigned SubRegIdx, RemIdx;
+  std::tie(SubRegIdx, RemIdx) =
+      RISCVTargetLowering::decomposeSubvectorInsertExtractToSubRegs(
+          getMVTForLLT(BigTy), LitTyMVT, Idx, TRI);
+
+  RISCVII::VLMUL SubVecLMUL = RISCVTargetLowering::getLMUL(getMVTForLLT(LitTy));
+  bool IsSubVecPartReg = SubVecLMUL == RISCVII::VLMUL::LMUL_F2 ||
+                         SubVecLMUL == RISCVII::VLMUL::LMUL_F4 ||
+                         SubVecLMUL == RISCVII::VLMUL::LMUL_F8;
+
+  // If the Idx has been completely eliminated and this subvector's size is a
+  // vector register or a multiple thereof, or the surrounding elements are
+  // undef, then this is a subvector insert which naturally aligns to a vector
+  // register. These can easily be handled using subregister manipulation.
+  if (RemIdx == 0 && (!IsSubVecPartReg || MRI.getVRegDef(Src1)->getOpcode() ==
+                                              TargetOpcode::G_IMPLICIT_DEF))
+    return true;
+
+  // If the subvector is smaller than a vector register, then the insertion
+  // must preserve the undisturbed elements of the register. We do this by
+  // lowering to an EXTRACT_SUBVECTOR grabbing the nearest LMUL=1 vector type
+  // (which resolves to a subregister copy), performing a VSLIDEUP to place the
+  // subvector within the vector register, and an INSERT_SUBVECTOR of that
+  // LMUL=1 type back into the larger vector (resolving to another subregister
+  // operation). See below for how our VSLIDEUP works. We go via a LMUL=1 type
+  // to avoid allocating a large register group to hold our subvector.
+
+  // VSLIDEUP works by leaving elements 0<i<OFFSET undisturbed, elements
+  // OFFSET<=i<VL set to the "subvector" and vl<=i<VLMAX set to the tail policy
+  // (in our case undisturbed). This means we can set up a subvector insertion
+  // where OFFSET is the insertion offset, and the VL is the OFFSET plus the
+  // size of the subvector.
+  const LLT XLenTy(STI.getXLenVT());
+  LLT InterLitTy = BigTy;
+  Register AlignedExtract = Src1;
+  unsigned AlignedIdx = Idx - RemIdx;
+  if (TypeSize::isKnownGT(BigTy.getSizeInBits(),
+                          getLMUL1Ty(BigTy).getSizeInBits())) {
+    InterLitTy = getLMUL1Ty(BigTy);
+    // Extract a subvector equal to the nearest full vector register type. This
+    // should resolve to a G_EXTRACT on a subreg.
+    AlignedExtract =
+        MIB.buildExtractSubvector(InterLitTy, BigVec, AlignedIdx).getReg(0);
+  }
+
+  auto Insert = MIB.buildInsertSubvector(InterLitTy, MIB.buildUndef(InterLitTy),
+                                         LitVec, 0);
+
+  auto [Mask, _] = buildDefaultVLOps(BigTy, MIB, MRI);
+  auto VL = MIB.buildVScale(XLenTy, LitTy.getElementCount().getKnownMinValue());
+
+  // Use tail agnostic policy if we're inserting over InterLitTy's tail.
+  ElementCount EndIndex =
+      ElementCount::getScalable(RemIdx) + LitTy.getElementCount();
+  uint64_t Policy = RISCVII::TAIL_UNDISTURBED_MASK_UNDISTURBED;
+  if (EndIndex == InterLitTy.getElementCount())
+    Policy = RISCVII::TAIL_AGNOSTIC;
+
+  // If we're inserting into the lowest elements, use a tail undisturbed
+  // vmv.v.v.
+  MachineInstrBuilder Inserted;
+  if (RemIdx == 0) {
+    Inserted = MIB.buildInstr(RISCV::G_VMV_V_V_VL, {InterLitTy},
+                              {AlignedExtract, Insert, VL});
+  } else {
+    auto SlideupAmt = MIB.buildVScale(XLenTy, RemIdx);
+    // Construct the vector length corresponding to RemIdx + length(LitTy).
+    VL = MIB.buildAdd(XLenTy, SlideupAmt, VL);
+    Inserted =
+        MIB.buildInstr(RISCV::G_VSLIDEUP_VL, {InterLitTy},
+                       {AlignedExtract, LitVec, SlideupAmt, Mask, VL, Policy});
+  }
+
+  // If required, insert this subvector back into the correct vector register.
+  // This should resolve to an INSERT_SUBREG instruction.
+  if (TypeSize::isKnownGT(BigTy.getSizeInBits(), InterLitTy.getSizeInBits()))
+    Inserted = MIB.buildInsert(BigTy, BigVec, LitVec, AlignedIdx);
+
+  // We might have bitcast from a mask type: cast back to the original type if
+  // required.
+  MIB.buildBitcast(Dst, Inserted);
+
+  MI.eraseFromParent();
+  return true;
+}
+
 bool RISCVLegalizerInfo::legalizeCustom(
     LegalizerHelper &Helper, MachineInstr &MI,
     LostDebugLocObserver &LocObserver) const {
@@ -985,6 +1139,8 @@ bool RISCVLegalizerInfo::legalizeCustom(
     return legalizeExt(MI, MIRBuilder);
   case TargetOpcode::G_SPLAT_VECTOR:
     return legalizeSplatVector(MI, MIRBuilder);
+  case TargetOpcode::G_INSERT_SUBVECTOR:
+    return legalizeInsertSubvector(MI, MIRBuilder);
   case TargetOpcode::G_LOAD:
   case TargetOpcode::G_STORE:
     return legalizeLoadStore(MI, Helper, MIRBuilder);
diff --git a/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.h b/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.h
index 2fc28615e7630d..ccd8ac9fe4ec90 100644
--- a/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.h
+++ b/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.h
@@ -46,6 +46,7 @@ class RISCVLegalizerInfo : public LegalizerInfo {
   bool legalizeVScale(MachineInstr &MI, MachineIRBuilder &MIB) const;
   bool legalizeExt(MachineInstr &MI, MachineIRBuilder &MIRBuilder) const;
   bool legalizeSplatVector(MachineInstr &MI, MachineIRBuilder &MIB) const;
+  bool legalizeInsertSubvector(MachineInstr &MI, MachineIRBuilder &MIB) const;
   bool legalizeLoadStore(MachineInstr &MI, LegalizerHelper &Helper,
                          MachineIRBuilder &MIB) const;
 };
diff --git a/llvm/lib/Target/RISCV/RISCVInstrGISel.td b/llvm/lib/Target/RISCV/RISCVInstrGISel.td
index ba40662c49c1df..948167023d50a2 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrGISel.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrGISel.td
@@ -57,3 +57,21 @@ def G_SPLAT_VECTOR_SPLIT_I64_VL : RISCVGenericInstruction {
   let InOperandList = (ins type0:$passthru, type1:$hi, type1:$lo, type2:$vl);
   let hasSideEffects = false;
 }
+
+// Pseudo equivalent to a RISCVISD::VMV_V_V_VL
+def G_VMV_V_V_VL : RISCVGenericInstruction {
+  let OutOperandList = (outs type0:$dst);
+  let InOperandList = (ins type0:$vec, type2:$vl);
+  let hasSideEffects = false;
+}
+def : GINodeEquiv<G_VMV_V_V_VL, riscv_vmv_v_v_vl>;
+
+// Pseudo equivalent to a RISCVISD::VSLIDEUP_VL
+def G_VSLIDEUP_VL : RISCVGenericInstruction {
+  let OutOperandList = (outs type0:$dst);
+  let InOperandList = (ins type0:$merge, type0:$vec, type1:$idx, type2:$mask,
+                       type3:$vl, type4:$policy);
+  let hasSideEffects = false;
+}
+def : GINodeEquiv<G_VSLIDEUP_VL, riscv_slideup_vl>;
+
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/rvv/legalize-insert-subvector.mir b/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/rvv/legalize-insert-subvector.mir
new file mode 100644
index 00000000000000..a5f1228b8f8ca6
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/rvv/legalize-insert-subvector.mir
@@ -0,0 +1,402 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=riscv64 -mattr=+v -run-pass=legalizer %s -o - | FileCheck %s -check-prefixes=CHECK,RV32
+# RUN: llc -mtriple=riscv32 -mattr=+v -run-pass=legalizer %s -o - | FileCheck %s -check-prefixes=CHECK,RV64
+
+# Special handling for i1-element vectors with non-zero index
+---
+name:            insert_subvector_nxv2i1_nxv4i1
+legalized:       false
+tracksRegLiveness: true
+body:             |
+  bb.0.entry:
+    ; RV32-LABEL: name: insert_subvector_nxv2i1_nxv4i1
+    ; RV32: [[DEF:%[0-9]+]]:_(<vscale x 4 x s1>) = G_IMPLICIT_DEF
+    ; RV32-NEXT: [[DEF1:%[0-9]+]]:_(<vscale x 2 x s1>) = G_IMPLICIT_DEF
+    ; RV32-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; RV32-NEXT: [[ANYEXT:%[0-9]+]]:_(s64) = G_ANYEXT [[C]](s32)
+    ; RV32-NEXT: [[SPLAT_VECTOR:%[0-9]+]]:_(<vscale x 4 x s8>) = G_SPLAT_VECTOR [[ANYEXT]](s64)
+    ; RV32-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+    ; RV32-NEXT: [[ANYEXT1:%[0-9]+]]:_(s64) = G_ANYEXT [[C1]](s32)
+    ; RV32-NEXT: [[SPLAT_VECTOR1:%[0-9]+]]:_(<vscale x 4 x s8>) = G_SPLAT_VECTOR [[ANYEXT1]](s64)
+    ; RV32-NEXT: [[SELECT:%[0-9]+]]:_(<vscale x 4 x s8>) = G_SELECT [[DEF]](<vscale x 4 x s1>), [[SPLAT_VECTOR1]], [[SPLAT_VECTOR]]
+    ; RV32-NEXT: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; RV32-NEXT: [[ANYEXT2:%[0-9]+]]:_(s64) = G_ANYEXT [[C2]](s32)
+    ; RV32-NEXT: [[SPLAT_VECTOR2:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SPLAT_VECTOR [[ANYEXT2]](s64)
+    ; RV32-NEXT: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+    ; RV32-NEXT: [[ANYEXT3:%[0-9]+]]:_(s64) = G_ANYEXT [[C3]](s32)
+    ; RV32-NEXT: [[SPLAT_VECTOR3:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SPLAT_VECTOR [[ANYEXT3]](s64)
+    ; RV32-NEXT: [[SELECT1:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SELECT [[DEF1]](<vscale x 2 x s1>), [[SPLAT_VECTOR3]], [[SPLAT_VECTOR2]]
+    ; RV32-NEXT: [[VMSET_VL:%[0-9]+]]:_(<vscale x 4 x s1>) = G_VMSET_VL $x0
+    ; RV32-NEXT: [[READ_VLENB:%[0-9]+]]:_(s64) = G_READ_VLENB
+    ; RV32-NEXT: [[C4:%[0-9]+]]:_(s64) = G_CONSTANT i64 2
+    ; RV32-NEXT: [[LSHR:%[0-9]+]]:_(s64) = G_LSHR [[READ_VLENB]], [[C4]](s64)
+    ; RV32-NEXT: [[READ_VLENB1:%[0-9]+]]:_(s64) = G_READ_VLENB
+    ; RV32-NEXT: [[C5:%[0-9]+]]:_(s64) = G_CONSTANT i64 2
+    ; RV32-NEXT: [[LSHR1:%[0-9]+]]:_(s64) = G_LSHR [[READ_VLENB1]], [[C5]](s64)
+    ; RV32-NEXT: [[ADD:%[0-9]+]]:_(s64) = G_ADD [[LSHR1]], [[LSHR]]
+    ; RV32-NEXT: [[VSLIDEUP_VL:%[0-9]+]]:_(<vscale x 4 x s8>) = G_VSLIDEUP_VL [[SELECT]], [[SELECT1]], [[LSHR1]](s64), [[VMSET_VL]](<vscale x 4 x s1>), [[ADD]](s64), 1
+    ; RV32-NEXT: [[BITCAST:%[0-9]+]]:_(<vscale x 4 x s8>) = G_BITCAST [[VSLIDEUP_VL]](<vscale x 4 x s8>)
+    ; RV32-NEXT: [[C6:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; RV32-NEXT: [[ANYEXT4:%[0-9]+]]:_(s64) = G_ANYEXT [[C6]](s32)
+    ; RV32-NEXT: [[SPLAT_VECTOR4:%[0-9]+]]:_(<vscale x 4 x s8>) = G_SPLAT_VECTOR [[ANYEXT4]](s64)
+    ; RV32-NEXT: [[ICMP:%[0-9]+]]:_(<vscale x 4 x s1>) = G_ICMP intpred(ne), [[BITCAST]](<vscale x 4 x s8>), [[SPLAT_VECTOR4]]
+    ; RV32-NEXT: $v8 = COPY [[ICMP]](<vscale x 4 x s1>)
+    ; RV32-NEXT: PseudoRET implicit $v8
+    ;
+    ; RV64-LABEL: name: insert_subvector_nxv2i1_nxv4i1
+    ; RV64: [[DEF:%[0-9]+]]:_(<vscale x 4 x s1>) = G_IMPLICIT_DEF
+    ; RV64-NEXT: [[DEF1:%[0-9]+]]:_(<vscale x 2 x s1>) = G_IMPLICIT_DEF
+    ; RV64-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; RV64-NEXT: [[SPLAT_VECTOR:%[0-9]+]]:_(<vscale x 4 x s8>) = G_SPLAT_VECTOR [[C]](s32)
+    ; RV64-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+    ; RV64-NEXT: [[SPLAT_VECTOR1:%[0-9]+]]:_(<vscale x 4 x s8>) = G_SPLAT_VECTOR [[C1]](s32)
+    ; RV64-NEXT: [[SELECT:%[0-9]+]]:_(<vscale x 4 x s8>) = G_SELECT [[DEF]](<vscale x 4 x s1>), [[SPLAT_VECTOR1]], [[SPLAT_VECTOR]]
+    ; RV64-NEXT: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; RV64-NEXT: [[SPLAT_VECTOR2:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SPLAT_VECTOR [[C2]](s32)
+    ; RV64-NEXT: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+    ; RV64-NEXT: [[SPLAT_VECTOR3:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SPLAT_VECTOR [[C3]](s32)
+    ; RV64-NEXT: [[SELECT1:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SELECT [[DEF1]](<vscale x 2 x s1>), [[SPLAT_VECTOR3]], [[SPLAT_VECTOR2]]
+    ; RV64-NEXT: [[VMSET_VL:%[0-9]+]]:_(<vscale x 4 x s1>) = G_VMSET_VL $x0
+    ; RV64-NEXT: [[READ_VLENB:%[0-9]+]]:_(s32) = G_READ_VLENB
+    ; RV64-NEXT: [[C4:%[0-9]+]]:_(s32) = G_CONSTANT i32 2
+    ; RV64-NEXT: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[READ_VLENB]], [[C4]](s32)
+    ; RV64-NEXT: [[READ_VLENB1:%[0-9]+]]:_(s32) = G_READ_VLENB
+    ; RV64-NEXT: [[C5:%[0-9]+]]:_(s32) = G_CONSTANT i32 2
+    ; RV64-NEXT: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[READ_VLENB1]], [[C5]](s32)
+    ; RV64-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[LSHR1]], [[LSHR]]
+    ; RV64-NEXT: [[VSLIDEUP_VL:%[0-9]+]]:_(<vscale x 4 x s8>) = G_VSLIDEUP_VL [[SELECT]], [[SELECT1]], [[LSHR1]](s32), [[VMSET_VL]](<vscale x 4 x s1>), [[ADD]](s32), 1
+    ; RV64-NEXT: [[BITCAST:%[0-9]+]]:_(<vscale x 4 x s8>) = G_BITCAST [[VSLIDEUP_VL]](<vscale x 4 x s8>)
+    ; RV64-NEXT: [[C6:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; RV64-NEXT: [[SPLAT_VECTOR4:%[0-9]+]]:_(<vscale x 4 x s8>) = G_SPLAT_VECTOR [[C6]](s32)
+    ; RV64-NEXT: [[ICMP:%[0-9]+]]:_(<vscale x 4 x s1>) = G_ICMP intpred(ne), [[BITCAST]](<vscale x 4 x s8>), [[SPLAT_VECTOR4]]
+    ; RV64-NEXT: $v8 = COPY [[ICMP]](<vscale x 4 x s1>)
+    ; RV64-NEXT: PseudoRET implicit $v8
+    %0:_(<vscale x 4 x s1>) = G_IMPLICIT_DEF
+    %1:_(<vscale x 2 x s1>) = G_IMPLICIT_DEF
+    %2:_(<vscale x 4 x s1>) = G_INSERT_SUBVECTOR %0(<vscale x 4 x s1>), %1, 2
+    $v8 = COPY %2(<vscale x 4 x s1>)
+    PseudoRET implicit $v8
+...
+---
+name:            insert_subvector_nxv4i1_nxv8i1
+legalized:       false
+tracksRegLiveness: true
+body:             |
+  bb.0.entry:
+    ; RV32-LABEL: name: insert_subvector_nxv4i1_nxv8i1
+    ; RV32: [[DEF:%[0-9]+]]:_(<vscale x 8 x s1>) = G_IMPLICIT_DEF
+    ; RV32-NEXT: [[DEF1:%[0-9]+]]:_(<vscale x 2 x s1>) = G_IMPLICIT_DEF
+    ; RV32-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; RV32-NEXT: [[ANYEXT:%[0-9]+]]:_(s64) = G_ANYEXT [[C]](s32)
+    ; RV32-NEXT: [[SPLAT_VECTOR:%[0-9]+]]:_(<vscale x 8 x s8>) = G_SPLAT_VECTOR [[ANYEXT]](s64)
+    ; RV32-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+    ; RV32-NEXT: [[ANYEXT1:%[0-9]+]]:_(s64) = G_ANYEXT [[C1]](s32)
+    ; RV32-NEXT: [[SPLAT_VECTOR1:%[0-9]+]]:_(<vscale x 8 x s8>) = G_SPLAT_VECTOR [[ANYEXT1]](s64)
+    ; RV32-NEXT: [[SELECT:%[0-9]+]]:_(<vscale x 8 x s8>) = G_SELECT [[DEF]](<vscale x 8 x s1>), [[SPLAT_VECTOR1]], [[SPLAT_VECTOR]]
+    ; RV32-NEXT: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; RV32-NEXT: [[ANYEXT2:%[0-9]+]]:_(s64) = G_ANYEXT [[C2]](s32)
+    ; RV32-NEXT: [[SPLAT_VECTOR2:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SPLAT_VECTOR [[ANYEXT2]](s64)
+    ; RV32-NEXT: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+    ; RV32-NEXT: [[ANYEXT3:%[0-9]+]]:_(s64) = G_ANYEXT [[C3]](s32)
+    ; RV32-NEXT: [[SPLAT_VECTOR3:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SPLAT_VECTOR [[ANYEXT3]](s64)
+    ; RV32-NEXT: [[SELECT1:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SELECT [[DEF1]](<vscale x 2 x s1>), [[SPLAT_VECTOR3]], [[SPLAT_VECTOR2]]
+    ; RV32-NEXT: [[VMSET_VL:%[0-9]+]]:_(<vscale x 8 x s1>) = G_VMSET_VL $x0
+    ; RV32-NEXT: [[READ_VLENB:%[0-9]+]]:_(s64) = G_READ_VLENB
+    ; RV32-NEXT: [[C4:%[0-9]+]]:_(s64) = G_CONSTANT i64 2
+    ; RV32-NEXT: [[LSHR:%[0-9]+]]:_(s64) = G_LSHR [[READ_VLENB]], [[C4]](s64)
+    ; RV32-NEXT: [[READ_VLENB1:%[0-9]+]]:_(s64) = G_READ_VLENB
+    ; RV32-NEXT: [[C5:%[0-9]+]]:_(s64) = G_CONSTANT i64 2
+    ; RV32-NEXT: [[LSHR1:%[0-9]+]]:_(s64) = G_LSHR [[READ_VLENB1]], [[C5]](s64)
+    ; RV32-NEXT: [[ADD:%[0-9]+]]:_(s64) = G_ADD [[LSHR1]], [[LSHR]]
+    ; RV32-NEXT: [[VSLIDEUP_VL:%[0-9]+]]:_(<vscale x 8 x s8>) = G_VSLIDEUP_VL [[SELECT]], [[SELECT1]], [[LSHR1]](s64), [[VMSET_VL]](<vscale x 8 x s1>), [[ADD]](s64), 0
+    ; RV32-NEXT: [[BITCAST:%[0-9]+]]:_(<vscale x 8 x s8>) = G_BITCAST [[VSLIDEUP_VL]](<vscale x 8 x s8>)
+    ; RV32-NEXT: [[C6:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; RV32-NEXT: [[ANYEXT4:%[0-9]+]]:_(s64) = G_ANYEXT [[C6]](s32)
+    ; RV32-NEXT: [[SPLAT_VECTOR4:%[0-9]+]]:_(<vscale x 8 x s8>) = G_SPLAT_VECTOR [[ANYEXT4]](s64)
+    ; RV32-NEXT: [[ICMP:%[0-9]+]]:_(<vscale x 8 x s1>) = G_ICMP intpred(ne), [[BITCAST]](<vscale x 8 x s8>), [[SPLAT_VECTOR4]]
+    ; RV32-NEXT: $v8 = COPY [[ICMP]](<vscale x 8 x s1>)
+    ; RV32-NEXT: PseudoRET implicit $v8
+    ;
+    ; RV64-LABEL: name: insert_subvector_nxv4i1_nxv8i1
+    ; RV64: [[DEF:%[0-9]+]]:_(<vscale x 8 x s1>) = G_IMPLICIT_DEF
+    ; RV64-NEXT: [[DEF1:%[0-9]+]]:_(<vscale x 2 x s1>) = G_IMPLICIT_DEF
+    ; RV64-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; RV64-NEXT: [[SPLAT_VECTOR:%[0-9]+]]:_(<vscale x 8 x s8>) = G_SPLAT_VECTOR [[C]](s32)
+    ; RV64-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+    ; RV64-NEXT: [[SPLAT_VECTOR1:%[0-9]+]]:_(<vscale x 8 x s8>) = G_SPLAT_VECTOR [[C1]](s32)
+    ; RV64-NEXT: [[SELECT:%[0-9]+]]:_(<vscale x 8 x s8>) = G_SELECT [[DEF]](<vscale x 8 x s1>), [[SPLAT_VECTOR1]], [[SPLAT_VECTOR]]
+    ; RV64-NEXT: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; RV64-NEXT: [[SPLAT_VECTOR2:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SPLAT_VECTOR [[C2]](s32)
+    ; RV64-NEXT: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+    ; RV64-NEXT: [[SPLAT_VECTOR3:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SPLAT_VECTOR [[C3]](s32)
+    ; RV64-NEXT: [[SELECT1:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SELECT [[DEF1]](<vscale x 2 x s1>), [[SPLAT_VECTOR3]], [[SPLAT_VECTOR2]]
+    ; RV64-NEXT: [[VMSET_VL:%[0-9]+]]:_(<vscale x 8 x s1>) = G_VMSET_VL $x0
+    ; RV64-NEXT: [[READ_VLENB:%[0-9]+]]:_(s32) = G_READ_VLENB
+    ; RV64-NEXT: [[C4:%[0-9]+]]:_(s32) = G_CONSTANT i32 2
+    ; RV64-NEXT: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[READ_...
[truncated]

@topperc
Copy link
Collaborator

topperc commented Sep 16, 2024

How does one create a G_INSERT_SUBVECTOR today? There are no callers of MachineIRBuilder::buildInsertSubvector

@michaelmaitland
Copy link
Contributor Author

How does one create a G_INSERT_SUBVECTOR today? There are no callers of MachineIRBuilder::buildInsertSubvector

It will be needed by G_INSERT_VECTOR_ELT. This is a precursor to that patch.

@tschuett
Copy link

IRTranslator support for G_INSERT_SUBVECTOR is missing.

@michaelmaitland michaelmaitland force-pushed the legalize-insert-subvector branch from d6902ef to 19bf34d Compare September 16, 2024 19:02
@topperc
Copy link
Collaborator

topperc commented Sep 17, 2024

I'm not sure I believe this belongs in the legalizer. This feels like custom instruction selection. I think its similar to the reasons AArch64 created AArch64PostLegalizerLowering. Perhaps we should be doing the same?

@michaelmaitland michaelmaitland force-pushed the legalize-insert-subvector branch 3 times, most recently from 5d48e79 to e2b473d Compare September 17, 2024 17:20
@michaelmaitland michaelmaitland changed the title [RISCV][GISEL] Legalize G_INSERT_SUBVECTOR [RISCV][GISEL] Legalize and post-legalize lower G_INSERT_SUBVECTOR Sep 17, 2024
Copy link

github-actions bot commented Sep 17, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@tschuett
Copy link

What is the difference between hosting this code at:

  • the legalizer
  • the post legalizer lower combiner
  • the instruction selector?

@topperc
Copy link
Collaborator

topperc commented Sep 17, 2024

What is the difference between hosting this code at:

  • the legalizer

  • the post legalizer lower combiner

  • the instruction selector?

Emitting a bunch of target specific opcodes from the legalizer makes it very hard to do any optimizations that may be exposed by legalizing the types.

Doing it in instruction selection means we can't reuse tablegen patterns. There are up to 34 different variants of instructions considering element width and vector width.

Lowering allows us to select target specific generic nodes after after post legalizer optimizations and still use tablegen patterns.

; RV64-NEXT: [[VSCALE1:%[0-9]+]]:_(s64) = G_VSCALE i64 1
; RV64-NEXT: [[ADD:%[0-9]+]]:_(s64) = G_ADD [[VSCALE1]], [[VSCALE]]
; RV64-NEXT: [[VSLIDEUP_VL:%[0-9]+]]:_(<vscale x 4 x s16>) = G_VSLIDEUP_VL [[DEF]], [[DEF]], [[VSCALE1]](s64), [[VMSET_VL]](<vscale x 4 x s1>), [[ADD]](s64), 0
; RV64-NEXT: [[BITCAST:%[0-9]+]]:_(<vscale x 4 x s16>) = G_BITCAST [[VSLIDEUP_VL]](<vscale x 4 x s16>)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bitcast is unnecessary

@tschuett
Copy link

I stated before: the odd thing about the lower passes: they are free-standing functions. There is no CombinerHelper and no access to LegalizerInfo.
In the combiner, we have the pattern:

if (!isLegal(op))
  return false;
build(Op);

The lowering combines cannot really fail. How about:

assert(isLegal(Op) && "unexpected"):
build(Op);

@arsenm
Copy link
Contributor

arsenm commented Sep 20, 2024

I stated before: the odd thing about the lower passes: they are free-standing functions. There is no CombinerHelper and no access to LegalizerInfo. In the combiner, we have the pattern:

I don't understand the lowering passes. They seem like they're just a hack and I don't understand why they are needed.

@tschuett
Copy link

See discussion in #108991.

@michaelmaitland michaelmaitland force-pushed the legalize-insert-subvector branch from 13ac1af to c2ebba9 Compare September 20, 2024 13:22
@tschuett tschuett changed the title [RISCV][GISEL] Legalize and post-legalize lower G_INSERT_SUBVECTOR [RISCV][GISEL] Legalize G_INSERT_SUBVECTOR Sep 20, 2024
@michaelmaitland michaelmaitland force-pushed the legalize-insert-subvector branch from c2ebba9 to 4616b37 Compare September 20, 2024 13:33
@michaelmaitland
Copy link
Contributor Author

Thanks @tschuett, I forgot to commit the changes you mention. And thanks also for updating the title. I have to post my patch for legalize extract subvector, because this patch depends on that one.

@michaelmaitland michaelmaitland force-pushed the legalize-insert-subvector branch from 9168d88 to 5d0a49d Compare October 1, 2024 19:59
@michaelmaitland
Copy link
Contributor Author

I've updated this patch based on the discussions we had during the legalization of EXTRACT_SUBVECTOR. Now ready for review :)

LitTy = LitTy.changeElementType(LLT::scalar(8));
auto BigZExt = MIB.buildZExt(BigTy, BigVec);
auto LitZExt = MIB.buildZExt(LitTy, LitVec);
auto Insert = MIB.buildInsertSubvector(BigTy, BigZExt, LitZExt, Idx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could make this widenscalar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we're widening a vector?

Copy link
Collaborator

@topperc topperc Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Widening a vector" would mean changing the element count. Widening scalar means changing the element size even if its part of a vector.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  /// The operation should be implemented in terms of a wider scalar             
  /// base-type. For example a <2 x s8> add could be implemented as a <2         
  /// x s32> add (ignoring the high bits).                                       
  WidenScalar,                                                                   
                

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the clarification.

Comment on lines +1084 to +1058
if (LitTy.getElementType() == LLT::scalar(1)) {
auto BigTyMinElts = BigTy.getElementCount().getKnownMinValue();
auto LitTyMinElts = LitTy.getElementCount().getKnownMinValue();
if (BigTyMinElts >= 8 && LitTyMinElts >= 8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do this in the legalization rule instead of in the custom logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure?

Above is this code is

if (Idx == 0 ||
      MRI.getVRegDef(BigVec)->getOpcode() == TargetOpcode::G_IMPLICIT_DEF)

This means that the legalization rule needs to check the immediate value to make sure its not zero, which I couldn't figure out how to do, and we need to check to see if the BigVec is not a G_IMPLICIT_DEF, which I also don't know how to do as part of the legalization rule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you require that you can't do it in the rule. But do you really need this to be in the rule? I would expect that folding out the undef part to only require an optimizing combine

Copy link
Contributor Author

@michaelmaitland michaelmaitland Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can do the undef check in a combine, but I still don’t think it’s possible to check that the index is not zero is it? I don't think its possible to have a legalIf that checks when the idx is zero.

Copy link
Contributor Author

@michaelmaitland michaelmaitland Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we promote the INSERT_SUBVECTOR to do the combine later, I am concerned that we will be left with an INSERT_SUBVECTOR that has a larger type than needed.

if (TypeSize::isKnownGT(BigTy.getSizeInBits(), InterLitTy.getSizeInBits()))
MIB.buildInsertSubvector(Dst, BigVec, LitVec, AlignedIdx);
else
Inserted->getOperand(0).setReg(Dst);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DIdn't call the observer before modifying instruction in place. Can you build the instruction above with the correct register?

@michaelmaitland michaelmaitland force-pushed the legalize-insert-subvector branch from 4c47dc0 to 0c558af Compare October 15, 2024 16:27
@michaelmaitland michaelmaitland merged commit 6bac414 into llvm:main Oct 21, 2024
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 21, 2024

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/7470

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/RISCV/GlobalISel/legalizer/rvv/legalize-insert-subvector.mir' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 2: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=riscv64 -mattr=+v -run-pass=legalizer /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/rvv/legalize-insert-subvector.mir -o - | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/rvv/legalize-insert-subvector.mir -check-prefixes=CHECK,RV32
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/rvv/legalize-insert-subvector.mir -check-prefixes=CHECK,RV32
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=riscv64 -mattr=+v -run-pass=legalizer /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/rvv/legalize-insert-subvector.mir -o -

# After Legalizer
# Machine code for function insert_subvector_nxv8i16_nxv1i16: IsSSA, TracksLiveness, Legalized

bb.0.entry:
  liveins: $v8m8
  %0:_(<vscale x 8 x s16>) = COPY $v8
  %1:_(<vscale x 1 x s16>) = G_IMPLICIT_DEF
  %3:_(<vscale x 4 x s16>) = G_EXTRACT_SUBVECTOR %0:_(<vscale x 8 x s16>), 4
  %4:_(<vscale x 4 x s16>) = G_IMPLICIT_DEF
  %5:_(<vscale x 4 x s16>) = G_INSERT_SUBVECTOR %4:_, %1:_(<vscale x 1 x s16>), 0
  %6:_(s64) = G_CONSTANT i64 -1
  %7:_(<vscale x 8 x s1>) = G_VMSET_VL %6:_(s64)
  %10:_(s64) = G_READ_VLENB
  %11:_(s64) = G_CONSTANT i64 3
  %8:_(s64) = G_LSHR %10:_, %11:_(s64)
  %9:_(<vscale x 4 x s16>) = G_VMV_V_V_VL %3:_, %5:_(<vscale x 4 x s16>), %8:_(s64)
  %2:_(<vscale x 8 x s16>) = G_INSERT_SUBVECTOR %0:_, %9:_(<vscale x 4 x s16>), 4
  $v8 = COPY %2:_(<vscale x 8 x s16>)
  PseudoRET implicit $v8

# End machine code for function insert_subvector_nxv8i16_nxv1i16.

*** Bad machine code: Extra explicit operand on non-variadic instruction ***
- function:    insert_subvector_nxv8i16_nxv1i16
- basic block: %bb.0 entry (0xa76f218)
- instruction: %9:_(<vscale x 4 x s16>) = G_VMV_V_V_VL %3:_, %5:_(<vscale x 4 x s16>), %8:_(s64)
- operand 3:   %8:_
LLVM ERROR: Found 1 machine code errors.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=riscv64 -mattr=+v -run-pass=legalizer /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/rvv/legalize-insert-subvector.mir -o -
1.	Running pass 'Function Pass Manager' on module '/b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/rvv/legalize-insert-subvector.mir'.
2.	Running pass 'Verify generated machine code' on function '@insert_subvector_nxv8i16_nxv1i16'
 #0 0x00000000035d3317 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc+0x35d3317)
 #1 0x00000000035d0dce llvm::sys::RunSignalHandlers() (/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc+0x35d0dce)
 #2 0x00000000035d39cf SignalHandler(int) Signals.cpp:0:0
 #3 0x00007fa797fbe140 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x13140)
 #4 0x00007fa797ad2d51 raise (/lib/x86_64-linux-gnu/libc.so.6+0x38d51)
 #5 0x00007fa797abc537 abort (/lib/x86_64-linux-gnu/libc.so.6+0x22537)
 #6 0x0000000003543f2a llvm::report_fatal_error(llvm::Twine const&, bool) (/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc+0x3543f2a)
 #7 0x00000000028009ba (/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc+0x28009ba)
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants