-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[RISCV][GISEL] Legalize G_INSERT_SUBVECTOR #108859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV][GISEL] Legalize G_INSERT_SUBVECTOR #108859
Conversation
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-risc-v Author: Michael Maitland (michaelmaitland) ChangesThis code is heavily based on the SelectionDAG lowerINSERT_SUBVECTOR code. Patch is 33.04 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/108859.diff 4 Files Affected:
diff --git a/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.cpp b/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.cpp
index 8de908d20d245c..ff9ebac02b0dc2 100644
--- a/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.cpp
+++ b/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.cpp
@@ -581,6 +581,12 @@ RISCVLegalizerInfo::RISCVLegalizerInfo(const RISCVSubtarget &ST)
SplatActions.clampScalar(1, sXLen, sXLen);
+ getActionDefinitionsBuilder(G_INSERT_SUBVECTOR)
+ .customIf(all(typeIsLegalBoolVec(0, BoolVecTys, ST),
+ typeIsLegalBoolVec(1, BoolVecTys, ST)))
+ .customIf(all(typeIsLegalIntOrFPVec(0, IntOrFPVecTys, ST),
+ typeIsLegalIntOrFPVec(1, IntOrFPVecTys, ST)));
+
getLegacyLegalizerInfo().computeTables();
}
@@ -915,6 +921,154 @@ bool RISCVLegalizerInfo::legalizeSplatVector(MachineInstr &MI,
return true;
}
+static LLT getLMUL1Ty(LLT VecTy) {
+ assert(VecTy.getElementType().getSizeInBits() <= 64 &&
+ "Unexpected vector LLT");
+ return LLT::scalable_vector(RISCV::RVVBitsPerBlock /
+ VecTy.getElementType().getSizeInBits(),
+ VecTy.getElementType());
+}
+
+bool RISCVLegalizerInfo::legalizeInsertSubvector(MachineInstr &MI,
+ MachineIRBuilder &MIB) const {
+ assert(MI.getOpcode() == TargetOpcode::G_INSERT_SUBVECTOR);
+
+ MachineRegisterInfo &MRI = *MIB.getMRI();
+
+ Register Dst = MI.getOperand(0).getReg();
+ Register Src1 = MI.getOperand(1).getReg();
+ Register Src2 = MI.getOperand(2).getReg();
+ uint64_t Idx = MI.getOperand(3).getImm();
+
+ LLT BigTy = MRI.getType(Src1);
+ LLT LitTy = MRI.getType(Src2);
+ Register BigVec = Src1;
+ Register LitVec = Src2;
+
+ // We don't have the ability to slide mask vectors up indexed by their i1
+ // elements; the smallest we can do is i8. Often we are able to bitcast to
+ // equivalent i8 vectors. Otherwise, we can must zeroextend to equivalent i8
+ // vectors and truncate down after the insert.
+ if (LitTy.getElementType() == LLT::scalar(1) &&
+ (Idx != 0 ||
+ MRI.getVRegDef(BigVec)->getOpcode() != TargetOpcode::G_IMPLICIT_DEF)) {
+ auto BigTyMinElts = BigTy.getElementCount().getKnownMinValue();
+ auto LitTyMinElts = LitTy.getElementCount().getKnownMinValue();
+ if (BigTyMinElts >= 8 && LitTyMinElts >= 8) {
+ assert(Idx % 8 == 0 && "Invalid index");
+ assert(BigTyMinElts % 8 == 0 && LitTyMinElts % 8 == 0 &&
+ "Unexpected mask vector lowering");
+ Idx /= 8;
+ BigTy = LLT::vector(BigTy.getElementCount().divideCoefficientBy(8), 8);
+ LitTy = LLT::vector(LitTy.getElementCount().divideCoefficientBy(8), 8);
+ BigVec = MIB.buildBitcast(BigTy, BigVec).getReg(0);
+ LitVec = MIB.buildBitcast(LitTy, LitVec).getReg(0);
+ } else {
+ // We can't slide this mask vector up indexed by its i1 elements.
+ // This poses a problem when we wish to insert a scalable vector which
+ // can't be re-expressed as a larger type. Just choose the slow path and
+ // extend to a larger type, then truncate back down.
+ LLT ExtBigTy = BigTy.changeElementType(LLT::scalar(8));
+ LLT ExtLitTy = LitTy.changeElementType(LLT::scalar(8));
+ auto BigZExt = MIB.buildZExt(ExtBigTy, BigVec);
+ auto LitZExt = MIB.buildZExt(ExtLitTy, LitVec);
+ auto Insert = MIB.buildInsertSubvector(ExtBigTy, BigZExt, LitZExt, Idx);
+ auto SplatZero = MIB.buildSplatVector(
+ ExtBigTy, MIB.buildConstant(ExtBigTy.getElementType(), 0));
+ MIB.buildICmp(CmpInst::Predicate::ICMP_NE, Dst, Insert, SplatZero);
+ MI.eraseFromParent();
+ return true;
+ }
+ }
+
+ const RISCVRegisterInfo *TRI = STI.getRegisterInfo();
+ MVT LitTyMVT = getMVTForLLT(LitTy);
+ unsigned SubRegIdx, RemIdx;
+ std::tie(SubRegIdx, RemIdx) =
+ RISCVTargetLowering::decomposeSubvectorInsertExtractToSubRegs(
+ getMVTForLLT(BigTy), LitTyMVT, Idx, TRI);
+
+ RISCVII::VLMUL SubVecLMUL = RISCVTargetLowering::getLMUL(getMVTForLLT(LitTy));
+ bool IsSubVecPartReg = SubVecLMUL == RISCVII::VLMUL::LMUL_F2 ||
+ SubVecLMUL == RISCVII::VLMUL::LMUL_F4 ||
+ SubVecLMUL == RISCVII::VLMUL::LMUL_F8;
+
+ // If the Idx has been completely eliminated and this subvector's size is a
+ // vector register or a multiple thereof, or the surrounding elements are
+ // undef, then this is a subvector insert which naturally aligns to a vector
+ // register. These can easily be handled using subregister manipulation.
+ if (RemIdx == 0 && (!IsSubVecPartReg || MRI.getVRegDef(Src1)->getOpcode() ==
+ TargetOpcode::G_IMPLICIT_DEF))
+ return true;
+
+ // If the subvector is smaller than a vector register, then the insertion
+ // must preserve the undisturbed elements of the register. We do this by
+ // lowering to an EXTRACT_SUBVECTOR grabbing the nearest LMUL=1 vector type
+ // (which resolves to a subregister copy), performing a VSLIDEUP to place the
+ // subvector within the vector register, and an INSERT_SUBVECTOR of that
+ // LMUL=1 type back into the larger vector (resolving to another subregister
+ // operation). See below for how our VSLIDEUP works. We go via a LMUL=1 type
+ // to avoid allocating a large register group to hold our subvector.
+
+ // VSLIDEUP works by leaving elements 0<i<OFFSET undisturbed, elements
+ // OFFSET<=i<VL set to the "subvector" and vl<=i<VLMAX set to the tail policy
+ // (in our case undisturbed). This means we can set up a subvector insertion
+ // where OFFSET is the insertion offset, and the VL is the OFFSET plus the
+ // size of the subvector.
+ const LLT XLenTy(STI.getXLenVT());
+ LLT InterLitTy = BigTy;
+ Register AlignedExtract = Src1;
+ unsigned AlignedIdx = Idx - RemIdx;
+ if (TypeSize::isKnownGT(BigTy.getSizeInBits(),
+ getLMUL1Ty(BigTy).getSizeInBits())) {
+ InterLitTy = getLMUL1Ty(BigTy);
+ // Extract a subvector equal to the nearest full vector register type. This
+ // should resolve to a G_EXTRACT on a subreg.
+ AlignedExtract =
+ MIB.buildExtractSubvector(InterLitTy, BigVec, AlignedIdx).getReg(0);
+ }
+
+ auto Insert = MIB.buildInsertSubvector(InterLitTy, MIB.buildUndef(InterLitTy),
+ LitVec, 0);
+
+ auto [Mask, _] = buildDefaultVLOps(BigTy, MIB, MRI);
+ auto VL = MIB.buildVScale(XLenTy, LitTy.getElementCount().getKnownMinValue());
+
+ // Use tail agnostic policy if we're inserting over InterLitTy's tail.
+ ElementCount EndIndex =
+ ElementCount::getScalable(RemIdx) + LitTy.getElementCount();
+ uint64_t Policy = RISCVII::TAIL_UNDISTURBED_MASK_UNDISTURBED;
+ if (EndIndex == InterLitTy.getElementCount())
+ Policy = RISCVII::TAIL_AGNOSTIC;
+
+ // If we're inserting into the lowest elements, use a tail undisturbed
+ // vmv.v.v.
+ MachineInstrBuilder Inserted;
+ if (RemIdx == 0) {
+ Inserted = MIB.buildInstr(RISCV::G_VMV_V_V_VL, {InterLitTy},
+ {AlignedExtract, Insert, VL});
+ } else {
+ auto SlideupAmt = MIB.buildVScale(XLenTy, RemIdx);
+ // Construct the vector length corresponding to RemIdx + length(LitTy).
+ VL = MIB.buildAdd(XLenTy, SlideupAmt, VL);
+ Inserted =
+ MIB.buildInstr(RISCV::G_VSLIDEUP_VL, {InterLitTy},
+ {AlignedExtract, LitVec, SlideupAmt, Mask, VL, Policy});
+ }
+
+ // If required, insert this subvector back into the correct vector register.
+ // This should resolve to an INSERT_SUBREG instruction.
+ if (TypeSize::isKnownGT(BigTy.getSizeInBits(), InterLitTy.getSizeInBits()))
+ Inserted = MIB.buildInsert(BigTy, BigVec, LitVec, AlignedIdx);
+
+ // We might have bitcast from a mask type: cast back to the original type if
+ // required.
+ MIB.buildBitcast(Dst, Inserted);
+
+ MI.eraseFromParent();
+ return true;
+}
+
bool RISCVLegalizerInfo::legalizeCustom(
LegalizerHelper &Helper, MachineInstr &MI,
LostDebugLocObserver &LocObserver) const {
@@ -985,6 +1139,8 @@ bool RISCVLegalizerInfo::legalizeCustom(
return legalizeExt(MI, MIRBuilder);
case TargetOpcode::G_SPLAT_VECTOR:
return legalizeSplatVector(MI, MIRBuilder);
+ case TargetOpcode::G_INSERT_SUBVECTOR:
+ return legalizeInsertSubvector(MI, MIRBuilder);
case TargetOpcode::G_LOAD:
case TargetOpcode::G_STORE:
return legalizeLoadStore(MI, Helper, MIRBuilder);
diff --git a/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.h b/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.h
index 2fc28615e7630d..ccd8ac9fe4ec90 100644
--- a/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.h
+++ b/llvm/lib/Target/RISCV/GISel/RISCVLegalizerInfo.h
@@ -46,6 +46,7 @@ class RISCVLegalizerInfo : public LegalizerInfo {
bool legalizeVScale(MachineInstr &MI, MachineIRBuilder &MIB) const;
bool legalizeExt(MachineInstr &MI, MachineIRBuilder &MIRBuilder) const;
bool legalizeSplatVector(MachineInstr &MI, MachineIRBuilder &MIB) const;
+ bool legalizeInsertSubvector(MachineInstr &MI, MachineIRBuilder &MIB) const;
bool legalizeLoadStore(MachineInstr &MI, LegalizerHelper &Helper,
MachineIRBuilder &MIB) const;
};
diff --git a/llvm/lib/Target/RISCV/RISCVInstrGISel.td b/llvm/lib/Target/RISCV/RISCVInstrGISel.td
index ba40662c49c1df..948167023d50a2 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrGISel.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrGISel.td
@@ -57,3 +57,21 @@ def G_SPLAT_VECTOR_SPLIT_I64_VL : RISCVGenericInstruction {
let InOperandList = (ins type0:$passthru, type1:$hi, type1:$lo, type2:$vl);
let hasSideEffects = false;
}
+
+// Pseudo equivalent to a RISCVISD::VMV_V_V_VL
+def G_VMV_V_V_VL : RISCVGenericInstruction {
+ let OutOperandList = (outs type0:$dst);
+ let InOperandList = (ins type0:$vec, type2:$vl);
+ let hasSideEffects = false;
+}
+def : GINodeEquiv<G_VMV_V_V_VL, riscv_vmv_v_v_vl>;
+
+// Pseudo equivalent to a RISCVISD::VSLIDEUP_VL
+def G_VSLIDEUP_VL : RISCVGenericInstruction {
+ let OutOperandList = (outs type0:$dst);
+ let InOperandList = (ins type0:$merge, type0:$vec, type1:$idx, type2:$mask,
+ type3:$vl, type4:$policy);
+ let hasSideEffects = false;
+}
+def : GINodeEquiv<G_VSLIDEUP_VL, riscv_slideup_vl>;
+
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/rvv/legalize-insert-subvector.mir b/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/rvv/legalize-insert-subvector.mir
new file mode 100644
index 00000000000000..a5f1228b8f8ca6
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/legalizer/rvv/legalize-insert-subvector.mir
@@ -0,0 +1,402 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=riscv64 -mattr=+v -run-pass=legalizer %s -o - | FileCheck %s -check-prefixes=CHECK,RV32
+# RUN: llc -mtriple=riscv32 -mattr=+v -run-pass=legalizer %s -o - | FileCheck %s -check-prefixes=CHECK,RV64
+
+# Special handling for i1-element vectors with non-zero index
+---
+name: insert_subvector_nxv2i1_nxv4i1
+legalized: false
+tracksRegLiveness: true
+body: |
+ bb.0.entry:
+ ; RV32-LABEL: name: insert_subvector_nxv2i1_nxv4i1
+ ; RV32: [[DEF:%[0-9]+]]:_(<vscale x 4 x s1>) = G_IMPLICIT_DEF
+ ; RV32-NEXT: [[DEF1:%[0-9]+]]:_(<vscale x 2 x s1>) = G_IMPLICIT_DEF
+ ; RV32-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; RV32-NEXT: [[ANYEXT:%[0-9]+]]:_(s64) = G_ANYEXT [[C]](s32)
+ ; RV32-NEXT: [[SPLAT_VECTOR:%[0-9]+]]:_(<vscale x 4 x s8>) = G_SPLAT_VECTOR [[ANYEXT]](s64)
+ ; RV32-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+ ; RV32-NEXT: [[ANYEXT1:%[0-9]+]]:_(s64) = G_ANYEXT [[C1]](s32)
+ ; RV32-NEXT: [[SPLAT_VECTOR1:%[0-9]+]]:_(<vscale x 4 x s8>) = G_SPLAT_VECTOR [[ANYEXT1]](s64)
+ ; RV32-NEXT: [[SELECT:%[0-9]+]]:_(<vscale x 4 x s8>) = G_SELECT [[DEF]](<vscale x 4 x s1>), [[SPLAT_VECTOR1]], [[SPLAT_VECTOR]]
+ ; RV32-NEXT: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; RV32-NEXT: [[ANYEXT2:%[0-9]+]]:_(s64) = G_ANYEXT [[C2]](s32)
+ ; RV32-NEXT: [[SPLAT_VECTOR2:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SPLAT_VECTOR [[ANYEXT2]](s64)
+ ; RV32-NEXT: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+ ; RV32-NEXT: [[ANYEXT3:%[0-9]+]]:_(s64) = G_ANYEXT [[C3]](s32)
+ ; RV32-NEXT: [[SPLAT_VECTOR3:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SPLAT_VECTOR [[ANYEXT3]](s64)
+ ; RV32-NEXT: [[SELECT1:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SELECT [[DEF1]](<vscale x 2 x s1>), [[SPLAT_VECTOR3]], [[SPLAT_VECTOR2]]
+ ; RV32-NEXT: [[VMSET_VL:%[0-9]+]]:_(<vscale x 4 x s1>) = G_VMSET_VL $x0
+ ; RV32-NEXT: [[READ_VLENB:%[0-9]+]]:_(s64) = G_READ_VLENB
+ ; RV32-NEXT: [[C4:%[0-9]+]]:_(s64) = G_CONSTANT i64 2
+ ; RV32-NEXT: [[LSHR:%[0-9]+]]:_(s64) = G_LSHR [[READ_VLENB]], [[C4]](s64)
+ ; RV32-NEXT: [[READ_VLENB1:%[0-9]+]]:_(s64) = G_READ_VLENB
+ ; RV32-NEXT: [[C5:%[0-9]+]]:_(s64) = G_CONSTANT i64 2
+ ; RV32-NEXT: [[LSHR1:%[0-9]+]]:_(s64) = G_LSHR [[READ_VLENB1]], [[C5]](s64)
+ ; RV32-NEXT: [[ADD:%[0-9]+]]:_(s64) = G_ADD [[LSHR1]], [[LSHR]]
+ ; RV32-NEXT: [[VSLIDEUP_VL:%[0-9]+]]:_(<vscale x 4 x s8>) = G_VSLIDEUP_VL [[SELECT]], [[SELECT1]], [[LSHR1]](s64), [[VMSET_VL]](<vscale x 4 x s1>), [[ADD]](s64), 1
+ ; RV32-NEXT: [[BITCAST:%[0-9]+]]:_(<vscale x 4 x s8>) = G_BITCAST [[VSLIDEUP_VL]](<vscale x 4 x s8>)
+ ; RV32-NEXT: [[C6:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; RV32-NEXT: [[ANYEXT4:%[0-9]+]]:_(s64) = G_ANYEXT [[C6]](s32)
+ ; RV32-NEXT: [[SPLAT_VECTOR4:%[0-9]+]]:_(<vscale x 4 x s8>) = G_SPLAT_VECTOR [[ANYEXT4]](s64)
+ ; RV32-NEXT: [[ICMP:%[0-9]+]]:_(<vscale x 4 x s1>) = G_ICMP intpred(ne), [[BITCAST]](<vscale x 4 x s8>), [[SPLAT_VECTOR4]]
+ ; RV32-NEXT: $v8 = COPY [[ICMP]](<vscale x 4 x s1>)
+ ; RV32-NEXT: PseudoRET implicit $v8
+ ;
+ ; RV64-LABEL: name: insert_subvector_nxv2i1_nxv4i1
+ ; RV64: [[DEF:%[0-9]+]]:_(<vscale x 4 x s1>) = G_IMPLICIT_DEF
+ ; RV64-NEXT: [[DEF1:%[0-9]+]]:_(<vscale x 2 x s1>) = G_IMPLICIT_DEF
+ ; RV64-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; RV64-NEXT: [[SPLAT_VECTOR:%[0-9]+]]:_(<vscale x 4 x s8>) = G_SPLAT_VECTOR [[C]](s32)
+ ; RV64-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+ ; RV64-NEXT: [[SPLAT_VECTOR1:%[0-9]+]]:_(<vscale x 4 x s8>) = G_SPLAT_VECTOR [[C1]](s32)
+ ; RV64-NEXT: [[SELECT:%[0-9]+]]:_(<vscale x 4 x s8>) = G_SELECT [[DEF]](<vscale x 4 x s1>), [[SPLAT_VECTOR1]], [[SPLAT_VECTOR]]
+ ; RV64-NEXT: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; RV64-NEXT: [[SPLAT_VECTOR2:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SPLAT_VECTOR [[C2]](s32)
+ ; RV64-NEXT: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+ ; RV64-NEXT: [[SPLAT_VECTOR3:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SPLAT_VECTOR [[C3]](s32)
+ ; RV64-NEXT: [[SELECT1:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SELECT [[DEF1]](<vscale x 2 x s1>), [[SPLAT_VECTOR3]], [[SPLAT_VECTOR2]]
+ ; RV64-NEXT: [[VMSET_VL:%[0-9]+]]:_(<vscale x 4 x s1>) = G_VMSET_VL $x0
+ ; RV64-NEXT: [[READ_VLENB:%[0-9]+]]:_(s32) = G_READ_VLENB
+ ; RV64-NEXT: [[C4:%[0-9]+]]:_(s32) = G_CONSTANT i32 2
+ ; RV64-NEXT: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[READ_VLENB]], [[C4]](s32)
+ ; RV64-NEXT: [[READ_VLENB1:%[0-9]+]]:_(s32) = G_READ_VLENB
+ ; RV64-NEXT: [[C5:%[0-9]+]]:_(s32) = G_CONSTANT i32 2
+ ; RV64-NEXT: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[READ_VLENB1]], [[C5]](s32)
+ ; RV64-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[LSHR1]], [[LSHR]]
+ ; RV64-NEXT: [[VSLIDEUP_VL:%[0-9]+]]:_(<vscale x 4 x s8>) = G_VSLIDEUP_VL [[SELECT]], [[SELECT1]], [[LSHR1]](s32), [[VMSET_VL]](<vscale x 4 x s1>), [[ADD]](s32), 1
+ ; RV64-NEXT: [[BITCAST:%[0-9]+]]:_(<vscale x 4 x s8>) = G_BITCAST [[VSLIDEUP_VL]](<vscale x 4 x s8>)
+ ; RV64-NEXT: [[C6:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; RV64-NEXT: [[SPLAT_VECTOR4:%[0-9]+]]:_(<vscale x 4 x s8>) = G_SPLAT_VECTOR [[C6]](s32)
+ ; RV64-NEXT: [[ICMP:%[0-9]+]]:_(<vscale x 4 x s1>) = G_ICMP intpred(ne), [[BITCAST]](<vscale x 4 x s8>), [[SPLAT_VECTOR4]]
+ ; RV64-NEXT: $v8 = COPY [[ICMP]](<vscale x 4 x s1>)
+ ; RV64-NEXT: PseudoRET implicit $v8
+ %0:_(<vscale x 4 x s1>) = G_IMPLICIT_DEF
+ %1:_(<vscale x 2 x s1>) = G_IMPLICIT_DEF
+ %2:_(<vscale x 4 x s1>) = G_INSERT_SUBVECTOR %0(<vscale x 4 x s1>), %1, 2
+ $v8 = COPY %2(<vscale x 4 x s1>)
+ PseudoRET implicit $v8
+...
+---
+name: insert_subvector_nxv4i1_nxv8i1
+legalized: false
+tracksRegLiveness: true
+body: |
+ bb.0.entry:
+ ; RV32-LABEL: name: insert_subvector_nxv4i1_nxv8i1
+ ; RV32: [[DEF:%[0-9]+]]:_(<vscale x 8 x s1>) = G_IMPLICIT_DEF
+ ; RV32-NEXT: [[DEF1:%[0-9]+]]:_(<vscale x 2 x s1>) = G_IMPLICIT_DEF
+ ; RV32-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; RV32-NEXT: [[ANYEXT:%[0-9]+]]:_(s64) = G_ANYEXT [[C]](s32)
+ ; RV32-NEXT: [[SPLAT_VECTOR:%[0-9]+]]:_(<vscale x 8 x s8>) = G_SPLAT_VECTOR [[ANYEXT]](s64)
+ ; RV32-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+ ; RV32-NEXT: [[ANYEXT1:%[0-9]+]]:_(s64) = G_ANYEXT [[C1]](s32)
+ ; RV32-NEXT: [[SPLAT_VECTOR1:%[0-9]+]]:_(<vscale x 8 x s8>) = G_SPLAT_VECTOR [[ANYEXT1]](s64)
+ ; RV32-NEXT: [[SELECT:%[0-9]+]]:_(<vscale x 8 x s8>) = G_SELECT [[DEF]](<vscale x 8 x s1>), [[SPLAT_VECTOR1]], [[SPLAT_VECTOR]]
+ ; RV32-NEXT: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; RV32-NEXT: [[ANYEXT2:%[0-9]+]]:_(s64) = G_ANYEXT [[C2]](s32)
+ ; RV32-NEXT: [[SPLAT_VECTOR2:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SPLAT_VECTOR [[ANYEXT2]](s64)
+ ; RV32-NEXT: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+ ; RV32-NEXT: [[ANYEXT3:%[0-9]+]]:_(s64) = G_ANYEXT [[C3]](s32)
+ ; RV32-NEXT: [[SPLAT_VECTOR3:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SPLAT_VECTOR [[ANYEXT3]](s64)
+ ; RV32-NEXT: [[SELECT1:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SELECT [[DEF1]](<vscale x 2 x s1>), [[SPLAT_VECTOR3]], [[SPLAT_VECTOR2]]
+ ; RV32-NEXT: [[VMSET_VL:%[0-9]+]]:_(<vscale x 8 x s1>) = G_VMSET_VL $x0
+ ; RV32-NEXT: [[READ_VLENB:%[0-9]+]]:_(s64) = G_READ_VLENB
+ ; RV32-NEXT: [[C4:%[0-9]+]]:_(s64) = G_CONSTANT i64 2
+ ; RV32-NEXT: [[LSHR:%[0-9]+]]:_(s64) = G_LSHR [[READ_VLENB]], [[C4]](s64)
+ ; RV32-NEXT: [[READ_VLENB1:%[0-9]+]]:_(s64) = G_READ_VLENB
+ ; RV32-NEXT: [[C5:%[0-9]+]]:_(s64) = G_CONSTANT i64 2
+ ; RV32-NEXT: [[LSHR1:%[0-9]+]]:_(s64) = G_LSHR [[READ_VLENB1]], [[C5]](s64)
+ ; RV32-NEXT: [[ADD:%[0-9]+]]:_(s64) = G_ADD [[LSHR1]], [[LSHR]]
+ ; RV32-NEXT: [[VSLIDEUP_VL:%[0-9]+]]:_(<vscale x 8 x s8>) = G_VSLIDEUP_VL [[SELECT]], [[SELECT1]], [[LSHR1]](s64), [[VMSET_VL]](<vscale x 8 x s1>), [[ADD]](s64), 0
+ ; RV32-NEXT: [[BITCAST:%[0-9]+]]:_(<vscale x 8 x s8>) = G_BITCAST [[VSLIDEUP_VL]](<vscale x 8 x s8>)
+ ; RV32-NEXT: [[C6:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; RV32-NEXT: [[ANYEXT4:%[0-9]+]]:_(s64) = G_ANYEXT [[C6]](s32)
+ ; RV32-NEXT: [[SPLAT_VECTOR4:%[0-9]+]]:_(<vscale x 8 x s8>) = G_SPLAT_VECTOR [[ANYEXT4]](s64)
+ ; RV32-NEXT: [[ICMP:%[0-9]+]]:_(<vscale x 8 x s1>) = G_ICMP intpred(ne), [[BITCAST]](<vscale x 8 x s8>), [[SPLAT_VECTOR4]]
+ ; RV32-NEXT: $v8 = COPY [[ICMP]](<vscale x 8 x s1>)
+ ; RV32-NEXT: PseudoRET implicit $v8
+ ;
+ ; RV64-LABEL: name: insert_subvector_nxv4i1_nxv8i1
+ ; RV64: [[DEF:%[0-9]+]]:_(<vscale x 8 x s1>) = G_IMPLICIT_DEF
+ ; RV64-NEXT: [[DEF1:%[0-9]+]]:_(<vscale x 2 x s1>) = G_IMPLICIT_DEF
+ ; RV64-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; RV64-NEXT: [[SPLAT_VECTOR:%[0-9]+]]:_(<vscale x 8 x s8>) = G_SPLAT_VECTOR [[C]](s32)
+ ; RV64-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+ ; RV64-NEXT: [[SPLAT_VECTOR1:%[0-9]+]]:_(<vscale x 8 x s8>) = G_SPLAT_VECTOR [[C1]](s32)
+ ; RV64-NEXT: [[SELECT:%[0-9]+]]:_(<vscale x 8 x s8>) = G_SELECT [[DEF]](<vscale x 8 x s1>), [[SPLAT_VECTOR1]], [[SPLAT_VECTOR]]
+ ; RV64-NEXT: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; RV64-NEXT: [[SPLAT_VECTOR2:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SPLAT_VECTOR [[C2]](s32)
+ ; RV64-NEXT: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
+ ; RV64-NEXT: [[SPLAT_VECTOR3:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SPLAT_VECTOR [[C3]](s32)
+ ; RV64-NEXT: [[SELECT1:%[0-9]+]]:_(<vscale x 2 x s8>) = G_SELECT [[DEF1]](<vscale x 2 x s1>), [[SPLAT_VECTOR3]], [[SPLAT_VECTOR2]]
+ ; RV64-NEXT: [[VMSET_VL:%[0-9]+]]:_(<vscale x 8 x s1>) = G_VMSET_VL $x0
+ ; RV64-NEXT: [[READ_VLENB:%[0-9]+]]:_(s32) = G_READ_VLENB
+ ; RV64-NEXT: [[C4:%[0-9]+]]:_(s32) = G_CONSTANT i32 2
+ ; RV64-NEXT: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[READ_...
[truncated]
|
How does one create a G_INSERT_SUBVECTOR today? There are no callers of |
It will be needed by G_INSERT_VECTOR_ELT. This is a precursor to that patch. |
llvm/test/CodeGen/RISCV/GlobalISel/legalizer/rvv/legalize-insert-subvector.mir
Outdated
Show resolved
Hide resolved
llvm/test/CodeGen/RISCV/GlobalISel/legalizer/rvv/legalize-insert-subvector.mir
Show resolved
Hide resolved
IRTranslator support for G_INSERT_SUBVECTOR is missing. |
d6902ef
to
19bf34d
Compare
I'm not sure I believe this belongs in the legalizer. This feels like custom instruction selection. I think its similar to the reasons AArch64 created AArch64PostLegalizerLowering. Perhaps we should be doing the same? |
5d48e79
to
e2b473d
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
What is the difference between hosting this code at:
|
Emitting a bunch of target specific opcodes from the legalizer makes it very hard to do any optimizations that may be exposed by legalizing the types. Doing it in instruction selection means we can't reuse tablegen patterns. There are up to 34 different variants of instructions considering element width and vector width. Lowering allows us to select target specific generic nodes after after post legalizer optimizations and still use tablegen patterns. |
llvm/test/CodeGen/RISCV/GlobalISel/postlegalizer-lowering/rvv/insert-subvector.mir
Outdated
Show resolved
Hide resolved
; RV64-NEXT: [[VSCALE1:%[0-9]+]]:_(s64) = G_VSCALE i64 1 | ||
; RV64-NEXT: [[ADD:%[0-9]+]]:_(s64) = G_ADD [[VSCALE1]], [[VSCALE]] | ||
; RV64-NEXT: [[VSLIDEUP_VL:%[0-9]+]]:_(<vscale x 4 x s16>) = G_VSLIDEUP_VL [[DEF]], [[DEF]], [[VSCALE1]](s64), [[VMSET_VL]](<vscale x 4 x s1>), [[ADD]](s64), 0 | ||
; RV64-NEXT: [[BITCAST:%[0-9]+]]:_(<vscale x 4 x s16>) = G_BITCAST [[VSLIDEUP_VL]](<vscale x 4 x s16>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bitcast is unnecessary
llvm/test/CodeGen/RISCV/GlobalISel/postlegalizer-lowering/rvv/insert-subvector.mir
Outdated
Show resolved
Hide resolved
c1e5198
to
415eda9
Compare
I stated before: the odd thing about the lower passes: they are free-standing functions. There is no CombinerHelper and no access to LegalizerInfo.
The lowering combines cannot really fail. How about:
|
I don't understand the lowering passes. They seem like they're just a hack and I don't understand why they are needed. |
See discussion in #108991. |
13ac1af
to
c2ebba9
Compare
c2ebba9
to
4616b37
Compare
Thanks @tschuett, I forgot to commit the changes you mention. And thanks also for updating the title. I have to post my patch for legalize extract subvector, because this patch depends on that one. |
9168d88
to
5d0a49d
Compare
I've updated this patch based on the discussions we had during the legalization of EXTRACT_SUBVECTOR. Now ready for review :) |
LitTy = LitTy.changeElementType(LLT::scalar(8)); | ||
auto BigZExt = MIB.buildZExt(BigTy, BigVec); | ||
auto LitZExt = MIB.buildZExt(LitTy, LitVec); | ||
auto Insert = MIB.buildInsertSubvector(BigTy, BigZExt, LitZExt, Idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could make this widenscalar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we're widening a vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Widening a vector" would mean changing the element count. Widening scalar means changing the element size even if its part of a vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// The operation should be implemented in terms of a wider scalar
/// base-type. For example a <2 x s8> add could be implemented as a <2
/// x s32> add (ignoring the high bits).
WidenScalar,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the clarification.
if (LitTy.getElementType() == LLT::scalar(1)) { | ||
auto BigTyMinElts = BigTy.getElementCount().getKnownMinValue(); | ||
auto LitTyMinElts = LitTy.getElementCount().getKnownMinValue(); | ||
if (BigTyMinElts >= 8 && LitTyMinElts >= 8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do this in the legalization rule instead of in the custom logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure?
Above is this code is
if (Idx == 0 ||
MRI.getVRegDef(BigVec)->getOpcode() == TargetOpcode::G_IMPLICIT_DEF)
This means that the legalization rule needs to check the immediate value to make sure its not zero, which I couldn't figure out how to do, and we need to check to see if the BigVec is not a G_IMPLICIT_DEF, which I also don't know how to do as part of the legalization rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if you require that you can't do it in the rule. But do you really need this to be in the rule? I would expect that folding out the undef part to only require an optimizing combine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can do the undef check in a combine, but I still don’t think it’s possible to check that the index is not zero is it? I don't think its possible to have a legalIf that checks when the idx is zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we promote the INSERT_SUBVECTOR to do the combine later, I am concerned that we will be left with an INSERT_SUBVECTOR that has a larger type than needed.
if (TypeSize::isKnownGT(BigTy.getSizeInBits(), InterLitTy.getSizeInBits())) | ||
MIB.buildInsertSubvector(Dst, BigVec, LitVec, AlignedIdx); | ||
else | ||
Inserted->getOperand(0).setReg(Dst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DIdn't call the observer before modifying instruction in place. Can you build the instruction above with the correct register?
This code is heavily based on the SelectionDAG lowerINSERT_SUBVECTOR code.
4c47dc0
to
0c558af
Compare
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/7470 Here is the relevant piece of the build log for the reference
|
This code is heavily based on the SelectionDAG lowerINSERT_SUBVECTOR code.