[RISCV] Shrink vslidedown when lowering fixed extract_subvector

lukel97 · lukel97 · commit e3534d365e3f · 2023-09-11T17:18:18.000+01:00
As noted in llvm#65392 (comment), when lowering an extract of a fixed length vector from another vector, we don't need to perform the vslidedown on the full vector type. Instead we can extract the smallest subregister that contains the subvector to be extracted and perform the vslidedown with a smaller LMUL. E.g, with +Zvl128b: v2i64 = extract_subvector nxv4i64, 2 is currently lowered as vsetivli zero, 2, e64, m4, ta, ma vslidedown.vi v8, v8, 2 This patch shrinks the vslidedown to LMUL=2: vsetivli zero, 2, e64, m2, ta, ma vslidedown.vi v8, v8, 2 Because we know that there's at least 128*2=256 bits in v8 at LMUL=2, and we only need the first 256 bits to extract a v2i64 at index 2. lowerEXTRACT_VECTOR_ELT already has this logic, so this extracts it out and reuses it. I've split this out into a separate PR rather than include it in llvm#65392, with the hope that we'll be able to generalize it later.
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -8751,6 +8751,39 @@ SDValue RISCVTargetLowering::lowerEXTRACT_SUBVECTOR(SDValue Op,
       ContainerVT = getContainerForFixedLengthVector(VecVT);
       Vec = convertToScalableVector(ContainerVT, Vec, DAG, Subtarget);
     }
+
+    // The minimum number of elements for a scalable vector type, e.g. nxv1i32
+    // is not legal on Zve32x.
+    const uint64_t MinLegalNumElts =
+        RISCV::RVVBitsPerBlock / Subtarget.getELen();
+    const uint64_t MinVscale =
+        Subtarget.getRealMinVLen() / RISCV::RVVBitsPerBlock;
+
+    // Even if we don't know the exact subregister the subvector is going to
+    // reside in, we know that the subvector is located within the first N bits
+    // of Vec:
+    //
+    // N = (OrigIdx + SubVecVT.getVectorNumElements()) * EltSizeInBits
+    //   = MinVscale * MinEltsNeeded * EltSizeInBits
+    //
+    // From this we can work out the smallest type that contains everything we
+    // need to extract, <vscale x MinEltsNeeded x Elt>
+    uint64_t MinEltsNeeded =
+        (OrigIdx + SubVecVT.getVectorNumElements()) / MinVscale;
+
+    // Round up the number of elements so it's a valid power of 2 scalable
+    // vector type, and make sure it's not less than smallest legal vector type.
+    MinEltsNeeded = std::max(MinLegalNumElts, PowerOf2Ceil(MinEltsNeeded));
+
+    assert(MinEltsNeeded <= ContainerVT.getVectorMinNumElements());
+
+    // Shrink down Vec so we're performing the slidedown on the smallest
+    // possible type.
+    ContainerVT = MVT::getScalableVectorVT(ContainerVT.getVectorElementType(),
+                                           MinEltsNeeded);
+    Vec = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ContainerVT, Vec,
+                      DAG.getVectorIdxConstant(0, DL));
+
     SDValue Mask =
         getDefaultVLOps(VecVT, ContainerVT, DL, DAG, Subtarget).first;
     // Set the vector length to only the number of elements we care about. This