-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[LegalizeVectorTypes] When widening don't check for libcalls if promoted #111297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When widening some FP ops, LegalizeVectorTypes will check to see if the widened op may be scalarized and then turned into a bunch of libcalls, and if so unroll early to avoid unecessary libcalls of the padded undef elements. It checks if the widened op is legal or custom to see if it will be scalarized, but promoted ops will also avoid scalarization. This relaxes the check to account for this which fixes some illegal vector types on RISC-V from being scalarized when they could be widened.
@llvm/pr-subscribers-llvm-selectiondag Author: Luke Lau (lukel97) ChangesWhen widening some FP ops, LegalizeVectorTypes will check to see if the widened op may be scalarized and then turned into a bunch of libcalls, and if so unroll early to avoid unecessary libcalls of the padded undef elements. It checks if the widened op is legal or custom to see if it will be scalarized, but promoted ops will also avoid scalarization. This relaxes the check to account for this which fixes some illegal vector types on RISC-V from being scalarized when they could be widened. Patch is 78.13 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/111297.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index 0a22f06271984e..e7ae989fcc3494 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -4441,7 +4441,7 @@ void DAGTypeLegalizer::WidenVectorResult(SDNode *N, unsigned ResNo) {
// libcalls on the undef elements.
EVT VT = N->getValueType(0);
EVT WideVecVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
- if (!TLI.isOperationLegalOrCustom(N->getOpcode(), WideVecVT) &&
+ if (!TLI.isOperationLegalOrCustomOrPromote(N->getOpcode(), WideVecVT) &&
TLI.isOperationExpand(N->getOpcode(), VT.getScalarType())) {
Res = DAG.UnrollVectorOp(N, WideVecVT.getVectorNumElements());
return true;
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll
index ea7829f2d6c658..297afd9fc96f9d 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp.ll
@@ -1192,259 +1192,18 @@ define void @sqrt_v6f16(ptr %x) {
; ZVFH-NEXT: vse16.v v8, (a0)
; ZVFH-NEXT: ret
;
-; RV32-ZVFHMIN-LABEL: sqrt_v6f16:
-; RV32-ZVFHMIN: # %bb.0:
-; RV32-ZVFHMIN-NEXT: addi sp, sp, -48
-; RV32-ZVFHMIN-NEXT: .cfi_def_cfa_offset 48
-; RV32-ZVFHMIN-NEXT: sw ra, 44(sp) # 4-byte Folded Spill
-; RV32-ZVFHMIN-NEXT: sw s0, 40(sp) # 4-byte Folded Spill
-; RV32-ZVFHMIN-NEXT: sw s1, 36(sp) # 4-byte Folded Spill
-; RV32-ZVFHMIN-NEXT: fsd fs0, 24(sp) # 8-byte Folded Spill
-; RV32-ZVFHMIN-NEXT: .cfi_offset ra, -4
-; RV32-ZVFHMIN-NEXT: .cfi_offset s0, -8
-; RV32-ZVFHMIN-NEXT: .cfi_offset s1, -12
-; RV32-ZVFHMIN-NEXT: .cfi_offset fs0, -24
-; RV32-ZVFHMIN-NEXT: csrr a1, vlenb
-; RV32-ZVFHMIN-NEXT: slli a1, a1, 1
-; RV32-ZVFHMIN-NEXT: sub sp, sp, a1
-; RV32-ZVFHMIN-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x30, 0x22, 0x11, 0x02, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 48 + 2 * vlenb
-; RV32-ZVFHMIN-NEXT: mv s0, a0
-; RV32-ZVFHMIN-NEXT: vsetivli zero, 6, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT: vle16.v v8, (a0)
-; RV32-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT: add a0, sp, a0
-; RV32-ZVFHMIN-NEXT: addi a0, a0, 16
-; RV32-ZVFHMIN-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV32-ZVFHMIN-NEXT: vmv.x.s a0, v8
-; RV32-ZVFHMIN-NEXT: fmv.w.x fa0, a0
-; RV32-ZVFHMIN-NEXT: call __extendhfsf2
-; RV32-ZVFHMIN-NEXT: fmv.s fs0, fa0
-; RV32-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT: add a0, sp, a0
-; RV32-ZVFHMIN-NEXT: addi a0, a0, 16
-; RV32-ZVFHMIN-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT: vsetivli zero, 1, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT: vslidedown.vi v8, v8, 1
-; RV32-ZVFHMIN-NEXT: vmv.x.s a0, v8
-; RV32-ZVFHMIN-NEXT: fmv.w.x fa0, a0
-; RV32-ZVFHMIN-NEXT: call __extendhfsf2
-; RV32-ZVFHMIN-NEXT: fsqrt.s fa0, fa0
-; RV32-ZVFHMIN-NEXT: call __truncsfhf2
-; RV32-ZVFHMIN-NEXT: fsqrt.s fa5, fs0
-; RV32-ZVFHMIN-NEXT: fmv.x.w s1, fa0
-; RV32-ZVFHMIN-NEXT: fmv.s fa0, fa5
-; RV32-ZVFHMIN-NEXT: call __truncsfhf2
-; RV32-ZVFHMIN-NEXT: fmv.x.w a0, fa0
-; RV32-ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT: vmv.v.x v8, a0
-; RV32-ZVFHMIN-NEXT: vslide1down.vx v8, v8, s1
-; RV32-ZVFHMIN-NEXT: addi a0, sp, 16
-; RV32-ZVFHMIN-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV32-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT: add a0, sp, a0
-; RV32-ZVFHMIN-NEXT: addi a0, a0, 16
-; RV32-ZVFHMIN-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT: vslidedown.vi v8, v8, 2
-; RV32-ZVFHMIN-NEXT: vmv.x.s a0, v8
-; RV32-ZVFHMIN-NEXT: fmv.w.x fa0, a0
-; RV32-ZVFHMIN-NEXT: call __extendhfsf2
-; RV32-ZVFHMIN-NEXT: fsqrt.s fa0, fa0
-; RV32-ZVFHMIN-NEXT: call __truncsfhf2
-; RV32-ZVFHMIN-NEXT: fmv.x.w a0, fa0
-; RV32-ZVFHMIN-NEXT: addi a1, sp, 16
-; RV32-ZVFHMIN-NEXT: vl1r.v v8, (a1) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT: vslide1down.vx v8, v8, a0
-; RV32-ZVFHMIN-NEXT: addi a0, sp, 16
-; RV32-ZVFHMIN-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV32-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT: add a0, sp, a0
-; RV32-ZVFHMIN-NEXT: addi a0, a0, 16
-; RV32-ZVFHMIN-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT: vslidedown.vi v8, v8, 3
-; RV32-ZVFHMIN-NEXT: vmv.x.s a0, v8
-; RV32-ZVFHMIN-NEXT: fmv.w.x fa0, a0
-; RV32-ZVFHMIN-NEXT: call __extendhfsf2
-; RV32-ZVFHMIN-NEXT: fsqrt.s fa0, fa0
-; RV32-ZVFHMIN-NEXT: call __truncsfhf2
-; RV32-ZVFHMIN-NEXT: fmv.x.w a0, fa0
-; RV32-ZVFHMIN-NEXT: addi a1, sp, 16
-; RV32-ZVFHMIN-NEXT: vl1r.v v8, (a1) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT: vslide1down.vx v8, v8, a0
-; RV32-ZVFHMIN-NEXT: addi a0, sp, 16
-; RV32-ZVFHMIN-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV32-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT: add a0, sp, a0
-; RV32-ZVFHMIN-NEXT: addi a0, a0, 16
-; RV32-ZVFHMIN-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT: vslidedown.vi v8, v8, 4
-; RV32-ZVFHMIN-NEXT: vmv.x.s a0, v8
-; RV32-ZVFHMIN-NEXT: fmv.w.x fa0, a0
-; RV32-ZVFHMIN-NEXT: call __extendhfsf2
-; RV32-ZVFHMIN-NEXT: fsqrt.s fa0, fa0
-; RV32-ZVFHMIN-NEXT: call __truncsfhf2
-; RV32-ZVFHMIN-NEXT: fmv.x.w a0, fa0
-; RV32-ZVFHMIN-NEXT: addi a1, sp, 16
-; RV32-ZVFHMIN-NEXT: vl1r.v v8, (a1) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT: vslide1down.vx v8, v8, a0
-; RV32-ZVFHMIN-NEXT: addi a0, sp, 16
-; RV32-ZVFHMIN-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV32-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT: add a0, sp, a0
-; RV32-ZVFHMIN-NEXT: addi a0, a0, 16
-; RV32-ZVFHMIN-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT: vslidedown.vi v8, v8, 5
-; RV32-ZVFHMIN-NEXT: vmv.x.s a0, v8
-; RV32-ZVFHMIN-NEXT: fmv.w.x fa0, a0
-; RV32-ZVFHMIN-NEXT: call __extendhfsf2
-; RV32-ZVFHMIN-NEXT: fsqrt.s fa0, fa0
-; RV32-ZVFHMIN-NEXT: call __truncsfhf2
-; RV32-ZVFHMIN-NEXT: fmv.x.w a0, fa0
-; RV32-ZVFHMIN-NEXT: addi a1, sp, 16
-; RV32-ZVFHMIN-NEXT: vl1r.v v8, (a1) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT: vslide1down.vx v8, v8, a0
-; RV32-ZVFHMIN-NEXT: vsetivli zero, 6, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT: vslidedown.vi v8, v8, 2
-; RV32-ZVFHMIN-NEXT: vse16.v v8, (s0)
-; RV32-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT: slli a0, a0, 1
-; RV32-ZVFHMIN-NEXT: add sp, sp, a0
-; RV32-ZVFHMIN-NEXT: lw ra, 44(sp) # 4-byte Folded Reload
-; RV32-ZVFHMIN-NEXT: lw s0, 40(sp) # 4-byte Folded Reload
-; RV32-ZVFHMIN-NEXT: lw s1, 36(sp) # 4-byte Folded Reload
-; RV32-ZVFHMIN-NEXT: fld fs0, 24(sp) # 8-byte Folded Reload
-; RV32-ZVFHMIN-NEXT: addi sp, sp, 48
-; RV32-ZVFHMIN-NEXT: ret
-;
-; RV64-ZVFHMIN-LABEL: sqrt_v6f16:
-; RV64-ZVFHMIN: # %bb.0:
-; RV64-ZVFHMIN-NEXT: addi sp, sp, -48
-; RV64-ZVFHMIN-NEXT: .cfi_def_cfa_offset 48
-; RV64-ZVFHMIN-NEXT: sd ra, 40(sp) # 8-byte Folded Spill
-; RV64-ZVFHMIN-NEXT: sd s0, 32(sp) # 8-byte Folded Spill
-; RV64-ZVFHMIN-NEXT: sd s1, 24(sp) # 8-byte Folded Spill
-; RV64-ZVFHMIN-NEXT: fsd fs0, 16(sp) # 8-byte Folded Spill
-; RV64-ZVFHMIN-NEXT: .cfi_offset ra, -8
-; RV64-ZVFHMIN-NEXT: .cfi_offset s0, -16
-; RV64-ZVFHMIN-NEXT: .cfi_offset s1, -24
-; RV64-ZVFHMIN-NEXT: .cfi_offset fs0, -32
-; RV64-ZVFHMIN-NEXT: csrr a1, vlenb
-; RV64-ZVFHMIN-NEXT: slli a1, a1, 1
-; RV64-ZVFHMIN-NEXT: sub sp, sp, a1
-; RV64-ZVFHMIN-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x30, 0x22, 0x11, 0x02, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 48 + 2 * vlenb
-; RV64-ZVFHMIN-NEXT: mv s0, a0
-; RV64-ZVFHMIN-NEXT: vsetivli zero, 6, e16, m1, ta, ma
-; RV64-ZVFHMIN-NEXT: vle16.v v8, (a0)
-; RV64-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV64-ZVFHMIN-NEXT: add a0, sp, a0
-; RV64-ZVFHMIN-NEXT: addi a0, a0, 16
-; RV64-ZVFHMIN-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV64-ZVFHMIN-NEXT: vmv.x.s a0, v8
-; RV64-ZVFHMIN-NEXT: fmv.w.x fa0, a0
-; RV64-ZVFHMIN-NEXT: call __extendhfsf2
-; RV64-ZVFHMIN-NEXT: fmv.s fs0, fa0
-; RV64-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV64-ZVFHMIN-NEXT: add a0, sp, a0
-; RV64-ZVFHMIN-NEXT: addi a0, a0, 16
-; RV64-ZVFHMIN-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV64-ZVFHMIN-NEXT: vsetivli zero, 1, e16, m1, ta, ma
-; RV64-ZVFHMIN-NEXT: vslidedown.vi v8, v8, 1
-; RV64-ZVFHMIN-NEXT: vmv.x.s a0, v8
-; RV64-ZVFHMIN-NEXT: fmv.w.x fa0, a0
-; RV64-ZVFHMIN-NEXT: call __extendhfsf2
-; RV64-ZVFHMIN-NEXT: fsqrt.s fa0, fa0
-; RV64-ZVFHMIN-NEXT: call __truncsfhf2
-; RV64-ZVFHMIN-NEXT: fsqrt.s fa5, fs0
-; RV64-ZVFHMIN-NEXT: fmv.x.w s1, fa0
-; RV64-ZVFHMIN-NEXT: fmv.s fa0, fa5
-; RV64-ZVFHMIN-NEXT: call __truncsfhf2
-; RV64-ZVFHMIN-NEXT: fmv.x.w a0, fa0
-; RV64-ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
-; RV64-ZVFHMIN-NEXT: vmv.v.x v8, a0
-; RV64-ZVFHMIN-NEXT: vslide1down.vx v8, v8, s1
-; RV64-ZVFHMIN-NEXT: addi a0, sp, 16
-; RV64-ZVFHMIN-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV64-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV64-ZVFHMIN-NEXT: add a0, sp, a0
-; RV64-ZVFHMIN-NEXT: addi a0, a0, 16
-; RV64-ZVFHMIN-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV64-ZVFHMIN-NEXT: vslidedown.vi v8, v8, 2
-; RV64-ZVFHMIN-NEXT: vmv.x.s a0, v8
-; RV64-ZVFHMIN-NEXT: fmv.w.x fa0, a0
-; RV64-ZVFHMIN-NEXT: call __extendhfsf2
-; RV64-ZVFHMIN-NEXT: fsqrt.s fa0, fa0
-; RV64-ZVFHMIN-NEXT: call __truncsfhf2
-; RV64-ZVFHMIN-NEXT: fmv.x.w a0, fa0
-; RV64-ZVFHMIN-NEXT: addi a1, sp, 16
-; RV64-ZVFHMIN-NEXT: vl1r.v v8, (a1) # Unknown-size Folded Reload
-; RV64-ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
-; RV64-ZVFHMIN-NEXT: vslide1down.vx v8, v8, a0
-; RV64-ZVFHMIN-NEXT: addi a0, sp, 16
-; RV64-ZVFHMIN-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV64-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV64-ZVFHMIN-NEXT: add a0, sp, a0
-; RV64-ZVFHMIN-NEXT: addi a0, a0, 16
-; RV64-ZVFHMIN-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV64-ZVFHMIN-NEXT: vslidedown.vi v8, v8, 3
-; RV64-ZVFHMIN-NEXT: vmv.x.s a0, v8
-; RV64-ZVFHMIN-NEXT: fmv.w.x fa0, a0
-; RV64-ZVFHMIN-NEXT: call __extendhfsf2
-; RV64-ZVFHMIN-NEXT: fsqrt.s fa0, fa0
-; RV64-ZVFHMIN-NEXT: call __truncsfhf2
-; RV64-ZVFHMIN-NEXT: fmv.x.w a0, fa0
-; RV64-ZVFHMIN-NEXT: addi a1, sp, 16
-; RV64-ZVFHMIN-NEXT: vl1r.v v8, (a1) # Unknown-size Folded Reload
-; RV64-ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
-; RV64-ZVFHMIN-NEXT: vslide1down.vx v8, v8, a0
-; RV64-ZVFHMIN-NEXT: addi a0, sp, 16
-; RV64-ZVFHMIN-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV64-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV64-ZVFHMIN-NEXT: add a0, sp, a0
-; RV64-ZVFHMIN-NEXT: addi a0, a0, 16
-; RV64-ZVFHMIN-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV64-ZVFHMIN-NEXT: vslidedown.vi v8, v8, 4
-; RV64-ZVFHMIN-NEXT: vmv.x.s a0, v8
-; RV64-ZVFHMIN-NEXT: fmv.w.x fa0, a0
-; RV64-ZVFHMIN-NEXT: call __extendhfsf2
-; RV64-ZVFHMIN-NEXT: fsqrt.s fa0, fa0
-; RV64-ZVFHMIN-NEXT: call __truncsfhf2
-; RV64-ZVFHMIN-NEXT: fmv.x.w a0, fa0
-; RV64-ZVFHMIN-NEXT: addi a1, sp, 16
-; RV64-ZVFHMIN-NEXT: vl1r.v v8, (a1) # Unknown-size Folded Reload
-; RV64-ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
-; RV64-ZVFHMIN-NEXT: vslide1down.vx v8, v8, a0
-; RV64-ZVFHMIN-NEXT: addi a0, sp, 16
-; RV64-ZVFHMIN-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV64-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV64-ZVFHMIN-NEXT: add a0, sp, a0
-; RV64-ZVFHMIN-NEXT: addi a0, a0, 16
-; RV64-ZVFHMIN-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV64-ZVFHMIN-NEXT: vslidedown.vi v8, v8, 5
-; RV64-ZVFHMIN-NEXT: vmv.x.s a0, v8
-; RV64-ZVFHMIN-NEXT: fmv.w.x fa0, a0
-; RV64-ZVFHMIN-NEXT: call __extendhfsf2
-; RV64-ZVFHMIN-NEXT: fsqrt.s fa0, fa0
-; RV64-ZVFHMIN-NEXT: call __truncsfhf2
-; RV64-ZVFHMIN-NEXT: fmv.x.w a0, fa0
-; RV64-ZVFHMIN-NEXT: addi a1, sp, 16
-; RV64-ZVFHMIN-NEXT: vl1r.v v8, (a1) # Unknown-size Folded Reload
-; RV64-ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
-; RV64-ZVFHMIN-NEXT: vslide1down.vx v8, v8, a0
-; RV64-ZVFHMIN-NEXT: vsetivli zero, 6, e16, m1, ta, ma
-; RV64-ZVFHMIN-NEXT: vslidedown.vi v8, v8, 2
-; RV64-ZVFHMIN-NEXT: vse16.v v8, (s0)
-; RV64-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV64-ZVFHMIN-NEXT: slli a0, a0, 1
-; RV64-ZVFHMIN-NEXT: add sp, sp, a0
-; RV64-ZVFHMIN-NEXT: ld ra, 40(sp) # 8-byte Folded Reload
-; RV64-ZVFHMIN-NEXT: ld s0, 32(sp) # 8-byte Folded Reload
-; RV64-ZVFHMIN-NEXT: ld s1, 24(sp) # 8-byte Folded Reload
-; RV64-ZVFHMIN-NEXT: fld fs0, 16(sp) # 8-byte Folded Reload
-; RV64-ZVFHMIN-NEXT: addi sp, sp, 48
-; RV64-ZVFHMIN-NEXT: ret
+; ZVFHMIN-LABEL: sqrt_v6f16:
+; ZVFHMIN: # %bb.0:
+; ZVFHMIN-NEXT: vsetivli zero, 6, e16, m1, ta, ma
+; ZVFHMIN-NEXT: vle16.v v8, (a0)
+; ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
+; ZVFHMIN-NEXT: vfwcvt.f.f.v v10, v8
+; ZVFHMIN-NEXT: vsetvli zero, zero, e32, m2, ta, ma
+; ZVFHMIN-NEXT: vfsqrt.v v8, v10
+; ZVFHMIN-NEXT: vsetivli zero, 6, e16, m1, ta, ma
+; ZVFHMIN-NEXT: vfncvt.f.f.w v10, v8
+; ZVFHMIN-NEXT: vse16.v v10, (a0)
+; ZVFHMIN-NEXT: ret
%a = load <6 x half>, ptr %x
%b = call <6 x half> @llvm.sqrt.v6f16(<6 x half> %a)
store <6 x half> %b, ptr %x
@@ -3264,337 +3023,25 @@ define void @trunc_v6f16(ptr %x) {
; ZVFH-NEXT: vse16.v v8, (a0)
; ZVFH-NEXT: ret
;
-; RV32-ZVFHMIN-LABEL: trunc_v6f16:
-; RV32-ZVFHMIN: # %bb.0:
-; RV32-ZVFHMIN-NEXT: addi sp, sp, -48
-; RV32-ZVFHMIN-NEXT: .cfi_def_cfa_offset 48
-; RV32-ZVFHMIN-NEXT: sw ra, 44(sp) # 4-byte Folded Spill
-; RV32-ZVFHMIN-NEXT: sw s0, 40(sp) # 4-byte Folded Spill
-; RV32-ZVFHMIN-NEXT: sw s1, 36(sp) # 4-byte Folded Spill
-; RV32-ZVFHMIN-NEXT: fsd fs0, 24(sp) # 8-byte Folded Spill
-; RV32-ZVFHMIN-NEXT: .cfi_offset ra, -4
-; RV32-ZVFHMIN-NEXT: .cfi_offset s0, -8
-; RV32-ZVFHMIN-NEXT: .cfi_offset s1, -12
-; RV32-ZVFHMIN-NEXT: .cfi_offset fs0, -24
-; RV32-ZVFHMIN-NEXT: csrr a1, vlenb
-; RV32-ZVFHMIN-NEXT: slli a1, a1, 1
-; RV32-ZVFHMIN-NEXT: sub sp, sp, a1
-; RV32-ZVFHMIN-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x30, 0x22, 0x11, 0x02, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 48 + 2 * vlenb
-; RV32-ZVFHMIN-NEXT: mv s0, a0
-; RV32-ZVFHMIN-NEXT: vsetivli zero, 6, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT: vle16.v v8, (a0)
-; RV32-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT: add a0, sp, a0
-; RV32-ZVFHMIN-NEXT: addi a0, a0, 16
-; RV32-ZVFHMIN-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV32-ZVFHMIN-NEXT: vslidedown.vi v8, v8, 1
-; RV32-ZVFHMIN-NEXT: vmv.x.s a0, v8
-; RV32-ZVFHMIN-NEXT: fmv.w.x fa0, a0
-; RV32-ZVFHMIN-NEXT: call __extendhfsf2
-; RV32-ZVFHMIN-NEXT: lui a0, 307200
-; RV32-ZVFHMIN-NEXT: fmv.w.x fs0, a0
-; RV32-ZVFHMIN-NEXT: fabs.s fa5, fa0
-; RV32-ZVFHMIN-NEXT: flt.s a0, fa5, fs0
-; RV32-ZVFHMIN-NEXT: beqz a0, .LBB116_2
-; RV32-ZVFHMIN-NEXT: # %bb.1:
-; RV32-ZVFHMIN-NEXT: fcvt.w.s a0, fa0, rtz
-; RV32-ZVFHMIN-NEXT: fcvt.s.w fa5, a0, rtz
-; RV32-ZVFHMIN-NEXT: fsgnj.s fa0, fa5, fa0
-; RV32-ZVFHMIN-NEXT: .LBB116_2:
-; RV32-ZVFHMIN-NEXT: call __truncsfhf2
-; RV32-ZVFHMIN-NEXT: fmv.x.w s1, fa0
-; RV32-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT: add a0, sp, a0
-; RV32-ZVFHMIN-NEXT: lh a0, 16(a0) # 8-byte Folded Reload
-; RV32-ZVFHMIN-NEXT: fmv.w.x fa0, a0
-; RV32-ZVFHMIN-NEXT: call __extendhfsf2
-; RV32-ZVFHMIN-NEXT: fabs.s fa5, fa0
-; RV32-ZVFHMIN-NEXT: flt.s a0, fa5, fs0
-; RV32-ZVFHMIN-NEXT: beqz a0, .LBB116_4
-; RV32-ZVFHMIN-NEXT: # %bb.3:
-; RV32-ZVFHMIN-NEXT: fcvt.w.s a0, fa0, rtz
-; RV32-ZVFHMIN-NEXT: fcvt.s.w fa5, a0, rtz
-; RV32-ZVFHMIN-NEXT: fsgnj.s fa0, fa5, fa0
-; RV32-ZVFHMIN-NEXT: .LBB116_4:
-; RV32-ZVFHMIN-NEXT: call __truncsfhf2
-; RV32-ZVFHMIN-NEXT: fmv.x.w a0, fa0
-; RV32-ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT: vmv.v.x v8, a0
-; RV32-ZVFHMIN-NEXT: vslide1down.vx v8, v8, s1
-; RV32-ZVFHMIN-NEXT: addi a0, sp, 16
-; RV32-ZVFHMIN-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV32-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT: add a0, sp, a0
-; RV32-ZVFHMIN-NEXT: addi a0, a0, 16
-; RV32-ZVFHMIN-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT: vslidedown.vi v8, v8, 2
-; RV32-ZVFHMIN-NEXT: vmv.x.s a0, v8
-; RV32-ZVFHMIN-NEXT: fmv.w.x fa0, a0
-; RV32-ZVFHMIN-NEXT: call __extendhfsf2
-; RV32-ZVFHMIN-NEXT: fabs.s fa5, fa0
-; RV32-ZVFHMIN-NEXT: flt.s a0, fa5, fs0
-; RV32-ZVFHMIN-NEXT: beqz a0, .LBB116_6
-; RV32-ZVFHMIN-NEXT: # %bb.5:
-; RV32-ZVFHMIN-NEXT: fcvt.w.s a0, fa0, rtz
-; RV32-ZVFHMIN-NEXT: fcvt.s.w fa5, a0, rtz
-; RV32-ZVFHMIN-NEXT: fsgnj.s fa0, fa5, fa0
-; RV32-ZVFHMIN-NEXT: .LBB116_6:
-; RV32-ZVFHMIN-NEXT: call __truncsfhf2
-; RV32-ZVFHMIN-NEXT: fmv.x.w a0, fa0
-; RV32-ZVFHMIN-NEXT: addi a1, sp, 16
-; RV32-ZVFHMIN-NEXT: vl1r.v v8, (a1) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT: vslide1down.vx v8, v8, a0
-; RV32-ZVFHMIN-NEXT: addi a0, sp, 16
-; RV32-ZVFHMIN-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV32-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT: add a0, sp, a0
-; RV32-ZVFHMIN-NEXT: addi a0, a0, 16
-; RV32-ZVFHMIN-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT: vslidedown.vi v8, v8, 3
-; RV32-ZVFHMIN-NEXT: vmv.x.s a0, v8
-; RV32-ZVFHMIN-NEXT: fmv.w.x fa0, a0
-; RV32-ZVFHMIN-NEXT: call __extendhfsf2
-; RV32-ZVFHMIN-NEXT: fabs.s fa5, fa0
-; RV32-ZVFHMIN-NEXT: flt.s a0, fa5, fs0
-; RV32-ZVFHMIN-NEXT: beqz a0, .LBB116_8
-; RV32-ZVFHMIN-NEXT: # %bb.7:
-; RV32-ZVFHMIN-NEXT: fcvt.w.s a0, fa0, rtz
-; RV32-ZVFHMIN-NEXT: fcvt.s.w fa5, a0, rtz
-; RV32-ZVFHMIN-NEXT: fsgnj.s fa0, fa5, fa0
-; RV32-ZVFHMIN-NEXT: .LBB116_8:
-; RV32-ZVFHMIN-NEXT: call __truncsfhf2
-; RV32-ZVFHMIN-NEXT: fmv.x.w a0, fa0
-; RV32-ZVFHMIN-NEXT: addi a1, sp, 16
-; RV32-ZVFHMIN-NEXT: vl1r.v v8, (a1) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT: vslide1down.vx v8, v8, a0
-; RV32-ZVFHMIN-NEXT: addi a0, sp, 16
-; RV32-ZVFHMIN-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV32-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT: add a0, sp, a0
-; RV32-ZVFHMIN-NEXT: addi a0, a0, 16
-; RV32-ZVFHMIN-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT: vslidedown.vi v8, v8, 4
-; RV32-ZVFHMIN-NEXT: vmv.x.s a0, v8
-; RV32-ZVFHMIN-NEXT: fmv.w.x fa0, a0
-; RV32-ZVFHMIN-NEXT: call __extendhfsf2
-; RV32-ZVFHMIN-NEXT: fabs.s fa5, fa0
-; RV32-ZVFHMIN-NEXT: flt.s a0, fa5, fs0
-; RV32-ZVFHMIN-NEXT: beqz a0, .LBB116_10
-; RV32-ZVFHMIN-NEXT: # %bb.9:
-; RV32-ZVFHMIN-NEXT: fcvt.w.s a0, fa0, rtz
-; RV32-ZVFHMIN-NEXT: fcvt.s.w fa5, a0, rtz
-; RV32-ZVFHMIN-NEXT: fsgnj.s fa0, fa5, fa0
-; RV32-ZVFHMIN-NEXT: .LBB116_10:
-; RV32-ZVFHMIN-NEXT: call __truncsfhf2
-; RV32-ZVFHMIN-NEXT: fmv.x.w a0, fa0
-; RV32-ZVFHMIN-NEXT: addi a1, sp, 16
-; RV32-ZVFHMIN-NEXT: vl1r.v v8, (a1) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT: vsetivli zero, 8, e16, m1, ta, ma
-; RV32-ZVFHMIN-NEXT: vslide1down.vx v8, v8, a0
-; RV32-ZVFHMIN-NEXT: addi a0, sp, 16
-; RV32-ZVFHMIN-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
-; RV32-ZVFHMIN-NEXT: csrr a0, vlenb
-; RV32-ZVFHMIN-NEXT: add a0, sp, a0
-; RV32-ZVFHMIN-NEXT: addi a0, a0, 16
-; RV32-ZVFHMIN-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
-; RV32-ZVFHMIN-NEXT: vslidedown.vi v8, v8, 5
-; RV32-ZVFHMIN-N...
[truncated]
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/18/builds/5015 Here is the relevant piece of the build log for the reference
|
* commit 'FETCH_HEAD': [X86] getIntImmCostInst - pull out repeated Imm.getBitWidth() calls. NFC. [X86] Add test coverage for llvm#111323 [Driver] Use empty multilib file in another test (llvm#111352) [clang][OpenMP][test] Use x86_64-linux-gnu triple for test referencing avx512f feature (llvm#111337) [doc] Fix Kaleidoscope tutorial chapter 3 code snippet and full listing discrepancies (llvm#111289) [Flang][OpenMP] Improve entry block argument creation and binding (llvm#110267) [x86] combineMul - handle 0/-1 KnownBits cases before MUL_IMM logic (REAPPLIED) [llvm-dis] Fix non-deterministic disassembly across multiple inputs (llvm#110988) [lldb][test] TestDataFormatterLibcxxOptionalSimulator.py: change order of ifdefs [lldb][test] Add libcxx-simulators test for std::optional (llvm#111133) [x86] combineMul - use computeKnownBits directly to find MUL_IMM constant splat. (REAPPLIED) Reland "[lldb][test] TestDataFormatterLibcxxStringSimulator.py: add new padding layout" (llvm#111123) Revert "[x86] combineMul - use computeKnownBits directly to find MUL_IMM constant splat." update_test_checks: fix a simple regression (llvm#111347) [LegalizeVectorTypes] Always widen fabs (llvm#111298) [lsan] Make ReportUnsuspendedThreads return bool also for Fuchsia [mlir][vector] Add more tests for ConvertVectorToLLVM (6/n) (llvm#111121) [bazel] port 9144fed [SystemZ] Remove inlining threshold multiplier. (llvm#106058) [LegalizeVectorTypes] When widening don't check for libcalls if promoted (llvm#111297) [clang][Driver] Improve multilib custom error reporting (llvm#110804) [clang][Driver] Rename "FatalError" key to "Error" in multilib.yaml (llvm#110804) [LLVM][Maintainers] Update release managers (llvm#111164) [Clang][Driver] Add option to provide path for multilib's YAML config file (llvm#109640) [LoopVectorize] Remove redundant code in emitSCEVChecks (llvm#111132) [AMDGPU] Only emit SCOPE_SYS global_wb (llvm#110636) [ELF] Change Ctx::target to unique_ptr (llvm#111260) [ELF] Pass Ctx & to some free functions [RISCV] Only disassemble fcvtmod.w.d if the rounding mode is rtz. (llvm#111308) [Clang] Remove the special-casing for RequiresExprBodyDecl in BuildResolvedCallExpr() after fd87d76 (llvm#111277) [ELF] Pass Ctx & to InputFile [clang-format] Add AlignFunctionDeclarations to AlignConsecutiveDeclarations (llvm#108241) [AMDGPU] Support preloading hidden kernel arguments (llvm#98861) [ELF] Move static nextGroupId isInGroup to LinkerDriver [clangd] Add ArgumentLists config option under Completion (llvm#111322) [ELF] Pass Ctx & to SyntheticSections [ELF] Pass Ctx & to Symbols [ELF] Pass Ctx & to Symbols [ELF] getRelocTargetVA: pass Ctx and Relocation. NFC [clang-tidy] Avoid capturing a local variable in a static lambda in UseRangesCheck (llvm#111282) [VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (llvm#106431) [clangd] Simplify ternary expressions with std::optional::value_or (NFC) (llvm#111309) [libc++][format][2/3] Optimizes c-string arguments. (llvm#101805) [RISCV] Combine RVBUnary and RVKUnary into classes that are more similar to ALU(W)_r(r/i). NFC (llvm#111279) [ELF] Pass Ctx & to InputFiles [libc] GPU RPC interface: add return value to `rpc_host_call` (llvm#111288) Signed-off-by: kyvangka1610 <[email protected]>
When widening some FP ops, LegalizeVectorTypes will check to see if the widened op may be scalarized and then turned into a bunch of libcalls, and if so unroll early to avoid unnecessary libcalls of the padded undef elements.
It checks if the widened op is legal or custom to see if it will be scalarized, but promoted ops will also avoid scalarization.
This relaxes the check to account for this which fixes some illegal vector types on RISC-V from being scalarized when they could be widened.