-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[DAG] Fold fdiv X, c2 -> fmul X, 1/c2 without AllowReciprocal if exact #93882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-adt @llvm/pr-subscribers-backend-amdgpu Author: David Green (davemgreen) ChangesThis moves the combine of fdiv by constant to fmul out of an 'if (Options.UnsafeFPMath || Flags.hasAllowReciprocal()' block, so that it still triggers if the divide is exact. An extra check for Recip.isDenormal() is added as multiple places make reference to it being unsafe or slow on certain platforms. Full diff: https://github.com/llvm/llvm-project/pull/93882.diff 6 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 42e861e61201c..f32c7da43b05d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -17258,26 +17258,29 @@ SDValue DAGCombiner::visitFDIV(SDNode *N) {
if (SDValue V = combineRepeatedFPDivisors(N))
return V;
- if (Options.UnsafeFPMath || Flags.hasAllowReciprocal()) {
- // fold (fdiv X, c2) -> fmul X, 1/c2 if losing precision is acceptable.
- if (auto *N1CFP = dyn_cast<ConstantFPSDNode>(N1)) {
- // Compute the reciprocal 1.0 / c2.
- const APFloat &N1APF = N1CFP->getValueAPF();
- APFloat Recip(N1APF.getSemantics(), 1); // 1.0
- APFloat::opStatus st = Recip.divide(N1APF, APFloat::rmNearestTiesToEven);
- // Only do the transform if the reciprocal is a legal fp immediate that
- // isn't too nasty (eg NaN, denormal, ...).
- if ((st == APFloat::opOK || st == APFloat::opInexact) && // Not too nasty
- (!LegalOperations ||
- // FIXME: custom lowering of ConstantFP might fail (see e.g. ARM
- // backend)... we should handle this gracefully after Legalize.
- // TLI.isOperationLegalOrCustom(ISD::ConstantFP, VT) ||
- TLI.isOperationLegal(ISD::ConstantFP, VT) ||
- TLI.isFPImmLegal(Recip, VT, ForCodeSize)))
- return DAG.getNode(ISD::FMUL, DL, VT, N0,
- DAG.getConstantFP(Recip, DL, VT));
- }
+ // fold (fdiv X, c2) -> (fmul X, 1/c2) if there is no loss in precision, or
+ // the loss is acceptable with AllowReciprocal.
+ if (auto *N1CFP = dyn_cast<ConstantFPSDNode>(N1)) {
+ // Compute the reciprocal 1.0 / c2.
+ const APFloat &N1APF = N1CFP->getValueAPF();
+ APFloat Recip(N1APF.getSemantics(), 1); // 1.0
+ APFloat::opStatus st = Recip.divide(N1APF, APFloat::rmNearestTiesToEven);
+ // Only do the transform if the reciprocal is a legal fp immediate that
+ // isn't too nasty (eg NaN, denormal, ...).
+ if (((st == APFloat::opOK && !Recip.isDenormal()) ||
+ (st == APFloat::opInexact &&
+ (Options.UnsafeFPMath || Flags.hasAllowReciprocal()))) &&
+ (!LegalOperations ||
+ // FIXME: custom lowering of ConstantFP might fail (see e.g. ARM
+ // backend)... we should handle this gracefully after Legalize.
+ // TLI.isOperationLegalOrCustom(ISD::ConstantFP, VT) ||
+ TLI.isOperationLegal(ISD::ConstantFP, VT) ||
+ TLI.isFPImmLegal(Recip, VT, ForCodeSize)))
+ return DAG.getNode(ISD::FMUL, DL, VT, N0,
+ DAG.getConstantFP(Recip, DL, VT));
+ }
+ if (Options.UnsafeFPMath || Flags.hasAllowReciprocal()) {
// If this FDIV is part of a reciprocal square root, it may be folded
// into a target-specific square root estimate instruction.
if (N1.getOpcode() == ISD::FSQRT) {
diff --git a/llvm/test/CodeGen/AArch64/fcvt-fixed.ll b/llvm/test/CodeGen/AArch64/fcvt-fixed.ll
index 296be831da762..7056a4d28fed3 100644
--- a/llvm/test/CodeGen/AArch64/fcvt-fixed.ll
+++ b/llvm/test/CodeGen/AArch64/fcvt-fixed.ll
@@ -412,10 +412,10 @@ define half @scvtf_f16_i32_7(i32 %int) {
; CHECK-NO16-LABEL: scvtf_f16_i32_7:
; CHECK-NO16: // %bb.0:
; CHECK-NO16-NEXT: scvtf s1, w0
-; CHECK-NO16-NEXT: movi v0.2s, #67, lsl #24
+; CHECK-NO16-NEXT: movi v0.2s, #60, lsl #24
; CHECK-NO16-NEXT: fcvt h1, s1
; CHECK-NO16-NEXT: fcvt s1, h1
-; CHECK-NO16-NEXT: fdiv s0, s1, s0
+; CHECK-NO16-NEXT: fmul s0, s1, s0
; CHECK-NO16-NEXT: fcvt h0, s0
; CHECK-NO16-NEXT: ret
;
@@ -432,10 +432,10 @@ define half @scvtf_f16_i32_15(i32 %int) {
; CHECK-NO16-LABEL: scvtf_f16_i32_15:
; CHECK-NO16: // %bb.0:
; CHECK-NO16-NEXT: scvtf s1, w0
-; CHECK-NO16-NEXT: movi v0.2s, #71, lsl #24
+; CHECK-NO16-NEXT: movi v0.2s, #56, lsl #24
; CHECK-NO16-NEXT: fcvt h1, s1
; CHECK-NO16-NEXT: fcvt s1, h1
-; CHECK-NO16-NEXT: fdiv s0, s1, s0
+; CHECK-NO16-NEXT: fmul s0, s1, s0
; CHECK-NO16-NEXT: fcvt h0, s0
; CHECK-NO16-NEXT: ret
;
@@ -452,10 +452,10 @@ define half @scvtf_f16_i64_7(i64 %long) {
; CHECK-NO16-LABEL: scvtf_f16_i64_7:
; CHECK-NO16: // %bb.0:
; CHECK-NO16-NEXT: scvtf s1, x0
-; CHECK-NO16-NEXT: movi v0.2s, #67, lsl #24
+; CHECK-NO16-NEXT: movi v0.2s, #60, lsl #24
; CHECK-NO16-NEXT: fcvt h1, s1
; CHECK-NO16-NEXT: fcvt s1, h1
-; CHECK-NO16-NEXT: fdiv s0, s1, s0
+; CHECK-NO16-NEXT: fmul s0, s1, s0
; CHECK-NO16-NEXT: fcvt h0, s0
; CHECK-NO16-NEXT: ret
;
@@ -472,10 +472,10 @@ define half @scvtf_f16_i64_15(i64 %long) {
; CHECK-NO16-LABEL: scvtf_f16_i64_15:
; CHECK-NO16: // %bb.0:
; CHECK-NO16-NEXT: scvtf s1, x0
-; CHECK-NO16-NEXT: movi v0.2s, #71, lsl #24
+; CHECK-NO16-NEXT: movi v0.2s, #56, lsl #24
; CHECK-NO16-NEXT: fcvt h1, s1
; CHECK-NO16-NEXT: fcvt s1, h1
-; CHECK-NO16-NEXT: fdiv s0, s1, s0
+; CHECK-NO16-NEXT: fmul s0, s1, s0
; CHECK-NO16-NEXT: fcvt h0, s0
; CHECK-NO16-NEXT: ret
;
@@ -574,10 +574,10 @@ define half @ucvtf_f16_i32_7(i32 %int) {
; CHECK-NO16-LABEL: ucvtf_f16_i32_7:
; CHECK-NO16: // %bb.0:
; CHECK-NO16-NEXT: ucvtf s1, w0
-; CHECK-NO16-NEXT: movi v0.2s, #67, lsl #24
+; CHECK-NO16-NEXT: movi v0.2s, #60, lsl #24
; CHECK-NO16-NEXT: fcvt h1, s1
; CHECK-NO16-NEXT: fcvt s1, h1
-; CHECK-NO16-NEXT: fdiv s0, s1, s0
+; CHECK-NO16-NEXT: fmul s0, s1, s0
; CHECK-NO16-NEXT: fcvt h0, s0
; CHECK-NO16-NEXT: ret
;
@@ -594,10 +594,10 @@ define half @ucvtf_f16_i32_15(i32 %int) {
; CHECK-NO16-LABEL: ucvtf_f16_i32_15:
; CHECK-NO16: // %bb.0:
; CHECK-NO16-NEXT: ucvtf s1, w0
-; CHECK-NO16-NEXT: movi v0.2s, #71, lsl #24
+; CHECK-NO16-NEXT: movi v0.2s, #56, lsl #24
; CHECK-NO16-NEXT: fcvt h1, s1
; CHECK-NO16-NEXT: fcvt s1, h1
-; CHECK-NO16-NEXT: fdiv s0, s1, s0
+; CHECK-NO16-NEXT: fmul s0, s1, s0
; CHECK-NO16-NEXT: fcvt h0, s0
; CHECK-NO16-NEXT: ret
;
@@ -614,10 +614,10 @@ define half @ucvtf_f16_i64_7(i64 %long) {
; CHECK-NO16-LABEL: ucvtf_f16_i64_7:
; CHECK-NO16: // %bb.0:
; CHECK-NO16-NEXT: ucvtf s1, x0
-; CHECK-NO16-NEXT: movi v0.2s, #67, lsl #24
+; CHECK-NO16-NEXT: movi v0.2s, #60, lsl #24
; CHECK-NO16-NEXT: fcvt h1, s1
; CHECK-NO16-NEXT: fcvt s1, h1
-; CHECK-NO16-NEXT: fdiv s0, s1, s0
+; CHECK-NO16-NEXT: fmul s0, s1, s0
; CHECK-NO16-NEXT: fcvt h0, s0
; CHECK-NO16-NEXT: ret
;
@@ -634,10 +634,10 @@ define half @ucvtf_f16_i64_15(i64 %long) {
; CHECK-NO16-LABEL: ucvtf_f16_i64_15:
; CHECK-NO16: // %bb.0:
; CHECK-NO16-NEXT: ucvtf s1, x0
-; CHECK-NO16-NEXT: movi v0.2s, #71, lsl #24
+; CHECK-NO16-NEXT: movi v0.2s, #56, lsl #24
; CHECK-NO16-NEXT: fcvt h1, s1
; CHECK-NO16-NEXT: fcvt s1, h1
-; CHECK-NO16-NEXT: fdiv s0, s1, s0
+; CHECK-NO16-NEXT: fmul s0, s1, s0
; CHECK-NO16-NEXT: fcvt h0, s0
; CHECK-NO16-NEXT: ret
;
diff --git a/llvm/test/CodeGen/AArch64/frem-power2.ll b/llvm/test/CodeGen/AArch64/frem-power2.ll
index 4192745abd347..6aa419a0f419a 100644
--- a/llvm/test/CodeGen/AArch64/frem-power2.ll
+++ b/llvm/test/CodeGen/AArch64/frem-power2.ll
@@ -5,11 +5,12 @@
define float @frem2(float %x) {
; CHECK-SD-LABEL: frem2:
; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: fmov s1, #2.00000000
+; CHECK-SD-NEXT: fmov s1, #0.50000000
; CHECK-SD-NEXT: // kill: def $s0 killed $s0 def $q0
-; CHECK-SD-NEXT: fdiv s2, s0, s1
-; CHECK-SD-NEXT: frintz s2, s2
-; CHECK-SD-NEXT: fmsub s1, s2, s1, s0
+; CHECK-SD-NEXT: fmov s2, #-2.00000000
+; CHECK-SD-NEXT: fmul s1, s0, s1
+; CHECK-SD-NEXT: frintz s1, s1
+; CHECK-SD-NEXT: fmadd s1, s1, s2, s0
; CHECK-SD-NEXT: mvni v2.4s, #128, lsl #24
; CHECK-SD-NEXT: bit v0.16b, v1.16b, v2.16b
; CHECK-SD-NEXT: // kill: def $s0 killed $s0 killed $q0
@@ -27,10 +28,11 @@ entry:
define float @frem2_nsz(float %x) {
; CHECK-SD-LABEL: frem2_nsz:
; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: fmov s1, #2.00000000
-; CHECK-SD-NEXT: fdiv s2, s0, s1
-; CHECK-SD-NEXT: frintz s2, s2
-; CHECK-SD-NEXT: fmsub s0, s2, s1, s0
+; CHECK-SD-NEXT: fmov s1, #0.50000000
+; CHECK-SD-NEXT: fmov s2, #-2.00000000
+; CHECK-SD-NEXT: fmul s1, s0, s1
+; CHECK-SD-NEXT: frintz s1, s1
+; CHECK-SD-NEXT: fmadd s0, s1, s2, s0
; CHECK-SD-NEXT: ret
;
; CHECK-GI-LABEL: frem2_nsz:
@@ -65,10 +67,11 @@ define float @frem2_abs(float %x) {
; CHECK-SD-LABEL: frem2_abs:
; CHECK-SD: // %bb.0: // %entry
; CHECK-SD-NEXT: fabs s0, s0
-; CHECK-SD-NEXT: fmov s1, #2.00000000
-; CHECK-SD-NEXT: fdiv s2, s0, s1
-; CHECK-SD-NEXT: frintz s2, s2
-; CHECK-SD-NEXT: fmsub s0, s2, s1, s0
+; CHECK-SD-NEXT: fmov s1, #0.50000000
+; CHECK-SD-NEXT: fmov s2, #-2.00000000
+; CHECK-SD-NEXT: fmul s1, s0, s1
+; CHECK-SD-NEXT: frintz s1, s1
+; CHECK-SD-NEXT: fmadd s0, s1, s2, s0
; CHECK-SD-NEXT: ret
;
; CHECK-GI-LABEL: frem2_abs:
@@ -85,9 +88,9 @@ entry:
define half @hrem2_nsz(half %x) {
; CHECK-SD-LABEL: hrem2_nsz:
; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: fmov h1, #2.00000000
+; CHECK-SD-NEXT: fmov h1, #0.50000000
; CHECK-SD-NEXT: fmov h2, #-2.00000000
-; CHECK-SD-NEXT: fdiv h1, h0, h1
+; CHECK-SD-NEXT: fmul h1, h0, h1
; CHECK-SD-NEXT: frintz h1, h1
; CHECK-SD-NEXT: fmadd h0, h1, h2, h0
; CHECK-SD-NEXT: ret
@@ -112,10 +115,11 @@ entry:
define double @drem2_nsz(double %x) {
; CHECK-SD-LABEL: drem2_nsz:
; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: fmov d1, #2.00000000
-; CHECK-SD-NEXT: fdiv d2, d0, d1
-; CHECK-SD-NEXT: frintz d2, d2
-; CHECK-SD-NEXT: fmsub d0, d2, d1, d0
+; CHECK-SD-NEXT: fmov d1, #0.50000000
+; CHECK-SD-NEXT: fmov d2, #-2.00000000
+; CHECK-SD-NEXT: fmul d1, d0, d1
+; CHECK-SD-NEXT: frintz d1, d1
+; CHECK-SD-NEXT: fmadd d0, d1, d2, d0
; CHECK-SD-NEXT: ret
;
; CHECK-GI-LABEL: drem2_nsz:
@@ -176,10 +180,11 @@ entry:
define float @fremm2_nsz(float %x) {
; CHECK-SD-LABEL: fremm2_nsz:
; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: fmov s1, #-2.00000000
-; CHECK-SD-NEXT: fdiv s2, s0, s1
-; CHECK-SD-NEXT: frintz s2, s2
-; CHECK-SD-NEXT: fmsub s0, s2, s1, s0
+; CHECK-SD-NEXT: fmov s1, #-0.50000000
+; CHECK-SD-NEXT: fmov s2, #2.00000000
+; CHECK-SD-NEXT: fmul s1, s0, s1
+; CHECK-SD-NEXT: frintz s1, s1
+; CHECK-SD-NEXT: fmadd s0, s1, s2, s0
; CHECK-SD-NEXT: ret
;
; CHECK-GI-LABEL: fremm2_nsz:
@@ -195,10 +200,11 @@ define float @frem4_abs(float %x) {
; CHECK-SD-LABEL: frem4_abs:
; CHECK-SD: // %bb.0: // %entry
; CHECK-SD-NEXT: fabs s0, s0
-; CHECK-SD-NEXT: fmov s1, #4.00000000
-; CHECK-SD-NEXT: fdiv s2, s0, s1
-; CHECK-SD-NEXT: frintz s2, s2
-; CHECK-SD-NEXT: fmsub s0, s2, s1, s0
+; CHECK-SD-NEXT: fmov s1, #0.25000000
+; CHECK-SD-NEXT: fmov s2, #-4.00000000
+; CHECK-SD-NEXT: fmul s1, s0, s1
+; CHECK-SD-NEXT: frintz s1, s1
+; CHECK-SD-NEXT: fmadd s0, s1, s2, s0
; CHECK-SD-NEXT: ret
;
; CHECK-GI-LABEL: frem4_abs:
@@ -216,10 +222,12 @@ define float @frem16_abs(float %x) {
; CHECK-SD-LABEL: frem16_abs:
; CHECK-SD: // %bb.0: // %entry
; CHECK-SD-NEXT: fabs s0, s0
-; CHECK-SD-NEXT: fmov s1, #16.00000000
-; CHECK-SD-NEXT: fdiv s2, s0, s1
-; CHECK-SD-NEXT: frintz s2, s2
-; CHECK-SD-NEXT: fmsub s0, s2, s1, s0
+; CHECK-SD-NEXT: mov w8, #1031798784 // =0x3d800000
+; CHECK-SD-NEXT: fmov s2, #-16.00000000
+; CHECK-SD-NEXT: fmov s1, w8
+; CHECK-SD-NEXT: fmul s1, s0, s1
+; CHECK-SD-NEXT: frintz s1, s1
+; CHECK-SD-NEXT: fmadd s0, s1, s2, s0
; CHECK-SD-NEXT: ret
;
; CHECK-GI-LABEL: frem16_abs:
@@ -237,11 +245,13 @@ define float @frem4294967296_abs(float %x) {
; CHECK-SD-LABEL: frem4294967296_abs:
; CHECK-SD: // %bb.0: // %entry
; CHECK-SD-NEXT: fabs s0, s0
-; CHECK-SD-NEXT: mov w8, #1333788672 // =0x4f800000
+; CHECK-SD-NEXT: mov w8, #796917760 // =0x2f800000
; CHECK-SD-NEXT: fmov s1, w8
-; CHECK-SD-NEXT: fdiv s2, s0, s1
-; CHECK-SD-NEXT: frintz s2, s2
-; CHECK-SD-NEXT: fmsub s0, s2, s1, s0
+; CHECK-SD-NEXT: mov w8, #-813694976 // =0xcf800000
+; CHECK-SD-NEXT: fmov s2, w8
+; CHECK-SD-NEXT: fmul s1, s0, s1
+; CHECK-SD-NEXT: frintz s1, s1
+; CHECK-SD-NEXT: fmadd s0, s1, s2, s0
; CHECK-SD-NEXT: ret
;
; CHECK-GI-LABEL: frem4294967296_abs:
@@ -260,11 +270,13 @@ define float @frem1152921504606846976_abs(float %x) {
; CHECK-SD-LABEL: frem1152921504606846976_abs:
; CHECK-SD: // %bb.0: // %entry
; CHECK-SD-NEXT: fabs s0, s0
-; CHECK-SD-NEXT: mov w8, #1568669696 // =0x5d800000
+; CHECK-SD-NEXT: mov w8, #562036736 // =0x21800000
; CHECK-SD-NEXT: fmov s1, w8
-; CHECK-SD-NEXT: fdiv s2, s0, s1
-; CHECK-SD-NEXT: frintz s2, s2
-; CHECK-SD-NEXT: fmsub s0, s2, s1, s0
+; CHECK-SD-NEXT: mov w8, #-578813952 // =0xdd800000
+; CHECK-SD-NEXT: fmov s2, w8
+; CHECK-SD-NEXT: fmul s1, s0, s1
+; CHECK-SD-NEXT: frintz s1, s1
+; CHECK-SD-NEXT: fmadd s0, s1, s2, s0
; CHECK-SD-NEXT: ret
;
; CHECK-GI-LABEL: frem1152921504606846976_abs:
@@ -283,11 +295,13 @@ define float @frem4611686018427387904_abs(float %x) {
; CHECK-SD-LABEL: frem4611686018427387904_abs:
; CHECK-SD: // %bb.0: // %entry
; CHECK-SD-NEXT: fabs s0, s0
-; CHECK-SD-NEXT: mov w8, #1585446912 // =0x5e800000
+; CHECK-SD-NEXT: mov w8, #545259520 // =0x20800000
; CHECK-SD-NEXT: fmov s1, w8
-; CHECK-SD-NEXT: fdiv s2, s0, s1
-; CHECK-SD-NEXT: frintz s2, s2
-; CHECK-SD-NEXT: fmsub s0, s2, s1, s0
+; CHECK-SD-NEXT: mov w8, #-562036736 // =0xde800000
+; CHECK-SD-NEXT: fmov s2, w8
+; CHECK-SD-NEXT: fmul s1, s0, s1
+; CHECK-SD-NEXT: frintz s1, s1
+; CHECK-SD-NEXT: fmadd s0, s1, s2, s0
; CHECK-SD-NEXT: ret
;
; CHECK-GI-LABEL: frem4611686018427387904_abs:
@@ -305,11 +319,12 @@ entry:
define float @frem9223372036854775808_abs(float %x) {
; CHECK-SD-LABEL: frem9223372036854775808_abs:
; CHECK-SD: // %bb.0: // %entry
-; CHECK-SD-NEXT: movi v1.2s, #95, lsl #24
+; CHECK-SD-NEXT: movi v1.2s, #32, lsl #24
; CHECK-SD-NEXT: fabs s0, s0
-; CHECK-SD-NEXT: fdiv s2, s0, s1
-; CHECK-SD-NEXT: frintz s2, s2
-; CHECK-SD-NEXT: fmsub s0, s2, s1, s0
+; CHECK-SD-NEXT: movi v2.2s, #223, lsl #24
+; CHECK-SD-NEXT: fmul s1, s0, s1
+; CHECK-SD-NEXT: frintz s1, s1
+; CHECK-SD-NEXT: fmadd s0, s1, s2, s0
; CHECK-SD-NEXT: ret
;
; CHECK-GI-LABEL: frem9223372036854775808_abs:
diff --git a/llvm/test/CodeGen/AMDGPU/uniform-phi-with-undef.ll b/llvm/test/CodeGen/AMDGPU/uniform-phi-with-undef.ll
index 5386ef425dcb5..64d4a0cf78501 100644
--- a/llvm/test/CodeGen/AMDGPU/uniform-phi-with-undef.ll
+++ b/llvm/test/CodeGen/AMDGPU/uniform-phi-with-undef.ll
@@ -17,7 +17,7 @@ define amdgpu_ps float @uniform_phi_with_undef(float inreg %c, float %v, i32 %x,
; GCN-NEXT: s_mov_b32 exec_lo, s2
; GCN-NEXT: s_cbranch_execz .LBB0_2
; GCN-NEXT: ; %bb.1: ; %if
-; GCN-NEXT: s_mov_b32 s2, 2.0
+; GCN-NEXT: s_mov_b32 s2, 0x40400000
; GCN-NEXT: v_div_scale_f32 v1, s3, s2, s2, v0
; GCN-NEXT: v_rcp_f32_e64 v2, v1
; GCN-NEXT: s_mov_b32 s3, 1.0
@@ -39,7 +39,7 @@ entry:
br i1 %cc, label %if, label %end
if:
- %v.if = fdiv float %v, 2.0
+ %v.if = fdiv float %v, 3.0
br label %end
end:
diff --git a/llvm/test/CodeGen/ARM/frem-power2.ll b/llvm/test/CodeGen/ARM/frem-power2.ll
index 71c2c09c0105c..63ecd9fec7883 100644
--- a/llvm/test/CodeGen/ARM/frem-power2.ll
+++ b/llvm/test/CodeGen/ARM/frem-power2.ll
@@ -14,26 +14,28 @@ define float @frem4(float %x) {
;
; CHECK-FP-LABEL: frem4:
; CHECK-FP: @ %bb.0: @ %entry
-; CHECK-FP-NEXT: vmov.f32 s0, #4.000000e+00
-; CHECK-FP-NEXT: vmov s2, r0
+; CHECK-FP-NEXT: vmov.f32 s0, #2.500000e-01
+; CHECK-FP-NEXT: vmov.f32 s2, #-4.000000e+00
+; CHECK-FP-NEXT: vmov s4, r0
; CHECK-FP-NEXT: lsrs r0, r0, #31
-; CHECK-FP-NEXT: vdiv.f32 s4, s2, s0
-; CHECK-FP-NEXT: vrintz.f32 s4, s4
-; CHECK-FP-NEXT: vfms.f32 s2, s4, s0
-; CHECK-FP-NEXT: vmov r1, s2
+; CHECK-FP-NEXT: vmul.f32 s0, s4, s0
+; CHECK-FP-NEXT: vrintz.f32 s0, s0
+; CHECK-FP-NEXT: vfma.f32 s4, s0, s2
+; CHECK-FP-NEXT: vmov r1, s4
; CHECK-FP-NEXT: bfi r1, r0, #31, #1
; CHECK-FP-NEXT: mov r0, r1
; CHECK-FP-NEXT: bx lr
;
; CHECK-M33-LABEL: frem4:
; CHECK-M33: @ %bb.0: @ %entry
-; CHECK-M33-NEXT: vmov.f32 s0, #4.000000e+00
-; CHECK-M33-NEXT: vmov s2, r0
+; CHECK-M33-NEXT: vmov.f32 s0, #2.500000e-01
+; CHECK-M33-NEXT: vmov.f32 s2, #-4.000000e+00
+; CHECK-M33-NEXT: vmov s4, r0
; CHECK-M33-NEXT: lsrs r0, r0, #31
-; CHECK-M33-NEXT: vdiv.f32 s4, s2, s0
-; CHECK-M33-NEXT: vrintz.f32 s4, s4
-; CHECK-M33-NEXT: vmls.f32 s2, s4, s0
-; CHECK-M33-NEXT: vmov r1, s2
+; CHECK-M33-NEXT: vmul.f32 s0, s4, s0
+; CHECK-M33-NEXT: vrintz.f32 s0, s0
+; CHECK-M33-NEXT: vmla.f32 s4, s0, s2
+; CHECK-M33-NEXT: vmov r1, s4
; CHECK-M33-NEXT: bfi r1, r0, #31, #1
; CHECK-M33-NEXT: mov r0, r1
; CHECK-M33-NEXT: bx lr
@@ -53,22 +55,24 @@ define float @frem4_nsz(float %x) {
;
; CHECK-FP-LABEL: frem4_nsz:
; CHECK-FP: @ %bb.0: @ %entry
-; CHECK-FP-NEXT: vmov.f32 s0, #4.000000e+00
-; CHECK-FP-NEXT: vmov s2, r0
-; CHECK-FP-NEXT: vdiv.f32 s4, s2, s0
-; CHECK-FP-NEXT: vrintz.f32 s4, s4
-; CHECK-FP-NEXT: vfms.f32 s2, s4, s0
-; CHECK-FP-NEXT: vmov r0, s2
+; CHECK-FP-NEXT: vmov.f32 s0, #2.500000e-01
+; CHECK-FP-NEXT: vmov.f32 s2, #-4.000000e+00
+; CHECK-FP-NEXT: vmov s4, r0
+; CHECK-FP-NEXT: vmul.f32 s0, s4, s0
+; CHECK-FP-NEXT: vrintz.f32 s0, s0
+; CHECK-FP-NEXT: vfma.f32 s4, s0, s2
+; CHECK-FP-NEXT: vmov r0, s4
; CHECK-FP-NEXT: bx lr
;
; CHECK-M33-LABEL: frem4_nsz:
; CHECK-M33: @ %bb.0: @ %entry
-; CHECK-M33-NEXT: vmov.f32 s0, #4.000000e+00
-; CHECK-M33-NEXT: vmov s2, r0
-; CHECK-M33-NEXT: vdiv.f32 s4, s2, s0
-; CHECK-M33-NEXT: vrintz.f32 s4, s4
-; CHECK-M33-NEXT: vmls.f32 s2, s4, s0
-; CHECK-M33-NEXT: vmov r0, s2
+; CHECK-M33-NEXT: vmov.f32 s0, #2.500000e-01
+; CHECK-M33-NEXT: vmov.f32 s2, #-4.000000e+00
+; CHECK-M33-NEXT: vmov s4, r0
+; CHECK-M33-NEXT: vmul.f32 s0, s4, s0
+; CHECK-M33-NEXT: vrintz.f32 s0, s0
+; CHECK-M33-NEXT: vmla.f32 s4, s0, s2
+; CHECK-M33-NEXT: vmov r0, s4
; CHECK-M33-NEXT: bx lr
entry:
%fmod = frem nsz float %x, 4.0
diff --git a/llvm/test/CodeGen/X86/change-unsafe-fp-math.ll b/llvm/test/CodeGen/X86/change-unsafe-fp-math.ll
index 33a7ec9bfc794..ba09ba8b6402b 100644
--- a/llvm/test/CodeGen/X86/change-unsafe-fp-math.ll
+++ b/llvm/test/CodeGen/X86/change-unsafe-fp-math.ll
@@ -14,7 +14,7 @@
define double @unsafe_fp_math_default0(double %x) {
; SAFE: divsd
; UNSAFE: mulsd
- %div = fdiv double %x, 2.0
+ %div = fdiv double %x, 3.0
ret double %div
}
@@ -22,7 +22,7 @@ define double @unsafe_fp_math_default0(double %x) {
define double @unsafe_fp_math_off(double %x) #0 {
; SAFE: divsd
; UNSAFE: divsd
- %div = fdiv double %x, 2.0
+ %div = fdiv double %x, 3.0
ret double %div
}
@@ -31,7 +31,7 @@ define double @unsafe_fp_math_default1(double %x) {
; With unsafe math enabled, can change this div to a mul.
; SAFE: divsd
; UNSAFE: mulsd
- %div = fdiv double %x, 2.0
+ %div = fdiv double %x, 3.0
ret double %div
}
@@ -39,7 +39,7 @@ define double @unsafe_fp_math_default1(double %x) {
define double @unsafe_fp_math_on(double %x) #1 {
; SAFE: mulsd
; UNSAFE: mulsd
- %div = fdiv double %x, 2.0
+ %div = fdiv double %x, 3.0
ret double %div
}
@@ -48,7 +48,7 @@ define double @unsafe_fp_math_default2(double %x) {
; With unsafe math enabled, can change this div to a mul.
; SAFE: divsd
; UNSAFE: mulsd
- %div = fdiv double %x, 2.0
+ %div = fdiv double %x, 3.0
ret double %div
}
|
APFloat::opStatus st = Recip.divide(N1APF, APFloat::rmNearestTiesToEven); | ||
// Only do the transform if the reciprocal is a legal fp immediate that | ||
// isn't too nasty (eg NaN, denormal, ...). | ||
if (((st == APFloat::opOK && !Recip.isDenormal()) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs test for this !denormal part at least
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. In practice I think this only affects dividing by the max power of 2, e.g. for single precision dividing by 2^127 would be the same as multiplying by 2^-127, but 2^-127 is denormal.
I think you've also changed the behaviour of the arcp
case here. Previously we would optimize fdiv arcp float %x, 2^127
but with this patch we will not. That might be OK but it should be a deliberate change and it needs a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are now specific tests in the fdiv-const.ll test file, including some vectors.
We could change it to fold denormals in more cases, but it would require some additional changes for the fixed-point lowering we have under Arm/AArch64. I don't believe that multiplying by a denormal is ever a problem under Arm/AArch64 but I'm not sure about other architectures.
Instcombine will convert fdiv by a power-2 to fmul, this converts the PerformVDIVCombine that converts fdiv+fcvt to fixed-point fcvt to fmul+fcvt. The fdiv tests will look worse, but won't appear in practice (and should be improved again by #93882).
This moves the combine of fdiv by constant to fmul out of an 'if (Options.UnsafeFPMath || Flags.hasAllowReciprocal()' block, so that it still triggers if the divide is exact. An extra check for Recip.isDenormal() is added as multiple places make reference to it being unsafe or slow on certain platforms.
347bc63
to
cf57613
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Letting the denormal through is probably fine
return DAG.getNode(ISD::FMUL, DL, VT, N0, | ||
DAG.getConstantFP(Recip, DL, VT)); | ||
} | ||
// fold (fdiv X, c2) -> (fmul X, 1/c2) if there is no loss in precision, or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't do this in instcombine already, we probably should also do this there
APFloat Val(Sem, Negative ? -1 : 1); | ||
return Val; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
APFloat Val(Sem, Negative ? -1 : 1); | |
return Val; | |
return APFloat(Sem, Negative ? -1.0 : 1.0); |
?
@@ -1367,12 +1367,11 @@ define void @bcast_unfold_fdiv_v16f32(ptr nocapture %arg) { | |||
; CHECK-LABEL: bcast_unfold_fdiv_v16f32: | |||
; CHECK: # %bb.0: # %bb | |||
; CHECK-NEXT: movq $-4096, %rax # imm = 0xF000 | |||
; CHECK-NEXT: vbroadcastss {{.*#+}} zmm0 = [2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2.000000e+00 splat value needs to be changed to something that can't be easily inverted - the test needs to show the second operand can be hoisted out of the loop, which fails for a commutable op like fmul.
#93882) This moves the combine of fdiv by constant to fmul out of an 'if (Options.UnsafeFPMath || Flags.hasAllowReciprocal()' block, so that it triggers if the divide is exact. An extra check for Recip.isDenormal() is added as multiple places make reference to it being unsafe or slow on certain platforms.
This was in a284bdb |
This moves the combine of fdiv by constant to fmul out of an 'if (Options.UnsafeFPMath || Flags.hasAllowReciprocal()' block, so that it still triggers if the divide is exact. An extra check for Recip.isDenormal() is added as multiple places make reference to it being unsafe or slow on certain platforms.