Skip to content

[DAG] Fold fdiv X, c2 -> fmul X, 1/c2 without AllowReciprocal if exact #93882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

davemgreen
Copy link
Collaborator

This moves the combine of fdiv by constant to fmul out of an 'if (Options.UnsafeFPMath || Flags.hasAllowReciprocal()' block, so that it still triggers if the divide is exact. An extra check for Recip.isDenormal() is added as multiple places make reference to it being unsafe or slow on certain platforms.

@llvmbot
Copy link
Member

llvmbot commented May 30, 2024

@llvm/pr-subscribers-llvm-adt
@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-llvm-selectiondag
@llvm/pr-subscribers-backend-arm
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-backend-amdgpu

Author: David Green (davemgreen)

Changes

This moves the combine of fdiv by constant to fmul out of an 'if (Options.UnsafeFPMath || Flags.hasAllowReciprocal()' block, so that it still triggers if the divide is exact. An extra check for Recip.isDenormal() is added as multiple places make reference to it being unsafe or slow on certain platforms.


Full diff: https://github.com/llvm/llvm-project/pull/93882.diff

6 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+22-19)
  • (modified) llvm/test/CodeGen/AArch64/fcvt-fixed.ll (+16-16)
  • (modified) llvm/test/CodeGen/AArch64/frem-power2.ll (+61-46)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-phi-with-undef.ll (+2-2)
  • (modified) llvm/test/CodeGen/ARM/frem-power2.ll (+28-24)
  • (modified) llvm/test/CodeGen/X86/change-unsafe-fp-math.ll (+5-5)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 42e861e61201c..f32c7da43b05d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -17258,26 +17258,29 @@ SDValue DAGCombiner::visitFDIV(SDNode *N) {
   if (SDValue V = combineRepeatedFPDivisors(N))
     return V;
 
-  if (Options.UnsafeFPMath || Flags.hasAllowReciprocal()) {
-    // fold (fdiv X, c2) -> fmul X, 1/c2 if losing precision is acceptable.
-    if (auto *N1CFP = dyn_cast<ConstantFPSDNode>(N1)) {
-      // Compute the reciprocal 1.0 / c2.
-      const APFloat &N1APF = N1CFP->getValueAPF();
-      APFloat Recip(N1APF.getSemantics(), 1); // 1.0
-      APFloat::opStatus st = Recip.divide(N1APF, APFloat::rmNearestTiesToEven);
-      // Only do the transform if the reciprocal is a legal fp immediate that
-      // isn't too nasty (eg NaN, denormal, ...).
-      if ((st == APFloat::opOK || st == APFloat::opInexact) && // Not too nasty
-          (!LegalOperations ||
-           // FIXME: custom lowering of ConstantFP might fail (see e.g. ARM
-           // backend)... we should handle this gracefully after Legalize.
-           // TLI.isOperationLegalOrCustom(ISD::ConstantFP, VT) ||
-           TLI.isOperationLegal(ISD::ConstantFP, VT) ||
-           TLI.isFPImmLegal(Recip, VT, ForCodeSize)))
-        return DAG.getNode(ISD::FMUL, DL, VT, N0,
-                           DAG.getConstantFP(Recip, DL, VT));
-    }
+  // fold (fdiv X, c2) -> (fmul X, 1/c2) if there is no loss in precision, or
+  // the loss is acceptable with AllowReciprocal.
+  if (auto *N1CFP = dyn_cast<ConstantFPSDNode>(N1)) {
+    // Compute the reciprocal 1.0 / c2.
+    const APFloat &N1APF = N1CFP->getValueAPF();
+    APFloat Recip(N1APF.getSemantics(), 1); // 1.0
+    APFloat::opStatus st = Recip.divide(N1APF, APFloat::rmNearestTiesToEven);
+    // Only do the transform if the reciprocal is a legal fp immediate that
+    // isn't too nasty (eg NaN, denormal, ...).
+    if (((st == APFloat::opOK && !Recip.isDenormal()) ||
+         (st == APFloat::opInexact &&
+          (Options.UnsafeFPMath || Flags.hasAllowReciprocal()))) &&
+        (!LegalOperations ||
+         // FIXME: custom lowering of ConstantFP might fail (see e.g. ARM
+         // backend)... we should handle this gracefully after Legalize.
+         // TLI.isOperationLegalOrCustom(ISD::ConstantFP, VT) ||
+         TLI.isOperationLegal(ISD::ConstantFP, VT) ||
+         TLI.isFPImmLegal(Recip, VT, ForCodeSize)))
+      return DAG.getNode(ISD::FMUL, DL, VT, N0,
+                         DAG.getConstantFP(Recip, DL, VT));
+  }
 
+  if (Options.UnsafeFPMath || Flags.hasAllowReciprocal()) {
     // If this FDIV is part of a reciprocal square root, it may be folded
     // into a target-specific square root estimate instruction.
     if (N1.getOpcode() == ISD::FSQRT) {
diff --git a/llvm/test/CodeGen/AArch64/fcvt-fixed.ll b/llvm/test/CodeGen/AArch64/fcvt-fixed.ll
index 296be831da762..7056a4d28fed3 100644
--- a/llvm/test/CodeGen/AArch64/fcvt-fixed.ll
+++ b/llvm/test/CodeGen/AArch64/fcvt-fixed.ll
@@ -412,10 +412,10 @@ define half @scvtf_f16_i32_7(i32 %int) {
 ; CHECK-NO16-LABEL: scvtf_f16_i32_7:
 ; CHECK-NO16:       // %bb.0:
 ; CHECK-NO16-NEXT:    scvtf s1, w0
-; CHECK-NO16-NEXT:    movi v0.2s, #67, lsl #24
+; CHECK-NO16-NEXT:    movi v0.2s, #60, lsl #24
 ; CHECK-NO16-NEXT:    fcvt h1, s1
 ; CHECK-NO16-NEXT:    fcvt s1, h1
-; CHECK-NO16-NEXT:    fdiv s0, s1, s0
+; CHECK-NO16-NEXT:    fmul s0, s1, s0
 ; CHECK-NO16-NEXT:    fcvt h0, s0
 ; CHECK-NO16-NEXT:    ret
 ;
@@ -432,10 +432,10 @@ define half @scvtf_f16_i32_15(i32 %int) {
 ; CHECK-NO16-LABEL: scvtf_f16_i32_15:
 ; CHECK-NO16:       // %bb.0:
 ; CHECK-NO16-NEXT:    scvtf s1, w0
-; CHECK-NO16-NEXT:    movi v0.2s, #71, lsl #24
+; CHECK-NO16-NEXT:    movi v0.2s, #56, lsl #24
 ; CHECK-NO16-NEXT:    fcvt h1, s1
 ; CHECK-NO16-NEXT:    fcvt s1, h1
-; CHECK-NO16-NEXT:    fdiv s0, s1, s0
+; CHECK-NO16-NEXT:    fmul s0, s1, s0
 ; CHECK-NO16-NEXT:    fcvt h0, s0
 ; CHECK-NO16-NEXT:    ret
 ;
@@ -452,10 +452,10 @@ define half @scvtf_f16_i64_7(i64 %long) {
 ; CHECK-NO16-LABEL: scvtf_f16_i64_7:
 ; CHECK-NO16:       // %bb.0:
 ; CHECK-NO16-NEXT:    scvtf s1, x0
-; CHECK-NO16-NEXT:    movi v0.2s, #67, lsl #24
+; CHECK-NO16-NEXT:    movi v0.2s, #60, lsl #24
 ; CHECK-NO16-NEXT:    fcvt h1, s1
 ; CHECK-NO16-NEXT:    fcvt s1, h1
-; CHECK-NO16-NEXT:    fdiv s0, s1, s0
+; CHECK-NO16-NEXT:    fmul s0, s1, s0
 ; CHECK-NO16-NEXT:    fcvt h0, s0
 ; CHECK-NO16-NEXT:    ret
 ;
@@ -472,10 +472,10 @@ define half @scvtf_f16_i64_15(i64 %long) {
 ; CHECK-NO16-LABEL: scvtf_f16_i64_15:
 ; CHECK-NO16:       // %bb.0:
 ; CHECK-NO16-NEXT:    scvtf s1, x0
-; CHECK-NO16-NEXT:    movi v0.2s, #71, lsl #24
+; CHECK-NO16-NEXT:    movi v0.2s, #56, lsl #24
 ; CHECK-NO16-NEXT:    fcvt h1, s1
 ; CHECK-NO16-NEXT:    fcvt s1, h1
-; CHECK-NO16-NEXT:    fdiv s0, s1, s0
+; CHECK-NO16-NEXT:    fmul s0, s1, s0
 ; CHECK-NO16-NEXT:    fcvt h0, s0
 ; CHECK-NO16-NEXT:    ret
 ;
@@ -574,10 +574,10 @@ define half @ucvtf_f16_i32_7(i32 %int) {
 ; CHECK-NO16-LABEL: ucvtf_f16_i32_7:
 ; CHECK-NO16:       // %bb.0:
 ; CHECK-NO16-NEXT:    ucvtf s1, w0
-; CHECK-NO16-NEXT:    movi v0.2s, #67, lsl #24
+; CHECK-NO16-NEXT:    movi v0.2s, #60, lsl #24
 ; CHECK-NO16-NEXT:    fcvt h1, s1
 ; CHECK-NO16-NEXT:    fcvt s1, h1
-; CHECK-NO16-NEXT:    fdiv s0, s1, s0
+; CHECK-NO16-NEXT:    fmul s0, s1, s0
 ; CHECK-NO16-NEXT:    fcvt h0, s0
 ; CHECK-NO16-NEXT:    ret
 ;
@@ -594,10 +594,10 @@ define half @ucvtf_f16_i32_15(i32 %int) {
 ; CHECK-NO16-LABEL: ucvtf_f16_i32_15:
 ; CHECK-NO16:       // %bb.0:
 ; CHECK-NO16-NEXT:    ucvtf s1, w0
-; CHECK-NO16-NEXT:    movi v0.2s, #71, lsl #24
+; CHECK-NO16-NEXT:    movi v0.2s, #56, lsl #24
 ; CHECK-NO16-NEXT:    fcvt h1, s1
 ; CHECK-NO16-NEXT:    fcvt s1, h1
-; CHECK-NO16-NEXT:    fdiv s0, s1, s0
+; CHECK-NO16-NEXT:    fmul s0, s1, s0
 ; CHECK-NO16-NEXT:    fcvt h0, s0
 ; CHECK-NO16-NEXT:    ret
 ;
@@ -614,10 +614,10 @@ define half @ucvtf_f16_i64_7(i64 %long) {
 ; CHECK-NO16-LABEL: ucvtf_f16_i64_7:
 ; CHECK-NO16:       // %bb.0:
 ; CHECK-NO16-NEXT:    ucvtf s1, x0
-; CHECK-NO16-NEXT:    movi v0.2s, #67, lsl #24
+; CHECK-NO16-NEXT:    movi v0.2s, #60, lsl #24
 ; CHECK-NO16-NEXT:    fcvt h1, s1
 ; CHECK-NO16-NEXT:    fcvt s1, h1
-; CHECK-NO16-NEXT:    fdiv s0, s1, s0
+; CHECK-NO16-NEXT:    fmul s0, s1, s0
 ; CHECK-NO16-NEXT:    fcvt h0, s0
 ; CHECK-NO16-NEXT:    ret
 ;
@@ -634,10 +634,10 @@ define half @ucvtf_f16_i64_15(i64 %long) {
 ; CHECK-NO16-LABEL: ucvtf_f16_i64_15:
 ; CHECK-NO16:       // %bb.0:
 ; CHECK-NO16-NEXT:    ucvtf s1, x0
-; CHECK-NO16-NEXT:    movi v0.2s, #71, lsl #24
+; CHECK-NO16-NEXT:    movi v0.2s, #56, lsl #24
 ; CHECK-NO16-NEXT:    fcvt h1, s1
 ; CHECK-NO16-NEXT:    fcvt s1, h1
-; CHECK-NO16-NEXT:    fdiv s0, s1, s0
+; CHECK-NO16-NEXT:    fmul s0, s1, s0
 ; CHECK-NO16-NEXT:    fcvt h0, s0
 ; CHECK-NO16-NEXT:    ret
 ;
diff --git a/llvm/test/CodeGen/AArch64/frem-power2.ll b/llvm/test/CodeGen/AArch64/frem-power2.ll
index 4192745abd347..6aa419a0f419a 100644
--- a/llvm/test/CodeGen/AArch64/frem-power2.ll
+++ b/llvm/test/CodeGen/AArch64/frem-power2.ll
@@ -5,11 +5,12 @@
 define float @frem2(float %x) {
 ; CHECK-SD-LABEL: frem2:
 ; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    fmov s1, #2.00000000
+; CHECK-SD-NEXT:    fmov s1, #0.50000000
 ; CHECK-SD-NEXT:    // kill: def $s0 killed $s0 def $q0
-; CHECK-SD-NEXT:    fdiv s2, s0, s1
-; CHECK-SD-NEXT:    frintz s2, s2
-; CHECK-SD-NEXT:    fmsub s1, s2, s1, s0
+; CHECK-SD-NEXT:    fmov s2, #-2.00000000
+; CHECK-SD-NEXT:    fmul s1, s0, s1
+; CHECK-SD-NEXT:    frintz s1, s1
+; CHECK-SD-NEXT:    fmadd s1, s1, s2, s0
 ; CHECK-SD-NEXT:    mvni v2.4s, #128, lsl #24
 ; CHECK-SD-NEXT:    bit v0.16b, v1.16b, v2.16b
 ; CHECK-SD-NEXT:    // kill: def $s0 killed $s0 killed $q0
@@ -27,10 +28,11 @@ entry:
 define float @frem2_nsz(float %x) {
 ; CHECK-SD-LABEL: frem2_nsz:
 ; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    fmov s1, #2.00000000
-; CHECK-SD-NEXT:    fdiv s2, s0, s1
-; CHECK-SD-NEXT:    frintz s2, s2
-; CHECK-SD-NEXT:    fmsub s0, s2, s1, s0
+; CHECK-SD-NEXT:    fmov s1, #0.50000000
+; CHECK-SD-NEXT:    fmov s2, #-2.00000000
+; CHECK-SD-NEXT:    fmul s1, s0, s1
+; CHECK-SD-NEXT:    frintz s1, s1
+; CHECK-SD-NEXT:    fmadd s0, s1, s2, s0
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GI-LABEL: frem2_nsz:
@@ -65,10 +67,11 @@ define float @frem2_abs(float %x) {
 ; CHECK-SD-LABEL: frem2_abs:
 ; CHECK-SD:       // %bb.0: // %entry
 ; CHECK-SD-NEXT:    fabs s0, s0
-; CHECK-SD-NEXT:    fmov s1, #2.00000000
-; CHECK-SD-NEXT:    fdiv s2, s0, s1
-; CHECK-SD-NEXT:    frintz s2, s2
-; CHECK-SD-NEXT:    fmsub s0, s2, s1, s0
+; CHECK-SD-NEXT:    fmov s1, #0.50000000
+; CHECK-SD-NEXT:    fmov s2, #-2.00000000
+; CHECK-SD-NEXT:    fmul s1, s0, s1
+; CHECK-SD-NEXT:    frintz s1, s1
+; CHECK-SD-NEXT:    fmadd s0, s1, s2, s0
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GI-LABEL: frem2_abs:
@@ -85,9 +88,9 @@ entry:
 define half @hrem2_nsz(half %x) {
 ; CHECK-SD-LABEL: hrem2_nsz:
 ; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    fmov h1, #2.00000000
+; CHECK-SD-NEXT:    fmov h1, #0.50000000
 ; CHECK-SD-NEXT:    fmov h2, #-2.00000000
-; CHECK-SD-NEXT:    fdiv h1, h0, h1
+; CHECK-SD-NEXT:    fmul h1, h0, h1
 ; CHECK-SD-NEXT:    frintz h1, h1
 ; CHECK-SD-NEXT:    fmadd h0, h1, h2, h0
 ; CHECK-SD-NEXT:    ret
@@ -112,10 +115,11 @@ entry:
 define double @drem2_nsz(double %x) {
 ; CHECK-SD-LABEL: drem2_nsz:
 ; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    fmov d1, #2.00000000
-; CHECK-SD-NEXT:    fdiv d2, d0, d1
-; CHECK-SD-NEXT:    frintz d2, d2
-; CHECK-SD-NEXT:    fmsub d0, d2, d1, d0
+; CHECK-SD-NEXT:    fmov d1, #0.50000000
+; CHECK-SD-NEXT:    fmov d2, #-2.00000000
+; CHECK-SD-NEXT:    fmul d1, d0, d1
+; CHECK-SD-NEXT:    frintz d1, d1
+; CHECK-SD-NEXT:    fmadd d0, d1, d2, d0
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GI-LABEL: drem2_nsz:
@@ -176,10 +180,11 @@ entry:
 define float @fremm2_nsz(float %x) {
 ; CHECK-SD-LABEL: fremm2_nsz:
 ; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    fmov s1, #-2.00000000
-; CHECK-SD-NEXT:    fdiv s2, s0, s1
-; CHECK-SD-NEXT:    frintz s2, s2
-; CHECK-SD-NEXT:    fmsub s0, s2, s1, s0
+; CHECK-SD-NEXT:    fmov s1, #-0.50000000
+; CHECK-SD-NEXT:    fmov s2, #2.00000000
+; CHECK-SD-NEXT:    fmul s1, s0, s1
+; CHECK-SD-NEXT:    frintz s1, s1
+; CHECK-SD-NEXT:    fmadd s0, s1, s2, s0
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GI-LABEL: fremm2_nsz:
@@ -195,10 +200,11 @@ define float @frem4_abs(float %x) {
 ; CHECK-SD-LABEL: frem4_abs:
 ; CHECK-SD:       // %bb.0: // %entry
 ; CHECK-SD-NEXT:    fabs s0, s0
-; CHECK-SD-NEXT:    fmov s1, #4.00000000
-; CHECK-SD-NEXT:    fdiv s2, s0, s1
-; CHECK-SD-NEXT:    frintz s2, s2
-; CHECK-SD-NEXT:    fmsub s0, s2, s1, s0
+; CHECK-SD-NEXT:    fmov s1, #0.25000000
+; CHECK-SD-NEXT:    fmov s2, #-4.00000000
+; CHECK-SD-NEXT:    fmul s1, s0, s1
+; CHECK-SD-NEXT:    frintz s1, s1
+; CHECK-SD-NEXT:    fmadd s0, s1, s2, s0
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GI-LABEL: frem4_abs:
@@ -216,10 +222,12 @@ define float @frem16_abs(float %x) {
 ; CHECK-SD-LABEL: frem16_abs:
 ; CHECK-SD:       // %bb.0: // %entry
 ; CHECK-SD-NEXT:    fabs s0, s0
-; CHECK-SD-NEXT:    fmov s1, #16.00000000
-; CHECK-SD-NEXT:    fdiv s2, s0, s1
-; CHECK-SD-NEXT:    frintz s2, s2
-; CHECK-SD-NEXT:    fmsub s0, s2, s1, s0
+; CHECK-SD-NEXT:    mov w8, #1031798784 // =0x3d800000
+; CHECK-SD-NEXT:    fmov s2, #-16.00000000
+; CHECK-SD-NEXT:    fmov s1, w8
+; CHECK-SD-NEXT:    fmul s1, s0, s1
+; CHECK-SD-NEXT:    frintz s1, s1
+; CHECK-SD-NEXT:    fmadd s0, s1, s2, s0
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GI-LABEL: frem16_abs:
@@ -237,11 +245,13 @@ define float @frem4294967296_abs(float %x) {
 ; CHECK-SD-LABEL: frem4294967296_abs:
 ; CHECK-SD:       // %bb.0: // %entry
 ; CHECK-SD-NEXT:    fabs s0, s0
-; CHECK-SD-NEXT:    mov w8, #1333788672 // =0x4f800000
+; CHECK-SD-NEXT:    mov w8, #796917760 // =0x2f800000
 ; CHECK-SD-NEXT:    fmov s1, w8
-; CHECK-SD-NEXT:    fdiv s2, s0, s1
-; CHECK-SD-NEXT:    frintz s2, s2
-; CHECK-SD-NEXT:    fmsub s0, s2, s1, s0
+; CHECK-SD-NEXT:    mov w8, #-813694976 // =0xcf800000
+; CHECK-SD-NEXT:    fmov s2, w8
+; CHECK-SD-NEXT:    fmul s1, s0, s1
+; CHECK-SD-NEXT:    frintz s1, s1
+; CHECK-SD-NEXT:    fmadd s0, s1, s2, s0
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GI-LABEL: frem4294967296_abs:
@@ -260,11 +270,13 @@ define float @frem1152921504606846976_abs(float %x) {
 ; CHECK-SD-LABEL: frem1152921504606846976_abs:
 ; CHECK-SD:       // %bb.0: // %entry
 ; CHECK-SD-NEXT:    fabs s0, s0
-; CHECK-SD-NEXT:    mov w8, #1568669696 // =0x5d800000
+; CHECK-SD-NEXT:    mov w8, #562036736 // =0x21800000
 ; CHECK-SD-NEXT:    fmov s1, w8
-; CHECK-SD-NEXT:    fdiv s2, s0, s1
-; CHECK-SD-NEXT:    frintz s2, s2
-; CHECK-SD-NEXT:    fmsub s0, s2, s1, s0
+; CHECK-SD-NEXT:    mov w8, #-578813952 // =0xdd800000
+; CHECK-SD-NEXT:    fmov s2, w8
+; CHECK-SD-NEXT:    fmul s1, s0, s1
+; CHECK-SD-NEXT:    frintz s1, s1
+; CHECK-SD-NEXT:    fmadd s0, s1, s2, s0
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GI-LABEL: frem1152921504606846976_abs:
@@ -283,11 +295,13 @@ define float @frem4611686018427387904_abs(float %x) {
 ; CHECK-SD-LABEL: frem4611686018427387904_abs:
 ; CHECK-SD:       // %bb.0: // %entry
 ; CHECK-SD-NEXT:    fabs s0, s0
-; CHECK-SD-NEXT:    mov w8, #1585446912 // =0x5e800000
+; CHECK-SD-NEXT:    mov w8, #545259520 // =0x20800000
 ; CHECK-SD-NEXT:    fmov s1, w8
-; CHECK-SD-NEXT:    fdiv s2, s0, s1
-; CHECK-SD-NEXT:    frintz s2, s2
-; CHECK-SD-NEXT:    fmsub s0, s2, s1, s0
+; CHECK-SD-NEXT:    mov w8, #-562036736 // =0xde800000
+; CHECK-SD-NEXT:    fmov s2, w8
+; CHECK-SD-NEXT:    fmul s1, s0, s1
+; CHECK-SD-NEXT:    frintz s1, s1
+; CHECK-SD-NEXT:    fmadd s0, s1, s2, s0
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GI-LABEL: frem4611686018427387904_abs:
@@ -305,11 +319,12 @@ entry:
 define float @frem9223372036854775808_abs(float %x) {
 ; CHECK-SD-LABEL: frem9223372036854775808_abs:
 ; CHECK-SD:       // %bb.0: // %entry
-; CHECK-SD-NEXT:    movi v1.2s, #95, lsl #24
+; CHECK-SD-NEXT:    movi v1.2s, #32, lsl #24
 ; CHECK-SD-NEXT:    fabs s0, s0
-; CHECK-SD-NEXT:    fdiv s2, s0, s1
-; CHECK-SD-NEXT:    frintz s2, s2
-; CHECK-SD-NEXT:    fmsub s0, s2, s1, s0
+; CHECK-SD-NEXT:    movi v2.2s, #223, lsl #24
+; CHECK-SD-NEXT:    fmul s1, s0, s1
+; CHECK-SD-NEXT:    frintz s1, s1
+; CHECK-SD-NEXT:    fmadd s0, s1, s2, s0
 ; CHECK-SD-NEXT:    ret
 ;
 ; CHECK-GI-LABEL: frem9223372036854775808_abs:
diff --git a/llvm/test/CodeGen/AMDGPU/uniform-phi-with-undef.ll b/llvm/test/CodeGen/AMDGPU/uniform-phi-with-undef.ll
index 5386ef425dcb5..64d4a0cf78501 100644
--- a/llvm/test/CodeGen/AMDGPU/uniform-phi-with-undef.ll
+++ b/llvm/test/CodeGen/AMDGPU/uniform-phi-with-undef.ll
@@ -17,7 +17,7 @@ define amdgpu_ps float @uniform_phi_with_undef(float inreg %c, float %v, i32 %x,
 ; GCN-NEXT:    s_mov_b32 exec_lo, s2
 ; GCN-NEXT:    s_cbranch_execz .LBB0_2
 ; GCN-NEXT:  ; %bb.1: ; %if
-; GCN-NEXT:    s_mov_b32 s2, 2.0
+; GCN-NEXT:    s_mov_b32 s2, 0x40400000
 ; GCN-NEXT:    v_div_scale_f32 v1, s3, s2, s2, v0
 ; GCN-NEXT:    v_rcp_f32_e64 v2, v1
 ; GCN-NEXT:    s_mov_b32 s3, 1.0
@@ -39,7 +39,7 @@ entry:
   br i1 %cc, label %if, label %end
 
 if:
-  %v.if = fdiv float %v, 2.0
+  %v.if = fdiv float %v, 3.0
   br label %end
 
 end:
diff --git a/llvm/test/CodeGen/ARM/frem-power2.ll b/llvm/test/CodeGen/ARM/frem-power2.ll
index 71c2c09c0105c..63ecd9fec7883 100644
--- a/llvm/test/CodeGen/ARM/frem-power2.ll
+++ b/llvm/test/CodeGen/ARM/frem-power2.ll
@@ -14,26 +14,28 @@ define float @frem4(float %x) {
 ;
 ; CHECK-FP-LABEL: frem4:
 ; CHECK-FP:       @ %bb.0: @ %entry
-; CHECK-FP-NEXT:    vmov.f32 s0, #4.000000e+00
-; CHECK-FP-NEXT:    vmov s2, r0
+; CHECK-FP-NEXT:    vmov.f32 s0, #2.500000e-01
+; CHECK-FP-NEXT:    vmov.f32 s2, #-4.000000e+00
+; CHECK-FP-NEXT:    vmov s4, r0
 ; CHECK-FP-NEXT:    lsrs r0, r0, #31
-; CHECK-FP-NEXT:    vdiv.f32 s4, s2, s0
-; CHECK-FP-NEXT:    vrintz.f32 s4, s4
-; CHECK-FP-NEXT:    vfms.f32 s2, s4, s0
-; CHECK-FP-NEXT:    vmov r1, s2
+; CHECK-FP-NEXT:    vmul.f32 s0, s4, s0
+; CHECK-FP-NEXT:    vrintz.f32 s0, s0
+; CHECK-FP-NEXT:    vfma.f32 s4, s0, s2
+; CHECK-FP-NEXT:    vmov r1, s4
 ; CHECK-FP-NEXT:    bfi r1, r0, #31, #1
 ; CHECK-FP-NEXT:    mov r0, r1
 ; CHECK-FP-NEXT:    bx lr
 ;
 ; CHECK-M33-LABEL: frem4:
 ; CHECK-M33:       @ %bb.0: @ %entry
-; CHECK-M33-NEXT:    vmov.f32 s0, #4.000000e+00
-; CHECK-M33-NEXT:    vmov s2, r0
+; CHECK-M33-NEXT:    vmov.f32 s0, #2.500000e-01
+; CHECK-M33-NEXT:    vmov.f32 s2, #-4.000000e+00
+; CHECK-M33-NEXT:    vmov s4, r0
 ; CHECK-M33-NEXT:    lsrs r0, r0, #31
-; CHECK-M33-NEXT:    vdiv.f32 s4, s2, s0
-; CHECK-M33-NEXT:    vrintz.f32 s4, s4
-; CHECK-M33-NEXT:    vmls.f32 s2, s4, s0
-; CHECK-M33-NEXT:    vmov r1, s2
+; CHECK-M33-NEXT:    vmul.f32 s0, s4, s0
+; CHECK-M33-NEXT:    vrintz.f32 s0, s0
+; CHECK-M33-NEXT:    vmla.f32 s4, s0, s2
+; CHECK-M33-NEXT:    vmov r1, s4
 ; CHECK-M33-NEXT:    bfi r1, r0, #31, #1
 ; CHECK-M33-NEXT:    mov r0, r1
 ; CHECK-M33-NEXT:    bx lr
@@ -53,22 +55,24 @@ define float @frem4_nsz(float %x) {
 ;
 ; CHECK-FP-LABEL: frem4_nsz:
 ; CHECK-FP:       @ %bb.0: @ %entry
-; CHECK-FP-NEXT:    vmov.f32 s0, #4.000000e+00
-; CHECK-FP-NEXT:    vmov s2, r0
-; CHECK-FP-NEXT:    vdiv.f32 s4, s2, s0
-; CHECK-FP-NEXT:    vrintz.f32 s4, s4
-; CHECK-FP-NEXT:    vfms.f32 s2, s4, s0
-; CHECK-FP-NEXT:    vmov r0, s2
+; CHECK-FP-NEXT:    vmov.f32 s0, #2.500000e-01
+; CHECK-FP-NEXT:    vmov.f32 s2, #-4.000000e+00
+; CHECK-FP-NEXT:    vmov s4, r0
+; CHECK-FP-NEXT:    vmul.f32 s0, s4, s0
+; CHECK-FP-NEXT:    vrintz.f32 s0, s0
+; CHECK-FP-NEXT:    vfma.f32 s4, s0, s2
+; CHECK-FP-NEXT:    vmov r0, s4
 ; CHECK-FP-NEXT:    bx lr
 ;
 ; CHECK-M33-LABEL: frem4_nsz:
 ; CHECK-M33:       @ %bb.0: @ %entry
-; CHECK-M33-NEXT:    vmov.f32 s0, #4.000000e+00
-; CHECK-M33-NEXT:    vmov s2, r0
-; CHECK-M33-NEXT:    vdiv.f32 s4, s2, s0
-; CHECK-M33-NEXT:    vrintz.f32 s4, s4
-; CHECK-M33-NEXT:    vmls.f32 s2, s4, s0
-; CHECK-M33-NEXT:    vmov r0, s2
+; CHECK-M33-NEXT:    vmov.f32 s0, #2.500000e-01
+; CHECK-M33-NEXT:    vmov.f32 s2, #-4.000000e+00
+; CHECK-M33-NEXT:    vmov s4, r0
+; CHECK-M33-NEXT:    vmul.f32 s0, s4, s0
+; CHECK-M33-NEXT:    vrintz.f32 s0, s0
+; CHECK-M33-NEXT:    vmla.f32 s4, s0, s2
+; CHECK-M33-NEXT:    vmov r0, s4
 ; CHECK-M33-NEXT:    bx lr
 entry:
   %fmod = frem nsz float %x, 4.0
diff --git a/llvm/test/CodeGen/X86/change-unsafe-fp-math.ll b/llvm/test/CodeGen/X86/change-unsafe-fp-math.ll
index 33a7ec9bfc794..ba09ba8b6402b 100644
--- a/llvm/test/CodeGen/X86/change-unsafe-fp-math.ll
+++ b/llvm/test/CodeGen/X86/change-unsafe-fp-math.ll
@@ -14,7 +14,7 @@
 define double @unsafe_fp_math_default0(double %x) {
 ; SAFE:      divsd
 ; UNSAFE:    mulsd
-  %div = fdiv double %x, 2.0
+  %div = fdiv double %x, 3.0
   ret double %div
 }
 
@@ -22,7 +22,7 @@ define double @unsafe_fp_math_default0(double %x) {
 define double @unsafe_fp_math_off(double %x) #0 {
 ; SAFE:      divsd
 ; UNSAFE:    divsd
-  %div = fdiv double %x, 2.0
+  %div = fdiv double %x, 3.0
   ret double %div
 }
 
@@ -31,7 +31,7 @@ define double @unsafe_fp_math_default1(double %x) {
 ; With unsafe math enabled, can change this div to a mul.
 ; SAFE:      divsd
 ; UNSAFE:    mulsd
-  %div = fdiv double %x, 2.0
+  %div = fdiv double %x, 3.0
   ret double %div
 }
 
@@ -39,7 +39,7 @@ define double @unsafe_fp_math_default1(double %x) {
 define double @unsafe_fp_math_on(double %x) #1 {
 ; SAFE:      mulsd
 ; UNSAFE:    mulsd
-  %div = fdiv double %x, 2.0
+  %div = fdiv double %x, 3.0
   ret double %div
 }
 
@@ -48,7 +48,7 @@ define double @unsafe_fp_math_default2(double %x) {
 ; With unsafe math enabled, can change this div to a mul.
 ; SAFE:      divsd
 ; UNSAFE:    mulsd
-  %div = fdiv double %x, 2.0
+  %div = fdiv double %x, 3.0
   ret double %div
 }
 

APFloat::opStatus st = Recip.divide(N1APF, APFloat::rmNearestTiesToEven);
// Only do the transform if the reciprocal is a legal fp immediate that
// isn't too nasty (eg NaN, denormal, ...).
if (((st == APFloat::opOK && !Recip.isDenormal()) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs test for this !denormal part at least

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. In practice I think this only affects dividing by the max power of 2, e.g. for single precision dividing by 2^127 would be the same as multiplying by 2^-127, but 2^-127 is denormal.

I think you've also changed the behaviour of the arcp case here. Previously we would optimize fdiv arcp float %x, 2^127 but with this patch we will not. That might be OK but it should be a deliberate change and it needs a test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are now specific tests in the fdiv-const.ll test file, including some vectors.

We could change it to fold denormals in more cases, but it would require some additional changes for the fixed-point lowering we have under Arm/AArch64. I don't believe that multiplying by a denormal is ever a problem under Arm/AArch64 but I'm not sure about other architectures.

davemgreen added a commit that referenced this pull request Jun 3, 2024
Instcombine will convert fdiv by a power-2 to fmul, this converts the
PerformVDIVCombine that converts fdiv+fcvt to fixed-point fcvt to fmul+fcvt.
The fdiv tests will look worse, but won't appear in practice (and should be
improved again by #93882).
This moves the combine of fdiv by constant to fmul out of an
'if (Options.UnsafeFPMath || Flags.hasAllowReciprocal()' block,
so that it still triggers if the divide is exact. An extra check
for Recip.isDenormal() is added as multiple places make reference
to it being unsafe or slow on certain platforms.
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Letting the denormal through is probably fine

return DAG.getNode(ISD::FMUL, DL, VT, N0,
DAG.getConstantFP(Recip, DL, VT));
}
// fold (fdiv X, c2) -> (fmul X, 1/c2) if there is no loss in precision, or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't do this in instcombine already, we probably should also do this there

Comment on lines +971 to +972
APFloat Val(Sem, Negative ? -1 : 1);
return Val;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
APFloat Val(Sem, Negative ? -1 : 1);
return Val;
return APFloat(Sem, Negative ? -1.0 : 1.0);

?

@@ -1367,12 +1367,11 @@ define void @bcast_unfold_fdiv_v16f32(ptr nocapture %arg) {
; CHECK-LABEL: bcast_unfold_fdiv_v16f32:
; CHECK: # %bb.0: # %bb
; CHECK-NEXT: movq $-4096, %rax # imm = 0xF000
; CHECK-NEXT: vbroadcastss {{.*#+}} zmm0 = [2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0,2.0E+0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2.000000e+00 splat value needs to be changed to something that can't be easily inverted - the test needs to show the second operand can be hoisted out of the loop, which fails for a commutable op like fmul.

davemgreen added a commit that referenced this pull request Jun 9, 2024
#93882)

This moves the combine of fdiv by constant to fmul out of an
'if (Options.UnsafeFPMath || Flags.hasAllowReciprocal()' block,
so that it triggers if the divide is exact. An extra check for
Recip.isDenormal() is added as multiple places make reference
to it being unsafe or slow on certain platforms.
@HerrCai0907 HerrCai0907 mentioned this pull request Jun 13, 2024
@davemgreen
Copy link
Collaborator Author

This was in a284bdb

@davemgreen davemgreen closed this Aug 28, 2024
@davemgreen davemgreen deleted the gh-dag-fdivconst branch August 28, 2024 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants