Skip to content

[IR] Allow fast math flags on fptrunc and fpext #115894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 4, 2024

Conversation

john-brawn-arm
Copy link
Collaborator

@john-brawn-arm john-brawn-arm commented Nov 12, 2024

This consists of:

  • Make these instructions part of FPMathOperator.
  • Adjust bitcode/ir readers/writers to expect fast math flags on these instructions.
  • Make IRBuilder set the fast math flags on these instructions.
  • Update langref and release notes.
  • Update a bunch of tests. Some of these are due to InstCombineCasts incorrectly adding fast math flags to fptrunc, which will be fixed in a later patch.

This consists of:
 * Make these instructions part of FPMathOperator.
 * Adjust bitcode/ir readers/writers to expect fast math flags on these
   instructions.
 * Make IRBuilder set the fast math flags on these instructions.
 * Adjust InstCombinerImpl::visitFPTrunc, as currently the fast math flags on
   all of the output instructions are set from the source operation. Instead we
   want each output instruction to get the fast math flags from their
   corresponding input instruction.
 * Update langref and release notes.
 * Update a bunch of tests.
@llvmbot
Copy link
Member

llvmbot commented Nov 12, 2024

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-clang
@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-backend-amdgpu

Author: John Brawn (john-brawn-arm)

Changes

This consists of:

  • Make these instructions part of FPMathOperator.
  • Adjust bitcode/ir readers/writers to expect fast math flags on these instructions.
  • Make IRBuilder set the fast math flags on these instructions.
  • Adjust InstCombinerImpl::visitFPTrunc, as currently the fast math flags on all of the output instructions are set from the source operation. Instead we want each output instruction to get the fast math flags from their corresponding input instruction.
  • Update langref and release notes.
  • Update a bunch of tests.

Patch is 95.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/115894.diff

15 Files Affected:

  • (modified) clang/test/CodeGen/X86/cx-complex-range.c (+88-88)
  • (modified) clang/test/CodeGen/cx-complex-range.c (+34-34)
  • (modified) clang/test/CodeGen/matrix-type-operators-fast-math.c (+6-6)
  • (modified) clang/test/CodeGen/nofpclass.c (+30-30)
  • (modified) clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu (+1-1)
  • (modified) clang/test/CodeGenHIP/printf_nonhostcall.cpp (+2-2)
  • (modified) clang/test/Headers/__clang_hip_math_ocml_rounded_ops.hip (+16-16)
  • (modified) llvm/docs/LangRef.rst (+8-4)
  • (modified) llvm/docs/ReleaseNotes.md (+2)
  • (modified) llvm/include/llvm/IR/IRBuilder.h (+15-10)
  • (modified) llvm/include/llvm/IR/Operator.h (+2)
  • (modified) llvm/lib/AsmParser/LLParser.cpp (+10-2)
  • (modified) llvm/lib/Bitcode/Reader/BitcodeReader.cpp (+5)
  • (modified) llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp (+5-3)
  • (modified) llvm/test/Assembler/fast-math-flags.ll (+32)
diff --git a/clang/test/CodeGen/X86/cx-complex-range.c b/clang/test/CodeGen/X86/cx-complex-range.c
index a0e6dc219b36f7..f87091427df713 100644
--- a/clang/test/CodeGen/X86/cx-complex-range.c
+++ b/clang/test/CodeGen/X86/cx-complex-range.c
@@ -266,14 +266,14 @@
 // BASIC_FAST-NEXT:    [[A_REAL:%.*]] = load half, ptr [[A_REALP]], align 2
 // BASIC_FAST-NEXT:    [[A_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[A]], i32 0, i32 1
 // BASIC_FAST-NEXT:    [[A_IMAG:%.*]] = load half, ptr [[A_IMAGP]], align 2
-// BASIC_FAST-NEXT:    [[EXT:%.*]] = fpext half [[A_REAL]] to float
-// BASIC_FAST-NEXT:    [[EXT1:%.*]] = fpext half [[A_IMAG]] to float
+// BASIC_FAST-NEXT:    [[EXT:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[A_REAL]] to float
+// BASIC_FAST-NEXT:    [[EXT1:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[A_IMAG]] to float
 // BASIC_FAST-NEXT:    [[B_REALP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[B]], i32 0, i32 0
 // BASIC_FAST-NEXT:    [[B_REAL:%.*]] = load half, ptr [[B_REALP]], align 2
 // BASIC_FAST-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[B]], i32 0, i32 1
 // BASIC_FAST-NEXT:    [[B_IMAG:%.*]] = load half, ptr [[B_IMAGP]], align 2
-// BASIC_FAST-NEXT:    [[EXT2:%.*]] = fpext half [[B_REAL]] to float
-// BASIC_FAST-NEXT:    [[EXT3:%.*]] = fpext half [[B_IMAG]] to float
+// BASIC_FAST-NEXT:    [[EXT2:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[B_REAL]] to float
+// BASIC_FAST-NEXT:    [[EXT3:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[B_IMAG]] to float
 // BASIC_FAST-NEXT:    [[TMP0:%.*]] = fmul reassoc nnan ninf nsz arcp afn float [[EXT]], [[EXT2]]
 // BASIC_FAST-NEXT:    [[TMP1:%.*]] = fmul reassoc nnan ninf nsz arcp afn float [[EXT1]], [[EXT3]]
 // BASIC_FAST-NEXT:    [[TMP2:%.*]] = fadd reassoc nnan ninf nsz arcp afn float [[TMP0]], [[TMP1]]
@@ -285,8 +285,8 @@
 // BASIC_FAST-NEXT:    [[TMP8:%.*]] = fsub reassoc nnan ninf nsz arcp afn float [[TMP6]], [[TMP7]]
 // BASIC_FAST-NEXT:    [[TMP9:%.*]] = fdiv reassoc nnan ninf nsz arcp afn float [[TMP2]], [[TMP5]]
 // BASIC_FAST-NEXT:    [[TMP10:%.*]] = fdiv reassoc nnan ninf nsz arcp afn float [[TMP8]], [[TMP5]]
-// BASIC_FAST-NEXT:    [[UNPROMOTION:%.*]] = fptrunc float [[TMP9]] to half
-// BASIC_FAST-NEXT:    [[UNPROMOTION4:%.*]] = fptrunc float [[TMP10]] to half
+// BASIC_FAST-NEXT:    [[UNPROMOTION:%.*]] = fptrunc reassoc nnan ninf nsz arcp afn float [[TMP9]] to half
+// BASIC_FAST-NEXT:    [[UNPROMOTION4:%.*]] = fptrunc reassoc nnan ninf nsz arcp afn float [[TMP10]] to half
 // BASIC_FAST-NEXT:    [[RETVAL_REALP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[RETVAL]], i32 0, i32 0
 // BASIC_FAST-NEXT:    [[RETVAL_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[RETVAL]], i32 0, i32 1
 // BASIC_FAST-NEXT:    store half [[UNPROMOTION]], ptr [[RETVAL_REALP]], align 2
@@ -307,22 +307,22 @@
 // FULL_FAST-NEXT:    [[A_REAL:%.*]] = load half, ptr [[A_REALP]], align 2
 // FULL_FAST-NEXT:    [[A_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[A]], i32 0, i32 1
 // FULL_FAST-NEXT:    [[A_IMAG:%.*]] = load half, ptr [[A_IMAGP]], align 2
-// FULL_FAST-NEXT:    [[EXT:%.*]] = fpext half [[A_REAL]] to float
-// FULL_FAST-NEXT:    [[EXT1:%.*]] = fpext half [[A_IMAG]] to float
+// FULL_FAST-NEXT:    [[EXT:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[A_REAL]] to float
+// FULL_FAST-NEXT:    [[EXT1:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[A_IMAG]] to float
 // FULL_FAST-NEXT:    [[B_REALP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[B]], i32 0, i32 0
 // FULL_FAST-NEXT:    [[B_REAL:%.*]] = load half, ptr [[B_REALP]], align 2
 // FULL_FAST-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[B]], i32 0, i32 1
 // FULL_FAST-NEXT:    [[B_IMAG:%.*]] = load half, ptr [[B_IMAGP]], align 2
-// FULL_FAST-NEXT:    [[EXT2:%.*]] = fpext half [[B_REAL]] to float
-// FULL_FAST-NEXT:    [[EXT3:%.*]] = fpext half [[B_IMAG]] to float
+// FULL_FAST-NEXT:    [[EXT2:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[B_REAL]] to float
+// FULL_FAST-NEXT:    [[EXT3:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[B_IMAG]] to float
 // FULL_FAST-NEXT:    [[CALL:%.*]] = call reassoc nnan ninf nsz arcp afn nofpclass(nan inf) <2 x float> @__divsc3(float noundef nofpclass(nan inf) [[EXT]], float noundef nofpclass(nan inf) [[EXT1]], float noundef nofpclass(nan inf) [[EXT2]], float noundef nofpclass(nan inf) [[EXT3]]) #[[ATTR1:[0-9]+]]
 // FULL_FAST-NEXT:    store <2 x float> [[CALL]], ptr [[COERCE]], align 4
 // FULL_FAST-NEXT:    [[COERCE_REALP:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[COERCE]], i32 0, i32 0
 // FULL_FAST-NEXT:    [[COERCE_REAL:%.*]] = load float, ptr [[COERCE_REALP]], align 4
 // FULL_FAST-NEXT:    [[COERCE_IMAGP:%.*]] = getelementptr inbounds nuw { float, float }, ptr [[COERCE]], i32 0, i32 1
 // FULL_FAST-NEXT:    [[COERCE_IMAG:%.*]] = load float, ptr [[COERCE_IMAGP]], align 4
-// FULL_FAST-NEXT:    [[UNPROMOTION:%.*]] = fptrunc float [[COERCE_REAL]] to half
-// FULL_FAST-NEXT:    [[UNPROMOTION4:%.*]] = fptrunc float [[COERCE_IMAG]] to half
+// FULL_FAST-NEXT:    [[UNPROMOTION:%.*]] = fptrunc reassoc nnan ninf nsz arcp afn float [[COERCE_REAL]] to half
+// FULL_FAST-NEXT:    [[UNPROMOTION4:%.*]] = fptrunc reassoc nnan ninf nsz arcp afn float [[COERCE_IMAG]] to half
 // FULL_FAST-NEXT:    [[RETVAL_REALP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[RETVAL]], i32 0, i32 0
 // FULL_FAST-NEXT:    [[RETVAL_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[RETVAL]], i32 0, i32 1
 // FULL_FAST-NEXT:    store half [[UNPROMOTION]], ptr [[RETVAL_REALP]], align 2
@@ -342,14 +342,14 @@
 // IMPRVD_FAST-NEXT:    [[A_REAL:%.*]] = load half, ptr [[A_REALP]], align 2
 // IMPRVD_FAST-NEXT:    [[A_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[A]], i32 0, i32 1
 // IMPRVD_FAST-NEXT:    [[A_IMAG:%.*]] = load half, ptr [[A_IMAGP]], align 2
-// IMPRVD_FAST-NEXT:    [[EXT:%.*]] = fpext half [[A_REAL]] to float
-// IMPRVD_FAST-NEXT:    [[EXT1:%.*]] = fpext half [[A_IMAG]] to float
+// IMPRVD_FAST-NEXT:    [[EXT:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[A_REAL]] to float
+// IMPRVD_FAST-NEXT:    [[EXT1:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[A_IMAG]] to float
 // IMPRVD_FAST-NEXT:    [[B_REALP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[B]], i32 0, i32 0
 // IMPRVD_FAST-NEXT:    [[B_REAL:%.*]] = load half, ptr [[B_REALP]], align 2
 // IMPRVD_FAST-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[B]], i32 0, i32 1
 // IMPRVD_FAST-NEXT:    [[B_IMAG:%.*]] = load half, ptr [[B_IMAGP]], align 2
-// IMPRVD_FAST-NEXT:    [[EXT2:%.*]] = fpext half [[B_REAL]] to float
-// IMPRVD_FAST-NEXT:    [[EXT3:%.*]] = fpext half [[B_IMAG]] to float
+// IMPRVD_FAST-NEXT:    [[EXT2:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[B_REAL]] to float
+// IMPRVD_FAST-NEXT:    [[EXT3:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[B_IMAG]] to float
 // IMPRVD_FAST-NEXT:    [[TMP0:%.*]] = call reassoc nnan ninf nsz arcp afn float @llvm.fabs.f32(float [[EXT2]])
 // IMPRVD_FAST-NEXT:    [[TMP1:%.*]] = call reassoc nnan ninf nsz arcp afn float @llvm.fabs.f32(float [[EXT3]])
 // IMPRVD_FAST-NEXT:    [[ABS_CMP:%.*]] = fcmp reassoc nnan ninf nsz arcp afn ugt float [[TMP0]], [[TMP1]]
@@ -379,8 +379,8 @@
 // IMPRVD_FAST:       complex_div:
 // IMPRVD_FAST-NEXT:    [[TMP20:%.*]] = phi reassoc nnan ninf nsz arcp afn float [ [[TMP7]], [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI]] ], [ [[TMP16]], [[ABS_RHSR_LESS_THAN_ABS_RHSI]] ]
 // IMPRVD_FAST-NEXT:    [[TMP21:%.*]] = phi reassoc nnan ninf nsz arcp afn float [ [[TMP10]], [[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI]] ], [ [[TMP19]], [[ABS_RHSR_LESS_THAN_ABS_RHSI]] ]
-// IMPRVD_FAST-NEXT:    [[UNPROMOTION:%.*]] = fptrunc float [[TMP20]] to half
-// IMPRVD_FAST-NEXT:    [[UNPROMOTION4:%.*]] = fptrunc float [[TMP21]] to half
+// IMPRVD_FAST-NEXT:    [[UNPROMOTION:%.*]] = fptrunc reassoc nnan ninf nsz arcp afn float [[TMP20]] to half
+// IMPRVD_FAST-NEXT:    [[UNPROMOTION4:%.*]] = fptrunc reassoc nnan ninf nsz arcp afn float [[TMP21]] to half
 // IMPRVD_FAST-NEXT:    [[RETVAL_REALP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[RETVAL]], i32 0, i32 0
 // IMPRVD_FAST-NEXT:    [[RETVAL_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[RETVAL]], i32 0, i32 1
 // IMPRVD_FAST-NEXT:    store half [[UNPROMOTION]], ptr [[RETVAL_REALP]], align 2
@@ -400,14 +400,14 @@
 // PRMTD_FAST-NEXT:    [[A_REAL:%.*]] = load half, ptr [[A_REALP]], align 2
 // PRMTD_FAST-NEXT:    [[A_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[A]], i32 0, i32 1
 // PRMTD_FAST-NEXT:    [[A_IMAG:%.*]] = load half, ptr [[A_IMAGP]], align 2
-// PRMTD_FAST-NEXT:    [[EXT:%.*]] = fpext half [[A_REAL]] to float
-// PRMTD_FAST-NEXT:    [[EXT1:%.*]] = fpext half [[A_IMAG]] to float
+// PRMTD_FAST-NEXT:    [[EXT:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[A_REAL]] to float
+// PRMTD_FAST-NEXT:    [[EXT1:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[A_IMAG]] to float
 // PRMTD_FAST-NEXT:    [[B_REALP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[B]], i32 0, i32 0
 // PRMTD_FAST-NEXT:    [[B_REAL:%.*]] = load half, ptr [[B_REALP]], align 2
 // PRMTD_FAST-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[B]], i32 0, i32 1
 // PRMTD_FAST-NEXT:    [[B_IMAG:%.*]] = load half, ptr [[B_IMAGP]], align 2
-// PRMTD_FAST-NEXT:    [[EXT2:%.*]] = fpext half [[B_REAL]] to float
-// PRMTD_FAST-NEXT:    [[EXT3:%.*]] = fpext half [[B_IMAG]] to float
+// PRMTD_FAST-NEXT:    [[EXT2:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[B_REAL]] to float
+// PRMTD_FAST-NEXT:    [[EXT3:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[B_IMAG]] to float
 // PRMTD_FAST-NEXT:    [[TMP0:%.*]] = fmul reassoc nnan ninf nsz arcp afn float [[EXT]], [[EXT2]]
 // PRMTD_FAST-NEXT:    [[TMP1:%.*]] = fmul reassoc nnan ninf nsz arcp afn float [[EXT1]], [[EXT3]]
 // PRMTD_FAST-NEXT:    [[TMP2:%.*]] = fadd reassoc nnan ninf nsz arcp afn float [[TMP0]], [[TMP1]]
@@ -419,8 +419,8 @@
 // PRMTD_FAST-NEXT:    [[TMP8:%.*]] = fsub reassoc nnan ninf nsz arcp afn float [[TMP6]], [[TMP7]]
 // PRMTD_FAST-NEXT:    [[TMP9:%.*]] = fdiv reassoc nnan ninf nsz arcp afn float [[TMP2]], [[TMP5]]
 // PRMTD_FAST-NEXT:    [[TMP10:%.*]] = fdiv reassoc nnan ninf nsz arcp afn float [[TMP8]], [[TMP5]]
-// PRMTD_FAST-NEXT:    [[UNPROMOTION:%.*]] = fptrunc float [[TMP9]] to half
-// PRMTD_FAST-NEXT:    [[UNPROMOTION4:%.*]] = fptrunc float [[TMP10]] to half
+// PRMTD_FAST-NEXT:    [[UNPROMOTION:%.*]] = fptrunc reassoc nnan ninf nsz arcp afn float [[TMP9]] to half
+// PRMTD_FAST-NEXT:    [[UNPROMOTION4:%.*]] = fptrunc reassoc nnan ninf nsz arcp afn float [[TMP10]] to half
 // PRMTD_FAST-NEXT:    [[RETVAL_REALP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[RETVAL]], i32 0, i32 0
 // PRMTD_FAST-NEXT:    [[RETVAL_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[RETVAL]], i32 0, i32 1
 // PRMTD_FAST-NEXT:    store half [[UNPROMOTION]], ptr [[RETVAL_REALP]], align 2
@@ -636,22 +636,22 @@ _Complex _Float16 divf16(_Complex _Float16 a, _Complex _Float16 b) {
 // BASIC_FAST-NEXT:    [[A_REAL:%.*]] = load half, ptr [[A_REALP]], align 2
 // BASIC_FAST-NEXT:    [[A_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[A]], i32 0, i32 1
 // BASIC_FAST-NEXT:    [[A_IMAG:%.*]] = load half, ptr [[A_IMAGP]], align 2
-// BASIC_FAST-NEXT:    [[EXT:%.*]] = fpext half [[A_REAL]] to float
-// BASIC_FAST-NEXT:    [[EXT1:%.*]] = fpext half [[A_IMAG]] to float
+// BASIC_FAST-NEXT:    [[EXT:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[A_REAL]] to float
+// BASIC_FAST-NEXT:    [[EXT1:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[A_IMAG]] to float
 // BASIC_FAST-NEXT:    [[B_REALP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[B]], i32 0, i32 0
 // BASIC_FAST-NEXT:    [[B_REAL:%.*]] = load half, ptr [[B_REALP]], align 2
 // BASIC_FAST-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[B]], i32 0, i32 1
 // BASIC_FAST-NEXT:    [[B_IMAG:%.*]] = load half, ptr [[B_IMAGP]], align 2
-// BASIC_FAST-NEXT:    [[EXT2:%.*]] = fpext half [[B_REAL]] to float
-// BASIC_FAST-NEXT:    [[EXT3:%.*]] = fpext half [[B_IMAG]] to float
+// BASIC_FAST-NEXT:    [[EXT2:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[B_REAL]] to float
+// BASIC_FAST-NEXT:    [[EXT3:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[B_IMAG]] to float
 // BASIC_FAST-NEXT:    [[MUL_AC:%.*]] = fmul reassoc nnan ninf nsz arcp afn float [[EXT]], [[EXT2]]
 // BASIC_FAST-NEXT:    [[MUL_BD:%.*]] = fmul reassoc nnan ninf nsz arcp afn float [[EXT1]], [[EXT3]]
 // BASIC_FAST-NEXT:    [[MUL_AD:%.*]] = fmul reassoc nnan ninf nsz arcp afn float [[EXT]], [[EXT3]]
 // BASIC_FAST-NEXT:    [[MUL_BC:%.*]] = fmul reassoc nnan ninf nsz arcp afn float [[EXT1]], [[EXT2]]
 // BASIC_FAST-NEXT:    [[MUL_R:%.*]] = fsub reassoc nnan ninf nsz arcp afn float [[MUL_AC]], [[MUL_BD]]
 // BASIC_FAST-NEXT:    [[MUL_I:%.*]] = fadd reassoc nnan ninf nsz arcp afn float [[MUL_AD]], [[MUL_BC]]
-// BASIC_FAST-NEXT:    [[UNPROMOTION:%.*]] = fptrunc float [[MUL_R]] to half
-// BASIC_FAST-NEXT:    [[UNPROMOTION4:%.*]] = fptrunc float [[MUL_I]] to half
+// BASIC_FAST-NEXT:    [[UNPROMOTION:%.*]] = fptrunc reassoc nnan ninf nsz arcp afn float [[MUL_R]] to half
+// BASIC_FAST-NEXT:    [[UNPROMOTION4:%.*]] = fptrunc reassoc nnan ninf nsz arcp afn float [[MUL_I]] to half
 // BASIC_FAST-NEXT:    [[RETVAL_REALP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[RETVAL]], i32 0, i32 0
 // BASIC_FAST-NEXT:    [[RETVAL_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[RETVAL]], i32 0, i32 1
 // BASIC_FAST-NEXT:    store half [[UNPROMOTION]], ptr [[RETVAL_REALP]], align 2
@@ -672,14 +672,14 @@ _Complex _Float16 divf16(_Complex _Float16 a, _Complex _Float16 b) {
 // FULL_FAST-NEXT:    [[A_REAL:%.*]] = load half, ptr [[A_REALP]], align 2
 // FULL_FAST-NEXT:    [[A_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[A]], i32 0, i32 1
 // FULL_FAST-NEXT:    [[A_IMAG:%.*]] = load half, ptr [[A_IMAGP]], align 2
-// FULL_FAST-NEXT:    [[EXT:%.*]] = fpext half [[A_REAL]] to float
-// FULL_FAST-NEXT:    [[EXT1:%.*]] = fpext half [[A_IMAG]] to float
+// FULL_FAST-NEXT:    [[EXT:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[A_REAL]] to float
+// FULL_FAST-NEXT:    [[EXT1:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[A_IMAG]] to float
 // FULL_FAST-NEXT:    [[B_REALP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[B]], i32 0, i32 0
 // FULL_FAST-NEXT:    [[B_REAL:%.*]] = load half, ptr [[B_REALP]], align 2
 // FULL_FAST-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[B]], i32 0, i32 1
 // FULL_FAST-NEXT:    [[B_IMAG:%.*]] = load half, ptr [[B_IMAGP]], align 2
-// FULL_FAST-NEXT:    [[EXT2:%.*]] = fpext half [[B_REAL]] to float
-// FULL_FAST-NEXT:    [[EXT3:%.*]] = fpext half [[B_IMAG]] to float
+// FULL_FAST-NEXT:    [[EXT2:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[B_REAL]] to float
+// FULL_FAST-NEXT:    [[EXT3:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[B_IMAG]] to float
 // FULL_FAST-NEXT:    [[MUL_AC:%.*]] = fmul reassoc nnan ninf nsz arcp afn float [[EXT]], [[EXT2]]
 // FULL_FAST-NEXT:    [[MUL_BD:%.*]] = fmul reassoc nnan ninf nsz arcp afn float [[EXT1]], [[EXT3]]
 // FULL_FAST-NEXT:    [[MUL_AD:%.*]] = fmul reassoc nnan ninf nsz arcp afn float [[EXT]], [[EXT3]]
@@ -702,8 +702,8 @@ _Complex _Float16 divf16(_Complex _Float16 a, _Complex _Float16 b) {
 // FULL_FAST:       complex_mul_cont:
 // FULL_FAST-NEXT:    [[REAL_MUL_PHI:%.*]] = phi reassoc nnan ninf nsz arcp afn float [ [[MUL_R]], [[ENTRY:%.*]] ], [ [[MUL_R]], [[COMPLEX_MUL_IMAG_NAN]] ], [ [[COERCE_REAL]], [[COMPLEX_MUL_LIBCALL]] ]
 // FULL_FAST-NEXT:    [[IMAG_MUL_PHI:%.*]] = phi reassoc nnan ninf nsz arcp afn float [ [[MUL_I]], [[ENTRY]] ], [ [[MUL_I]], [[COMPLEX_MUL_IMAG_NAN]] ], [ [[COERCE_IMAG]], [[COMPLEX_MUL_LIBCALL]] ]
-// FULL_FAST-NEXT:    [[UNPROMOTION:%.*]] = fptrunc float [[REAL_MUL_PHI]] to half
-// FULL_FAST-NEXT:    [[UNPROMOTION5:%.*]] = fptrunc float [[IMAG_MUL_PHI]] to half
+// FULL_FAST-NEXT:    [[UNPROMOTION:%.*]] = fptrunc reassoc nnan ninf nsz arcp afn float [[REAL_MUL_PHI]] to half
+// FULL_FAST-NEXT:    [[UNPROMOTION5:%.*]] = fptrunc reassoc nnan ninf nsz arcp afn float [[IMAG_MUL_PHI]] to half
 // FULL_FAST-NEXT:    [[RETVAL_REALP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[RETVAL]], i32 0, i32 0
 // FULL_FAST-NEXT:    [[RETVAL_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[RETVAL]], i32 0, i32 1
 // FULL_FAST-NEXT:    store half [[UNPROMOTION]], ptr [[RETVAL_REALP]], align 2
@@ -723,22 +723,22 @@ _Complex _Float16 divf16(_Complex _Float16 a, _Complex _Float16 b) {
 // IMPRVD_FAST-NEXT:    [[A_REAL:%.*]] = load half, ptr [[A_REALP]], align 2
 // IMPRVD_FAST-NEXT:    [[A_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[A]], i32 0, i32 1
 // IMPRVD_FAST-NEXT:    [[A_IMAG:%.*]] = load half, ptr [[A_IMAGP]], align 2
-// IMPRVD_FAST-NEXT:    [[EXT:%.*]] = fpext half [[A_REAL]] to float
-// IMPRVD_FAST-NEXT:    [[EXT1:%.*]] = fpext half [[A_IMAG]] to float
+// IMPRVD_FAST-NEXT:    [[EXT:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[A_REAL]] to float
+// IMPRVD_FAST-NEXT:    [[EXT1:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[A_IMAG]] to float
 // IMPRVD_FAST-NEXT:    [[B_REALP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[B]], i32 0, i32 0
 // IMPRVD_FAST-NEXT:    [[B_REAL:%.*]] = load half, ptr [[B_REALP]], align 2
 // IMPRVD_FAST-NEXT:    [[B_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[B]], i32 0, i32 1
 // IMPRVD_FAST-NEXT:    [[B_IMAG:%.*]] = load half, ptr [[B_IMAGP]], align 2
-// IMPRVD_FAST-NEXT:    [[EXT2:%.*]] = fpext half [[B_REAL]] to float
-// IMPRVD_FAST-NEXT:    [[EXT3:%.*]] = fpext half [[B_IMAG]] to float
+// IMPRVD_FAST-NEXT:    [[EXT2:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[B_REAL]] to float
+// IMPRVD_FAST-NEXT:    [[EXT3:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[B_IMAG]] to float
 // IMPRVD_FAST-NEXT:    [[MUL_AC:%.*]] = fmul reassoc nnan ninf nsz arcp afn float [[EXT]], [[EXT2]]
 // IMPRVD_FAST-NEXT:    [[MUL_BD:%.*]] = fmul reassoc nnan ninf nsz arcp afn float [[EXT1]], [[EXT3]]
 // IMPRVD_FAST-NEXT:    [[MUL_AD:%.*]] = fmul reassoc nnan ninf nsz arcp afn float [[EXT]], [[EXT3]]
 // IMPRVD_FAST-NEXT:    [[MUL_BC:%.*]] = fmul reassoc nnan ninf nsz arcp afn float [[EXT1]], [[EXT2]]
 // IMPRVD_FAST-NEXT:    [[MUL_R:%.*]] = fsub reassoc nnan ninf nsz arcp afn float [[MUL_AC]], [[MUL_BD]]
 // IMPRVD_FAST-NEXT:    [[MUL_I:%.*]] = fadd reassoc nnan ninf nsz arcp afn float [[MUL_AD]], [[MUL_BC]]
-// IMPRVD_FAST-NEXT:    [[UNPROMOTION:%.*]] = fptrunc float [[MUL_R]] to half
-// IMPRVD_FAST-NEXT:    [[UNPROMOTION4:%.*]] = fptrunc float [[MUL_I]] to half
+// IMPRVD_FAST-NEXT:    [[UNPROMOTION:%.*]] = fptrunc reassoc nnan ninf nsz arcp afn float [[MUL_R]] to half
+// IMPRVD_FAST-NEXT:    [[UNPROMOTION4:%.*]] = fptrunc reassoc nnan ninf nsz arcp afn float [[MUL_I]] to half
 // IMPRVD_FAST-NEXT:    [[RETVAL_REALP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[RETVAL]], i32 0, i32 0
 // IMPRVD_FAST-NEXT:    [[RETVAL_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[RETVAL]], i32 0, i32 1
 // IMPRVD_FAST-NEXT:    store half [[UNPROMOTION]], ptr [[RETVAL_REALP]], align 2
@@ -758,22 +758,22 @@ _Complex _Float16 divf16(_Complex _Float16 a, _Complex _Float16 b) {
 // PRMTD_FAST-NEXT:    [[A_REAL:%.*]] = load half, ptr [[A_REALP]], align 2
 // PRMTD_FAST-NEXT:    [[A_IMAGP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[A]], i32 0, i32 1
 // PRMTD_FAST-NEXT:    [[A_IMAG:%.*]] = load half, ptr [[A_IMAGP]], align 2
-// PRMTD_FAST-NEXT:    [[EXT:%.*]] = fpext half [[A_REAL]] to float
-// PRMTD_FAST-NEXT:    [[EXT1:%.*]] = fpext half [[A_IMAG]] to float
+// PRMTD_FAST-NEXT:    [[EXT:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[A_REAL]] to float
+// PRMTD_FAST-NEXT:    [[EXT1:%.*]] = fpext reassoc nnan ninf nsz arcp afn half [[A_IMAG]] to float
 // PRMTD_FAST-NEXT:    [[B_REALP:%.*]] = getelementptr inbounds nuw { half, half }, ptr [[B]], i32 0, i32 0
 // PRMTD_FAST-NEXT:    [[B_REAL:%.*]] = load half, ptr [[B_REALP]], align 2
 /...
[truncated]

Copy link

github-actions bot commented Nov 12, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@nikic nikic added the floating-point Floating-point math label Nov 12, 2024
@nikic nikic requested a review from andykaylor November 12, 2024 16:27
floating-point transformations.
:ref:`frem <i_frem>`, :ref:`fcmp <i_fcmp>`, :ref:`fptrunc <i_fptrunc>`,
:ref:`fpext <i_fpext>`), and :ref:`phi <i_phi>`, :ref:`select <i_select>`, or
:ref:`call <i_call>` instructions that return floating-point types may use the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syntax of fptrunc and fpext forbids fast math flags. You may want to update the two sections.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, fptrunc and fpext need updating to at least show where the fast-math flags may occur.

@@ -42,6 +42,14 @@ entry:
%f = fneg float %x
; CHECK: %f_vec = fneg <3 x float> %vec
%f_vec = fneg <3 x float> %vec
; CHECK: %g = fpext float %x to double
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs bitcode compatibility test

; CHECK: %g = fptrunc nnan ninf float %x to half
%g = fptrunc ninf nnan float %x to half
; CHECK: %g_vec = fptrunc nnan ninf <3 x float> %vec to <3 x half>
%g_vec = fptrunc ninf nnan <3 x float> %vec to <3 x half>
; CHECK: ret float %e
ret float %e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test scalable vector case

Add fast math flags to fptrunc and fpext in langref
Add scalable vector test
Add test in Bitcode/compatibility.ll
return replaceInstUsesWith(FPT, Sel);
}
if (match(Op, m_Select(m_Value(Cond), m_Value(Y), m_FPExt(m_Value(X)))) &&
X->getType() == Ty) {
// fptrunc (select Cond, Y, (fpext X) --> select Cond, (fptrunc Y), X
Value *NarrowY = Builder.CreateFPTrunc(Y, Ty);
Value *Sel = Builder.CreateSelect(Cond, NarrowY, X, "narrow.sel", Op);
if (auto *I = dyn_cast<Instruction>(Sel))
I->setFastMathFlags(Op->getFastMathFlags());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already propagate FMF in IRBuilder::CreateSelect. BTW it is incorrect to preserve ninf: #116073 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IRBuilder::CreateSelect uses the default fast math flags of the IRBuilder, which will be the flags of the fptrunc. So we need this to make sure the new select has the same flags as the old select.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I got MDFrom and FMFSource mixed up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to split off this change? The rest of the patch looks good to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean these specific changes for setting the fast math flag on select instructions, or also the change at the top of visitFPTrunc to set the fast math flags in the builder based on the fptrunc? Either way it causes failures in the llvm/test/Transforms/InstCombine/fpcast.ll and llvm/test/Transforms/InstCombine/fptrunc.ll tests. Without setting the fast math flags in the select the failures are because the select gets the fast math flags from the fptrunc. Without setting the flags in the builder the opposite happens and the fptrunc gets the fast math flags from the select.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would split the instcombine change into a separate PR, and leave this one purely mechanical to add the flags

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is possible. Applying this patch first then later applying a patch to instcombine would cause test failures, as I mention in my comment above. Applying a patch to instcombine first doesn't work, because Instruction::getFastMathFlags fails an assert when the instruction isn't an FPMathOperator, which fptrunc wouldn't be at that point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failures are just lost flag regressions? Just take the regression and let it be fixed in the follow up

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@john-brawn-arm
Copy link
Collaborator Author

Ping

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jcranmer-intel jcranmer-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some followup work needed to get this working in SelectionDAG as well, but that can live in a separate patch (especially as we need to do a followup pass in DAGCombine to check for flags on the fpext/fptrunc nodes).

@john-brawn-arm john-brawn-arm merged commit ecbe4d1 into llvm:main Dec 4, 2024
9 checks passed
@john-brawn-arm
Copy link
Collaborator Author

Instcombine follow up : #118808

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants