-
Notifications
You must be signed in to change notification settings - Fork 13.5k
SimplifyLibCalls: Don't require ldexp to emit intrinsic in exp2 combine #92707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SimplifyLibCalls: Don't require ldexp to emit intrinsic in exp2 combine #92707
Conversation
@llvm/pr-subscribers-lto @llvm/pr-subscribers-llvm-transforms Author: Matt Arsenault (arsenm) ChangesWhen folding exp2(itofp(x)) to ldexp(1, x), don't require an ldexp libcall to emit the intrinsic. The intrinsic needs to be handled regardless of whether the system has a libcall, and we have an inline implementation of ldexp already. This fixes the instance in the exp2->ldexp fold. Another instance exists for the pow(2) -> ldexp case The LTO test change isn't ideal, since it's just moving the problem to another instance where we're relying on implied libm behavior for an intrinsic transform. Use exp10 since that's a harder case to solve in the libcall house of cards we have. Full diff: https://github.com/llvm/llvm-project/pull/92707.diff 4 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp b/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
index eb1224abf00e2..df89e1775a319 100644
--- a/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
@@ -2386,12 +2386,11 @@ Value *LibCallSimplifier::optimizeExp2(CallInst *CI, IRBuilderBase &B) {
// exp2(uitofp(x)) -> ldexp(1.0, zext(x)) if sizeof(x) < IntSize
Value *Op = CI->getArgOperand(0);
if ((isa<SIToFPInst>(Op) || isa<UIToFPInst>(Op)) &&
- hasFloatFn(M, TLI, Ty, LibFunc_ldexp, LibFunc_ldexpf, LibFunc_ldexpl)) {
+ (UseIntrinsic ||
+ hasFloatFn(M, TLI, Ty, LibFunc_ldexp, LibFunc_ldexpf, LibFunc_ldexpl))) {
if (Value *Exp = getIntToFPVal(Op, B, TLI->getIntSize())) {
Constant *One = ConstantFP::get(Ty, 1.0);
- // TODO: Emitting the intrinsic should not depend on whether the libcall
- // is available.
if (UseIntrinsic) {
return copyFlags(*CI, B.CreateIntrinsic(Intrinsic::ldexp,
{Ty, Exp->getType()},
diff --git a/llvm/test/LTO/X86/triple-init2.ll b/llvm/test/LTO/X86/triple-init2.ll
index 2638180ef33c6..bc5ecf9785a28 100644
--- a/llvm/test/LTO/X86/triple-init2.ll
+++ b/llvm/test/LTO/X86/triple-init2.ll
@@ -11,21 +11,20 @@
; RUN: llvm-lto2 run -r %t1,main,plx -o %t2 %t1
; RUN: llvm-nm %t2.1 | FileCheck %s
-; We check that LTO will be aware of target triple and prevent exp2 to ldexpf
+; We check that LTO will be aware of target triple and prevent pow to exp10
; transformation on Windows.
-; CHECK: U exp2f
+; CHECK: U powf
target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc19.11.0"
+declare float @llvm.pow.f32(float, float)
+
define dso_local i32 @main(i32 %argc, ptr nocapture readnone %argv) local_unnamed_addr {
entry:
%conv = sitofp i32 %argc to float
- %exp2 = tail call float @llvm.exp2.f32(float %conv)
+ %exp2 = tail call float @llvm.pow.f32(float 10.0, float %conv)
%conv1 = fptosi float %exp2 to i32
ret i32 %conv1
}
-; Function Attrs: nounwind readnone speculatable
-declare float @llvm.exp2.f32(float)
-
diff --git a/llvm/test/Transforms/InstCombine/exp2-1.ll b/llvm/test/Transforms/InstCombine/exp2-1.ll
index 2dff0b08ecf97..d8bd0a4d8159d 100644
--- a/llvm/test/Transforms/InstCombine/exp2-1.ll
+++ b/llvm/test/Transforms/InstCombine/exp2-1.ll
@@ -242,8 +242,8 @@ define double @test_simplify9(i8 zeroext %x) {
; NOLDEXPF-NEXT: ret double [[RET]]
;
; NOLDEXP-LABEL: @test_simplify9(
-; NOLDEXP-NEXT: [[CONV:%.*]] = uitofp i8 [[X:%.*]] to double
-; NOLDEXP-NEXT: [[RET:%.*]] = call double @llvm.exp2.f64(double [[CONV]])
+; NOLDEXP-NEXT: [[TMP1:%.*]] = zext i8 [[X:%.*]] to i32
+; NOLDEXP-NEXT: [[RET:%.*]] = call double @llvm.ldexp.f64.i32(double 1.000000e+00, i32 [[TMP1]])
; NOLDEXP-NEXT: ret double [[RET]]
;
%conv = uitofp i8 %x to double
@@ -263,13 +263,13 @@ define float @test_simplify10(i8 zeroext %x) {
; LDEXP16-NEXT: ret float [[RET]]
;
; NOLDEXPF-LABEL: @test_simplify10(
-; NOLDEXPF-NEXT: [[CONV:%.*]] = uitofp i8 [[X:%.*]] to float
-; NOLDEXPF-NEXT: [[RET:%.*]] = call float @llvm.exp2.f32(float [[CONV]])
+; NOLDEXPF-NEXT: [[TMP1:%.*]] = zext i8 [[X:%.*]] to i32
+; NOLDEXPF-NEXT: [[RET:%.*]] = call float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
; NOLDEXPF-NEXT: ret float [[RET]]
;
; NOLDEXP-LABEL: @test_simplify10(
-; NOLDEXP-NEXT: [[CONV:%.*]] = uitofp i8 [[X:%.*]] to float
-; NOLDEXP-NEXT: [[RET:%.*]] = call float @llvm.exp2.f32(float [[CONV]])
+; NOLDEXP-NEXT: [[TMP1:%.*]] = zext i8 [[X:%.*]] to i32
+; NOLDEXP-NEXT: [[RET:%.*]] = call float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
; NOLDEXP-NEXT: ret float [[RET]]
;
%conv = uitofp i8 %x to float
@@ -289,13 +289,13 @@ define float @sitofp_scalar_intrinsic_with_FMF(i8 %x) {
; LDEXP16-NEXT: ret float [[R]]
;
; NOLDEXPF-LABEL: @sitofp_scalar_intrinsic_with_FMF(
-; NOLDEXPF-NEXT: [[S:%.*]] = sitofp i8 [[X:%.*]] to float
-; NOLDEXPF-NEXT: [[R:%.*]] = tail call nnan float @llvm.exp2.f32(float [[S]])
+; NOLDEXPF-NEXT: [[TMP1:%.*]] = sext i8 [[X:%.*]] to i32
+; NOLDEXPF-NEXT: [[R:%.*]] = tail call nnan float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
; NOLDEXPF-NEXT: ret float [[R]]
;
; NOLDEXP-LABEL: @sitofp_scalar_intrinsic_with_FMF(
-; NOLDEXP-NEXT: [[S:%.*]] = sitofp i8 [[X:%.*]] to float
-; NOLDEXP-NEXT: [[R:%.*]] = tail call nnan float @llvm.exp2.f32(float [[S]])
+; NOLDEXP-NEXT: [[TMP1:%.*]] = sext i8 [[X:%.*]] to i32
+; NOLDEXP-NEXT: [[R:%.*]] = tail call nnan float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
; NOLDEXP-NEXT: ret float [[R]]
;
%s = sitofp i8 %x to float
@@ -317,9 +317,14 @@ define <2 x float> @sitofp_vector_intrinsic_with_FMF(<2 x i8> %x) {
; LDEXP16-NEXT: [[R:%.*]] = call nnan <2 x float> @llvm.ldexp.v2f32.v2i16(<2 x float> <float 1.000000e+00, float 1.000000e+00>, <2 x i16> [[TMP1]])
; LDEXP16-NEXT: ret <2 x float> [[R]]
;
+; NOLDEXPF-LABEL: @sitofp_vector_intrinsic_with_FMF(
+; NOLDEXPF-NEXT: [[TMP1:%.*]] = sext <2 x i8> [[X:%.*]] to <2 x i32>
+; NOLDEXPF-NEXT: [[R:%.*]] = call nnan <2 x float> @llvm.ldexp.v2f32.v2i32(<2 x float> <float 1.000000e+00, float 1.000000e+00>, <2 x i32> [[TMP1]])
+; NOLDEXPF-NEXT: ret <2 x float> [[R]]
+;
; NOLDEXP-LABEL: @sitofp_vector_intrinsic_with_FMF(
-; NOLDEXP-NEXT: [[S:%.*]] = sitofp <2 x i8> [[X:%.*]] to <2 x float>
-; NOLDEXP-NEXT: [[R:%.*]] = call nnan <2 x float> @llvm.exp2.v2f32(<2 x float> [[S]])
+; NOLDEXP-NEXT: [[TMP1:%.*]] = sext <2 x i8> [[X:%.*]] to <2 x i32>
+; NOLDEXP-NEXT: [[R:%.*]] = call nnan <2 x float> @llvm.ldexp.v2f32.v2i32(<2 x float> <float 1.000000e+00, float 1.000000e+00>, <2 x i32> [[TMP1]])
; NOLDEXP-NEXT: ret <2 x float> [[R]]
;
%s = sitofp <2 x i8> %x to <2 x float>
diff --git a/llvm/test/Transforms/InstCombine/exp2-to-ldexp.ll b/llvm/test/Transforms/InstCombine/exp2-to-ldexp.ll
index 6e5be5a19d6da..969020140cb33 100644
--- a/llvm/test/Transforms/InstCombine/exp2-to-ldexp.ll
+++ b/llvm/test/Transforms/InstCombine/exp2-to-ldexp.ll
@@ -1,19 +1,13 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
-; RUN: opt -S -passes=instcombine %s | FileCheck -check-prefixes=CHECK,LDEXP %s
-; RUN: opt -S -passes=instcombine -disable-builtin=ldexpf -disable-builtin=ldexp -disable-builtin=ldexpl %s | FileCheck -check-prefixes=CHECK,NOLDEXP %s
+; RUN: opt -S -passes=instcombine %s | FileCheck %s
+; RUN: opt -S -passes=instcombine -disable-builtin=ldexpf -disable-builtin=ldexp -disable-builtin=ldexpl %s | FileCheck %s
define float @exp2_f32_sitofp_i8(i8 %x) {
-; LDEXP-LABEL: define float @exp2_f32_sitofp_i8(
-; LDEXP-SAME: i8 [[X:%.*]]) {
-; LDEXP-NEXT: [[TMP1:%.*]] = sext i8 [[X]] to i32
-; LDEXP-NEXT: [[LDEXPF:%.*]] = call float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
-; LDEXP-NEXT: ret float [[LDEXPF]]
-;
-; NOLDEXP-LABEL: define float @exp2_f32_sitofp_i8(
-; NOLDEXP-SAME: i8 [[X:%.*]]) {
-; NOLDEXP-NEXT: [[ITOFP:%.*]] = sitofp i8 [[X]] to float
-; NOLDEXP-NEXT: [[EXP2:%.*]] = call float @llvm.exp2.f32(float [[ITOFP]])
-; NOLDEXP-NEXT: ret float [[EXP2]]
+; CHECK-LABEL: define float @exp2_f32_sitofp_i8(
+; CHECK-SAME: i8 [[X:%.*]]) {
+; CHECK-NEXT: [[TMP1:%.*]] = sext i8 [[X]] to i32
+; CHECK-NEXT: [[EXP2:%.*]] = call float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
+; CHECK-NEXT: ret float [[EXP2]]
;
%itofp = sitofp i8 %x to float
%exp2 = call float @llvm.exp2.f32(float %itofp)
@@ -21,17 +15,11 @@ define float @exp2_f32_sitofp_i8(i8 %x) {
}
define float @exp2_f32_sitofp_i8_flags(i8 %x) {
-; LDEXP-LABEL: define float @exp2_f32_sitofp_i8_flags(
-; LDEXP-SAME: i8 [[X:%.*]]) {
-; LDEXP-NEXT: [[TMP1:%.*]] = sext i8 [[X]] to i32
-; LDEXP-NEXT: [[LDEXPF:%.*]] = call nnan ninf float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
-; LDEXP-NEXT: ret float [[LDEXPF]]
-;
-; NOLDEXP-LABEL: define float @exp2_f32_sitofp_i8_flags(
-; NOLDEXP-SAME: i8 [[X:%.*]]) {
-; NOLDEXP-NEXT: [[ITOFP:%.*]] = sitofp i8 [[X]] to float
-; NOLDEXP-NEXT: [[EXP2:%.*]] = call nnan ninf float @llvm.exp2.f32(float [[ITOFP]])
-; NOLDEXP-NEXT: ret float [[EXP2]]
+; CHECK-LABEL: define float @exp2_f32_sitofp_i8_flags(
+; CHECK-SAME: i8 [[X:%.*]]) {
+; CHECK-NEXT: [[TMP1:%.*]] = sext i8 [[X]] to i32
+; CHECK-NEXT: [[EXP2:%.*]] = call nnan ninf float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
+; CHECK-NEXT: ret float [[EXP2]]
;
%itofp = sitofp i8 %x to float
%exp2 = call nnan ninf float @llvm.exp2.f32(float %itofp)
@@ -39,17 +27,11 @@ define float @exp2_f32_sitofp_i8_flags(i8 %x) {
}
define <2 x float> @exp2_v2f32_sitofp_v2i8(<2 x i8> %x) {
-; LDEXP-LABEL: define <2 x float> @exp2_v2f32_sitofp_v2i8(
-; LDEXP-SAME: <2 x i8> [[X:%.*]]) {
-; LDEXP-NEXT: [[TMP1:%.*]] = sext <2 x i8> [[X]] to <2 x i32>
-; LDEXP-NEXT: [[EXP2:%.*]] = call <2 x float> @llvm.ldexp.v2f32.v2i32(<2 x float> <float 1.000000e+00, float 1.000000e+00>, <2 x i32> [[TMP1]])
-; LDEXP-NEXT: ret <2 x float> [[EXP2]]
-;
-; NOLDEXP-LABEL: define <2 x float> @exp2_v2f32_sitofp_v2i8(
-; NOLDEXP-SAME: <2 x i8> [[X:%.*]]) {
-; NOLDEXP-NEXT: [[ITOFP:%.*]] = sitofp <2 x i8> [[X]] to <2 x float>
-; NOLDEXP-NEXT: [[EXP2:%.*]] = call <2 x float> @llvm.exp2.v2f32(<2 x float> [[ITOFP]])
-; NOLDEXP-NEXT: ret <2 x float> [[EXP2]]
+; CHECK-LABEL: define <2 x float> @exp2_v2f32_sitofp_v2i8(
+; CHECK-SAME: <2 x i8> [[X:%.*]]) {
+; CHECK-NEXT: [[TMP1:%.*]] = sext <2 x i8> [[X]] to <2 x i32>
+; CHECK-NEXT: [[EXP2:%.*]] = call <2 x float> @llvm.ldexp.v2f32.v2i32(<2 x float> <float 1.000000e+00, float 1.000000e+00>, <2 x i32> [[TMP1]])
+; CHECK-NEXT: ret <2 x float> [[EXP2]]
;
%itofp = sitofp <2 x i8> %x to <2 x float>
%exp2 = call <2 x float> @llvm.exp2.v2f32(<2 x float> %itofp)
@@ -57,17 +39,11 @@ define <2 x float> @exp2_v2f32_sitofp_v2i8(<2 x i8> %x) {
}
define float @exp2_f32_uitofp_i8(i8 %x) {
-; LDEXP-LABEL: define float @exp2_f32_uitofp_i8(
-; LDEXP-SAME: i8 [[X:%.*]]) {
-; LDEXP-NEXT: [[TMP1:%.*]] = zext i8 [[X]] to i32
-; LDEXP-NEXT: [[LDEXPF:%.*]] = call float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
-; LDEXP-NEXT: ret float [[LDEXPF]]
-;
-; NOLDEXP-LABEL: define float @exp2_f32_uitofp_i8(
-; NOLDEXP-SAME: i8 [[X:%.*]]) {
-; NOLDEXP-NEXT: [[ITOFP:%.*]] = uitofp i8 [[X]] to float
-; NOLDEXP-NEXT: [[EXP2:%.*]] = call float @llvm.exp2.f32(float [[ITOFP]])
-; NOLDEXP-NEXT: ret float [[EXP2]]
+; CHECK-LABEL: define float @exp2_f32_uitofp_i8(
+; CHECK-SAME: i8 [[X:%.*]]) {
+; CHECK-NEXT: [[TMP1:%.*]] = zext i8 [[X]] to i32
+; CHECK-NEXT: [[EXP2:%.*]] = call float @llvm.ldexp.f32.i32(float 1.000000e+00, i32 [[TMP1]])
+; CHECK-NEXT: ret float [[EXP2]]
;
%itofp = uitofp i8 %x to float
%exp2 = call float @llvm.exp2.f32(float %itofp)
@@ -77,8 +53,8 @@ define float @exp2_f32_uitofp_i8(i8 %x) {
define half @exp2_f16_sitofp_i8(i8 %x) {
; CHECK-LABEL: define half @exp2_f16_sitofp_i8(
; CHECK-SAME: i8 [[X:%.*]]) {
-; CHECK-NEXT: [[ITOFP:%.*]] = sitofp i8 [[X]] to half
-; CHECK-NEXT: [[EXP2:%.*]] = call half @llvm.exp2.f16(half [[ITOFP]])
+; CHECK-NEXT: [[TMP1:%.*]] = sext i8 [[X]] to i32
+; CHECK-NEXT: [[EXP2:%.*]] = call half @llvm.ldexp.f16.i32(half 0xH3C00, i32 [[TMP1]])
; CHECK-NEXT: ret half [[EXP2]]
;
%itofp = sitofp i8 %x to half
@@ -87,17 +63,11 @@ define half @exp2_f16_sitofp_i8(i8 %x) {
}
define double @exp2_f64_sitofp_i8(i8 %x) {
-; LDEXP-LABEL: define double @exp2_f64_sitofp_i8(
-; LDEXP-SAME: i8 [[X:%.*]]) {
-; LDEXP-NEXT: [[TMP1:%.*]] = sext i8 [[X]] to i32
-; LDEXP-NEXT: [[LDEXP:%.*]] = call double @llvm.ldexp.f64.i32(double 1.000000e+00, i32 [[TMP1]])
-; LDEXP-NEXT: ret double [[LDEXP]]
-;
-; NOLDEXP-LABEL: define double @exp2_f64_sitofp_i8(
-; NOLDEXP-SAME: i8 [[X:%.*]]) {
-; NOLDEXP-NEXT: [[ITOFP:%.*]] = sitofp i8 [[X]] to double
-; NOLDEXP-NEXT: [[EXP2:%.*]] = call double @llvm.exp2.f64(double [[ITOFP]])
-; NOLDEXP-NEXT: ret double [[EXP2]]
+; CHECK-LABEL: define double @exp2_f64_sitofp_i8(
+; CHECK-SAME: i8 [[X:%.*]]) {
+; CHECK-NEXT: [[TMP1:%.*]] = sext i8 [[X]] to i32
+; CHECK-NEXT: [[EXP2:%.*]] = call double @llvm.ldexp.f64.i32(double 1.000000e+00, i32 [[TMP1]])
+; CHECK-NEXT: ret double [[EXP2]]
;
%itofp = sitofp i8 %x to double
%exp2 = call double @llvm.exp2.f64(double %itofp)
@@ -105,17 +75,11 @@ define double @exp2_f64_sitofp_i8(i8 %x) {
}
define fp128 @exp2_fp128_sitofp_i8(i8 %x) {
-; LDEXP-LABEL: define fp128 @exp2_fp128_sitofp_i8(
-; LDEXP-SAME: i8 [[X:%.*]]) {
-; LDEXP-NEXT: [[TMP1:%.*]] = sext i8 [[X]] to i32
-; LDEXP-NEXT: [[LDEXPL:%.*]] = call fp128 @llvm.ldexp.f128.i32(fp128 0xL00000000000000003FFF000000000000, i32 [[TMP1]])
-; LDEXP-NEXT: ret fp128 [[LDEXPL]]
-;
-; NOLDEXP-LABEL: define fp128 @exp2_fp128_sitofp_i8(
-; NOLDEXP-SAME: i8 [[X:%.*]]) {
-; NOLDEXP-NEXT: [[ITOFP:%.*]] = sitofp i8 [[X]] to fp128
-; NOLDEXP-NEXT: [[EXP2:%.*]] = call fp128 @llvm.exp2.f128(fp128 [[ITOFP]])
-; NOLDEXP-NEXT: ret fp128 [[EXP2]]
+; CHECK-LABEL: define fp128 @exp2_fp128_sitofp_i8(
+; CHECK-SAME: i8 [[X:%.*]]) {
+; CHECK-NEXT: [[TMP1:%.*]] = sext i8 [[X]] to i32
+; CHECK-NEXT: [[EXP2:%.*]] = call fp128 @llvm.ldexp.f128.i32(fp128 0xL00000000000000003FFF000000000000, i32 [[TMP1]])
+; CHECK-NEXT: ret fp128 [[EXP2]]
;
%itofp = sitofp i8 %x to fp128
%exp2 = call fp128 @llvm.exp2.fp128(fp128 %itofp)
@@ -123,17 +87,11 @@ define fp128 @exp2_fp128_sitofp_i8(i8 %x) {
}
define <vscale x 4 x float> @exp2_nxv4f32_sitofp_i8(<vscale x 4 x i8> %x) {
-; LDEXP-LABEL: define <vscale x 4 x float> @exp2_nxv4f32_sitofp_i8(
-; LDEXP-SAME: <vscale x 4 x i8> [[X:%.*]]) {
-; LDEXP-NEXT: [[TMP1:%.*]] = sext <vscale x 4 x i8> [[X]] to <vscale x 4 x i32>
-; LDEXP-NEXT: [[EXP2:%.*]] = call <vscale x 4 x float> @llvm.ldexp.nxv4f32.nxv4i32(<vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 1.000000e+00, i64 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> [[TMP1]])
-; LDEXP-NEXT: ret <vscale x 4 x float> [[EXP2]]
-;
-; NOLDEXP-LABEL: define <vscale x 4 x float> @exp2_nxv4f32_sitofp_i8(
-; NOLDEXP-SAME: <vscale x 4 x i8> [[X:%.*]]) {
-; NOLDEXP-NEXT: [[ITOFP:%.*]] = sitofp <vscale x 4 x i8> [[X]] to <vscale x 4 x float>
-; NOLDEXP-NEXT: [[EXP2:%.*]] = call <vscale x 4 x float> @llvm.exp2.nxv4f32(<vscale x 4 x float> [[ITOFP]])
-; NOLDEXP-NEXT: ret <vscale x 4 x float> [[EXP2]]
+; CHECK-LABEL: define <vscale x 4 x float> @exp2_nxv4f32_sitofp_i8(
+; CHECK-SAME: <vscale x 4 x i8> [[X:%.*]]) {
+; CHECK-NEXT: [[TMP1:%.*]] = sext <vscale x 4 x i8> [[X]] to <vscale x 4 x i32>
+; CHECK-NEXT: [[EXP2:%.*]] = call <vscale x 4 x float> @llvm.ldexp.nxv4f32.nxv4i32(<vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 1.000000e+00, i64 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> [[TMP1]])
+; CHECK-NEXT: ret <vscale x 4 x float> [[EXP2]]
;
%itofp = sitofp <vscale x 4 x i8> %x to <vscale x 4 x float>
%exp2 = call <vscale x 4 x float> @llvm.exp2.nxv4f32(<vscale x 4 x float> %itofp)
|
It looks like the lowering for llvm.ldexp currently doesn't respect "no-builtin-ldexp" etc.; we should probably fix that first. Long-term, we really need a better approach for managing the codesize explosion that comes from expanding everything inline. |
When I tried to consider no-builtin* for expanding memcpy instead of emitting a call, I got complaints (and had to implement c8cac15) because for some reason there was the expectation of this not changing legalization emitted libcalls. We have parallel tables, TargetLibraryInfo and RuntimeLibcalls, and it's not clear if no-builtin is supposed to change the RuntimeLibcall behavior |
If we reserve the right to call ldexp anywhere, under any set of flags, how is anyone supposed to implement a C library? I guess this particular combine is unlikely to cause issues, but if you're planning to rip out all checks for whether a functions exists, it seems like something we need to address. For memcpy/memmove/memset specifically, we currently have a special-case rule: in no-builtins mode: existing calls to the intrinsics lower to the corresponding calls, but optimizations don't generate any new calls to those functions. This is sort of hard to formally justify, but in practice, it works pretty well for people doing stuff with -ffreestanding. The special case doesn't really extend beyond those specific functions. |
This is already not possible, but is a problem to solve in codegen for when the intrinsic implementation is emitted. We need some mechanism to disable RuntimeLibcalls when building a libc. IMO it was wrong to provide any math intrinsics to expose the usage of math functions. These intrinsics should be providing the target implementation of the libm functions, not the other way around. If we can't codegen the intrinsic without the libcall, we probably shouldn't have the intrinsic. Treating the intrinsics as aliases to the library calls isn't really useful. We have to maintain parallel optimization paths for the intrinsic and call anyway, then have a lot of machinery to turn the calls into intrinsics, and then the intrinsics back into libcalls. Alternative to this, I could fix it to preserve libcall->libcall behavior and intrinsic->intrinsic behavior. The main problem I'm trying to solve is gating intrinsic->intrinsic gated on the libcall availability
For today, I just want to fix ldexp. We need to come up with a RuntimeLibcalls solution eventually (e.g. see the inline asm in llvm-libc to use fma instead of calling the fma intrinsic)
We could also undo this optimization in codegen if ldexp isn't available (which in practice just means windows) and call pow |
I see where you're coming from with this... see also discussion on https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 . If you're willing to push forward something to break apart the current mess, I'd be in favor.
That still technically breaks -fno-builtin-ldexp... but the way it breaks is much less likely to matter, so I think I'm okay with this.
On Windows specifically, we have ldexp; the issue is just that we don't have ldexpf. |
We definitely have a mess with regard to our handling of math library functions. We're nowhere near consistent behavior. I'm not sure if this is what Matt was alluding to, but the gcc documentation for fno-builtin-* says, "Don’t recognize built-in functions that do not begin with ‘_builtin’ as prefix." (emphasis added) The result of this is that if you call __builtin_ldexp even when compiling with -fno-builtin-ldexp, we will happily translate it to a call to the llvm.ldexp intrinsic, and on X86 that gets lowered to a call to ldexp() (which may or may not be the library call). https://godbolt.org/z/95Kb6PjnW You can make a case that this is user error -- if they didn't want the library function to be called, they shouldn't have used the builtin. It would be much better if we lowered the intrinsics to calls to a compiler runtime library function that we could be certain existed and had control over the accuracy of. This would allow us to actual implement things the way the LLVM IR definition says we will (i.e. not setting errno). But that's not the current situation, and the example above is different from the front end replacing one call with another on the user's behalf. If the user has passed |
Maybe the right thing to do here is just use TTI: we ask the target if it can handle the intrinsic, it says yes if either the libcall is legal, or it has some other implementation it can use. |
When folding exp2(itofp(x)) to ldexp(1, x), don't require an ldexp libcall to emit the intrinsic. The intrinsic needs to be handled regardless of whether the system has a libcall, and we have an inline implementation of ldexp already. This fixes the instance in the exp2->ldexp fold. Another instance exists for the pow(2) -> ldexp case The LTO test change isn't ideal, since it's just moving the problem to another instance where we're relying on implied libm behavior for an intrinsic transform. Use exp10 since that's a harder case to solve in the libcall house of cards we have.
Can we handle this by just promoting float to double in codegen? |
Assuming the runtime libcalls are set up accuratly, this should hit the inline expansion we get today and nothing will break. We do not just unconditionally lower the intrinsic to the libcall, although the control-flow free expansion might not be what you want |
43b490e
to
1b4e6be
Compare
Yes, that's fine; the Windows math.h does exactly the same thing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ne (llvm#92707) When folding exp2(itofp(x)) to ldexp(1, x), don't require an ldexp libcall to emit the intrinsic. The intrinsic needs to be handled regardless of whether the system has a libcall, and we have an inline implementation of ldexp already. This fixes the instance in the exp2->ldexp fold. Another instance exists for the pow(2) -> ldexp case The LTO test change isn't ideal, since it's just moving the problem to another instance where we're relying on implied libm behavior for an intrinsic transform. Use exp10 since that's a harder case to solve in the libcall house of cards we have.
This reduces codesize. As discussed in llvm#92707.
This reduces codesize. As discussed in #92707.
This reduces codesize. As discussed in llvm#92707.
When folding exp2(itofp(x)) to ldexp(1, x), don't require an ldexp libcall to emit the intrinsic.
The intrinsic needs to be handled regardless of whether the system has a libcall, and we have an inline implementation of ldexp already. This fixes the instance in the exp2->ldexp fold. Another instance exists for the pow(2) -> ldexp case
The LTO test change isn't ideal, since it's just moving the problem to another instance where we're relying on implied libm behavior for an intrinsic transform. Use exp10 since that's a harder case to solve in the libcall house of cards we have.