-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[arm64] Add tan intrinsic lowering #94545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-llvm-analysis Author: Farzon Lotfi (farzonl) ChangesThis change is an implementation of #87367 investigation on supporting IEEE math operations as intrinsics. This PR is just for Tan. Now that x86 tan backend landed: #90503 we are one step closer to supporting constraint intrinsics for Tan that will be durable. Changes:
Patch is 39.13 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94545.diff 14 Files Affected:
diff --git a/llvm/include/llvm/Analysis/VecFuncs.def b/llvm/include/llvm/Analysis/VecFuncs.def
index 402189769c20c..ffdf8b8c3bc79 100644
--- a/llvm/include/llvm/Analysis/VecFuncs.def
+++ b/llvm/include/llvm/Analysis/VecFuncs.def
@@ -92,6 +92,11 @@ TLI_DEFINE_VECFUNC("llvm.sin.f64", "_simd_sin_d2", FIXED(2), "_ZGV_LLVM_N2v")
TLI_DEFINE_VECFUNC("sinf", "_simd_sin_f4", FIXED(4), "_ZGV_LLVM_N4v")
TLI_DEFINE_VECFUNC("llvm.sin.f32", "_simd_sin_f4", FIXED(4), "_ZGV_LLVM_N4v")
+TLI_DEFINE_VECFUNC("tan", "_simd_tan_d2", FIXED(2), "_ZGV_LLVM_N2v")
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "_simd_tan_d2", FIXED(2), "_ZGV_LLVM_N2v")
+TLI_DEFINE_VECFUNC("tanf", "_simd_tan_f4", FIXED(4), "_ZGV_LLVM_N4v")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "_simd_tan_f4", FIXED(4), "_ZGV_LLVM_N4v")
+
// Floating-Point Arithmetic and Auxiliary Functions
TLI_DEFINE_VECFUNC("cbrt", "_simd_cbrt_d2", FIXED(2), "_ZGV_LLVM_N2v")
TLI_DEFINE_VECFUNC("cbrtf", "_simd_cbrt_f4", FIXED(4), "_ZGV_LLVM_N4v")
@@ -584,6 +589,7 @@ TLI_DEFINE_VECFUNC("sinpi", "_ZGVnN2v_sinpi", FIXED(2), "_ZGV_LLVM_N2v")
TLI_DEFINE_VECFUNC("sqrt", "_ZGVnN2v_sqrt", FIXED(2), "_ZGV_LLVM_N2v")
TLI_DEFINE_VECFUNC("tan", "_ZGVnN2v_tan", FIXED(2), "_ZGV_LLVM_N2v")
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "_ZGVnN2v_tan", FIXED(2), "_ZGV_LLVM_N2v")
TLI_DEFINE_VECFUNC("tanh", "_ZGVnN2v_tanh", FIXED(2), "_ZGV_LLVM_N2v")
@@ -681,6 +687,7 @@ TLI_DEFINE_VECFUNC("sinpif", "_ZGVnN4v_sinpif", FIXED(4), "_ZGV_LLVM_N4v")
TLI_DEFINE_VECFUNC("sqrtf", "_ZGVnN4v_sqrtf", FIXED(4), "_ZGV_LLVM_N4v")
TLI_DEFINE_VECFUNC("tanf", "_ZGVnN4v_tanf", FIXED(4), "_ZGV_LLVM_N4v")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "_ZGVnN4v_tanf", FIXED(4), "_ZGV_LLVM_N4v")
TLI_DEFINE_VECFUNC("tanhf", "_ZGVnN4v_tanhf", FIXED(4), "_ZGV_LLVM_N4v")
@@ -828,6 +835,8 @@ TLI_DEFINE_VECFUNC("sqrtf", "_ZGVsMxv_sqrtf", SCALABLE(4), MASKED, "_ZGVsMxv")
TLI_DEFINE_VECFUNC("tan", "_ZGVsMxv_tan", SCALABLE(2), MASKED, "_ZGVsMxv")
TLI_DEFINE_VECFUNC("tanf", "_ZGVsMxv_tanf", SCALABLE(4), MASKED, "_ZGVsMxv")
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "_ZGVsMxv_tan", SCALABLE(2), MASKED, "_ZGVsMxv")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "_ZGVsMxv_tanf", SCALABLE(4), MASKED, "_ZGVsMxv")
TLI_DEFINE_VECFUNC("tanh", "_ZGVsMxv_tanh", SCALABLE(2), MASKED, "_ZGVsMxv")
TLI_DEFINE_VECFUNC("tanhf", "_ZGVsMxv_tanhf", SCALABLE(4), MASKED, "_ZGVsMxv")
@@ -1087,6 +1096,11 @@ TLI_DEFINE_VECFUNC("tanf", "armpl_vtanq_f32", FIXED(4), NOMASK, "_ZGV_LLVM_N4v")
TLI_DEFINE_VECFUNC("tan", "armpl_svtan_f64_x", SCALABLE(2), MASKED, "_ZGVsMxv")
TLI_DEFINE_VECFUNC("tanf", "armpl_svtan_f32_x", SCALABLE(4), MASKED, "_ZGVsMxv")
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "armpl_vtanq_f64", FIXED(2), NOMASK, "_ZGV_LLVM_N2v")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "armpl_vtanq_f32", FIXED(4), NOMASK, "_ZGV_LLVM_N4v")
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "armpl_svtan_f64_x", SCALABLE(2), MASKED, "_ZGVsMxv")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "armpl_svtan_f32_x", SCALABLE(4), MASKED, "_ZGVsMxv")
+
TLI_DEFINE_VECFUNC("tanh", "armpl_vtanhq_f64", FIXED(2), NOMASK, "_ZGV_LLVM_N2v")
TLI_DEFINE_VECFUNC("tanhf", "armpl_vtanhq_f32", FIXED(4), NOMASK, "_ZGV_LLVM_N4v")
TLI_DEFINE_VECFUNC("tanh", "armpl_svtanh_f64_x", SCALABLE(2), MASKED, "_ZGVsMxv")
diff --git a/llvm/lib/Target/AArch64/AArch64FastISel.cpp b/llvm/lib/Target/AArch64/AArch64FastISel.cpp
index 62cf6a2c47acc..56fd15f23363d 100644
--- a/llvm/lib/Target/AArch64/AArch64FastISel.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FastISel.cpp
@@ -3534,6 +3534,7 @@ bool AArch64FastISel::fastLowerIntrinsicCall(const IntrinsicInst *II) {
}
case Intrinsic::sin:
case Intrinsic::cos:
+ case Intrinsic::tan:
case Intrinsic::pow: {
MVT RetVT;
if (!isTypeLegal(II->getType(), RetVT))
@@ -3542,11 +3543,11 @@ bool AArch64FastISel::fastLowerIntrinsicCall(const IntrinsicInst *II) {
if (RetVT != MVT::f32 && RetVT != MVT::f64)
return false;
- static const RTLIB::Libcall LibCallTable[3][2] = {
- { RTLIB::SIN_F32, RTLIB::SIN_F64 },
- { RTLIB::COS_F32, RTLIB::COS_F64 },
- { RTLIB::POW_F32, RTLIB::POW_F64 }
- };
+ static const RTLIB::Libcall LibCallTable[4][2] = {
+ {RTLIB::SIN_F32, RTLIB::SIN_F64},
+ {RTLIB::COS_F32, RTLIB::COS_F64},
+ {RTLIB::TAN_F32, RTLIB::TAN_F64},
+ {RTLIB::POW_F32, RTLIB::POW_F64}};
RTLIB::Libcall LC;
bool Is64Bit = RetVT == MVT::f64;
switch (II->getIntrinsicID()) {
@@ -3558,9 +3559,12 @@ bool AArch64FastISel::fastLowerIntrinsicCall(const IntrinsicInst *II) {
case Intrinsic::cos:
LC = LibCallTable[1][Is64Bit];
break;
- case Intrinsic::pow:
+ case Intrinsic::tan:
LC = LibCallTable[2][Is64Bit];
break;
+ case Intrinsic::pow:
+ LC = LibCallTable[3][Is64Bit];
+ break;
}
ArgListTy Args;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 360a841bdade4..a8cf8f65029d3 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -543,6 +543,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::FSINCOS, MVT::f128, Expand);
setOperationAction(ISD::FSQRT, MVT::f128, Expand);
setOperationAction(ISD::FSUB, MVT::f128, LibCall);
+ setOperationAction(ISD::FTAN, MVT::f128, Expand);
setOperationAction(ISD::FTRUNC, MVT::f128, Expand);
setOperationAction(ISD::SETCC, MVT::f128, Custom);
setOperationAction(ISD::STRICT_FSETCC, MVT::f128, Custom);
@@ -727,14 +728,14 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::FCOPYSIGN, MVT::bf16, Promote);
}
- for (auto Op : {ISD::FREM, ISD::FPOW, ISD::FPOWI,
- ISD::FCOS, ISD::FSIN, ISD::FSINCOS,
- ISD::FEXP, ISD::FEXP2, ISD::FEXP10,
- ISD::FLOG, ISD::FLOG2, ISD::FLOG10,
- ISD::STRICT_FREM,
- ISD::STRICT_FPOW, ISD::STRICT_FPOWI, ISD::STRICT_FCOS,
- ISD::STRICT_FSIN, ISD::STRICT_FEXP, ISD::STRICT_FEXP2,
- ISD::STRICT_FLOG, ISD::STRICT_FLOG2, ISD::STRICT_FLOG10}) {
+ for (auto Op : {ISD::FREM, ISD::FPOW, ISD::FPOWI,
+ ISD::FCOS, ISD::FSIN, ISD::FSINCOS,
+ ISD::FTAN, ISD::FEXP, ISD::FEXP2,
+ ISD::FEXP10, ISD::FLOG, ISD::FLOG2,
+ ISD::FLOG10, ISD::STRICT_FREM, ISD::STRICT_FPOW,
+ ISD::STRICT_FPOWI, ISD::STRICT_FCOS, ISD::STRICT_FSIN,
+ ISD::STRICT_FEXP, ISD::STRICT_FEXP2, ISD::STRICT_FLOG,
+ ISD::STRICT_FLOG2, ISD::STRICT_FLOG10}) {
setOperationAction(Op, MVT::f16, Promote);
setOperationAction(Op, MVT::v4f16, Expand);
setOperationAction(Op, MVT::v8f16, Expand);
@@ -1171,13 +1172,15 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
if (Subtarget->isNeonAvailable()) {
// FIXME: v1f64 shouldn't be legal if we can avoid it, because it leads to
// silliness like this:
+ // clang-format off
for (auto Op :
{ISD::SELECT, ISD::SELECT_CC,
ISD::BR_CC, ISD::FADD, ISD::FSUB,
ISD::FMUL, ISD::FDIV, ISD::FMA,
ISD::FNEG, ISD::FABS, ISD::FCEIL,
ISD::FSQRT, ISD::FFLOOR, ISD::FNEARBYINT,
- ISD::FSIN, ISD::FCOS, ISD::FPOW,
+ ISD::FSIN, ISD::FCOS, ISD::FTAN,
+ ISD::FPOW,
ISD::FLOG, ISD::FLOG2, ISD::FLOG10,
ISD::FEXP, ISD::FEXP2, ISD::FEXP10,
ISD::FRINT, ISD::FROUND, ISD::FROUNDEVEN,
@@ -1190,7 +1193,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
ISD::STRICT_FMINNUM, ISD::STRICT_FMAXNUM, ISD::STRICT_FMINIMUM,
ISD::STRICT_FMAXIMUM})
setOperationAction(Op, MVT::v1f64, Expand);
-
+ // clang-format on
for (auto Op :
{ISD::FP_TO_SINT, ISD::FP_TO_UINT, ISD::SINT_TO_FP, ISD::UINT_TO_FP,
ISD::FP_ROUND, ISD::FP_TO_SINT_SAT, ISD::FP_TO_UINT_SAT, ISD::MUL,
@@ -1622,6 +1625,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::FCOS, VT, Expand);
setOperationAction(ISD::FSIN, VT, Expand);
setOperationAction(ISD::FSINCOS, VT, Expand);
+ setOperationAction(ISD::FTAN, VT, Expand);
setOperationAction(ISD::FEXP, VT, Expand);
setOperationAction(ISD::FEXP2, VT, Expand);
setOperationAction(ISD::FEXP10, VT, Expand);
@@ -1803,6 +1807,7 @@ void AArch64TargetLowering::addTypeForNEON(MVT VT) {
if (VT == MVT::v2f32 || VT == MVT::v4f32 || VT == MVT::v2f64) {
setOperationAction(ISD::FSIN, VT, Expand);
setOperationAction(ISD::FCOS, VT, Expand);
+ setOperationAction(ISD::FTAN, VT, Expand);
setOperationAction(ISD::FPOW, VT, Expand);
setOperationAction(ISD::FLOG, VT, Expand);
setOperationAction(ISD::FLOG2, VT, Expand);
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index 07a0473888ee5..42cd43c3afa37 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -267,9 +267,8 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
.libcallFor({{s64, s128}})
.minScalarOrElt(1, MinFPScalar);
- getActionDefinitionsBuilder(
- {G_FCOS, G_FSIN, G_FPOW, G_FLOG, G_FLOG2, G_FLOG10,
- G_FEXP, G_FEXP2, G_FEXP10})
+ getActionDefinitionsBuilder({G_FCOS, G_FSIN, G_FPOW, G_FLOG, G_FLOG2,
+ G_FLOG10, G_FTAN, G_FEXP, G_FEXP2, G_FEXP10})
// We need a call for these, so we always need to scalarize.
.scalarize(0)
// Regardless of FP16 support, widen 16-bit elements to 32-bits.
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
index a61931b898aea..eb94cc5d0fb61 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
@@ -2313,6 +2313,14 @@ define float @test_sin_f32(float %x) {
ret float %y
}
+declare float @llvm.tan.f32(float)
+define float @test_tan_f32(float %x) {
+ ; CHECK-LABEL: name: test_tan_f32
+ ; CHECK: %{{[0-9]+}}:_(s32) = G_FTAN %{{[0-9]+}}
+ %y = call float @llvm.tan.f32(float %x)
+ ret float %y
+}
+
declare float @llvm.sqrt.f32(float)
define float @test_sqrt_f32(float %x) {
; CHECK-LABEL: name: test_sqrt_f32
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-tan.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-tan.mir
new file mode 100644
index 0000000000000..455aae0128364
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-tan.mir
@@ -0,0 +1,227 @@
+# RUN: llc -verify-machineinstrs -mtriple aarch64--- \
+# RUN: -run-pass=legalizer -mattr=+fullfp16 -global-isel %s -o - \
+# RUN: | FileCheck %s
+...
+---
+name: test_v4f16.tan
+alignment: 4
+tracksRegLiveness: true
+registers:
+ - { id: 0, class: _ }
+ - { id: 1, class: _ }
+body: |
+ bb.0:
+ liveins: $d0
+ ; CHECK-LABEL: name: test_v4f16.tan
+ ; CHECK: [[V1:%[0-9]+]]:_(s16), [[V2:%[0-9]+]]:_(s16), [[V3:%[0-9]+]]:_(s16), [[V4:%[0-9]+]]:_(s16) = G_UNMERGE_VALUES %{{[0-9]+}}(<4 x s16>)
+
+ ; CHECK-DAG: [[V1_S32:%[0-9]+]]:_(s32) = G_FPEXT [[V1]](s16)
+ ; CHECK-NEXT: ADJCALLSTACKDOWN
+ ; CHECK-NEXT: $s0 = COPY [[V1_S32]](s32)
+ ; CHECK-NEXT: BL &tanf
+ ; CHECK-NEXT: ADJCALLSTACKUP
+ ; CHECK-NEXT: [[ELT1_S32:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-NEXT: [[ELT1:%[0-9]+]]:_(s16) = G_FPTRUNC [[ELT1_S32]](s32)
+
+ ; CHECK-DAG: [[V2_S32:%[0-9]+]]:_(s32) = G_FPEXT [[V2]](s16)
+ ; CHECK-NEXT: ADJCALLSTACKDOWN
+ ; CHECK-NEXT: $s0 = COPY [[V2_S32]](s32)
+ ; CHECK-NEXT: BL &tanf
+ ; CHECK-NEXT: ADJCALLSTACKUP
+ ; CHECK-NEXT: [[ELT2_S32:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-NEXT: [[ELT2:%[0-9]+]]:_(s16) = G_FPTRUNC [[ELT2_S32]](s32)
+
+ ; CHECK-DAG: [[V3_S32:%[0-9]+]]:_(s32) = G_FPEXT [[V3]](s16)
+ ; CHECK-NEXT: ADJCALLSTACKDOWN
+ ; CHECK-NEXT: $s0 = COPY [[V3_S32]](s32)
+ ; CHECK-NEXT: BL &tanf
+ ; CHECK-NEXT: ADJCALLSTACKUP
+ ; CHECK-NEXT: [[ELT3_S32:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-NEXT: [[ELT3:%[0-9]+]]:_(s16) = G_FPTRUNC [[ELT3_S32]](s32)
+
+ ; CHECK-DAG: [[V4_S32:%[0-9]+]]:_(s32) = G_FPEXT [[V4]](s16)
+ ; CHECK-NEXT: ADJCALLSTACKDOWN
+ ; CHECK-NEXT: $s0 = COPY [[V4_S32]](s32)
+ ; CHECK-NEXT: BL &tanf
+ ; CHECK-NEXT: ADJCALLSTACKUP
+ ; CHECK-NEXT: [[ELT4_S32:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-NEXT: [[ELT4:%[0-9]+]]:_(s16) = G_FPTRUNC [[ELT4_S32]](s32)
+
+ ; CHECK-DAG: %{{[0-9]+}}:_(<4 x s16>) = G_BUILD_VECTOR [[ELT1]](s16), [[ELT2]](s16), [[ELT3]](s16), [[ELT4]](s16)
+
+ %0:_(<4 x s16>) = COPY $d0
+ %1:_(<4 x s16>) = G_FTAN %0
+ $d0 = COPY %1(<4 x s16>)
+ RET_ReallyLR implicit $d0
+
+...
+---
+name: test_v8f16.tan
+alignment: 4
+tracksRegLiveness: true
+registers:
+ - { id: 0, class: _ }
+ - { id: 1, class: _ }
+body: |
+ bb.0:
+ liveins: $q0
+
+ ; CHECK-LABEL: name: test_v8f16.tan
+
+ ; This is big, so let's just check for the 8 calls to tanf, the the
+ ; G_UNMERGE_VALUES, and the G_BUILD_VECTOR. The other instructions ought
+ ; to be covered by the other tests.
+
+ ; CHECK: G_UNMERGE_VALUES
+ ; CHECK: BL &tanf
+ ; CHECK: BL &tanf
+ ; CHECK: BL &tanf
+ ; CHECK: BL &tanf
+ ; CHECK: BL &tanf
+ ; CHECK: BL &tanf
+ ; CHECK: BL &tanf
+ ; CHECK: BL &tanf
+ ; CHECK: G_BUILD_VECTOR
+
+ %0:_(<8 x s16>) = COPY $q0
+ %1:_(<8 x s16>) = G_FTAN %0
+ $q0 = COPY %1(<8 x s16>)
+ RET_ReallyLR implicit $q0
+
+...
+---
+name: test_v2f32.tan
+alignment: 4
+tracksRegLiveness: true
+registers:
+ - { id: 0, class: _ }
+ - { id: 1, class: _ }
+body: |
+ bb.0:
+ liveins: $d0
+
+ ; CHECK-LABEL: name: test_v2f32.tan
+ ; CHECK: [[V1:%[0-9]+]]:_(s32), [[V2:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES %{{[0-9]+}}(<2 x s32>)
+
+ ; CHECK-NEXT: ADJCALLSTACKDOWN
+ ; CHECK-DAG: $s0 = COPY [[V1]](s32)
+ ; CHECK-DAG: BL &tanf
+ ; CHECK-DAG: [[ELT1:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-DAG: ADJCALLSTACKUP
+
+ ; CHECK-DAG: ADJCALLSTACKDOWN
+ ; CHECK-DAG: $s0 = COPY [[V2]](s32)
+ ; CHECK-DAG: BL &tanf
+ ; CHECK-DAG: [[ELT2:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-DAG: ADJCALLSTACKUP
+
+ ; CHECK-DAG: %1:_(<2 x s32>) = G_BUILD_VECTOR [[ELT1]](s32), [[ELT2]](s32)
+
+ %0:_(<2 x s32>) = COPY $d0
+ %1:_(<2 x s32>) = G_FTAN %0
+ $d0 = COPY %1(<2 x s32>)
+ RET_ReallyLR implicit $d0
+
+...
+---
+name: test_v4f32.tan
+alignment: 4
+tracksRegLiveness: true
+registers:
+ - { id: 0, class: _ }
+ - { id: 1, class: _ }
+body: |
+ bb.0:
+ liveins: $q0
+ ; CHECK-LABEL: name: test_v4f32.tan
+ ; CHECK: [[V1:%[0-9]+]]:_(s32), [[V2:%[0-9]+]]:_(s32), [[V3:%[0-9]+]]:_(s32), [[V4:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES %{{[0-9]+}}(<4 x s32>)
+
+ ; CHECK-NEXT: ADJCALLSTACKDOWN
+ ; CHECK-DAG: $s0 = COPY [[V1]](s32)
+ ; CHECK-DAG: BL &tanf
+ ; CHECK-DAG: [[ELT1:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-DAG: ADJCALLSTACKUP
+
+ ; CHECK-DAG: ADJCALLSTACKDOWN
+ ; CHECK-DAG: $s0 = COPY [[V2]](s32)
+ ; CHECK-DAG: BL &tanf
+ ; CHECK-DAG: [[ELT2:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-DAG: ADJCALLSTACKUP
+
+ ; CHECK-DAG: ADJCALLSTACKDOWN
+ ; CHECK-DAG: $s0 = COPY [[V3]](s32)
+ ; CHECK-DAG: BL &tanf
+ ; CHECK-DAG: [[ELT3:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-DAG: ADJCALLSTACKUP
+
+ ; CHECK-DAG: ADJCALLSTACKDOWN
+ ; CHECK-DAG: $s0 = COPY [[V4]](s32)
+ ; CHECK-DAG: BL &tanf
+ ; CHECK-DAG: [[ELT4:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-DAG: ADJCALLSTACKUP
+
+ ; CHECK-DAG: %1:_(<4 x s32>) = G_BUILD_VECTOR [[ELT1]](s32), [[ELT2]](s32), [[ELT3]](s32), [[ELT4]](s32)
+
+ %0:_(<4 x s32>) = COPY $q0
+ %1:_(<4 x s32>) = G_FTAN %0
+ $q0 = COPY %1(<4 x s32>)
+ RET_ReallyLR implicit $q0
+
+...
+---
+name: test_v2f64.tan
+alignment: 4
+tracksRegLiveness: true
+registers:
+ - { id: 0, class: _ }
+ - { id: 1, class: _ }
+body: |
+ bb.0:
+ liveins: $q0
+
+ ; CHECK-LABEL: name: test_v2f64.tan
+ ; CHECK: [[V1:%[0-9]+]]:_(s64), [[V2:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES %{{[0-9]+}}(<2 x s64>)
+
+ ; CHECK-NEXT: ADJCALLSTACKDOWN
+ ; CHECK-DAG: $d0 = COPY [[V1]](s64)
+ ; CHECK-DAG: BL &tan
+ ; CHECK-DAG: [[ELT1:%[0-9]+]]:_(s64) = COPY $d0
+ ; CHECK-DAG: ADJCALLSTACKUP
+
+ ; CHECK-DAG: ADJCALLSTACKDOWN
+ ; CHECK-DAG: $d0 = COPY [[V2]](s64)
+ ; CHECK-DAG: BL &tan
+ ; CHECK-DAG: [[ELT2:%[0-9]+]]:_(s64) = COPY $d0
+ ; CHECK-DAG: ADJCALLSTACKUP
+
+ ; CHECK-DAG: %1:_(<2 x s64>) = G_BUILD_VECTOR [[ELT1]](s64), [[ELT2]](s64)
+
+ %0:_(<2 x s64>) = COPY $q0
+ %1:_(<2 x s64>) = G_FTAN %0
+ $q0 = COPY %1(<2 x s64>)
+ RET_ReallyLR implicit $q0
+
+...
+---
+name: test_tan_half
+alignment: 4
+tracksRegLiveness: true
+registers:
+ - { id: 0, class: _ }
+ - { id: 1, class: _ }
+body: |
+ bb.0:
+ liveins: $h0
+ ; CHECK-LABEL: name: test_tan_half
+ ; CHECK: [[REG1:%[0-9]+]]:_(s32) = G_FPEXT %0(s16)
+ ; CHECK-NEXT: ADJCALLSTACKDOWN
+ ; CHECK-NEXT: $s0 = COPY [[REG1]](s32)
+ ; CHECK-NEXT: BL &tanf
+ ; CHECK-NEXT: ADJCALLSTACKUP
+ ; CHECK-NEXT: [[REG2:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-NEXT: [[RES:%[0-9]+]]:_(s16) = G_FPTRUNC [[REG2]](s32)
+
+ %0:_(s16) = COPY $h0
+ %1:_(s16) = G_FTAN %0
+ $h0 = COPY %1(s16)
+ RET_ReallyLR implicit $h0
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
index d71111b57efe5..04c9a08c50038 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
@@ -675,8 +675,9 @@
# DEBUG-NEXT: .. the first uncovered type index: 1, OK
# DEBUG-NEXT: .. the first uncovered imm index: 0, OK
# DEBUG-NEXT: G_FTAN (opcode {{[0-9]+}}): 1 type index, 0 imm indices
-# DEBUG-NEXT: .. type index coverage check SKIPPED: no rules defined
-# DEBUG-NEXT: .. imm index coverage check SKIPPED: no rules defined
+# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
+# DEBUG-NEXT: .. the first uncovered type index: 1, OK
+# DEBUG-NEXT: .. the first uncovered imm index: 0, OK
# DEBUG-NEXT: G_FSQRT (opcode {{[0-9]+}}): 1 type index, 0 imm indices
# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
diff --git a/llvm/test/CodeGen/AArch64/f16-instructions.ll b/llvm/test/CodeGen/AArch64/f16-instructions.ll
index fa362ae798aa8..998f7dd38c3f5 100644
--- a/llvm/test/CodeGen/AArch64/f16-instructions.ll
+++ b/llvm/test/CodeGen/AArch64/f16-instructions.ll
@@ -759,6 +759,7 @@ declare half @llvm.sqrt.f16(half %a) #0
declare half @llvm.powi.f16.i32(half %a, i32 %b) #0
declare half @llvm.sin.f16(half %a) #0
declare half @llvm.cos.f16(half %a) #0
+declare half @llvm.tan.f16(half %a) #0
declare half @llvm.pow.f16(half %a, half %b) #0
declare half @llvm.exp.f16(half %a) #0
declare half @llvm.exp2.f16(half %a) #0
@@ -870,6 +871,31 @@ define half @test_cos(half %a) #0 {
ret half %r
}
+; FALLBACK-NOT: remark:{{.*}}test_tan
+; FALLBACK-FP16-NOT: remark:{{.*}}test_tan
+
+; CHECK-COMMON-LABEL: test_tan:
+; CHECK-COMMON-NEXT: stp x29, x30, [sp, #-16]!
+; CHECK-COMMON-NEXT: mov x29, sp
+; CHECK-COMMON-NEXT: fcvt s0, h0
+; CHECK-COMMON-NEXT: bl {{_?}}tanf
+; CHECK-COMMON-NEXT: fcvt h0, s0
+; CHECK-COMMON-NEXT: ldp x29, x30, [sp], #16
+; CHECK-COMMON-NEXT: ret
+
+; GISEL-LABEL: test_tan:
+; GISEL-NEXT: stp x29, x30, [sp, #-16]!
+; GISEL-NEXT: mov x29, sp
+; GISEL-NEXT: fcvt s0, h0
+; GISEL-NEXT: bl {{_?}}tanf
+; GISEL-NEXT: fcvt h0, s0
+; GISEL-NEXT: ldp x29, ...
[truncated]
|
@llvm/pr-subscribers-llvm-globalisel Author: Farzon Lotfi (farzonl) ChangesThis change is an implementation of #87367 investigation on supporting IEEE math operations as intrinsics. This PR is just for Tan. Now that x86 tan backend landed: #90503 we are one step closer to supporting constraint intrinsics for Tan that will be durable. Changes:
Patch is 39.13 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94545.diff 14 Files Affected:
diff --git a/llvm/include/llvm/Analysis/VecFuncs.def b/llvm/include/llvm/Analysis/VecFuncs.def
index 402189769c20c..ffdf8b8c3bc79 100644
--- a/llvm/include/llvm/Analysis/VecFuncs.def
+++ b/llvm/include/llvm/Analysis/VecFuncs.def
@@ -92,6 +92,11 @@ TLI_DEFINE_VECFUNC("llvm.sin.f64", "_simd_sin_d2", FIXED(2), "_ZGV_LLVM_N2v")
TLI_DEFINE_VECFUNC("sinf", "_simd_sin_f4", FIXED(4), "_ZGV_LLVM_N4v")
TLI_DEFINE_VECFUNC("llvm.sin.f32", "_simd_sin_f4", FIXED(4), "_ZGV_LLVM_N4v")
+TLI_DEFINE_VECFUNC("tan", "_simd_tan_d2", FIXED(2), "_ZGV_LLVM_N2v")
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "_simd_tan_d2", FIXED(2), "_ZGV_LLVM_N2v")
+TLI_DEFINE_VECFUNC("tanf", "_simd_tan_f4", FIXED(4), "_ZGV_LLVM_N4v")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "_simd_tan_f4", FIXED(4), "_ZGV_LLVM_N4v")
+
// Floating-Point Arithmetic and Auxiliary Functions
TLI_DEFINE_VECFUNC("cbrt", "_simd_cbrt_d2", FIXED(2), "_ZGV_LLVM_N2v")
TLI_DEFINE_VECFUNC("cbrtf", "_simd_cbrt_f4", FIXED(4), "_ZGV_LLVM_N4v")
@@ -584,6 +589,7 @@ TLI_DEFINE_VECFUNC("sinpi", "_ZGVnN2v_sinpi", FIXED(2), "_ZGV_LLVM_N2v")
TLI_DEFINE_VECFUNC("sqrt", "_ZGVnN2v_sqrt", FIXED(2), "_ZGV_LLVM_N2v")
TLI_DEFINE_VECFUNC("tan", "_ZGVnN2v_tan", FIXED(2), "_ZGV_LLVM_N2v")
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "_ZGVnN2v_tan", FIXED(2), "_ZGV_LLVM_N2v")
TLI_DEFINE_VECFUNC("tanh", "_ZGVnN2v_tanh", FIXED(2), "_ZGV_LLVM_N2v")
@@ -681,6 +687,7 @@ TLI_DEFINE_VECFUNC("sinpif", "_ZGVnN4v_sinpif", FIXED(4), "_ZGV_LLVM_N4v")
TLI_DEFINE_VECFUNC("sqrtf", "_ZGVnN4v_sqrtf", FIXED(4), "_ZGV_LLVM_N4v")
TLI_DEFINE_VECFUNC("tanf", "_ZGVnN4v_tanf", FIXED(4), "_ZGV_LLVM_N4v")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "_ZGVnN4v_tanf", FIXED(4), "_ZGV_LLVM_N4v")
TLI_DEFINE_VECFUNC("tanhf", "_ZGVnN4v_tanhf", FIXED(4), "_ZGV_LLVM_N4v")
@@ -828,6 +835,8 @@ TLI_DEFINE_VECFUNC("sqrtf", "_ZGVsMxv_sqrtf", SCALABLE(4), MASKED, "_ZGVsMxv")
TLI_DEFINE_VECFUNC("tan", "_ZGVsMxv_tan", SCALABLE(2), MASKED, "_ZGVsMxv")
TLI_DEFINE_VECFUNC("tanf", "_ZGVsMxv_tanf", SCALABLE(4), MASKED, "_ZGVsMxv")
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "_ZGVsMxv_tan", SCALABLE(2), MASKED, "_ZGVsMxv")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "_ZGVsMxv_tanf", SCALABLE(4), MASKED, "_ZGVsMxv")
TLI_DEFINE_VECFUNC("tanh", "_ZGVsMxv_tanh", SCALABLE(2), MASKED, "_ZGVsMxv")
TLI_DEFINE_VECFUNC("tanhf", "_ZGVsMxv_tanhf", SCALABLE(4), MASKED, "_ZGVsMxv")
@@ -1087,6 +1096,11 @@ TLI_DEFINE_VECFUNC("tanf", "armpl_vtanq_f32", FIXED(4), NOMASK, "_ZGV_LLVM_N4v")
TLI_DEFINE_VECFUNC("tan", "armpl_svtan_f64_x", SCALABLE(2), MASKED, "_ZGVsMxv")
TLI_DEFINE_VECFUNC("tanf", "armpl_svtan_f32_x", SCALABLE(4), MASKED, "_ZGVsMxv")
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "armpl_vtanq_f64", FIXED(2), NOMASK, "_ZGV_LLVM_N2v")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "armpl_vtanq_f32", FIXED(4), NOMASK, "_ZGV_LLVM_N4v")
+TLI_DEFINE_VECFUNC("llvm.tan.f64", "armpl_svtan_f64_x", SCALABLE(2), MASKED, "_ZGVsMxv")
+TLI_DEFINE_VECFUNC("llvm.tan.f32", "armpl_svtan_f32_x", SCALABLE(4), MASKED, "_ZGVsMxv")
+
TLI_DEFINE_VECFUNC("tanh", "armpl_vtanhq_f64", FIXED(2), NOMASK, "_ZGV_LLVM_N2v")
TLI_DEFINE_VECFUNC("tanhf", "armpl_vtanhq_f32", FIXED(4), NOMASK, "_ZGV_LLVM_N4v")
TLI_DEFINE_VECFUNC("tanh", "armpl_svtanh_f64_x", SCALABLE(2), MASKED, "_ZGVsMxv")
diff --git a/llvm/lib/Target/AArch64/AArch64FastISel.cpp b/llvm/lib/Target/AArch64/AArch64FastISel.cpp
index 62cf6a2c47acc..56fd15f23363d 100644
--- a/llvm/lib/Target/AArch64/AArch64FastISel.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FastISel.cpp
@@ -3534,6 +3534,7 @@ bool AArch64FastISel::fastLowerIntrinsicCall(const IntrinsicInst *II) {
}
case Intrinsic::sin:
case Intrinsic::cos:
+ case Intrinsic::tan:
case Intrinsic::pow: {
MVT RetVT;
if (!isTypeLegal(II->getType(), RetVT))
@@ -3542,11 +3543,11 @@ bool AArch64FastISel::fastLowerIntrinsicCall(const IntrinsicInst *II) {
if (RetVT != MVT::f32 && RetVT != MVT::f64)
return false;
- static const RTLIB::Libcall LibCallTable[3][2] = {
- { RTLIB::SIN_F32, RTLIB::SIN_F64 },
- { RTLIB::COS_F32, RTLIB::COS_F64 },
- { RTLIB::POW_F32, RTLIB::POW_F64 }
- };
+ static const RTLIB::Libcall LibCallTable[4][2] = {
+ {RTLIB::SIN_F32, RTLIB::SIN_F64},
+ {RTLIB::COS_F32, RTLIB::COS_F64},
+ {RTLIB::TAN_F32, RTLIB::TAN_F64},
+ {RTLIB::POW_F32, RTLIB::POW_F64}};
RTLIB::Libcall LC;
bool Is64Bit = RetVT == MVT::f64;
switch (II->getIntrinsicID()) {
@@ -3558,9 +3559,12 @@ bool AArch64FastISel::fastLowerIntrinsicCall(const IntrinsicInst *II) {
case Intrinsic::cos:
LC = LibCallTable[1][Is64Bit];
break;
- case Intrinsic::pow:
+ case Intrinsic::tan:
LC = LibCallTable[2][Is64Bit];
break;
+ case Intrinsic::pow:
+ LC = LibCallTable[3][Is64Bit];
+ break;
}
ArgListTy Args;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 360a841bdade4..a8cf8f65029d3 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -543,6 +543,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::FSINCOS, MVT::f128, Expand);
setOperationAction(ISD::FSQRT, MVT::f128, Expand);
setOperationAction(ISD::FSUB, MVT::f128, LibCall);
+ setOperationAction(ISD::FTAN, MVT::f128, Expand);
setOperationAction(ISD::FTRUNC, MVT::f128, Expand);
setOperationAction(ISD::SETCC, MVT::f128, Custom);
setOperationAction(ISD::STRICT_FSETCC, MVT::f128, Custom);
@@ -727,14 +728,14 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::FCOPYSIGN, MVT::bf16, Promote);
}
- for (auto Op : {ISD::FREM, ISD::FPOW, ISD::FPOWI,
- ISD::FCOS, ISD::FSIN, ISD::FSINCOS,
- ISD::FEXP, ISD::FEXP2, ISD::FEXP10,
- ISD::FLOG, ISD::FLOG2, ISD::FLOG10,
- ISD::STRICT_FREM,
- ISD::STRICT_FPOW, ISD::STRICT_FPOWI, ISD::STRICT_FCOS,
- ISD::STRICT_FSIN, ISD::STRICT_FEXP, ISD::STRICT_FEXP2,
- ISD::STRICT_FLOG, ISD::STRICT_FLOG2, ISD::STRICT_FLOG10}) {
+ for (auto Op : {ISD::FREM, ISD::FPOW, ISD::FPOWI,
+ ISD::FCOS, ISD::FSIN, ISD::FSINCOS,
+ ISD::FTAN, ISD::FEXP, ISD::FEXP2,
+ ISD::FEXP10, ISD::FLOG, ISD::FLOG2,
+ ISD::FLOG10, ISD::STRICT_FREM, ISD::STRICT_FPOW,
+ ISD::STRICT_FPOWI, ISD::STRICT_FCOS, ISD::STRICT_FSIN,
+ ISD::STRICT_FEXP, ISD::STRICT_FEXP2, ISD::STRICT_FLOG,
+ ISD::STRICT_FLOG2, ISD::STRICT_FLOG10}) {
setOperationAction(Op, MVT::f16, Promote);
setOperationAction(Op, MVT::v4f16, Expand);
setOperationAction(Op, MVT::v8f16, Expand);
@@ -1171,13 +1172,15 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
if (Subtarget->isNeonAvailable()) {
// FIXME: v1f64 shouldn't be legal if we can avoid it, because it leads to
// silliness like this:
+ // clang-format off
for (auto Op :
{ISD::SELECT, ISD::SELECT_CC,
ISD::BR_CC, ISD::FADD, ISD::FSUB,
ISD::FMUL, ISD::FDIV, ISD::FMA,
ISD::FNEG, ISD::FABS, ISD::FCEIL,
ISD::FSQRT, ISD::FFLOOR, ISD::FNEARBYINT,
- ISD::FSIN, ISD::FCOS, ISD::FPOW,
+ ISD::FSIN, ISD::FCOS, ISD::FTAN,
+ ISD::FPOW,
ISD::FLOG, ISD::FLOG2, ISD::FLOG10,
ISD::FEXP, ISD::FEXP2, ISD::FEXP10,
ISD::FRINT, ISD::FROUND, ISD::FROUNDEVEN,
@@ -1190,7 +1193,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
ISD::STRICT_FMINNUM, ISD::STRICT_FMAXNUM, ISD::STRICT_FMINIMUM,
ISD::STRICT_FMAXIMUM})
setOperationAction(Op, MVT::v1f64, Expand);
-
+ // clang-format on
for (auto Op :
{ISD::FP_TO_SINT, ISD::FP_TO_UINT, ISD::SINT_TO_FP, ISD::UINT_TO_FP,
ISD::FP_ROUND, ISD::FP_TO_SINT_SAT, ISD::FP_TO_UINT_SAT, ISD::MUL,
@@ -1622,6 +1625,7 @@ AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::FCOS, VT, Expand);
setOperationAction(ISD::FSIN, VT, Expand);
setOperationAction(ISD::FSINCOS, VT, Expand);
+ setOperationAction(ISD::FTAN, VT, Expand);
setOperationAction(ISD::FEXP, VT, Expand);
setOperationAction(ISD::FEXP2, VT, Expand);
setOperationAction(ISD::FEXP10, VT, Expand);
@@ -1803,6 +1807,7 @@ void AArch64TargetLowering::addTypeForNEON(MVT VT) {
if (VT == MVT::v2f32 || VT == MVT::v4f32 || VT == MVT::v2f64) {
setOperationAction(ISD::FSIN, VT, Expand);
setOperationAction(ISD::FCOS, VT, Expand);
+ setOperationAction(ISD::FTAN, VT, Expand);
setOperationAction(ISD::FPOW, VT, Expand);
setOperationAction(ISD::FLOG, VT, Expand);
setOperationAction(ISD::FLOG2, VT, Expand);
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index 07a0473888ee5..42cd43c3afa37 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -267,9 +267,8 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
.libcallFor({{s64, s128}})
.minScalarOrElt(1, MinFPScalar);
- getActionDefinitionsBuilder(
- {G_FCOS, G_FSIN, G_FPOW, G_FLOG, G_FLOG2, G_FLOG10,
- G_FEXP, G_FEXP2, G_FEXP10})
+ getActionDefinitionsBuilder({G_FCOS, G_FSIN, G_FPOW, G_FLOG, G_FLOG2,
+ G_FLOG10, G_FTAN, G_FEXP, G_FEXP2, G_FEXP10})
// We need a call for these, so we always need to scalarize.
.scalarize(0)
// Regardless of FP16 support, widen 16-bit elements to 32-bits.
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
index a61931b898aea..eb94cc5d0fb61 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
@@ -2313,6 +2313,14 @@ define float @test_sin_f32(float %x) {
ret float %y
}
+declare float @llvm.tan.f32(float)
+define float @test_tan_f32(float %x) {
+ ; CHECK-LABEL: name: test_tan_f32
+ ; CHECK: %{{[0-9]+}}:_(s32) = G_FTAN %{{[0-9]+}}
+ %y = call float @llvm.tan.f32(float %x)
+ ret float %y
+}
+
declare float @llvm.sqrt.f32(float)
define float @test_sqrt_f32(float %x) {
; CHECK-LABEL: name: test_sqrt_f32
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalize-tan.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-tan.mir
new file mode 100644
index 0000000000000..455aae0128364
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalize-tan.mir
@@ -0,0 +1,227 @@
+# RUN: llc -verify-machineinstrs -mtriple aarch64--- \
+# RUN: -run-pass=legalizer -mattr=+fullfp16 -global-isel %s -o - \
+# RUN: | FileCheck %s
+...
+---
+name: test_v4f16.tan
+alignment: 4
+tracksRegLiveness: true
+registers:
+ - { id: 0, class: _ }
+ - { id: 1, class: _ }
+body: |
+ bb.0:
+ liveins: $d0
+ ; CHECK-LABEL: name: test_v4f16.tan
+ ; CHECK: [[V1:%[0-9]+]]:_(s16), [[V2:%[0-9]+]]:_(s16), [[V3:%[0-9]+]]:_(s16), [[V4:%[0-9]+]]:_(s16) = G_UNMERGE_VALUES %{{[0-9]+}}(<4 x s16>)
+
+ ; CHECK-DAG: [[V1_S32:%[0-9]+]]:_(s32) = G_FPEXT [[V1]](s16)
+ ; CHECK-NEXT: ADJCALLSTACKDOWN
+ ; CHECK-NEXT: $s0 = COPY [[V1_S32]](s32)
+ ; CHECK-NEXT: BL &tanf
+ ; CHECK-NEXT: ADJCALLSTACKUP
+ ; CHECK-NEXT: [[ELT1_S32:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-NEXT: [[ELT1:%[0-9]+]]:_(s16) = G_FPTRUNC [[ELT1_S32]](s32)
+
+ ; CHECK-DAG: [[V2_S32:%[0-9]+]]:_(s32) = G_FPEXT [[V2]](s16)
+ ; CHECK-NEXT: ADJCALLSTACKDOWN
+ ; CHECK-NEXT: $s0 = COPY [[V2_S32]](s32)
+ ; CHECK-NEXT: BL &tanf
+ ; CHECK-NEXT: ADJCALLSTACKUP
+ ; CHECK-NEXT: [[ELT2_S32:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-NEXT: [[ELT2:%[0-9]+]]:_(s16) = G_FPTRUNC [[ELT2_S32]](s32)
+
+ ; CHECK-DAG: [[V3_S32:%[0-9]+]]:_(s32) = G_FPEXT [[V3]](s16)
+ ; CHECK-NEXT: ADJCALLSTACKDOWN
+ ; CHECK-NEXT: $s0 = COPY [[V3_S32]](s32)
+ ; CHECK-NEXT: BL &tanf
+ ; CHECK-NEXT: ADJCALLSTACKUP
+ ; CHECK-NEXT: [[ELT3_S32:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-NEXT: [[ELT3:%[0-9]+]]:_(s16) = G_FPTRUNC [[ELT3_S32]](s32)
+
+ ; CHECK-DAG: [[V4_S32:%[0-9]+]]:_(s32) = G_FPEXT [[V4]](s16)
+ ; CHECK-NEXT: ADJCALLSTACKDOWN
+ ; CHECK-NEXT: $s0 = COPY [[V4_S32]](s32)
+ ; CHECK-NEXT: BL &tanf
+ ; CHECK-NEXT: ADJCALLSTACKUP
+ ; CHECK-NEXT: [[ELT4_S32:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-NEXT: [[ELT4:%[0-9]+]]:_(s16) = G_FPTRUNC [[ELT4_S32]](s32)
+
+ ; CHECK-DAG: %{{[0-9]+}}:_(<4 x s16>) = G_BUILD_VECTOR [[ELT1]](s16), [[ELT2]](s16), [[ELT3]](s16), [[ELT4]](s16)
+
+ %0:_(<4 x s16>) = COPY $d0
+ %1:_(<4 x s16>) = G_FTAN %0
+ $d0 = COPY %1(<4 x s16>)
+ RET_ReallyLR implicit $d0
+
+...
+---
+name: test_v8f16.tan
+alignment: 4
+tracksRegLiveness: true
+registers:
+ - { id: 0, class: _ }
+ - { id: 1, class: _ }
+body: |
+ bb.0:
+ liveins: $q0
+
+ ; CHECK-LABEL: name: test_v8f16.tan
+
+ ; This is big, so let's just check for the 8 calls to tanf, the the
+ ; G_UNMERGE_VALUES, and the G_BUILD_VECTOR. The other instructions ought
+ ; to be covered by the other tests.
+
+ ; CHECK: G_UNMERGE_VALUES
+ ; CHECK: BL &tanf
+ ; CHECK: BL &tanf
+ ; CHECK: BL &tanf
+ ; CHECK: BL &tanf
+ ; CHECK: BL &tanf
+ ; CHECK: BL &tanf
+ ; CHECK: BL &tanf
+ ; CHECK: BL &tanf
+ ; CHECK: G_BUILD_VECTOR
+
+ %0:_(<8 x s16>) = COPY $q0
+ %1:_(<8 x s16>) = G_FTAN %0
+ $q0 = COPY %1(<8 x s16>)
+ RET_ReallyLR implicit $q0
+
+...
+---
+name: test_v2f32.tan
+alignment: 4
+tracksRegLiveness: true
+registers:
+ - { id: 0, class: _ }
+ - { id: 1, class: _ }
+body: |
+ bb.0:
+ liveins: $d0
+
+ ; CHECK-LABEL: name: test_v2f32.tan
+ ; CHECK: [[V1:%[0-9]+]]:_(s32), [[V2:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES %{{[0-9]+}}(<2 x s32>)
+
+ ; CHECK-NEXT: ADJCALLSTACKDOWN
+ ; CHECK-DAG: $s0 = COPY [[V1]](s32)
+ ; CHECK-DAG: BL &tanf
+ ; CHECK-DAG: [[ELT1:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-DAG: ADJCALLSTACKUP
+
+ ; CHECK-DAG: ADJCALLSTACKDOWN
+ ; CHECK-DAG: $s0 = COPY [[V2]](s32)
+ ; CHECK-DAG: BL &tanf
+ ; CHECK-DAG: [[ELT2:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-DAG: ADJCALLSTACKUP
+
+ ; CHECK-DAG: %1:_(<2 x s32>) = G_BUILD_VECTOR [[ELT1]](s32), [[ELT2]](s32)
+
+ %0:_(<2 x s32>) = COPY $d0
+ %1:_(<2 x s32>) = G_FTAN %0
+ $d0 = COPY %1(<2 x s32>)
+ RET_ReallyLR implicit $d0
+
+...
+---
+name: test_v4f32.tan
+alignment: 4
+tracksRegLiveness: true
+registers:
+ - { id: 0, class: _ }
+ - { id: 1, class: _ }
+body: |
+ bb.0:
+ liveins: $q0
+ ; CHECK-LABEL: name: test_v4f32.tan
+ ; CHECK: [[V1:%[0-9]+]]:_(s32), [[V2:%[0-9]+]]:_(s32), [[V3:%[0-9]+]]:_(s32), [[V4:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES %{{[0-9]+}}(<4 x s32>)
+
+ ; CHECK-NEXT: ADJCALLSTACKDOWN
+ ; CHECK-DAG: $s0 = COPY [[V1]](s32)
+ ; CHECK-DAG: BL &tanf
+ ; CHECK-DAG: [[ELT1:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-DAG: ADJCALLSTACKUP
+
+ ; CHECK-DAG: ADJCALLSTACKDOWN
+ ; CHECK-DAG: $s0 = COPY [[V2]](s32)
+ ; CHECK-DAG: BL &tanf
+ ; CHECK-DAG: [[ELT2:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-DAG: ADJCALLSTACKUP
+
+ ; CHECK-DAG: ADJCALLSTACKDOWN
+ ; CHECK-DAG: $s0 = COPY [[V3]](s32)
+ ; CHECK-DAG: BL &tanf
+ ; CHECK-DAG: [[ELT3:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-DAG: ADJCALLSTACKUP
+
+ ; CHECK-DAG: ADJCALLSTACKDOWN
+ ; CHECK-DAG: $s0 = COPY [[V4]](s32)
+ ; CHECK-DAG: BL &tanf
+ ; CHECK-DAG: [[ELT4:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-DAG: ADJCALLSTACKUP
+
+ ; CHECK-DAG: %1:_(<4 x s32>) = G_BUILD_VECTOR [[ELT1]](s32), [[ELT2]](s32), [[ELT3]](s32), [[ELT4]](s32)
+
+ %0:_(<4 x s32>) = COPY $q0
+ %1:_(<4 x s32>) = G_FTAN %0
+ $q0 = COPY %1(<4 x s32>)
+ RET_ReallyLR implicit $q0
+
+...
+---
+name: test_v2f64.tan
+alignment: 4
+tracksRegLiveness: true
+registers:
+ - { id: 0, class: _ }
+ - { id: 1, class: _ }
+body: |
+ bb.0:
+ liveins: $q0
+
+ ; CHECK-LABEL: name: test_v2f64.tan
+ ; CHECK: [[V1:%[0-9]+]]:_(s64), [[V2:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES %{{[0-9]+}}(<2 x s64>)
+
+ ; CHECK-NEXT: ADJCALLSTACKDOWN
+ ; CHECK-DAG: $d0 = COPY [[V1]](s64)
+ ; CHECK-DAG: BL &tan
+ ; CHECK-DAG: [[ELT1:%[0-9]+]]:_(s64) = COPY $d0
+ ; CHECK-DAG: ADJCALLSTACKUP
+
+ ; CHECK-DAG: ADJCALLSTACKDOWN
+ ; CHECK-DAG: $d0 = COPY [[V2]](s64)
+ ; CHECK-DAG: BL &tan
+ ; CHECK-DAG: [[ELT2:%[0-9]+]]:_(s64) = COPY $d0
+ ; CHECK-DAG: ADJCALLSTACKUP
+
+ ; CHECK-DAG: %1:_(<2 x s64>) = G_BUILD_VECTOR [[ELT1]](s64), [[ELT2]](s64)
+
+ %0:_(<2 x s64>) = COPY $q0
+ %1:_(<2 x s64>) = G_FTAN %0
+ $q0 = COPY %1(<2 x s64>)
+ RET_ReallyLR implicit $q0
+
+...
+---
+name: test_tan_half
+alignment: 4
+tracksRegLiveness: true
+registers:
+ - { id: 0, class: _ }
+ - { id: 1, class: _ }
+body: |
+ bb.0:
+ liveins: $h0
+ ; CHECK-LABEL: name: test_tan_half
+ ; CHECK: [[REG1:%[0-9]+]]:_(s32) = G_FPEXT %0(s16)
+ ; CHECK-NEXT: ADJCALLSTACKDOWN
+ ; CHECK-NEXT: $s0 = COPY [[REG1]](s32)
+ ; CHECK-NEXT: BL &tanf
+ ; CHECK-NEXT: ADJCALLSTACKUP
+ ; CHECK-NEXT: [[REG2:%[0-9]+]]:_(s32) = COPY $s0
+ ; CHECK-NEXT: [[RES:%[0-9]+]]:_(s16) = G_FPTRUNC [[REG2]](s32)
+
+ %0:_(s16) = COPY $h0
+ %1:_(s16) = G_FTAN %0
+ $h0 = COPY %1(s16)
+ RET_ReallyLR implicit $h0
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
index d71111b57efe5..04c9a08c50038 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
@@ -675,8 +675,9 @@
# DEBUG-NEXT: .. the first uncovered type index: 1, OK
# DEBUG-NEXT: .. the first uncovered imm index: 0, OK
# DEBUG-NEXT: G_FTAN (opcode {{[0-9]+}}): 1 type index, 0 imm indices
-# DEBUG-NEXT: .. type index coverage check SKIPPED: no rules defined
-# DEBUG-NEXT: .. imm index coverage check SKIPPED: no rules defined
+# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
+# DEBUG-NEXT: .. the first uncovered type index: 1, OK
+# DEBUG-NEXT: .. the first uncovered imm index: 0, OK
# DEBUG-NEXT: G_FSQRT (opcode {{[0-9]+}}): 1 type index, 0 imm indices
# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
diff --git a/llvm/test/CodeGen/AArch64/f16-instructions.ll b/llvm/test/CodeGen/AArch64/f16-instructions.ll
index fa362ae798aa8..998f7dd38c3f5 100644
--- a/llvm/test/CodeGen/AArch64/f16-instructions.ll
+++ b/llvm/test/CodeGen/AArch64/f16-instructions.ll
@@ -759,6 +759,7 @@ declare half @llvm.sqrt.f16(half %a) #0
declare half @llvm.powi.f16.i32(half %a, i32 %b) #0
declare half @llvm.sin.f16(half %a) #0
declare half @llvm.cos.f16(half %a) #0
+declare half @llvm.tan.f16(half %a) #0
declare half @llvm.pow.f16(half %a, half %b) #0
declare half @llvm.exp.f16(half %a) #0
declare half @llvm.exp2.f16(half %a) #0
@@ -870,6 +871,31 @@ define half @test_cos(half %a) #0 {
ret half %r
}
+; FALLBACK-NOT: remark:{{.*}}test_tan
+; FALLBACK-FP16-NOT: remark:{{.*}}test_tan
+
+; CHECK-COMMON-LABEL: test_tan:
+; CHECK-COMMON-NEXT: stp x29, x30, [sp, #-16]!
+; CHECK-COMMON-NEXT: mov x29, sp
+; CHECK-COMMON-NEXT: fcvt s0, h0
+; CHECK-COMMON-NEXT: bl {{_?}}tanf
+; CHECK-COMMON-NEXT: fcvt h0, s0
+; CHECK-COMMON-NEXT: ldp x29, x30, [sp], #16
+; CHECK-COMMON-NEXT: ret
+
+; GISEL-LABEL: test_tan:
+; GISEL-NEXT: stp x29, x30, [sp, #-16]!
+; GISEL-NEXT: mov x29, sp
+; GISEL-NEXT: fcvt s0, h0
+; GISEL-NEXT: bl {{_?}}tanf
+; GISEL-NEXT: fcvt h0, s0
+; GISEL-NEXT: ldp x29, ...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume these are very similar to cos? It all looks sensible to me.
ISD::FCOS, ISD::FSIN, ISD::FSINCOS, | ||
ISD::FTAN, ISD::FEXP, ISD::FEXP2, | ||
ISD::FEXP10, ISD::FLOG, ISD::FLOG2, | ||
ISD::FLOG10, ISD::STRICT_FREM, ISD::STRICT_FPOW, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a llvm.experimental.constrained.tan.f64 yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats in this PR: #94559
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I see, great. And will that eventually involve adding testing for AArch64? Or is it best to add a quick STRICT_FTAN in here now? I know we have some tests for constrained.cos, and it doesn't look like much is needed to support them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks likere there are two places to add tests:
- llvm/test/CodeGen/AArch64/fp-intrinsics-fp16.ll
- llvm/test/CodeGen/AArch64/fp-intrinsics.ll
fp-intrinsics.ll
I added now (ce4e2a0) to PR #94559 and everything works.
fp-intrinsics-fp16.ll
will need to wait for this pr to merge before I add the tests because f16 is handled on a per target basis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add the sleef tests as well, in the files below?
- replace-with-veclib-sleef-scalable.ll
- replace-with-veclib-sleef.ll
This should be the equivalent you did for ArmPL (replace-with-veclib-armpl.ll).
Regarding the TLI mappings, the symbols do exist in the libraries SLEEF and ArmPL (libsleefgnuabi.so and libamath.so respectively).
The code changes look reasonable to me as well, but no strong opinion on areas I haven't touched (fast/global isels, darwin changes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM if there are no other comments from @paschalis-mpeis on the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@farzonl thanks a lot for adding the replace-with-veclib SLEEF tests. LGTM.
@davemgreen @paschalis-mpeis thank you both for the review! |
This change is an implementation of #87367 investigation on supporting IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
This PR is just for Tan.
Now that x86 tan backend landed: #90503 we can add other backends since the shared pieces are in tree now.
Changes:
llvm/include/llvm/Analysis/VecFuncs.def
- vectorization of tan for arm64 backends.llvm/lib/Target/AArch64/AArch64FastISel.cpp
- Add tan to the libcall tablellvm/lib/Target/AArch64/AArch64ISelLowering.cpp
- Add tan expansion for f128, f16, and vector\neon operationsllvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
defineG_FTAN
as a legal arm64 instructionresolves #94755