-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[AArch64][GlobalISel] Basic add_sat and sub_sat vector handling. #80650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: David Green (davemgreen) ChangesThis tries to fill in the basic vector handling for sadd_sat/uadd_sat and ssub_sat/usub_sat. It just handles the basics, marking legal types and clamping illegally sized vectors to legal ones. Patch is 59.65 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/80650.diff 8 Files Affected:
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index cbf5655706e69..d61520a0028bc 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -1103,9 +1103,6 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
.scalarize(1)
.lower();
- getActionDefinitionsBuilder({G_UADDSAT, G_USUBSAT})
- .lowerIf([=](const LegalityQuery &Q) { return Q.Types[0].isScalar(); });
-
getActionDefinitionsBuilder({G_FSHL, G_FSHR})
.customFor({{s32, s32}, {s32, s64}, {s64, s64}})
.lower();
@@ -1153,8 +1150,14 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
.minScalarEltSameAsIf(always, 1, 0)
.maxScalarEltSameAsIf(always, 1, 0);
- // TODO: Vector types.
- getActionDefinitionsBuilder({G_SADDSAT, G_SSUBSAT}).lowerIf(isScalar(0));
+ getActionDefinitionsBuilder({G_UADDSAT, G_SADDSAT, G_USUBSAT, G_SSUBSAT})
+ .legalFor({v2s64, v2s32, v4s32, v4s16, v8s16, v8s8, v16s8})
+ .clampNumElements(0, v8s8, v16s8)
+ .clampNumElements(0, v4s16, v8s16)
+ .clampNumElements(0, v2s32, v4s32)
+ .clampMaxNumElements(0, s64, 2)
+ .moreElementsToNextPow2(0)
+ .lowerIf(isScalar(0));
// TODO: Libcall support for s128.
// TODO: s16 should be legal with full FP16 support.
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
index c90c31aa27ef5..920437d4c09c7 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
@@ -392,6 +392,7 @@
# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
# DEBUG-NEXT: G_SADDSAT (opcode {{[0-9]+}}): 1 type index, 0 imm indices
+# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
# DEBUG-NEXT: G_USUBSAT (opcode {{[0-9]+}}): 1 type index, 0 imm indices
diff --git a/llvm/test/CodeGen/AArch64/sadd_sat.ll b/llvm/test/CodeGen/AArch64/sadd_sat.ll
index 9e09b7f9a4bd6..789fd7b20a7f9 100644
--- a/llvm/test/CodeGen/AArch64/sadd_sat.ll
+++ b/llvm/test/CodeGen/AArch64/sadd_sat.ll
@@ -2,8 +2,6 @@
; RUN: llc < %s -mtriple=aarch64-- | FileCheck %s --check-prefixes=CHECK,CHECK-SD
; RUN: llc < %s -mtriple=aarch64-- -global-isel -global-isel-abort=2 2>&1 | FileCheck %s --check-prefixes=CHECK,CHECK-GI
-; CHECK-GI: warning: Instruction selection used fallback path for vec
-
declare i4 @llvm.sadd.sat.i4(i4, i4)
declare i8 @llvm.sadd.sat.i8(i8, i8)
declare i16 @llvm.sadd.sat.i16(i16, i16)
diff --git a/llvm/test/CodeGen/AArch64/sadd_sat_vec.ll b/llvm/test/CodeGen/AArch64/sadd_sat_vec.ll
index 5f905d94e3573..0f1f42cacff6f 100644
--- a/llvm/test/CodeGen/AArch64/sadd_sat_vec.ll
+++ b/llvm/test/CodeGen/AArch64/sadd_sat_vec.ll
@@ -2,28 +2,11 @@
; RUN: llc < %s -mtriple=aarch64-- | FileCheck %s --check-prefixes=CHECK,CHECK-SD
; RUN: llc < %s -mtriple=aarch64-- -global-isel -global-isel-abort=2 2>&1 | FileCheck %s --check-prefixes=CHECK,CHECK-GI
-; CHECK-GI: warning: Instruction selection used fallback path for v16i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v32i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v64i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v8i16
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v16i16
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v32i16
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v8i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v4i8
+; CHECK-GI: warning: Instruction selection used fallback path for v4i8
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v4i16
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i16
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v12i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v12i16
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v16i4
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v16i1
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v4i32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v8i32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v16i32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i64
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v4i64
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v8i64
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i128
declare <1 x i8> @llvm.sadd.sat.v1i8(<1 x i8>, <1 x i8>)
@@ -67,23 +50,37 @@ define <16 x i8> @v16i8(<16 x i8> %x, <16 x i8> %y) nounwind {
}
define <32 x i8> @v32i8(<32 x i8> %x, <32 x i8> %y) nounwind {
-; CHECK-LABEL: v32i8:
-; CHECK: // %bb.0:
-; CHECK-NEXT: sqadd v1.16b, v1.16b, v3.16b
-; CHECK-NEXT: sqadd v0.16b, v0.16b, v2.16b
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v32i8:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: sqadd v1.16b, v1.16b, v3.16b
+; CHECK-SD-NEXT: sqadd v0.16b, v0.16b, v2.16b
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v32i8:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: sqadd v0.16b, v0.16b, v2.16b
+; CHECK-GI-NEXT: sqadd v1.16b, v1.16b, v3.16b
+; CHECK-GI-NEXT: ret
%z = call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> %x, <32 x i8> %y)
ret <32 x i8> %z
}
define <64 x i8> @v64i8(<64 x i8> %x, <64 x i8> %y) nounwind {
-; CHECK-LABEL: v64i8:
-; CHECK: // %bb.0:
-; CHECK-NEXT: sqadd v2.16b, v2.16b, v6.16b
-; CHECK-NEXT: sqadd v0.16b, v0.16b, v4.16b
-; CHECK-NEXT: sqadd v1.16b, v1.16b, v5.16b
-; CHECK-NEXT: sqadd v3.16b, v3.16b, v7.16b
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v64i8:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: sqadd v2.16b, v2.16b, v6.16b
+; CHECK-SD-NEXT: sqadd v0.16b, v0.16b, v4.16b
+; CHECK-SD-NEXT: sqadd v1.16b, v1.16b, v5.16b
+; CHECK-SD-NEXT: sqadd v3.16b, v3.16b, v7.16b
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v64i8:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: sqadd v0.16b, v0.16b, v4.16b
+; CHECK-GI-NEXT: sqadd v1.16b, v1.16b, v5.16b
+; CHECK-GI-NEXT: sqadd v2.16b, v2.16b, v6.16b
+; CHECK-GI-NEXT: sqadd v3.16b, v3.16b, v7.16b
+; CHECK-GI-NEXT: ret
%z = call <64 x i8> @llvm.sadd.sat.v64i8(<64 x i8> %x, <64 x i8> %y)
ret <64 x i8> %z
}
@@ -98,23 +95,37 @@ define <8 x i16> @v8i16(<8 x i16> %x, <8 x i16> %y) nounwind {
}
define <16 x i16> @v16i16(<16 x i16> %x, <16 x i16> %y) nounwind {
-; CHECK-LABEL: v16i16:
-; CHECK: // %bb.0:
-; CHECK-NEXT: sqadd v1.8h, v1.8h, v3.8h
-; CHECK-NEXT: sqadd v0.8h, v0.8h, v2.8h
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v16i16:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: sqadd v1.8h, v1.8h, v3.8h
+; CHECK-SD-NEXT: sqadd v0.8h, v0.8h, v2.8h
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v16i16:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: sqadd v0.8h, v0.8h, v2.8h
+; CHECK-GI-NEXT: sqadd v1.8h, v1.8h, v3.8h
+; CHECK-GI-NEXT: ret
%z = call <16 x i16> @llvm.sadd.sat.v16i16(<16 x i16> %x, <16 x i16> %y)
ret <16 x i16> %z
}
define <32 x i16> @v32i16(<32 x i16> %x, <32 x i16> %y) nounwind {
-; CHECK-LABEL: v32i16:
-; CHECK: // %bb.0:
-; CHECK-NEXT: sqadd v2.8h, v2.8h, v6.8h
-; CHECK-NEXT: sqadd v0.8h, v0.8h, v4.8h
-; CHECK-NEXT: sqadd v1.8h, v1.8h, v5.8h
-; CHECK-NEXT: sqadd v3.8h, v3.8h, v7.8h
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v32i16:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: sqadd v2.8h, v2.8h, v6.8h
+; CHECK-SD-NEXT: sqadd v0.8h, v0.8h, v4.8h
+; CHECK-SD-NEXT: sqadd v1.8h, v1.8h, v5.8h
+; CHECK-SD-NEXT: sqadd v3.8h, v3.8h, v7.8h
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v32i16:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: sqadd v0.8h, v0.8h, v4.8h
+; CHECK-GI-NEXT: sqadd v1.8h, v1.8h, v5.8h
+; CHECK-GI-NEXT: sqadd v2.8h, v2.8h, v6.8h
+; CHECK-GI-NEXT: sqadd v3.8h, v3.8h, v7.8h
+; CHECK-GI-NEXT: ret
%z = call <32 x i16> @llvm.sadd.sat.v32i16(<32 x i16> %x, <32 x i16> %y)
ret <32 x i16> %z
}
@@ -196,23 +207,41 @@ define void @v4i16(ptr %px, ptr %py, ptr %pz) nounwind {
}
define void @v2i16(ptr %px, ptr %py, ptr %pz) nounwind {
-; CHECK-LABEL: v2i16:
-; CHECK: // %bb.0:
-; CHECK-NEXT: ld1 { v0.h }[0], [x0]
-; CHECK-NEXT: ld1 { v1.h }[0], [x1]
-; CHECK-NEXT: add x8, x0, #2
-; CHECK-NEXT: add x9, x1, #2
-; CHECK-NEXT: ld1 { v0.h }[2], [x8]
-; CHECK-NEXT: ld1 { v1.h }[2], [x9]
-; CHECK-NEXT: shl v1.2s, v1.2s, #16
-; CHECK-NEXT: shl v0.2s, v0.2s, #16
-; CHECK-NEXT: sqadd v0.2s, v0.2s, v1.2s
-; CHECK-NEXT: ushr v0.2s, v0.2s, #16
-; CHECK-NEXT: mov w8, v0.s[1]
-; CHECK-NEXT: fmov w9, s0
-; CHECK-NEXT: strh w9, [x2]
-; CHECK-NEXT: strh w8, [x2, #2]
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v2i16:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: ld1 { v0.h }[0], [x0]
+; CHECK-SD-NEXT: ld1 { v1.h }[0], [x1]
+; CHECK-SD-NEXT: add x8, x0, #2
+; CHECK-SD-NEXT: add x9, x1, #2
+; CHECK-SD-NEXT: ld1 { v0.h }[2], [x8]
+; CHECK-SD-NEXT: ld1 { v1.h }[2], [x9]
+; CHECK-SD-NEXT: shl v1.2s, v1.2s, #16
+; CHECK-SD-NEXT: shl v0.2s, v0.2s, #16
+; CHECK-SD-NEXT: sqadd v0.2s, v0.2s, v1.2s
+; CHECK-SD-NEXT: ushr v0.2s, v0.2s, #16
+; CHECK-SD-NEXT: mov w8, v0.s[1]
+; CHECK-SD-NEXT: fmov w9, s0
+; CHECK-SD-NEXT: strh w9, [x2]
+; CHECK-SD-NEXT: strh w8, [x2, #2]
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v2i16:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: ldr h0, [x0]
+; CHECK-GI-NEXT: ldr h1, [x0, #2]
+; CHECK-GI-NEXT: ldr h2, [x1]
+; CHECK-GI-NEXT: ldr h3, [x1, #2]
+; CHECK-GI-NEXT: mov v0.h[1], v1.h[0]
+; CHECK-GI-NEXT: mov v2.h[1], v3.h[0]
+; CHECK-GI-NEXT: mov v0.h[2], v0.h[0]
+; CHECK-GI-NEXT: mov v2.h[2], v0.h[0]
+; CHECK-GI-NEXT: mov v0.h[3], v0.h[0]
+; CHECK-GI-NEXT: mov v2.h[3], v0.h[0]
+; CHECK-GI-NEXT: sqadd v0.4h, v0.4h, v2.4h
+; CHECK-GI-NEXT: mov h1, v0.h[1]
+; CHECK-GI-NEXT: str h0, [x2]
+; CHECK-GI-NEXT: str h1, [x2, #2]
+; CHECK-GI-NEXT: ret
%x = load <2 x i16>, ptr %px
%y = load <2 x i16>, ptr %py
%z = call <2 x i16> @llvm.sadd.sat.v2i16(<2 x i16> %x, <2 x i16> %y)
@@ -230,15 +259,67 @@ define <12 x i8> @v12i8(<12 x i8> %x, <12 x i8> %y) nounwind {
}
define void @v12i16(ptr %px, ptr %py, ptr %pz) nounwind {
-; CHECK-LABEL: v12i16:
-; CHECK: // %bb.0:
-; CHECK-NEXT: ldp q0, q3, [x1]
-; CHECK-NEXT: ldp q1, q2, [x0]
-; CHECK-NEXT: sqadd v0.8h, v1.8h, v0.8h
-; CHECK-NEXT: sqadd v1.8h, v2.8h, v3.8h
-; CHECK-NEXT: str q0, [x2]
-; CHECK-NEXT: str d1, [x2, #16]
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v12i16:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: ldp q0, q3, [x1]
+; CHECK-SD-NEXT: ldp q1, q2, [x0]
+; CHECK-SD-NEXT: sqadd v0.8h, v1.8h, v0.8h
+; CHECK-SD-NEXT: sqadd v1.8h, v2.8h, v3.8h
+; CHECK-SD-NEXT: str q0, [x2]
+; CHECK-SD-NEXT: str d1, [x2, #16]
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v12i16:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: ldr h0, [x0]
+; CHECK-GI-NEXT: ldr h2, [x0, #2]
+; CHECK-GI-NEXT: ldr h1, [x1]
+; CHECK-GI-NEXT: ldr h3, [x1, #2]
+; CHECK-GI-NEXT: ldr h4, [x1, #10]
+; CHECK-GI-NEXT: ldr h5, [x0, #18]
+; CHECK-GI-NEXT: mov v0.h[1], v2.h[0]
+; CHECK-GI-NEXT: ldr h2, [x0, #4]
+; CHECK-GI-NEXT: ldr h6, [x1, #16]
+; CHECK-GI-NEXT: mov v1.h[1], v3.h[0]
+; CHECK-GI-NEXT: ldr h3, [x1, #4]
+; CHECK-GI-NEXT: ldr h7, [x1, #18]
+; CHECK-GI-NEXT: mov v6.h[1], v7.h[0]
+; CHECK-GI-NEXT: ldr h7, [x1, #20]
+; CHECK-GI-NEXT: mov v0.h[2], v2.h[0]
+; CHECK-GI-NEXT: ldr h2, [x0, #6]
+; CHECK-GI-NEXT: mov v1.h[2], v3.h[0]
+; CHECK-GI-NEXT: ldr h3, [x1, #6]
+; CHECK-GI-NEXT: mov v6.h[2], v7.h[0]
+; CHECK-GI-NEXT: ldr h7, [x1, #22]
+; CHECK-GI-NEXT: mov v0.h[3], v2.h[0]
+; CHECK-GI-NEXT: ldr h2, [x0, #8]
+; CHECK-GI-NEXT: mov v1.h[3], v3.h[0]
+; CHECK-GI-NEXT: ldr h3, [x1, #8]
+; CHECK-GI-NEXT: mov v6.h[3], v7.h[0]
+; CHECK-GI-NEXT: mov v0.h[4], v2.h[0]
+; CHECK-GI-NEXT: ldr h2, [x0, #10]
+; CHECK-GI-NEXT: mov v1.h[4], v3.h[0]
+; CHECK-GI-NEXT: ldr h3, [x0, #16]
+; CHECK-GI-NEXT: mov v3.h[1], v5.h[0]
+; CHECK-GI-NEXT: ldr h5, [x0, #20]
+; CHECK-GI-NEXT: mov v0.h[5], v2.h[0]
+; CHECK-GI-NEXT: ldr h2, [x0, #12]
+; CHECK-GI-NEXT: mov v1.h[5], v4.h[0]
+; CHECK-GI-NEXT: ldr h4, [x1, #12]
+; CHECK-GI-NEXT: mov v3.h[2], v5.h[0]
+; CHECK-GI-NEXT: ldr h5, [x0, #22]
+; CHECK-GI-NEXT: mov v0.h[6], v2.h[0]
+; CHECK-GI-NEXT: ldr h2, [x0, #14]
+; CHECK-GI-NEXT: mov v1.h[6], v4.h[0]
+; CHECK-GI-NEXT: ldr h4, [x1, #14]
+; CHECK-GI-NEXT: mov v3.h[3], v5.h[0]
+; CHECK-GI-NEXT: mov v0.h[7], v2.h[0]
+; CHECK-GI-NEXT: mov v1.h[7], v4.h[0]
+; CHECK-GI-NEXT: sqadd v0.8h, v0.8h, v1.8h
+; CHECK-GI-NEXT: sqadd v1.4h, v3.4h, v6.4h
+; CHECK-GI-NEXT: str q0, [x2]
+; CHECK-GI-NEXT: str d1, [x2, #16]
+; CHECK-GI-NEXT: ret
%x = load <12 x i16>, ptr %px
%y = load <12 x i16>, ptr %py
%z = call <12 x i16> @llvm.sadd.sat.v12i16(<12 x i16> %x, <12 x i16> %y)
@@ -346,23 +427,37 @@ define <4 x i32> @v4i32(<4 x i32> %x, <4 x i32> %y) nounwind {
}
define <8 x i32> @v8i32(<8 x i32> %x, <8 x i32> %y) nounwind {
-; CHECK-LABEL: v8i32:
-; CHECK: // %bb.0:
-; CHECK-NEXT: sqadd v1.4s, v1.4s, v3.4s
-; CHECK-NEXT: sqadd v0.4s, v0.4s, v2.4s
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v8i32:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: sqadd v1.4s, v1.4s, v3.4s
+; CHECK-SD-NEXT: sqadd v0.4s, v0.4s, v2.4s
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v8i32:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: sqadd v0.4s, v0.4s, v2.4s
+; CHECK-GI-NEXT: sqadd v1.4s, v1.4s, v3.4s
+; CHECK-GI-NEXT: ret
%z = call <8 x i32> @llvm.sadd.sat.v8i32(<8 x i32> %x, <8 x i32> %y)
ret <8 x i32> %z
}
define <16 x i32> @v16i32(<16 x i32> %x, <16 x i32> %y) nounwind {
-; CHECK-LABEL: v16i32:
-; CHECK: // %bb.0:
-; CHECK-NEXT: sqadd v2.4s, v2.4s, v6.4s
-; CHECK-NEXT: sqadd v0.4s, v0.4s, v4.4s
-; CHECK-NEXT: sqadd v1.4s, v1.4s, v5.4s
-; CHECK-NEXT: sqadd v3.4s, v3.4s, v7.4s
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v16i32:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: sqadd v2.4s, v2.4s, v6.4s
+; CHECK-SD-NEXT: sqadd v0.4s, v0.4s, v4.4s
+; CHECK-SD-NEXT: sqadd v1.4s, v1.4s, v5.4s
+; CHECK-SD-NEXT: sqadd v3.4s, v3.4s, v7.4s
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v16i32:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: sqadd v0.4s, v0.4s, v4.4s
+; CHECK-GI-NEXT: sqadd v1.4s, v1.4s, v5.4s
+; CHECK-GI-NEXT: sqadd v2.4s, v2.4s, v6.4s
+; CHECK-GI-NEXT: sqadd v3.4s, v3.4s, v7.4s
+; CHECK-GI-NEXT: ret
%z = call <16 x i32> @llvm.sadd.sat.v16i32(<16 x i32> %x, <16 x i32> %y)
ret <16 x i32> %z
}
@@ -377,23 +472,37 @@ define <2 x i64> @v2i64(<2 x i64> %x, <2 x i64> %y) nounwind {
}
define <4 x i64> @v4i64(<4 x i64> %x, <4 x i64> %y) nounwind {
-; CHECK-LABEL: v4i64:
-; CHECK: // %bb.0:
-; CHECK-NEXT: sqadd v1.2d, v1.2d, v3.2d
-; CHECK-NEXT: sqadd v0.2d, v0.2d, v2.2d
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v4i64:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: sqadd v1.2d, v1.2d, v3.2d
+; CHECK-SD-NEXT: sqadd v0.2d, v0.2d, v2.2d
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v4i64:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: sqadd v0.2d, v0.2d, v2.2d
+; CHECK-GI-NEXT: sqadd v1.2d, v1.2d, v3.2d
+; CHECK-GI-NEXT: ret
%z = call <4 x i64> @llvm.sadd.sat.v4i64(<4 x i64> %x, <4 x i64> %y)
ret <4 x i64> %z
}
define <8 x i64> @v8i64(<8 x i64> %x, <8 x i64> %y) nounwind {
-; CHECK-LABEL: v8i64:
-; CHECK: // %bb.0:
-; CHECK-NEXT: sqadd v2.2d, v2.2d, v6.2d
-; CHECK-NEXT: sqadd v0.2d, v0.2d, v4.2d
-; CHECK-NEXT: sqadd v1.2d, v1.2d, v5.2d
-; CHECK-NEXT: sqadd v3.2d, v3.2d, v7.2d
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v8i64:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: sqadd v2.2d, v2.2d, v6.2d
+; CHECK-SD-NEXT: sqadd v0.2d, v0.2d, v4.2d
+; CHECK-SD-NEXT: sqadd v1.2d, v1.2d, v5.2d
+; CHECK-SD-NEXT: sqadd v3.2d, v3.2d, v7.2d
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v8i64:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: sqadd v0.2d, v0.2d, v4.2d
+; CHECK-GI-NEXT: sqadd v1.2d, v1.2d, v5.2d
+; CHECK-GI-NEXT: sqadd v2.2d, v2.2d, v6.2d
+; CHECK-GI-NEXT: sqadd v3.2d, v3.2d, v7.2d
+; CHECK-GI-NEXT: ret
%z = call <8 x i64> @llvm.sadd.sat.v8i64(<8 x i64> %x, <8 x i64> %y)
ret <8 x i64> %z
}
diff --git a/llvm/test/CodeGen/AArch64/ssub_sat.ll b/llvm/test/CodeGen/AArch64/ssub_sat.ll
index abeb4b357fa9f..4d755f480c3fc 100644
--- a/llvm/test/CodeGen/AArch64/ssub_sat.ll
+++ b/llvm/test/CodeGen/AArch64/ssub_sat.ll
@@ -2,8 +2,6 @@
; RUN: llc < %s -mtriple=aarch64-- | FileCheck %s --check-prefixes=CHECK,CHECK-SD
; RUN: llc < %s -mtriple=aarch64-- -global-isel -global-isel-abort=2 2>&1 | FileCheck %s --check-prefixes=CHECK,CHECK-GI
-; CHECK-GI: warning: Instruction selection used fallback path for vec
-
declare i4 @llvm.ssub.sat.i4(i4, i4)
declare i8 @llvm.ssub.sat.i8(i8, i8)
declare i16 @llvm.ssub.sat.i16(i16, i16)
diff --git a/llvm/test/CodeGen/AArch64/ssub_sat_vec.ll b/llvm/test/CodeGen/AArch64/ssub_sat_vec.ll
index acec3e74d3e93..a768bbbdc6343 100644
--- a/llvm/test/CodeGen/AArch64/ssub_sat_vec.ll
+++ b/llvm/test/CodeGen/AArch64/ssub_sat_vec.ll
@@ -2,28 +2,11 @@
; RUN: llc < %s -mtriple=aarch64-- | FileCheck %s --check-prefixes=CHECK,CHECK-SD
; RUN: llc < %s -mtriple=aarch64-- -global-isel -global-isel-abort=2 2>&1 | FileCheck %s --check-prefixes=CHECK,CHECK-GI
-; CHECK-GI: warning: Instruction selection used fallback path for v16i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v32i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v64i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v8i16
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v16i16
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v32i16
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v8i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v4i8
+; CHECK-GI: warning: Instruction selection used fallback path for v4i8
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v4i16
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i16
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v12i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v12i16
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v16i4
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v16i1
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v4i32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v8i32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v16i32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i64
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v4i64
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v8i64
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i128
declare <1 x i8> @llvm.ssub.sat.v1i8(<1 x i8>, <1 x i8>)
@@ -68,23 +51,37 @@ define <16 x i8> @...
[truncated]
|
@llvm/pr-subscribers-llvm-globalisel Author: David Green (davemgreen) ChangesThis tries to fill in the basic vector handling for sadd_sat/uadd_sat and ssub_sat/usub_sat. It just handles the basics, marking legal types and clamping illegally sized vectors to legal ones. Patch is 59.65 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/80650.diff 8 Files Affected:
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index cbf5655706e69..d61520a0028bc 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -1103,9 +1103,6 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
.scalarize(1)
.lower();
- getActionDefinitionsBuilder({G_UADDSAT, G_USUBSAT})
- .lowerIf([=](const LegalityQuery &Q) { return Q.Types[0].isScalar(); });
-
getActionDefinitionsBuilder({G_FSHL, G_FSHR})
.customFor({{s32, s32}, {s32, s64}, {s64, s64}})
.lower();
@@ -1153,8 +1150,14 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
.minScalarEltSameAsIf(always, 1, 0)
.maxScalarEltSameAsIf(always, 1, 0);
- // TODO: Vector types.
- getActionDefinitionsBuilder({G_SADDSAT, G_SSUBSAT}).lowerIf(isScalar(0));
+ getActionDefinitionsBuilder({G_UADDSAT, G_SADDSAT, G_USUBSAT, G_SSUBSAT})
+ .legalFor({v2s64, v2s32, v4s32, v4s16, v8s16, v8s8, v16s8})
+ .clampNumElements(0, v8s8, v16s8)
+ .clampNumElements(0, v4s16, v8s16)
+ .clampNumElements(0, v2s32, v4s32)
+ .clampMaxNumElements(0, s64, 2)
+ .moreElementsToNextPow2(0)
+ .lowerIf(isScalar(0));
// TODO: Libcall support for s128.
// TODO: s16 should be legal with full FP16 support.
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
index c90c31aa27ef5..920437d4c09c7 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
@@ -392,6 +392,7 @@
# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
# DEBUG-NEXT: G_SADDSAT (opcode {{[0-9]+}}): 1 type index, 0 imm indices
+# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
# DEBUG-NEXT: G_USUBSAT (opcode {{[0-9]+}}): 1 type index, 0 imm indices
diff --git a/llvm/test/CodeGen/AArch64/sadd_sat.ll b/llvm/test/CodeGen/AArch64/sadd_sat.ll
index 9e09b7f9a4bd6..789fd7b20a7f9 100644
--- a/llvm/test/CodeGen/AArch64/sadd_sat.ll
+++ b/llvm/test/CodeGen/AArch64/sadd_sat.ll
@@ -2,8 +2,6 @@
; RUN: llc < %s -mtriple=aarch64-- | FileCheck %s --check-prefixes=CHECK,CHECK-SD
; RUN: llc < %s -mtriple=aarch64-- -global-isel -global-isel-abort=2 2>&1 | FileCheck %s --check-prefixes=CHECK,CHECK-GI
-; CHECK-GI: warning: Instruction selection used fallback path for vec
-
declare i4 @llvm.sadd.sat.i4(i4, i4)
declare i8 @llvm.sadd.sat.i8(i8, i8)
declare i16 @llvm.sadd.sat.i16(i16, i16)
diff --git a/llvm/test/CodeGen/AArch64/sadd_sat_vec.ll b/llvm/test/CodeGen/AArch64/sadd_sat_vec.ll
index 5f905d94e3573..0f1f42cacff6f 100644
--- a/llvm/test/CodeGen/AArch64/sadd_sat_vec.ll
+++ b/llvm/test/CodeGen/AArch64/sadd_sat_vec.ll
@@ -2,28 +2,11 @@
; RUN: llc < %s -mtriple=aarch64-- | FileCheck %s --check-prefixes=CHECK,CHECK-SD
; RUN: llc < %s -mtriple=aarch64-- -global-isel -global-isel-abort=2 2>&1 | FileCheck %s --check-prefixes=CHECK,CHECK-GI
-; CHECK-GI: warning: Instruction selection used fallback path for v16i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v32i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v64i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v8i16
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v16i16
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v32i16
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v8i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v4i8
+; CHECK-GI: warning: Instruction selection used fallback path for v4i8
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v4i16
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i16
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v12i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v12i16
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v16i4
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v16i1
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v4i32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v8i32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v16i32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i64
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v4i64
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v8i64
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i128
declare <1 x i8> @llvm.sadd.sat.v1i8(<1 x i8>, <1 x i8>)
@@ -67,23 +50,37 @@ define <16 x i8> @v16i8(<16 x i8> %x, <16 x i8> %y) nounwind {
}
define <32 x i8> @v32i8(<32 x i8> %x, <32 x i8> %y) nounwind {
-; CHECK-LABEL: v32i8:
-; CHECK: // %bb.0:
-; CHECK-NEXT: sqadd v1.16b, v1.16b, v3.16b
-; CHECK-NEXT: sqadd v0.16b, v0.16b, v2.16b
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v32i8:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: sqadd v1.16b, v1.16b, v3.16b
+; CHECK-SD-NEXT: sqadd v0.16b, v0.16b, v2.16b
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v32i8:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: sqadd v0.16b, v0.16b, v2.16b
+; CHECK-GI-NEXT: sqadd v1.16b, v1.16b, v3.16b
+; CHECK-GI-NEXT: ret
%z = call <32 x i8> @llvm.sadd.sat.v32i8(<32 x i8> %x, <32 x i8> %y)
ret <32 x i8> %z
}
define <64 x i8> @v64i8(<64 x i8> %x, <64 x i8> %y) nounwind {
-; CHECK-LABEL: v64i8:
-; CHECK: // %bb.0:
-; CHECK-NEXT: sqadd v2.16b, v2.16b, v6.16b
-; CHECK-NEXT: sqadd v0.16b, v0.16b, v4.16b
-; CHECK-NEXT: sqadd v1.16b, v1.16b, v5.16b
-; CHECK-NEXT: sqadd v3.16b, v3.16b, v7.16b
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v64i8:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: sqadd v2.16b, v2.16b, v6.16b
+; CHECK-SD-NEXT: sqadd v0.16b, v0.16b, v4.16b
+; CHECK-SD-NEXT: sqadd v1.16b, v1.16b, v5.16b
+; CHECK-SD-NEXT: sqadd v3.16b, v3.16b, v7.16b
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v64i8:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: sqadd v0.16b, v0.16b, v4.16b
+; CHECK-GI-NEXT: sqadd v1.16b, v1.16b, v5.16b
+; CHECK-GI-NEXT: sqadd v2.16b, v2.16b, v6.16b
+; CHECK-GI-NEXT: sqadd v3.16b, v3.16b, v7.16b
+; CHECK-GI-NEXT: ret
%z = call <64 x i8> @llvm.sadd.sat.v64i8(<64 x i8> %x, <64 x i8> %y)
ret <64 x i8> %z
}
@@ -98,23 +95,37 @@ define <8 x i16> @v8i16(<8 x i16> %x, <8 x i16> %y) nounwind {
}
define <16 x i16> @v16i16(<16 x i16> %x, <16 x i16> %y) nounwind {
-; CHECK-LABEL: v16i16:
-; CHECK: // %bb.0:
-; CHECK-NEXT: sqadd v1.8h, v1.8h, v3.8h
-; CHECK-NEXT: sqadd v0.8h, v0.8h, v2.8h
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v16i16:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: sqadd v1.8h, v1.8h, v3.8h
+; CHECK-SD-NEXT: sqadd v0.8h, v0.8h, v2.8h
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v16i16:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: sqadd v0.8h, v0.8h, v2.8h
+; CHECK-GI-NEXT: sqadd v1.8h, v1.8h, v3.8h
+; CHECK-GI-NEXT: ret
%z = call <16 x i16> @llvm.sadd.sat.v16i16(<16 x i16> %x, <16 x i16> %y)
ret <16 x i16> %z
}
define <32 x i16> @v32i16(<32 x i16> %x, <32 x i16> %y) nounwind {
-; CHECK-LABEL: v32i16:
-; CHECK: // %bb.0:
-; CHECK-NEXT: sqadd v2.8h, v2.8h, v6.8h
-; CHECK-NEXT: sqadd v0.8h, v0.8h, v4.8h
-; CHECK-NEXT: sqadd v1.8h, v1.8h, v5.8h
-; CHECK-NEXT: sqadd v3.8h, v3.8h, v7.8h
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v32i16:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: sqadd v2.8h, v2.8h, v6.8h
+; CHECK-SD-NEXT: sqadd v0.8h, v0.8h, v4.8h
+; CHECK-SD-NEXT: sqadd v1.8h, v1.8h, v5.8h
+; CHECK-SD-NEXT: sqadd v3.8h, v3.8h, v7.8h
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v32i16:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: sqadd v0.8h, v0.8h, v4.8h
+; CHECK-GI-NEXT: sqadd v1.8h, v1.8h, v5.8h
+; CHECK-GI-NEXT: sqadd v2.8h, v2.8h, v6.8h
+; CHECK-GI-NEXT: sqadd v3.8h, v3.8h, v7.8h
+; CHECK-GI-NEXT: ret
%z = call <32 x i16> @llvm.sadd.sat.v32i16(<32 x i16> %x, <32 x i16> %y)
ret <32 x i16> %z
}
@@ -196,23 +207,41 @@ define void @v4i16(ptr %px, ptr %py, ptr %pz) nounwind {
}
define void @v2i16(ptr %px, ptr %py, ptr %pz) nounwind {
-; CHECK-LABEL: v2i16:
-; CHECK: // %bb.0:
-; CHECK-NEXT: ld1 { v0.h }[0], [x0]
-; CHECK-NEXT: ld1 { v1.h }[0], [x1]
-; CHECK-NEXT: add x8, x0, #2
-; CHECK-NEXT: add x9, x1, #2
-; CHECK-NEXT: ld1 { v0.h }[2], [x8]
-; CHECK-NEXT: ld1 { v1.h }[2], [x9]
-; CHECK-NEXT: shl v1.2s, v1.2s, #16
-; CHECK-NEXT: shl v0.2s, v0.2s, #16
-; CHECK-NEXT: sqadd v0.2s, v0.2s, v1.2s
-; CHECK-NEXT: ushr v0.2s, v0.2s, #16
-; CHECK-NEXT: mov w8, v0.s[1]
-; CHECK-NEXT: fmov w9, s0
-; CHECK-NEXT: strh w9, [x2]
-; CHECK-NEXT: strh w8, [x2, #2]
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v2i16:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: ld1 { v0.h }[0], [x0]
+; CHECK-SD-NEXT: ld1 { v1.h }[0], [x1]
+; CHECK-SD-NEXT: add x8, x0, #2
+; CHECK-SD-NEXT: add x9, x1, #2
+; CHECK-SD-NEXT: ld1 { v0.h }[2], [x8]
+; CHECK-SD-NEXT: ld1 { v1.h }[2], [x9]
+; CHECK-SD-NEXT: shl v1.2s, v1.2s, #16
+; CHECK-SD-NEXT: shl v0.2s, v0.2s, #16
+; CHECK-SD-NEXT: sqadd v0.2s, v0.2s, v1.2s
+; CHECK-SD-NEXT: ushr v0.2s, v0.2s, #16
+; CHECK-SD-NEXT: mov w8, v0.s[1]
+; CHECK-SD-NEXT: fmov w9, s0
+; CHECK-SD-NEXT: strh w9, [x2]
+; CHECK-SD-NEXT: strh w8, [x2, #2]
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v2i16:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: ldr h0, [x0]
+; CHECK-GI-NEXT: ldr h1, [x0, #2]
+; CHECK-GI-NEXT: ldr h2, [x1]
+; CHECK-GI-NEXT: ldr h3, [x1, #2]
+; CHECK-GI-NEXT: mov v0.h[1], v1.h[0]
+; CHECK-GI-NEXT: mov v2.h[1], v3.h[0]
+; CHECK-GI-NEXT: mov v0.h[2], v0.h[0]
+; CHECK-GI-NEXT: mov v2.h[2], v0.h[0]
+; CHECK-GI-NEXT: mov v0.h[3], v0.h[0]
+; CHECK-GI-NEXT: mov v2.h[3], v0.h[0]
+; CHECK-GI-NEXT: sqadd v0.4h, v0.4h, v2.4h
+; CHECK-GI-NEXT: mov h1, v0.h[1]
+; CHECK-GI-NEXT: str h0, [x2]
+; CHECK-GI-NEXT: str h1, [x2, #2]
+; CHECK-GI-NEXT: ret
%x = load <2 x i16>, ptr %px
%y = load <2 x i16>, ptr %py
%z = call <2 x i16> @llvm.sadd.sat.v2i16(<2 x i16> %x, <2 x i16> %y)
@@ -230,15 +259,67 @@ define <12 x i8> @v12i8(<12 x i8> %x, <12 x i8> %y) nounwind {
}
define void @v12i16(ptr %px, ptr %py, ptr %pz) nounwind {
-; CHECK-LABEL: v12i16:
-; CHECK: // %bb.0:
-; CHECK-NEXT: ldp q0, q3, [x1]
-; CHECK-NEXT: ldp q1, q2, [x0]
-; CHECK-NEXT: sqadd v0.8h, v1.8h, v0.8h
-; CHECK-NEXT: sqadd v1.8h, v2.8h, v3.8h
-; CHECK-NEXT: str q0, [x2]
-; CHECK-NEXT: str d1, [x2, #16]
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v12i16:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: ldp q0, q3, [x1]
+; CHECK-SD-NEXT: ldp q1, q2, [x0]
+; CHECK-SD-NEXT: sqadd v0.8h, v1.8h, v0.8h
+; CHECK-SD-NEXT: sqadd v1.8h, v2.8h, v3.8h
+; CHECK-SD-NEXT: str q0, [x2]
+; CHECK-SD-NEXT: str d1, [x2, #16]
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v12i16:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: ldr h0, [x0]
+; CHECK-GI-NEXT: ldr h2, [x0, #2]
+; CHECK-GI-NEXT: ldr h1, [x1]
+; CHECK-GI-NEXT: ldr h3, [x1, #2]
+; CHECK-GI-NEXT: ldr h4, [x1, #10]
+; CHECK-GI-NEXT: ldr h5, [x0, #18]
+; CHECK-GI-NEXT: mov v0.h[1], v2.h[0]
+; CHECK-GI-NEXT: ldr h2, [x0, #4]
+; CHECK-GI-NEXT: ldr h6, [x1, #16]
+; CHECK-GI-NEXT: mov v1.h[1], v3.h[0]
+; CHECK-GI-NEXT: ldr h3, [x1, #4]
+; CHECK-GI-NEXT: ldr h7, [x1, #18]
+; CHECK-GI-NEXT: mov v6.h[1], v7.h[0]
+; CHECK-GI-NEXT: ldr h7, [x1, #20]
+; CHECK-GI-NEXT: mov v0.h[2], v2.h[0]
+; CHECK-GI-NEXT: ldr h2, [x0, #6]
+; CHECK-GI-NEXT: mov v1.h[2], v3.h[0]
+; CHECK-GI-NEXT: ldr h3, [x1, #6]
+; CHECK-GI-NEXT: mov v6.h[2], v7.h[0]
+; CHECK-GI-NEXT: ldr h7, [x1, #22]
+; CHECK-GI-NEXT: mov v0.h[3], v2.h[0]
+; CHECK-GI-NEXT: ldr h2, [x0, #8]
+; CHECK-GI-NEXT: mov v1.h[3], v3.h[0]
+; CHECK-GI-NEXT: ldr h3, [x1, #8]
+; CHECK-GI-NEXT: mov v6.h[3], v7.h[0]
+; CHECK-GI-NEXT: mov v0.h[4], v2.h[0]
+; CHECK-GI-NEXT: ldr h2, [x0, #10]
+; CHECK-GI-NEXT: mov v1.h[4], v3.h[0]
+; CHECK-GI-NEXT: ldr h3, [x0, #16]
+; CHECK-GI-NEXT: mov v3.h[1], v5.h[0]
+; CHECK-GI-NEXT: ldr h5, [x0, #20]
+; CHECK-GI-NEXT: mov v0.h[5], v2.h[0]
+; CHECK-GI-NEXT: ldr h2, [x0, #12]
+; CHECK-GI-NEXT: mov v1.h[5], v4.h[0]
+; CHECK-GI-NEXT: ldr h4, [x1, #12]
+; CHECK-GI-NEXT: mov v3.h[2], v5.h[0]
+; CHECK-GI-NEXT: ldr h5, [x0, #22]
+; CHECK-GI-NEXT: mov v0.h[6], v2.h[0]
+; CHECK-GI-NEXT: ldr h2, [x0, #14]
+; CHECK-GI-NEXT: mov v1.h[6], v4.h[0]
+; CHECK-GI-NEXT: ldr h4, [x1, #14]
+; CHECK-GI-NEXT: mov v3.h[3], v5.h[0]
+; CHECK-GI-NEXT: mov v0.h[7], v2.h[0]
+; CHECK-GI-NEXT: mov v1.h[7], v4.h[0]
+; CHECK-GI-NEXT: sqadd v0.8h, v0.8h, v1.8h
+; CHECK-GI-NEXT: sqadd v1.4h, v3.4h, v6.4h
+; CHECK-GI-NEXT: str q0, [x2]
+; CHECK-GI-NEXT: str d1, [x2, #16]
+; CHECK-GI-NEXT: ret
%x = load <12 x i16>, ptr %px
%y = load <12 x i16>, ptr %py
%z = call <12 x i16> @llvm.sadd.sat.v12i16(<12 x i16> %x, <12 x i16> %y)
@@ -346,23 +427,37 @@ define <4 x i32> @v4i32(<4 x i32> %x, <4 x i32> %y) nounwind {
}
define <8 x i32> @v8i32(<8 x i32> %x, <8 x i32> %y) nounwind {
-; CHECK-LABEL: v8i32:
-; CHECK: // %bb.0:
-; CHECK-NEXT: sqadd v1.4s, v1.4s, v3.4s
-; CHECK-NEXT: sqadd v0.4s, v0.4s, v2.4s
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v8i32:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: sqadd v1.4s, v1.4s, v3.4s
+; CHECK-SD-NEXT: sqadd v0.4s, v0.4s, v2.4s
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v8i32:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: sqadd v0.4s, v0.4s, v2.4s
+; CHECK-GI-NEXT: sqadd v1.4s, v1.4s, v3.4s
+; CHECK-GI-NEXT: ret
%z = call <8 x i32> @llvm.sadd.sat.v8i32(<8 x i32> %x, <8 x i32> %y)
ret <8 x i32> %z
}
define <16 x i32> @v16i32(<16 x i32> %x, <16 x i32> %y) nounwind {
-; CHECK-LABEL: v16i32:
-; CHECK: // %bb.0:
-; CHECK-NEXT: sqadd v2.4s, v2.4s, v6.4s
-; CHECK-NEXT: sqadd v0.4s, v0.4s, v4.4s
-; CHECK-NEXT: sqadd v1.4s, v1.4s, v5.4s
-; CHECK-NEXT: sqadd v3.4s, v3.4s, v7.4s
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v16i32:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: sqadd v2.4s, v2.4s, v6.4s
+; CHECK-SD-NEXT: sqadd v0.4s, v0.4s, v4.4s
+; CHECK-SD-NEXT: sqadd v1.4s, v1.4s, v5.4s
+; CHECK-SD-NEXT: sqadd v3.4s, v3.4s, v7.4s
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v16i32:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: sqadd v0.4s, v0.4s, v4.4s
+; CHECK-GI-NEXT: sqadd v1.4s, v1.4s, v5.4s
+; CHECK-GI-NEXT: sqadd v2.4s, v2.4s, v6.4s
+; CHECK-GI-NEXT: sqadd v3.4s, v3.4s, v7.4s
+; CHECK-GI-NEXT: ret
%z = call <16 x i32> @llvm.sadd.sat.v16i32(<16 x i32> %x, <16 x i32> %y)
ret <16 x i32> %z
}
@@ -377,23 +472,37 @@ define <2 x i64> @v2i64(<2 x i64> %x, <2 x i64> %y) nounwind {
}
define <4 x i64> @v4i64(<4 x i64> %x, <4 x i64> %y) nounwind {
-; CHECK-LABEL: v4i64:
-; CHECK: // %bb.0:
-; CHECK-NEXT: sqadd v1.2d, v1.2d, v3.2d
-; CHECK-NEXT: sqadd v0.2d, v0.2d, v2.2d
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v4i64:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: sqadd v1.2d, v1.2d, v3.2d
+; CHECK-SD-NEXT: sqadd v0.2d, v0.2d, v2.2d
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v4i64:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: sqadd v0.2d, v0.2d, v2.2d
+; CHECK-GI-NEXT: sqadd v1.2d, v1.2d, v3.2d
+; CHECK-GI-NEXT: ret
%z = call <4 x i64> @llvm.sadd.sat.v4i64(<4 x i64> %x, <4 x i64> %y)
ret <4 x i64> %z
}
define <8 x i64> @v8i64(<8 x i64> %x, <8 x i64> %y) nounwind {
-; CHECK-LABEL: v8i64:
-; CHECK: // %bb.0:
-; CHECK-NEXT: sqadd v2.2d, v2.2d, v6.2d
-; CHECK-NEXT: sqadd v0.2d, v0.2d, v4.2d
-; CHECK-NEXT: sqadd v1.2d, v1.2d, v5.2d
-; CHECK-NEXT: sqadd v3.2d, v3.2d, v7.2d
-; CHECK-NEXT: ret
+; CHECK-SD-LABEL: v8i64:
+; CHECK-SD: // %bb.0:
+; CHECK-SD-NEXT: sqadd v2.2d, v2.2d, v6.2d
+; CHECK-SD-NEXT: sqadd v0.2d, v0.2d, v4.2d
+; CHECK-SD-NEXT: sqadd v1.2d, v1.2d, v5.2d
+; CHECK-SD-NEXT: sqadd v3.2d, v3.2d, v7.2d
+; CHECK-SD-NEXT: ret
+;
+; CHECK-GI-LABEL: v8i64:
+; CHECK-GI: // %bb.0:
+; CHECK-GI-NEXT: sqadd v0.2d, v0.2d, v4.2d
+; CHECK-GI-NEXT: sqadd v1.2d, v1.2d, v5.2d
+; CHECK-GI-NEXT: sqadd v2.2d, v2.2d, v6.2d
+; CHECK-GI-NEXT: sqadd v3.2d, v3.2d, v7.2d
+; CHECK-GI-NEXT: ret
%z = call <8 x i64> @llvm.sadd.sat.v8i64(<8 x i64> %x, <8 x i64> %y)
ret <8 x i64> %z
}
diff --git a/llvm/test/CodeGen/AArch64/ssub_sat.ll b/llvm/test/CodeGen/AArch64/ssub_sat.ll
index abeb4b357fa9f..4d755f480c3fc 100644
--- a/llvm/test/CodeGen/AArch64/ssub_sat.ll
+++ b/llvm/test/CodeGen/AArch64/ssub_sat.ll
@@ -2,8 +2,6 @@
; RUN: llc < %s -mtriple=aarch64-- | FileCheck %s --check-prefixes=CHECK,CHECK-SD
; RUN: llc < %s -mtriple=aarch64-- -global-isel -global-isel-abort=2 2>&1 | FileCheck %s --check-prefixes=CHECK,CHECK-GI
-; CHECK-GI: warning: Instruction selection used fallback path for vec
-
declare i4 @llvm.ssub.sat.i4(i4, i4)
declare i8 @llvm.ssub.sat.i8(i8, i8)
declare i16 @llvm.ssub.sat.i16(i16, i16)
diff --git a/llvm/test/CodeGen/AArch64/ssub_sat_vec.ll b/llvm/test/CodeGen/AArch64/ssub_sat_vec.ll
index acec3e74d3e93..a768bbbdc6343 100644
--- a/llvm/test/CodeGen/AArch64/ssub_sat_vec.ll
+++ b/llvm/test/CodeGen/AArch64/ssub_sat_vec.ll
@@ -2,28 +2,11 @@
; RUN: llc < %s -mtriple=aarch64-- | FileCheck %s --check-prefixes=CHECK,CHECK-SD
; RUN: llc < %s -mtriple=aarch64-- -global-isel -global-isel-abort=2 2>&1 | FileCheck %s --check-prefixes=CHECK,CHECK-GI
-; CHECK-GI: warning: Instruction selection used fallback path for v16i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v32i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v64i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v8i16
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v16i16
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v32i16
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v8i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v4i8
+; CHECK-GI: warning: Instruction selection used fallback path for v4i8
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v4i16
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i16
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v12i8
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v12i16
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v16i4
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v16i1
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v4i32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v8i32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v16i32
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i64
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v4i64
-; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v8i64
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v2i128
declare <1 x i8> @llvm.ssub.sat.v1i8(<1 x i8>, <1 x i8>)
@@ -68,23 +51,37 @@ define <16 x i8> @...
[truncated]
|
.clampNumElements(0, v2s32, v4s32) | ||
.clampMaxNumElements(0, s64, 2) | ||
.moreElementsToNextPow2(0) | ||
.lowerIf(isScalar(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just make an unconditional .lower()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was attempting to focus on the more normal vector types in this patch. A lot of other operations like G_SADDO do not handle vectors yet, let alone odd sized vectors, so it would seem difficult to write a test that works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't really go wrong with a default case of lower
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't think we should be adding things if we don't have test coverage for them (or they don't work, like here). I was hoping this patch could handle the more standard types without trying to solve every problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more like having a trailing else if that can just be an else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - it is a 'else' instead of 'unreachable'. I would just personally prefer that this failed during legalization (and fell back to SDAG) than it being untested, or tried to handle every different case in a single patch. I can change it to a lower though, we should get test coverage when the other operations are able to handle non-power-2 vector sizes.
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v32i16 | ||
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v8i8 | ||
; CHECK-GI-NEXT: warning: Instruction selection used fallback path for v4i8 | ||
; CHECK-GI: warning: Instruction selection used fallback path for v4i8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this would be handled by the elements clamp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is the G_ANYEXT issue we have seen elsewhere.
d415f87
to
35bd976
Compare
This tries to fill in the basic vector handling for sadd_sat/uadd_sat and ssub_sat/usub_sat. It just handles the basics, marking legal types and clamping illegally sized vectors to legal ones.
35bd976
to
d765623
Compare
I've rebased over the recent anyext fixes. Thanks |
; CHECK-SD-NEXT: str s0, [x2] | ||
; CHECK-SD-NEXT: ret | ||
; | ||
; CHECK-GI-LABEL: v4i8: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume further patches will catch up to the DAG output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's the plan. It should hopefully be better in the end because we use widening, not expand.
Thanks |
This tries to fill in the basic vector handling for sadd_sat/uadd_sat and ssub_sat/usub_sat. It just handles the basics, marking legal types and clamping illegally sized vectors to legal ones.