-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[AArch64] Enable vscale_range with +sme #124466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-backend-aarch64 Author: David Green (davemgreen) ChangesIf we have +sme but not +sve, we would not set vscale_range on functions. It should be valid to apply it with the same range with just +sme, which can help improve the performance of generated code. Full diff: https://github.com/llvm/llvm-project/pull/124466.diff 3 Files Affected:
diff --git a/clang/lib/Basic/Targets/AArch64.cpp b/clang/lib/Basic/Targets/AArch64.cpp
index 0b899137bbb5c7..e9afe0fb4ee456 100644
--- a/clang/lib/Basic/Targets/AArch64.cpp
+++ b/clang/lib/Basic/Targets/AArch64.cpp
@@ -708,7 +708,7 @@ AArch64TargetInfo::getVScaleRange(const LangOptions &LangOpts) const {
return std::pair<unsigned, unsigned>(
LangOpts.VScaleMin ? LangOpts.VScaleMin : 1, LangOpts.VScaleMax);
- if (hasFeature("sve"))
+ if (hasFeature("sve") || hasFeature("sme"))
return std::pair<unsigned, unsigned>(1, 16);
return std::nullopt;
diff --git a/clang/test/CodeGen/AArch64/sme-intrinsics/aarch64-sme-attrs.cpp b/clang/test/CodeGen/AArch64/sme-intrinsics/aarch64-sme-attrs.cpp
index 54762c8b414124..cabd9d01e46652 100644
--- a/clang/test/CodeGen/AArch64/sme-intrinsics/aarch64-sme-attrs.cpp
+++ b/clang/test/CodeGen/AArch64/sme-intrinsics/aarch64-sme-attrs.cpp
@@ -300,19 +300,19 @@ int test_variadic_template() __arm_inout("za") {
preserves_za_decl);
}
-// CHECK: attributes #[[SM_ENABLED]] = { mustprogress noinline nounwind "aarch64_pstate_sm_enabled" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[SM_ENABLED]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_pstate_sm_enabled" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
// CHECK: attributes #[[NORMAL_DECL]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
// CHECK: attributes #[[SM_ENABLED_DECL]] = { "aarch64_pstate_sm_enabled" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[SM_COMPATIBLE]] = { mustprogress noinline nounwind "aarch64_pstate_sm_compatible" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[SM_COMPATIBLE]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_pstate_sm_compatible" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
// CHECK: attributes #[[SM_COMPATIBLE_DECL]] = { "aarch64_pstate_sm_compatible" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[SM_BODY]] = { mustprogress noinline nounwind "aarch64_pstate_sm_body" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_SHARED]] = { mustprogress noinline nounwind "aarch64_inout_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[SM_BODY]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_pstate_sm_body" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_SHARED]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_inout_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
// CHECK: attributes #[[ZA_SHARED_DECL]] = { "aarch64_inout_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_PRESERVED]] = { mustprogress noinline nounwind "aarch64_preserves_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_PRESERVED]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_preserves_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
// CHECK: attributes #[[ZA_PRESERVED_DECL]] = { "aarch64_preserves_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_NEW]] = { mustprogress noinline nounwind "aarch64_new_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_AGNOSTIC]] = { mustprogress noinline nounwind "aarch64_za_state_agnostic" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[NORMAL_DEF]] = { mustprogress noinline nounwind "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_NEW]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_new_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_AGNOSTIC]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_za_state_agnostic" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[NORMAL_DEF]] = { mustprogress noinline nounwind vscale_range(1,16) "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
// CHECK: attributes #[[SM_ENABLED_CALL]] = { "aarch64_pstate_sm_enabled" }
// CHECK: attributes #[[SM_COMPATIBLE_CALL]] = { "aarch64_pstate_sm_compatible" }
// CHECK: attributes #[[SM_BODY_CALL]] = { "aarch64_pstate_sm_body" }
diff --git a/clang/test/CodeGen/arm-sve-vector-bits-vscale-range.c b/clang/test/CodeGen/arm-sve-vector-bits-vscale-range.c
index bd424172a18650..66b67b036ef4ca 100644
--- a/clang/test/CodeGen/arm-sve-vector-bits-vscale-range.c
+++ b/clang/test/CodeGen/arm-sve-vector-bits-vscale-range.c
@@ -13,6 +13,7 @@
// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2 -mvscale-min=1 -mvscale-max=0 -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-UNBOUNDED
// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -mvscale-min=1 -mvscale-max=0 -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-UNBOUNDED
// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-NONE
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-NONE
// CHECK-LABEL: @func() #0
// CHECK: attributes #0 = { {{.*}} vscale_range([[#VBITS]],[[#VBITS]]) {{.*}} }
|
@llvm/pr-subscribers-clang Author: David Green (davemgreen) ChangesIf we have +sme but not +sve, we would not set vscale_range on functions. It should be valid to apply it with the same range with just +sme, which can help improve the performance of generated code. Full diff: https://github.com/llvm/llvm-project/pull/124466.diff 3 Files Affected:
diff --git a/clang/lib/Basic/Targets/AArch64.cpp b/clang/lib/Basic/Targets/AArch64.cpp
index 0b899137bbb5c7..e9afe0fb4ee456 100644
--- a/clang/lib/Basic/Targets/AArch64.cpp
+++ b/clang/lib/Basic/Targets/AArch64.cpp
@@ -708,7 +708,7 @@ AArch64TargetInfo::getVScaleRange(const LangOptions &LangOpts) const {
return std::pair<unsigned, unsigned>(
LangOpts.VScaleMin ? LangOpts.VScaleMin : 1, LangOpts.VScaleMax);
- if (hasFeature("sve"))
+ if (hasFeature("sve") || hasFeature("sme"))
return std::pair<unsigned, unsigned>(1, 16);
return std::nullopt;
diff --git a/clang/test/CodeGen/AArch64/sme-intrinsics/aarch64-sme-attrs.cpp b/clang/test/CodeGen/AArch64/sme-intrinsics/aarch64-sme-attrs.cpp
index 54762c8b414124..cabd9d01e46652 100644
--- a/clang/test/CodeGen/AArch64/sme-intrinsics/aarch64-sme-attrs.cpp
+++ b/clang/test/CodeGen/AArch64/sme-intrinsics/aarch64-sme-attrs.cpp
@@ -300,19 +300,19 @@ int test_variadic_template() __arm_inout("za") {
preserves_za_decl);
}
-// CHECK: attributes #[[SM_ENABLED]] = { mustprogress noinline nounwind "aarch64_pstate_sm_enabled" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[SM_ENABLED]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_pstate_sm_enabled" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
// CHECK: attributes #[[NORMAL_DECL]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
// CHECK: attributes #[[SM_ENABLED_DECL]] = { "aarch64_pstate_sm_enabled" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[SM_COMPATIBLE]] = { mustprogress noinline nounwind "aarch64_pstate_sm_compatible" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[SM_COMPATIBLE]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_pstate_sm_compatible" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
// CHECK: attributes #[[SM_COMPATIBLE_DECL]] = { "aarch64_pstate_sm_compatible" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[SM_BODY]] = { mustprogress noinline nounwind "aarch64_pstate_sm_body" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_SHARED]] = { mustprogress noinline nounwind "aarch64_inout_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[SM_BODY]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_pstate_sm_body" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_SHARED]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_inout_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
// CHECK: attributes #[[ZA_SHARED_DECL]] = { "aarch64_inout_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_PRESERVED]] = { mustprogress noinline nounwind "aarch64_preserves_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_PRESERVED]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_preserves_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
// CHECK: attributes #[[ZA_PRESERVED_DECL]] = { "aarch64_preserves_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_NEW]] = { mustprogress noinline nounwind "aarch64_new_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_AGNOSTIC]] = { mustprogress noinline nounwind "aarch64_za_state_agnostic" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[NORMAL_DEF]] = { mustprogress noinline nounwind "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_NEW]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_new_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_AGNOSTIC]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_za_state_agnostic" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[NORMAL_DEF]] = { mustprogress noinline nounwind vscale_range(1,16) "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
// CHECK: attributes #[[SM_ENABLED_CALL]] = { "aarch64_pstate_sm_enabled" }
// CHECK: attributes #[[SM_COMPATIBLE_CALL]] = { "aarch64_pstate_sm_compatible" }
// CHECK: attributes #[[SM_BODY_CALL]] = { "aarch64_pstate_sm_body" }
diff --git a/clang/test/CodeGen/arm-sve-vector-bits-vscale-range.c b/clang/test/CodeGen/arm-sve-vector-bits-vscale-range.c
index bd424172a18650..66b67b036ef4ca 100644
--- a/clang/test/CodeGen/arm-sve-vector-bits-vscale-range.c
+++ b/clang/test/CodeGen/arm-sve-vector-bits-vscale-range.c
@@ -13,6 +13,7 @@
// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2 -mvscale-min=1 -mvscale-max=0 -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-UNBOUNDED
// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -mvscale-min=1 -mvscale-max=0 -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-UNBOUNDED
// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-NONE
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-NONE
// CHECK-LABEL: @func() #0
// CHECK: attributes #0 = { {{.*}} vscale_range([[#VBITS]],[[#VBITS]]) {{.*}} }
|
clang/lib/Basic/Targets/AArch64.cpp
Outdated
@@ -708,7 +708,7 @@ AArch64TargetInfo::getVScaleRange(const LangOptions &LangOpts) const { | |||
return std::pair<unsigned, unsigned>( | |||
LangOpts.VScaleMin ? LangOpts.VScaleMin : 1, LangOpts.VScaleMax); | |||
|
|||
if (hasFeature("sve")) | |||
if (hasFeature("sve") || hasFeature("sme")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this situation isn't it the case that SVE will only be available in streaming mode? I guess your reasoning here is that for normal functions (not streaming or streaming-compatible) the vscale_range attribute should be harmless, i.e. that it does not imply scalable vector support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It should be valid to always have the attribute and it will only be used with scalable vectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree for SME-only platforms it seems fine. But what's the meaning of vscale_range when you have a CPU that has both? At that point do we need to disambiguate the different potential vscale_range's for VL & SVL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default range should still be the same - somewhere between 1 and 16. SME doesn't increase that even if they VL and SVL are different.
(Currently if we have +sve+sme we will set the vscale_range to (1,16) already, so hopefully that part is already OK. In either case this adds the simple case, it doesn't alter the more complicated part :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the function only has +sme
, then vscale_range
should only apply to (locally) streaming functions, because non-streaming or streaming-compatible functions cannot use scalable vectors.
I guess the the thing to watch out for is that we don't propagate assumptions from using -msve-vector-bits
to streaming[-compatible] functions, because that information may be incorrect.
The SME attributes tell which mode the statements of a function must be executed in, so vscale_range
should apply to all LLVM IR operations in the body of that function. For targets that have both SVE and SME, this may no longer be true during CodeGen because the DAG introduces streaming-mode changes for calls, and in the prologue/epilogue of locally-streaming functions. If it assumes a certain vector width for the body, the DAG might end up with instructions that assume this width for operations executed in a different mode. I've not really thought this through, but it does need further investigation if we ever want to enable that. Anyway, that doesn't really apply to this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the function only has +sme, then vscale_range should only apply to (locally) streaming functions, because non-streaming or streaming-compatible functions cannot use scalable vectors.
vscale_range should be valid on any function, and will only be used if there are scalable vectors present. I've had a go at making it only apply to streaming functions, but it couldn't pass FunctionDecl to getVScaleRange as that would create a layering violation. I'm not sure I prefer this to the original that just applied it to all functions, unless you have a better suggestion?
If we have +sme but not +sve, we would not set vscale_range on functions. It should be valid to apply it with the same range with just +sme, which can help improve the performance of generated code.
7919c40
to
f0f1718
Compare
/cherry-pick 9f1c825 |
/pull-request #125386 |
If we have +sme but not +sve, we would not set vscale_range on functions. It should be valid to apply it with the same range with just +sme, which can help mitigate some performance regressions in cases such as scalable vector bitcasts (https://godbolt.org/z/exhe4jd8d). (cherry picked from commit 9f1c825)
If we have +sme but not +sve, we would not set vscale_range on functions. It should be valid to apply it with the same range with just +sme, which can help mitigate some performance regressions in cases such as scalable vector bitcasts (https://godbolt.org/z/exhe4jd8d).