[AArch64] Enable vscale_range with +sme #124466

davemgreen · 2025-01-26T13:50:44Z

If we have +sme but not +sve, we would not set vscale_range on functions. It should be valid to apply it with the same range with just +sme, which can help mitigate some performance regressions in cases such as scalable vector bitcasts (https://godbolt.org/z/exhe4jd8d).

llvmbot · 2025-01-26T13:51:17Z

@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-backend-aarch64

Author: David Green (davemgreen)

Changes

If we have +sme but not +sve, we would not set vscale_range on functions. It should be valid to apply it with the same range with just +sme, which can help improve the performance of generated code.

Full diff: https://github.com/llvm/llvm-project/pull/124466.diff

3 Files Affected:

(modified) clang/lib/Basic/Targets/AArch64.cpp (+1-1)
(modified) clang/test/CodeGen/AArch64/sme-intrinsics/aarch64-sme-attrs.cpp (+8-8)
(modified) clang/test/CodeGen/arm-sve-vector-bits-vscale-range.c (+1)

diff --git a/clang/lib/Basic/Targets/AArch64.cpp b/clang/lib/Basic/Targets/AArch64.cpp
index 0b899137bbb5c7..e9afe0fb4ee456 100644
--- a/clang/lib/Basic/Targets/AArch64.cpp
+++ b/clang/lib/Basic/Targets/AArch64.cpp
@@ -708,7 +708,7 @@ AArch64TargetInfo::getVScaleRange(const LangOptions &LangOpts) const {
     return std::pair<unsigned, unsigned>(
         LangOpts.VScaleMin ? LangOpts.VScaleMin : 1, LangOpts.VScaleMax);
 
-  if (hasFeature("sve"))
+  if (hasFeature("sve") || hasFeature("sme"))
     return std::pair<unsigned, unsigned>(1, 16);
 
   return std::nullopt;
diff --git a/clang/test/CodeGen/AArch64/sme-intrinsics/aarch64-sme-attrs.cpp b/clang/test/CodeGen/AArch64/sme-intrinsics/aarch64-sme-attrs.cpp
index 54762c8b414124..cabd9d01e46652 100644
--- a/clang/test/CodeGen/AArch64/sme-intrinsics/aarch64-sme-attrs.cpp
+++ b/clang/test/CodeGen/AArch64/sme-intrinsics/aarch64-sme-attrs.cpp
@@ -300,19 +300,19 @@ int test_variadic_template() __arm_inout("za") {
               preserves_za_decl);
 }
 
-// CHECK: attributes #[[SM_ENABLED]] = { mustprogress noinline nounwind "aarch64_pstate_sm_enabled" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[SM_ENABLED]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_pstate_sm_enabled" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
 // CHECK: attributes #[[NORMAL_DECL]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
 // CHECK: attributes #[[SM_ENABLED_DECL]] = { "aarch64_pstate_sm_enabled" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[SM_COMPATIBLE]] = { mustprogress noinline nounwind "aarch64_pstate_sm_compatible" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[SM_COMPATIBLE]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_pstate_sm_compatible" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
 // CHECK: attributes #[[SM_COMPATIBLE_DECL]] = { "aarch64_pstate_sm_compatible" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[SM_BODY]] = { mustprogress noinline nounwind "aarch64_pstate_sm_body" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_SHARED]] = { mustprogress noinline nounwind "aarch64_inout_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[SM_BODY]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_pstate_sm_body" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_SHARED]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_inout_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
 // CHECK: attributes #[[ZA_SHARED_DECL]] = { "aarch64_inout_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_PRESERVED]] = { mustprogress noinline nounwind "aarch64_preserves_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_PRESERVED]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_preserves_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
 // CHECK: attributes #[[ZA_PRESERVED_DECL]] = { "aarch64_preserves_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_NEW]] = { mustprogress noinline nounwind "aarch64_new_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_AGNOSTIC]] = { mustprogress noinline nounwind "aarch64_za_state_agnostic" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[NORMAL_DEF]] = { mustprogress noinline nounwind "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_NEW]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_new_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_AGNOSTIC]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_za_state_agnostic" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[NORMAL_DEF]] = { mustprogress noinline nounwind vscale_range(1,16) "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
 // CHECK: attributes #[[SM_ENABLED_CALL]] = { "aarch64_pstate_sm_enabled" }
 // CHECK: attributes #[[SM_COMPATIBLE_CALL]] = { "aarch64_pstate_sm_compatible" }
 // CHECK: attributes #[[SM_BODY_CALL]] = { "aarch64_pstate_sm_body" }
diff --git a/clang/test/CodeGen/arm-sve-vector-bits-vscale-range.c b/clang/test/CodeGen/arm-sve-vector-bits-vscale-range.c
index bd424172a18650..66b67b036ef4ca 100644
--- a/clang/test/CodeGen/arm-sve-vector-bits-vscale-range.c
+++ b/clang/test/CodeGen/arm-sve-vector-bits-vscale-range.c
@@ -13,6 +13,7 @@
 // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2 -mvscale-min=1 -mvscale-max=0 -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-UNBOUNDED
 // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -mvscale-min=1 -mvscale-max=0 -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-UNBOUNDED
 // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-NONE
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-NONE
 
 // CHECK-LABEL: @func() #0
 // CHECK: attributes #0 = { {{.*}} vscale_range([[#VBITS]],[[#VBITS]]) {{.*}} }

llvmbot · 2025-01-26T13:51:18Z

@llvm/pr-subscribers-clang

Author: David Green (davemgreen)

Changes

If we have +sme but not +sve, we would not set vscale_range on functions. It should be valid to apply it with the same range with just +sme, which can help improve the performance of generated code.

Full diff: https://github.com/llvm/llvm-project/pull/124466.diff

3 Files Affected:

(modified) clang/lib/Basic/Targets/AArch64.cpp (+1-1)
(modified) clang/test/CodeGen/AArch64/sme-intrinsics/aarch64-sme-attrs.cpp (+8-8)
(modified) clang/test/CodeGen/arm-sve-vector-bits-vscale-range.c (+1)

diff --git a/clang/lib/Basic/Targets/AArch64.cpp b/clang/lib/Basic/Targets/AArch64.cpp
index 0b899137bbb5c7..e9afe0fb4ee456 100644
--- a/clang/lib/Basic/Targets/AArch64.cpp
+++ b/clang/lib/Basic/Targets/AArch64.cpp
@@ -708,7 +708,7 @@ AArch64TargetInfo::getVScaleRange(const LangOptions &LangOpts) const {
     return std::pair<unsigned, unsigned>(
         LangOpts.VScaleMin ? LangOpts.VScaleMin : 1, LangOpts.VScaleMax);
 
-  if (hasFeature("sve"))
+  if (hasFeature("sve") || hasFeature("sme"))
     return std::pair<unsigned, unsigned>(1, 16);
 
   return std::nullopt;
diff --git a/clang/test/CodeGen/AArch64/sme-intrinsics/aarch64-sme-attrs.cpp b/clang/test/CodeGen/AArch64/sme-intrinsics/aarch64-sme-attrs.cpp
index 54762c8b414124..cabd9d01e46652 100644
--- a/clang/test/CodeGen/AArch64/sme-intrinsics/aarch64-sme-attrs.cpp
+++ b/clang/test/CodeGen/AArch64/sme-intrinsics/aarch64-sme-attrs.cpp
@@ -300,19 +300,19 @@ int test_variadic_template() __arm_inout("za") {
               preserves_za_decl);
 }
 
-// CHECK: attributes #[[SM_ENABLED]] = { mustprogress noinline nounwind "aarch64_pstate_sm_enabled" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[SM_ENABLED]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_pstate_sm_enabled" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
 // CHECK: attributes #[[NORMAL_DECL]] = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
 // CHECK: attributes #[[SM_ENABLED_DECL]] = { "aarch64_pstate_sm_enabled" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[SM_COMPATIBLE]] = { mustprogress noinline nounwind "aarch64_pstate_sm_compatible" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[SM_COMPATIBLE]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_pstate_sm_compatible" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
 // CHECK: attributes #[[SM_COMPATIBLE_DECL]] = { "aarch64_pstate_sm_compatible" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[SM_BODY]] = { mustprogress noinline nounwind "aarch64_pstate_sm_body" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_SHARED]] = { mustprogress noinline nounwind "aarch64_inout_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[SM_BODY]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_pstate_sm_body" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_SHARED]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_inout_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
 // CHECK: attributes #[[ZA_SHARED_DECL]] = { "aarch64_inout_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_PRESERVED]] = { mustprogress noinline nounwind "aarch64_preserves_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_PRESERVED]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_preserves_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
 // CHECK: attributes #[[ZA_PRESERVED_DECL]] = { "aarch64_preserves_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_NEW]] = { mustprogress noinline nounwind "aarch64_new_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[ZA_AGNOSTIC]] = { mustprogress noinline nounwind "aarch64_za_state_agnostic" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
-// CHECK: attributes #[[NORMAL_DEF]] = { mustprogress noinline nounwind "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_NEW]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_new_za" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[ZA_AGNOSTIC]] = { mustprogress noinline nounwind vscale_range(1,16) "aarch64_za_state_agnostic" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
+// CHECK: attributes #[[NORMAL_DEF]] = { mustprogress noinline nounwind vscale_range(1,16) "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+bf16,+sme" }
 // CHECK: attributes #[[SM_ENABLED_CALL]] = { "aarch64_pstate_sm_enabled" }
 // CHECK: attributes #[[SM_COMPATIBLE_CALL]] = { "aarch64_pstate_sm_compatible" }
 // CHECK: attributes #[[SM_BODY_CALL]] = { "aarch64_pstate_sm_body" }
diff --git a/clang/test/CodeGen/arm-sve-vector-bits-vscale-range.c b/clang/test/CodeGen/arm-sve-vector-bits-vscale-range.c
index bd424172a18650..66b67b036ef4ca 100644
--- a/clang/test/CodeGen/arm-sve-vector-bits-vscale-range.c
+++ b/clang/test/CodeGen/arm-sve-vector-bits-vscale-range.c
@@ -13,6 +13,7 @@
 // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2 -mvscale-min=1 -mvscale-max=0 -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-UNBOUNDED
 // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -mvscale-min=1 -mvscale-max=0 -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-UNBOUNDED
 // RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-NONE
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK-NONE
 
 // CHECK-LABEL: @func() #0
 // CHECK: attributes #0 = { {{.*}} vscale_range([[#VBITS]],[[#VBITS]]) {{.*}} }

david-arm · 2025-01-27T14:55:29Z

clang/lib/Basic/Targets/AArch64.cpp

@@ -708,7 +708,7 @@ AArch64TargetInfo::getVScaleRange(const LangOptions &LangOpts) const {
    return std::pair<unsigned, unsigned>(
        LangOpts.VScaleMin ? LangOpts.VScaleMin : 1, LangOpts.VScaleMax);

-  if (hasFeature("sve"))
+  if (hasFeature("sve") || hasFeature("sme"))


In this situation isn't it the case that SVE will only be available in streaming mode? I guess your reasoning here is that for normal functions (not streaming or streaming-compatible) the vscale_range attribute should be harmless, i.e. that it does not imply scalable vector support?

Yes. It should be valid to always have the attribute and it will only be used with scalable vectors.

I agree for SME-only platforms it seems fine. But what's the meaning of vscale_range when you have a CPU that has both? At that point do we need to disambiguate the different potential vscale_range's for VL & SVL?

I think the default range should still be the same - somewhere between 1 and 16. SME doesn't increase that even if they VL and SVL are different.

(Currently if we have +sve+sme we will set the vscale_range to (1,16) already, so hopefully that part is already OK. In either case this adds the simple case, it doesn't alter the more complicated part :) )

If the function only has +sme, then vscale_range should only apply to (locally) streaming functions, because non-streaming or streaming-compatible functions cannot use scalable vectors.

I guess the the thing to watch out for is that we don't propagate assumptions from using -msve-vector-bits to streaming[-compatible] functions, because that information may be incorrect.

The SME attributes tell which mode the statements of a function must be executed in, so vscale_range should apply to all LLVM IR operations in the body of that function. For targets that have both SVE and SME, this may no longer be true during CodeGen because the DAG introduces streaming-mode changes for calls, and in the prologue/epilogue of locally-streaming functions. If it assumes a certain vector width for the body, the DAG might end up with instructions that assume this width for operations executed in a different mode. I've not really thought this through, but it does need further investigation if we ever want to enable that. Anyway, that doesn't really apply to this patch.

If the function only has +sme, then vscale_range should only apply to (locally) streaming functions, because non-streaming or streaming-compatible functions cannot use scalable vectors.

vscale_range should be valid on any function, and will only be used if there are scalable vectors present. I've had a go at making it only apply to streaming functions, but it couldn't pass FunctionDecl to getVScaleRange as that would create a layering violation. I'm not sure I prefer this to the original that just applied it to all functions, unless you have a better suggestion?

If we have +sme but not +sve, we would not set vscale_range on functions. It should be valid to apply it with the same range with just +sme, which can help improve the performance of generated code.

davemgreen · 2025-02-02T07:13:50Z

/cherry-pick 9f1c825

llvmbot · 2025-02-02T07:19:19Z

/pull-request #125386

If we have +sme but not +sve, we would not set vscale_range on functions. It should be valid to apply it with the same range with just +sme, which can help mitigate some performance regressions in cases such as scalable vector bitcasts (https://godbolt.org/z/exhe4jd8d). (cherry picked from commit 9f1c825)

davemgreen requested review from aemerson, c-rhodes, sdesmalen-arm and david-arm January 26, 2025 13:50

llvmbot added clang Clang issues not falling into any other category backend:AArch64 clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jan 26, 2025

david-arm reviewed Jan 27, 2025

View reviewed changes

davemgreen added 2 commits January 29, 2025 11:29

[AArch64] Enable vscale_range with +sme

c43c262

If we have +sme but not +sve, we would not set vscale_range on functions. It should be valid to apply it with the same range with just +sme, which can help improve the performance of generated code.

Attempt to make it streaming-functions only.

f0f1718

davemgreen force-pushed the gh-sve-vscalerangesme branch from 7919c40 to f0f1718 Compare January 29, 2025 11:44

llvmbot added backend:RISC-V clang:codegen IR generation bugs: mangling, exceptions, etc. labels Jan 29, 2025

aemerson approved these changes Jan 30, 2025

View reviewed changes

davemgreen merged commit 9f1c825 into llvm:main Jan 31, 2025
10 checks passed

davemgreen deleted the gh-sve-vscalerangesme branch January 31, 2025 07:57

davemgreen added this to the LLVM 20.X Release milestone Feb 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64] Enable vscale_range with +sme #124466

[AArch64] Enable vscale_range with +sme #124466

Uh oh!

davemgreen commented Jan 26, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jan 26, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jan 26, 2025

Uh oh!

david-arm Jan 27, 2025

Uh oh!

davemgreen Jan 27, 2025

Uh oh!

aemerson Jan 28, 2025

Uh oh!

davemgreen Jan 28, 2025

Uh oh!

sdesmalen-arm Jan 28, 2025

Uh oh!

davemgreen Jan 29, 2025

Uh oh!

Uh oh!

davemgreen commented Feb 2, 2025

Uh oh!

llvmbot commented Feb 2, 2025

Uh oh!

Uh oh!

[AArch64] Enable vscale_range with +sme #124466

[AArch64] Enable vscale_range with +sme #124466

Uh oh!

Conversation

davemgreen commented Jan 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 26, 2025

Uh oh!

david-arm Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

davemgreen Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

aemerson Jan 28, 2025

Choose a reason for hiding this comment

Uh oh!

davemgreen Jan 28, 2025

Choose a reason for hiding this comment

Uh oh!

sdesmalen-arm Jan 28, 2025

Choose a reason for hiding this comment

Uh oh!

davemgreen Jan 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

davemgreen commented Feb 2, 2025

Uh oh!

llvmbot commented Feb 2, 2025

Uh oh!

Uh oh!

davemgreen commented Jan 26, 2025 •

edited

Loading

llvmbot commented Jan 26, 2025 •

edited

Loading