Skip to content

[AArch64] Enable AvoidLDAPUR for cpu=generic between armv8.4 and armv9.3. #125261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 7, 2025

Conversation

davemgreen
Copy link
Collaborator

@davemgreen davemgreen commented Jan 31, 2025

As added in #124274, CPUs in this range can suffer from performance issues with ldapur. As the gain from ldar->ldapr is expected to be greater than the minor gain from ldapr->ldapur, this opts to avoid the instruction under the default -mcpu=generic when the -march is less that armv8.8 / armv9.3.

I renamed AArch64Subtarget::Others to AArch64Subtarget::Generic to be clearer what it means.

@llvmbot
Copy link
Member

llvmbot commented Jan 31, 2025

@llvm/pr-subscribers-backend-aarch64

Author: David Green (davemgreen)

Changes

As added in #124274, CPUs in this range can suffer from performance issues with ldapur. As the gain from ldar->ldapr is expected to be greater than the minor gain from ldapr->ldapur, this opts to avoid the instruction under the default -mcpu=generic when the -march is less that armv9.3.

I renamed AArch64Subtarget::Others to AArch64Subtarget::Generic to be clearer what it means.


Full diff: https://github.com/llvm/llvm-project/pull/125261.diff

4 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64Subtarget.cpp (+6-1)
  • (modified) llvm/lib/Target/AArch64/AArch64Subtarget.h (+2-2)
  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc_immo.ll (+2)
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
index bc921f07e1dbf8..89e3e64e984421 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -125,7 +125,12 @@ void AArch64Subtarget::initializeProperties(bool HasMinSize) {
   // this in the future so we can specify it together with the subtarget
   // features.
   switch (ARMProcFamily) {
-  case Others:
+  case Generic:
+    // Using TuneCPU=generic we avoid ldapur instructions to line up with the
+    // cpus that use the AvoidLDAPUR feature. We don't want this to be on
+    // forever, so it is enabled between armv9.0 and armv9.2.
+    if (hasV9_0aOps() && !hasV9_3aOps())
+      AvoidLDAPUR = true;
     break;
   case Carmel:
     CacheLineSize = 64;
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.h b/llvm/lib/Target/AArch64/AArch64Subtarget.h
index d22991224d496d..dca5f5393fe47b 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -38,7 +38,7 @@ class Triple;
 class AArch64Subtarget final : public AArch64GenSubtargetInfo {
 public:
   enum ARMProcFamilyEnum : uint8_t {
-    Others,
+    Generic,
 #define ARM_PROCESSOR_FAMILY(ENUM) ENUM,
 #include "llvm/TargetParser/AArch64TargetParserDef.inc"
 #undef ARM_PROCESSOR_FAMILY
@@ -46,7 +46,7 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
 
 protected:
   /// ARMProcFamily - ARM processor family: Cortex-A53, Cortex-A57, and others.
-  ARMProcFamilyEnum ARMProcFamily = Others;
+  ARMProcFamilyEnum ARMProcFamily = Generic;
 
   // Enable 64-bit vectorization in SLP.
   unsigned MinVectorRegisterBitWidth = 64;
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index aae2fdaf5bec37..81f920e2c788b1 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -4272,7 +4272,7 @@ void AArch64TTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
   // If mcpu is omitted, getProcFamily() returns AArch64Subtarget::Others, so by
   // checking for that case, we can ensure that the default behaviour is
   // unchanged
-  if (ST->getProcFamily() != AArch64Subtarget::Others &&
+  if (ST->getProcFamily() != AArch64Subtarget::Generic &&
       !ST->getSchedModel().isOutOfOrder()) {
     UP.Runtime = true;
     UP.Partial = true;
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc_immo.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc_immo.ll
index b475e68db411a4..0cdd08535884e9 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc_immo.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc_immo.ll
@@ -7,6 +7,8 @@
 ; RUN: llc %s -o - -verify-machineinstrs -mtriple=aarch64 -mcpu=cortex-x3 -global-isel=false -O1 | FileCheck %s --check-prefixes=CHECK,SDAG,SDAG-AVOIDLDAPUR
 ; RUN: llc %s -o - -verify-machineinstrs -mtriple=aarch64 -mcpu=cortex-x4 -global-isel=false -O1 | FileCheck %s --check-prefixes=CHECK,SDAG,SDAG-AVOIDLDAPUR
 ; RUN: llc %s -o - -verify-machineinstrs -mtriple=aarch64 -mcpu=cortex-x925 -global-isel=false -O1 | FileCheck %s --check-prefixes=CHECK,SDAG,SDAG-AVOIDLDAPUR
+; RUN: llc %s -o - -verify-machineinstrs -mtriple=aarch64 -mattr=+v9a -global-isel=false -O1 | FileCheck %s --check-prefixes=CHECK,SDAG,SDAG-AVOIDLDAPUR
+; RUN: llc %s -o - -verify-machineinstrs -mtriple=aarch64 -mattr=+v9.3a -global-isel=false -O1 | FileCheck %s --check-prefixes=CHECK,SDAG,SDAG-NOAVOIDLDAPUR
 
 define i8 @load_atomic_i8_aligned_unordered(ptr %ptr) {
 ; CHECK-LABEL: load_atomic_i8_aligned_unordered:

@rj-jesus
Copy link
Contributor

rj-jesus commented Feb 3, 2025

Hi, thank you for this! I think it looks good overall, though I think we should extend this to Armv8-A too.

Maybe we could replace the condition hasV9_0aOps() && !hasV9_3aOps() with !hasV8_8aOps()? This should enable AvoidLDAPUR prior to v8.8a and (transitively) v9.3a. What do you think?

@davemgreen
Copy link
Collaborator Author

Hi, thank you for this! I think it looks good overall, though I think we should extend this to Armv8-A too.

Do you expect many people to use armv8.4-a when targeting grace? Or higher armv8 variants of the architecture?

@rj-jesus
Copy link
Contributor

rj-jesus commented Feb 4, 2025

Do you expect many people to use armv8.4-a when targeting grace? Or higher armv8 variants of the architecture?

I think that's a reasonable expectation, yes.

@davemgreen
Copy link
Collaborator Author

I talked to some people internally and they mentioned that some of the the graviton machines are armv8.4, and users could use that to compile for multiple cpu generations. I was hoping to limit the blast radius of this in the long run, but whatever we do will be a bit rubbish and adding 8.4 might be the least bad option.

…9.3.

As added in llvm#124274, CPUs in this range can suffer from performance issues with
ldapur. As the gain from ldar->ldapr is expected to be greater than the minor
gain from ldapr->ldapur, this opts to avoid the instruction under the default
-mcpu=generic when the -march is less that armv9.3.
@davemgreen davemgreen force-pushed the gh-a64-genericavoidldapur branch from c82fa16 to a67a8cb Compare February 5, 2025 19:56
Copy link
Contributor

@rj-jesus rj-jesus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making these changes, it looks good now. As you say, I think there isn't a perfect solution here. This seems to be the best compromise.

@davemgreen davemgreen changed the title [AArch64] Enable AvoidLDAPUR for cpu=generic between armv9.0 and armv9.3. [AArch64] Enable AvoidLDAPUR for cpu=generic between armv8.4 and armv9.3. Feb 7, 2025
@davemgreen davemgreen merged commit 6424abc into llvm:main Feb 7, 2025
8 checks passed
@davemgreen davemgreen deleted the gh-a64-genericavoidldapur branch February 7, 2025 10:17
@davemgreen davemgreen added this to the LLVM 20.X Release milestone Feb 7, 2025
@davemgreen
Copy link
Collaborator Author

/cherry-pick 6424abc

@llvmbot
Copy link
Member

llvmbot commented Feb 7, 2025

/pull-request #126253

swift-ci pushed a commit to swiftlang/llvm-project that referenced this pull request Feb 8, 2025
…9.3. (llvm#125261)

As added in llvm#124274, CPUs in this range can suffer from performance
issues with ldapur. As the gain from ldar->ldapr is expected to be
greater than the minor gain from ldapr->ldapur, this opts to avoid the
instruction under the default -mcpu=generic when the -march is less that
armv8.8 / armv9.3.

I renamed AArch64Subtarget::Others to AArch64Subtarget::Generic to be
clearer what it means.

(cherry picked from commit 6424abc)
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
…9.3. (llvm#125261)

As added in llvm#124274, CPUs in this range can suffer from performance
issues with ldapur. As the gain from ldar->ldapr is expected to be
greater than the minor gain from ldapr->ldapur, this opts to avoid the
instruction under the default -mcpu=generic when the -march is less that
armv8.8 / armv9.3.

I renamed AArch64Subtarget::Others to AArch64Subtarget::Generic to be
clearer what it means.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

3 participants