[AArch64] Enable AvoidLDAPUR for cpu=generic between armv8.4 and armv9.3. #125261

davemgreen · 2025-01-31T17:51:04Z

As added in #124274, CPUs in this range can suffer from performance issues with ldapur. As the gain from ldar->ldapr is expected to be greater than the minor gain from ldapr->ldapur, this opts to avoid the instruction under the default -mcpu=generic when the -march is less that armv8.8 / armv9.3.

I renamed AArch64Subtarget::Others to AArch64Subtarget::Generic to be clearer what it means.

llvmbot · 2025-01-31T17:51:38Z

@llvm/pr-subscribers-backend-aarch64

Author: David Green (davemgreen)

Changes

As added in #124274, CPUs in this range can suffer from performance issues with ldapur. As the gain from ldar->ldapr is expected to be greater than the minor gain from ldapr->ldapur, this opts to avoid the instruction under the default -mcpu=generic when the -march is less that armv9.3.

I renamed AArch64Subtarget::Others to AArch64Subtarget::Generic to be clearer what it means.

Full diff: https://github.com/llvm/llvm-project/pull/125261.diff

4 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64Subtarget.cpp (+6-1)
(modified) llvm/lib/Target/AArch64/AArch64Subtarget.h (+2-2)
(modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+1-1)
(modified) llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc_immo.ll (+2)

diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
index bc921f07e1dbf8..89e3e64e984421 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -125,7 +125,12 @@ void AArch64Subtarget::initializeProperties(bool HasMinSize) {
   // this in the future so we can specify it together with the subtarget
   // features.
   switch (ARMProcFamily) {
-  case Others:
+  case Generic:
+    // Using TuneCPU=generic we avoid ldapur instructions to line up with the
+    // cpus that use the AvoidLDAPUR feature. We don't want this to be on
+    // forever, so it is enabled between armv9.0 and armv9.2.
+    if (hasV9_0aOps() && !hasV9_3aOps())
+      AvoidLDAPUR = true;
     break;
   case Carmel:
     CacheLineSize = 64;
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.h b/llvm/lib/Target/AArch64/AArch64Subtarget.h
index d22991224d496d..dca5f5393fe47b 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -38,7 +38,7 @@ class Triple;
 class AArch64Subtarget final : public AArch64GenSubtargetInfo {
 public:
   enum ARMProcFamilyEnum : uint8_t {
-    Others,
+    Generic,
 #define ARM_PROCESSOR_FAMILY(ENUM) ENUM,
 #include "llvm/TargetParser/AArch64TargetParserDef.inc"
 #undef ARM_PROCESSOR_FAMILY
@@ -46,7 +46,7 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo {
 
 protected:
   /// ARMProcFamily - ARM processor family: Cortex-A53, Cortex-A57, and others.
-  ARMProcFamilyEnum ARMProcFamily = Others;
+  ARMProcFamilyEnum ARMProcFamily = Generic;
 
   // Enable 64-bit vectorization in SLP.
   unsigned MinVectorRegisterBitWidth = 64;
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index aae2fdaf5bec37..81f920e2c788b1 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -4272,7 +4272,7 @@ void AArch64TTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
   // If mcpu is omitted, getProcFamily() returns AArch64Subtarget::Others, so by
   // checking for that case, we can ensure that the default behaviour is
   // unchanged
-  if (ST->getProcFamily() != AArch64Subtarget::Others &&
+  if (ST->getProcFamily() != AArch64Subtarget::Generic &&
       !ST->getSchedModel().isOutOfOrder()) {
     UP.Runtime = true;
     UP.Partial = true;
diff --git a/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc_immo.ll b/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc_immo.ll
index b475e68db411a4..0cdd08535884e9 100644
--- a/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc_immo.ll
+++ b/llvm/test/CodeGen/AArch64/Atomics/aarch64-atomic-load-rcpc_immo.ll
@@ -7,6 +7,8 @@
 ; RUN: llc %s -o - -verify-machineinstrs -mtriple=aarch64 -mcpu=cortex-x3 -global-isel=false -O1 | FileCheck %s --check-prefixes=CHECK,SDAG,SDAG-AVOIDLDAPUR
 ; RUN: llc %s -o - -verify-machineinstrs -mtriple=aarch64 -mcpu=cortex-x4 -global-isel=false -O1 | FileCheck %s --check-prefixes=CHECK,SDAG,SDAG-AVOIDLDAPUR
 ; RUN: llc %s -o - -verify-machineinstrs -mtriple=aarch64 -mcpu=cortex-x925 -global-isel=false -O1 | FileCheck %s --check-prefixes=CHECK,SDAG,SDAG-AVOIDLDAPUR
+; RUN: llc %s -o - -verify-machineinstrs -mtriple=aarch64 -mattr=+v9a -global-isel=false -O1 | FileCheck %s --check-prefixes=CHECK,SDAG,SDAG-AVOIDLDAPUR
+; RUN: llc %s -o - -verify-machineinstrs -mtriple=aarch64 -mattr=+v9.3a -global-isel=false -O1 | FileCheck %s --check-prefixes=CHECK,SDAG,SDAG-NOAVOIDLDAPUR
 
 define i8 @load_atomic_i8_aligned_unordered(ptr %ptr) {
 ; CHECK-LABEL: load_atomic_i8_aligned_unordered:

rj-jesus · 2025-02-03T10:16:35Z

Hi, thank you for this! I think it looks good overall, though I think we should extend this to Armv8-A too.

Maybe we could replace the condition hasV9_0aOps() && !hasV9_3aOps() with !hasV8_8aOps()? This should enable AvoidLDAPUR prior to v8.8a and (transitively) v9.3a. What do you think?

davemgreen · 2025-02-04T07:31:08Z

Hi, thank you for this! I think it looks good overall, though I think we should extend this to Armv8-A too.

Do you expect many people to use armv8.4-a when targeting grace? Or higher armv8 variants of the architecture?

rj-jesus · 2025-02-04T08:19:34Z

Do you expect many people to use armv8.4-a when targeting grace? Or higher armv8 variants of the architecture?

I think that's a reasonable expectation, yes.

davemgreen · 2025-02-05T19:55:11Z

I talked to some people internally and they mentioned that some of the the graviton machines are armv8.4, and users could use that to compile for multiple cpu generations. I was hoping to limit the blast radius of this in the long run, but whatever we do will be a bit rubbish and adding 8.4 might be the least bad option.

…9.3. As added in llvm#124274, CPUs in this range can suffer from performance issues with ldapur. As the gain from ldar->ldapr is expected to be greater than the minor gain from ldapr->ldapur, this opts to avoid the instruction under the default -mcpu=generic when the -march is less that armv9.3.

rj-jesus

Thank you for making these changes, it looks good now. As you say, I think there isn't a perfect solution here. This seems to be the best compromise.

davemgreen · 2025-02-07T15:29:06Z

/cherry-pick 6424abc

llvmbot · 2025-02-07T15:34:41Z

/pull-request #126253

…9.3. (llvm#125261) As added in llvm#124274, CPUs in this range can suffer from performance issues with ldapur. As the gain from ldar->ldapr is expected to be greater than the minor gain from ldapr->ldapur, this opts to avoid the instruction under the default -mcpu=generic when the -march is less that armv8.8 / armv9.3. I renamed AArch64Subtarget::Others to AArch64Subtarget::Generic to be clearer what it means. (cherry picked from commit 6424abc)

…9.3. (llvm#125261) As added in llvm#124274, CPUs in this range can suffer from performance issues with ldapur. As the gain from ldar->ldapr is expected to be greater than the minor gain from ldapr->ldapur, this opts to avoid the instruction under the default -mcpu=generic when the -march is less that armv8.8 / armv9.3. I renamed AArch64Subtarget::Others to AArch64Subtarget::Generic to be clearer what it means.

davemgreen requested review from rj-jesus, sjoerdmeijer and david-arm January 31, 2025 17:51

llvmbot added the backend:AArch64 label Jan 31, 2025

davemgreen force-pushed the gh-a64-genericavoidldapur branch from c82fa16 to a67a8cb Compare February 5, 2025 19:56

davemgreen requested a review from efriedma-quic February 5, 2025 19:56

rj-jesus approved these changes Feb 6, 2025

View reviewed changes

davemgreen changed the title ~~[AArch64] Enable AvoidLDAPUR for cpu=generic between armv9.0 and armv9.3.~~ [AArch64] Enable AvoidLDAPUR for cpu=generic between armv8.4 and armv9.3. Feb 7, 2025

davemgreen merged commit 6424abc into llvm:main Feb 7, 2025
8 checks passed

davemgreen deleted the gh-a64-genericavoidldapur branch February 7, 2025 10:17

davemgreen added this to the LLVM 20.X Release milestone Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64] Enable AvoidLDAPUR for cpu=generic between armv8.4 and armv9.3. #125261

[AArch64] Enable AvoidLDAPUR for cpu=generic between armv8.4 and armv9.3. #125261

Uh oh!

davemgreen commented Jan 31, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jan 31, 2025

Uh oh!

rj-jesus commented Feb 3, 2025 •

edited

Loading

Uh oh!

davemgreen commented Feb 4, 2025

Uh oh!

rj-jesus commented Feb 4, 2025

Uh oh!

davemgreen commented Feb 5, 2025

Uh oh!

rj-jesus left a comment

Uh oh!

Uh oh!

davemgreen commented Feb 7, 2025

Uh oh!

llvmbot commented Feb 7, 2025

Uh oh!

Uh oh!

[AArch64] Enable AvoidLDAPUR for cpu=generic between armv8.4 and armv9.3. #125261

[AArch64] Enable AvoidLDAPUR for cpu=generic between armv8.4 and armv9.3. #125261

Uh oh!

Conversation

davemgreen commented Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 31, 2025

Uh oh!

rj-jesus commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davemgreen commented Feb 4, 2025

Uh oh!

rj-jesus commented Feb 4, 2025

Uh oh!

davemgreen commented Feb 5, 2025

Uh oh!

rj-jesus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

davemgreen commented Feb 7, 2025

Uh oh!

llvmbot commented Feb 7, 2025

Uh oh!

Uh oh!

davemgreen commented Jan 31, 2025 •

edited

Loading

rj-jesus commented Feb 3, 2025 •

edited

Loading