Skip to content

[AIX][TLS] Produce a faster local-exec access sequence for the "aix-small-tls" global variable attribute #83053

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 28, 2024

Conversation

amy-kwan
Copy link
Contributor

Similar to 3f46e54, this patch allows the backend to produce a faster access sequence for the local-exec TLS model, where loading from the TOC can be avoided, for local-exec TLS variables that are annotated with the "aix-small-tls" attribute.

The expectation is for local-exec TLS variables to be set with this attribute through PGO. Furthermore, the optimized access sequence is only generated for local-exec TLS variables annotated with "aix-small-tls", only if they are less than ~32KB in size.

@llvmbot
Copy link
Member

llvmbot commented Feb 26, 2024

@llvm/pr-subscribers-backend-powerpc

Author: Amy Kwan (amy-kwan)

Changes

Similar to 3f46e54, this patch allows the backend to produce a faster access sequence for the local-exec TLS model, where loading from the TOC can be avoided, for local-exec TLS variables that are annotated with the "aix-small-tls" attribute.

The expectation is for local-exec TLS variables to be set with this attribute through PGO. Furthermore, the optimized access sequence is only generated for local-exec TLS variables annotated with "aix-small-tls", only if they are less than ~32KB in size.


Patch is 33.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/83053.diff

5 Files Affected:

  • (modified) llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp (+26-8)
  • (modified) llvm/lib/Target/PowerPC/PPCISelLowering.cpp (+15-6)
  • (added) llvm/test/CodeGen/PowerPC/aix-small-tls-globalvarattr-funcattr.ll (+197)
  • (added) llvm/test/CodeGen/PowerPC/aix-small-tls-globalvarattr-loadaddr.ll (+251)
  • (added) llvm/test/CodeGen/PowerPC/aix-small-tls-globalvarattr-targetattr.ll (+104)
diff --git a/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp b/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
index 9e5f0b36616d1b..05f5d6ba7007a6 100644
--- a/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
@@ -7573,6 +7573,22 @@ static void reduceVSXSwap(SDNode *N, SelectionDAG *DAG) {
   DAG->ReplaceAllUsesOfValueWith(SDValue(N, 0), N->getOperand(0));
 }
 
+// Check if an SDValue has the 'aix-small-tls' global variable attribute.
+static bool hasAIXSmallTLSAttr(SDValue Val) {
+  GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(Val);
+  if (!GA)
+    return false;
+
+  const GlobalVariable *GV = dyn_cast<GlobalVariable>(GA->getGlobal());
+  if (!GV)
+    return false;
+
+  if (!GV->hasAttribute("aix-small-tls"))
+    return false;
+
+  return true;
+}
+
 // Is an ADDI eligible for folding for non-TOC-based local-exec accesses?
 static bool isEligibleToFoldADDIForLocalExecAccesses(SelectionDAG *DAG,
                                                      SDValue ADDIToFold) {
@@ -7582,20 +7598,25 @@ static bool isEligibleToFoldADDIForLocalExecAccesses(SelectionDAG *DAG,
       (ADDIToFold.getMachineOpcode() != PPC::ADDI8))
     return false;
 
+  // Folding is only allowed for the AIX small-local-exec TLS target attribute
+  // or when the 'aix-small-tls' global variable attribute is present.
+  const PPCSubtarget &Subtarget =
+      DAG->getMachineFunction().getSubtarget<PPCSubtarget>();
+  SDValue TLSVarNode = ADDIToFold.getOperand(1);
+  if (!(Subtarget.hasAIXSmallLocalExecTLS() || hasAIXSmallTLSAttr(TLSVarNode)))
+    return false;
+
   // The first operand of the ADDIToFold should be the thread pointer.
   // This transformation is only performed if the first operand of the
   // addi is the thread pointer.
   SDValue TPRegNode = ADDIToFold.getOperand(0);
   RegisterSDNode *TPReg = dyn_cast<RegisterSDNode>(TPRegNode.getNode());
-  const PPCSubtarget &Subtarget =
-      DAG->getMachineFunction().getSubtarget<PPCSubtarget>();
   if (!TPReg || (TPReg->getReg() != Subtarget.getThreadPointerRegister()))
     return false;
 
   // The second operand of the ADDIToFold should be the global TLS address
   // (the local-exec TLS variable). We only perform the folding if the TLS
   // variable is the second operand.
-  SDValue TLSVarNode = ADDIToFold.getOperand(1);
   GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(TLSVarNode);
   if (!GA)
     return false;
@@ -7664,7 +7685,6 @@ static void foldADDIForLocalExecAccesses(SDNode *N, SelectionDAG *DAG) {
 
 void PPCDAGToDAGISel::PeepholePPC64() {
   SelectionDAG::allnodes_iterator Position = CurDAG->allnodes_end();
-  bool HasAIXSmallLocalExecTLS = Subtarget->hasAIXSmallLocalExecTLS();
 
   while (Position != CurDAG->allnodes_begin()) {
     SDNode *N = &*--Position;
@@ -7676,8 +7696,7 @@ void PPCDAGToDAGISel::PeepholePPC64() {
       reduceVSXSwap(N, CurDAG);
 
     // This optimization is performed for non-TOC-based local-exec accesses.
-    if (HasAIXSmallLocalExecTLS)
-      foldADDIForLocalExecAccesses(N, CurDAG);
+    foldADDIForLocalExecAccesses(N, CurDAG);
 
     unsigned FirstOp;
     unsigned StorageOpcode = N->getMachineOpcode();
@@ -7836,8 +7855,7 @@ void PPCDAGToDAGISel::PeepholePPC64() {
                                             ImmOpnd.getValueType());
       } else if (Offset != 0) {
         // This optimization is performed for non-TOC-based local-exec accesses.
-        if (HasAIXSmallLocalExecTLS &&
-            isEligibleToFoldADDIForLocalExecAccesses(CurDAG, Base)) {
+        if (isEligibleToFoldADDIForLocalExecAccesses(CurDAG, Base)) {
           // Add the non-zero offset information into the load or store
           // instruction to be used for non-TOC-based local-exec accesses.
           GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(ImmOpnd);
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index 51becf1d5b8584..128cfa79ff95e4 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -3365,6 +3365,7 @@ SDValue PPCTargetLowering::LowerGlobalTLSAddressAIX(SDValue Op,
   EVT PtrVT = getPointerTy(DAG.getDataLayout());
   bool Is64Bit = Subtarget.isPPC64();
   bool HasAIXSmallLocalExecTLS = Subtarget.hasAIXSmallLocalExecTLS();
+  bool HasAIXSmallTLSGlobalAttr = false;
   TLSModel::Model Model = getTargetMachine().getTLSModel(GV);
   bool IsTLSLocalExecModel = Model == TLSModel::LocalExec;
 
@@ -3373,6 +3374,11 @@ SDValue PPCTargetLowering::LowerGlobalTLSAddressAIX(SDValue Op,
         DAG.getTargetGlobalAddress(GV, dl, PtrVT, 0, PPCII::MO_TPREL_FLAG);
     SDValue VariableOffset = getTOCEntry(DAG, dl, VariableOffsetTGA);
     SDValue TLSReg;
+
+    if (const GlobalVariable *GVar = dyn_cast<GlobalVariable>(GV))
+      if (GVar->hasAttribute("aix-small-tls"))
+        HasAIXSmallTLSGlobalAttr = true;
+
     if (Is64Bit) {
       // For local-exec and initial-exec on AIX (64-bit), the sequence generated
       // involves a load of the variable offset (from the TOC), followed by an
@@ -3382,14 +3388,16 @@ SDValue PPCTargetLowering::LowerGlobalTLSAddressAIX(SDValue Op,
       //    add reg2, reg1, r13     // r13 contains the thread pointer
       TLSReg = DAG.getRegister(PPC::X13, MVT::i64);
 
-      // With the -maix-small-local-exec-tls option, produce a faster access
-      // sequence for local-exec TLS variables where the offset from the TLS
-      // base is encoded as an immediate operand.
+      // With the -maix-small-local-exec-tls option, or with the "aix-small-tls"
+      // global variable attribute, produce a faster access sequence for
+      // local-exec TLS variables where the offset from the TLS base is encoded
+      // as an immediate operand.
       //
       // We only utilize the faster local-exec access sequence when the TLS
       // variable has a size within the policy limit. We treat types that are
       // not sized or are empty as being over the policy size limit.
-      if (HasAIXSmallLocalExecTLS && IsTLSLocalExecModel) {
+      if ((HasAIXSmallLocalExecTLS || HasAIXSmallTLSGlobalAttr) &&
+          IsTLSLocalExecModel) {
         Type *GVType = GV->getValueType();
         if (GVType->isSized() && !GVType->isEmptyTy() &&
             GV->getParent()->getDataLayout().getTypeAllocSize(GVType) <=
@@ -3407,8 +3415,9 @@ SDValue PPCTargetLowering::LowerGlobalTLSAddressAIX(SDValue Op,
       TLSReg = DAG.getNode(PPCISD::GET_TPOINTER, dl, PtrVT);
 
       // We do not implement the 32-bit version of the faster access sequence
-      // for local-exec that is controlled by -maix-small-local-exec-tls.
-      if (HasAIXSmallLocalExecTLS)
+      // for local-exec that is controlled by the -maix-small-local-exec-tls
+      // option, or the "aix-small-tls" global variable attribute.
+      if (HasAIXSmallLocalExecTLS || HasAIXSmallTLSGlobalAttr)
         report_fatal_error("The small-local-exec TLS access sequence is "
                            "currently only supported on AIX (64-bit mode).");
     }
diff --git a/llvm/test/CodeGen/PowerPC/aix-small-tls-globalvarattr-funcattr.ll b/llvm/test/CodeGen/PowerPC/aix-small-tls-globalvarattr-funcattr.ll
new file mode 100644
index 00000000000000..55e486876e3373
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/aix-small-tls-globalvarattr-funcattr.ll
@@ -0,0 +1,197 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
+; RUN: llc  -verify-machineinstrs -mcpu=pwr7 -ppc-asm-full-reg-names \
+; RUN:      -mtriple powerpc64-ibm-aix-xcoff < %s \
+; RUN:      | FileCheck %s --check-prefix=CHECK-SMALLCM64
+; RUN: llc  -verify-machineinstrs -mcpu=pwr7 -ppc-asm-full-reg-names \
+; RUN:      -mtriple powerpc64-ibm-aix-xcoff --code-model=large \
+; RUN:      < %s | FileCheck %s --check-prefix=CHECK-LARGECM64
+
+@mySmallLocalExecTLS6 = external thread_local(localexec) global [60 x i64], align 8
+@mySmallLocalExecTLS2 = external thread_local(localexec) global [3000 x i64], align 8 #0
+@MyTLSGDVar = thread_local global [800 x i64] zeroinitializer, align 8
+@mySmallLocalExecTLS3 = internal thread_local(localexec) global [3000 x i64] zeroinitializer, align 8
+@mySmallLocalExecTLS4 = internal thread_local(localexec) global [3000 x i64] zeroinitializer, align 8 #0
+@mySmallLocalExecTLS5 = thread_local(localexec) global [3000 x i64] zeroinitializer, align 8 #0
+@mySmallLocalExecTLS = thread_local(localexec) local_unnamed_addr global [7800 x i64] zeroinitializer, align 8 #0
+declare nonnull ptr @llvm.threadlocal.address.p0(ptr nonnull)
+
+; All accesses use a "faster" local-exec sequence directly off the thread pointer.
+define i64 @StoreLargeAccess1() #1 {
+; CHECK-SMALLCM64-LABEL: StoreLargeAccess1:
+; CHECK-SMALLCM64:       # %bb.0: # %entry
+; CHECK-SMALLCM64-NEXT:    mflr r0
+; CHECK-SMALLCM64-NEXT:    stdu r1, -48(r1)
+; CHECK-SMALLCM64-NEXT:    li r3, 212
+; CHECK-SMALLCM64-NEXT:    li r4, 203
+; CHECK-SMALLCM64-NEXT:    std r0, 64(r1)
+; CHECK-SMALLCM64-NEXT:    std r3, mySmallLocalExecTLS6[UL]@le+424(r13)
+; CHECK-SMALLCM64-NEXT:    std r4, mySmallLocalExecTLS2[UL]@le+1200(r13)
+; CHECK-SMALLCM64-NEXT:    ld r3, L..C0(r2) # target-flags(ppc-tlsgdm) @MyTLSGDVar
+; CHECK-SMALLCM64-NEXT:    ld r4, L..C1(r2) # target-flags(ppc-tlsgd) @MyTLSGDVar
+; CHECK-SMALLCM64-NEXT:    bla .__tls_get_addr[PR]
+; CHECK-SMALLCM64-NEXT:    li r4, 44
+; CHECK-SMALLCM64-NEXT:    std r4, 440(r3)
+; CHECK-SMALLCM64-NEXT:    li r3, 6
+; CHECK-SMALLCM64-NEXT:    li r4, 100
+; CHECK-SMALLCM64-NEXT:    std r3, mySmallLocalExecTLS3[UL]@le+2000(r13)
+; CHECK-SMALLCM64-NEXT:    li r3, 882
+; CHECK-SMALLCM64-NEXT:    std r4, (mySmallLocalExecTLS4[UL]@le+6800)-65536(r13)
+; CHECK-SMALLCM64-NEXT:    std r3, (mySmallLocalExecTLS5[TL]@le+8400)-65536(r13)
+; CHECK-SMALLCM64-NEXT:    li r3, 1191
+; CHECK-SMALLCM64-NEXT:    addi r1, r1, 48
+; CHECK-SMALLCM64-NEXT:    ld r0, 16(r1)
+; CHECK-SMALLCM64-NEXT:    mtlr r0
+; CHECK-SMALLCM64-NEXT:    blr
+;
+; CHECK-LARGECM64-LABEL: StoreLargeAccess1:
+; CHECK-LARGECM64:       # %bb.0: # %entry
+; CHECK-LARGECM64-NEXT:    mflr r0
+; CHECK-LARGECM64-NEXT:    stdu r1, -48(r1)
+; CHECK-LARGECM64-NEXT:    li r3, 212
+; CHECK-LARGECM64-NEXT:    std r0, 64(r1)
+; CHECK-LARGECM64-NEXT:    addis r4, L..C0@u(r2)
+; CHECK-LARGECM64-NEXT:    ld r4, L..C0@l(r4)
+; CHECK-LARGECM64-NEXT:    std r3, mySmallLocalExecTLS6[UL]@le+424(r13)
+; CHECK-LARGECM64-NEXT:    li r3, 203
+; CHECK-LARGECM64-NEXT:    std r3, mySmallLocalExecTLS2[UL]@le+1200(r13)
+; CHECK-LARGECM64-NEXT:    addis r3, L..C1@u(r2)
+; CHECK-LARGECM64-NEXT:    ld r3, L..C1@l(r3)
+; CHECK-LARGECM64-NEXT:    bla .__tls_get_addr[PR]
+; CHECK-LARGECM64-NEXT:    li r4, 44
+; CHECK-LARGECM64-NEXT:    std r4, 440(r3)
+; CHECK-LARGECM64-NEXT:    li r3, 6
+; CHECK-LARGECM64-NEXT:    li r4, 100
+; CHECK-LARGECM64-NEXT:    std r3, mySmallLocalExecTLS3[UL]@le+2000(r13)
+; CHECK-LARGECM64-NEXT:    li r3, 882
+; CHECK-LARGECM64-NEXT:    std r4, (mySmallLocalExecTLS4[UL]@le+6800)-65536(r13)
+; CHECK-LARGECM64-NEXT:    std r3, (mySmallLocalExecTLS5[TL]@le+8400)-65536(r13)
+; CHECK-LARGECM64-NEXT:    li r3, 1191
+; CHECK-LARGECM64-NEXT:    addi r1, r1, 48
+; CHECK-LARGECM64-NEXT:    ld r0, 16(r1)
+; CHECK-LARGECM64-NEXT:    mtlr r0
+; CHECK-LARGECM64-NEXT:    blr
+entry:
+  %0 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS6)
+  %arrayidx = getelementptr inbounds [60 x i64], ptr %0, i64 0, i64 53
+  store i64 212, ptr %arrayidx, align 8
+  %1 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS2)
+  %arrayidx1 = getelementptr inbounds [3000 x i64], ptr %1, i64 0, i64 150
+  store i64 203, ptr %arrayidx1, align 8
+  %2 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @MyTLSGDVar)
+  %arrayidx2 = getelementptr inbounds [800 x i64], ptr %2, i64 0, i64 55
+  store i64 44, ptr %arrayidx2, align 8
+  %3 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS3)
+  %arrayidx3 = getelementptr inbounds [3000 x i64], ptr %3, i64 0, i64 250
+  store i64 6, ptr %arrayidx3, align 8
+  %4 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS4)
+  %arrayidx4 = getelementptr inbounds [3000 x i64], ptr %4, i64 0, i64 850
+  store i64 100, ptr %arrayidx4, align 8
+  %5 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS5)
+  %arrayidx5 = getelementptr inbounds [3000 x i64], ptr %5, i64 0, i64 1050
+  store i64 882, ptr %arrayidx5, align 8
+  %6 = load i64, ptr %arrayidx1, align 8
+  %7 = load i64, ptr %arrayidx3, align 8
+  %8 = load i64, ptr %arrayidx4, align 8
+  %add = add i64 %6, 882
+  %add9 = add i64 %add, %7
+  %add11 = add i64 %add9, %8
+  ret i64 %add11
+}
+
+; Since this function does not have the 'aix-small-local-exec-tls` attribute,
+; only some local-exec variables should have the small-local-exec TLS access
+; sequence (as opposed to all of them).
+define i64 @StoreLargeAccess2() {
+; CHECK-SMALLCM64-LABEL: StoreLargeAccess2:
+; CHECK-SMALLCM64:       # %bb.0: # %entry
+; CHECK-SMALLCM64-NEXT:    mflr r0
+; CHECK-SMALLCM64-NEXT:    stdu r1, -48(r1)
+; CHECK-SMALLCM64-NEXT:    ld r3, L..C2(r2) # target-flags(ppc-tprel) @mySmallLocalExecTLS6
+; CHECK-SMALLCM64-NEXT:    li r4, 212
+; CHECK-SMALLCM64-NEXT:    std r0, 64(r1)
+; CHECK-SMALLCM64-NEXT:    add r3, r13, r3
+; CHECK-SMALLCM64-NEXT:    std r4, 424(r3)
+; CHECK-SMALLCM64-NEXT:    ld r4, L..C1(r2) # target-flags(ppc-tlsgd) @MyTLSGDVar
+; CHECK-SMALLCM64-NEXT:    li r3, 203
+; CHECK-SMALLCM64-NEXT:    std r3, mySmallLocalExecTLS2[UL]@le+1200(r13)
+; CHECK-SMALLCM64-NEXT:    ld r3, L..C0(r2) # target-flags(ppc-tlsgdm) @MyTLSGDVar
+; CHECK-SMALLCM64-NEXT:    bla .__tls_get_addr[PR]
+; CHECK-SMALLCM64-NEXT:    li r4, 44
+; CHECK-SMALLCM64-NEXT:    std r4, 440(r3)
+; CHECK-SMALLCM64-NEXT:    ld r3, L..C3(r2) # target-flags(ppc-tprel) @mySmallLocalExecTLS3
+; CHECK-SMALLCM64-NEXT:    li r4, 6
+; CHECK-SMALLCM64-NEXT:    add r3, r13, r3
+; CHECK-SMALLCM64-NEXT:    std r4, 2000(r3)
+; CHECK-SMALLCM64-NEXT:    li r3, 100
+; CHECK-SMALLCM64-NEXT:    li r4, 882
+; CHECK-SMALLCM64-NEXT:    std r3, mySmallLocalExecTLS4[UL]@le+6800(r13)
+; CHECK-SMALLCM64-NEXT:    std r4, mySmallLocalExecTLS5[TL]@le+8400(r13)
+; CHECK-SMALLCM64-NEXT:    li r3, 1191
+; CHECK-SMALLCM64-NEXT:    addi r1, r1, 48
+; CHECK-SMALLCM64-NEXT:    ld r0, 16(r1)
+; CHECK-SMALLCM64-NEXT:    mtlr r0
+; CHECK-SMALLCM64-NEXT:    blr
+;
+; CHECK-LARGECM64-LABEL: StoreLargeAccess2:
+; CHECK-LARGECM64:       # %bb.0: # %entry
+; CHECK-LARGECM64-NEXT:    mflr r0
+; CHECK-LARGECM64-NEXT:    stdu r1, -48(r1)
+; CHECK-LARGECM64-NEXT:    addis r3, L..C2@u(r2)
+; CHECK-LARGECM64-NEXT:    li r4, 212
+; CHECK-LARGECM64-NEXT:    std r0, 64(r1)
+; CHECK-LARGECM64-NEXT:    ld r3, L..C2@l(r3)
+; CHECK-LARGECM64-NEXT:    add r3, r13, r3
+; CHECK-LARGECM64-NEXT:    std r4, 424(r3)
+; CHECK-LARGECM64-NEXT:    li r3, 203
+; CHECK-LARGECM64-NEXT:    addis r4, L..C0@u(r2)
+; CHECK-LARGECM64-NEXT:    ld r4, L..C0@l(r4)
+; CHECK-LARGECM64-NEXT:    std r3, mySmallLocalExecTLS2[UL]@le+1200(r13)
+; CHECK-LARGECM64-NEXT:    addis r3, L..C1@u(r2)
+; CHECK-LARGECM64-NEXT:    ld r3, L..C1@l(r3)
+; CHECK-LARGECM64-NEXT:    bla .__tls_get_addr[PR]
+; CHECK-LARGECM64-NEXT:    li r4, 44
+; CHECK-LARGECM64-NEXT:    std r4, 440(r3)
+; CHECK-LARGECM64-NEXT:    addis r3, L..C3@u(r2)
+; CHECK-LARGECM64-NEXT:    li r4, 6
+; CHECK-LARGECM64-NEXT:    ld r3, L..C3@l(r3)
+; CHECK-LARGECM64-NEXT:    add r3, r13, r3
+; CHECK-LARGECM64-NEXT:    std r4, 2000(r3)
+; CHECK-LARGECM64-NEXT:    li r3, 100
+; CHECK-LARGECM64-NEXT:    li r4, 882
+; CHECK-LARGECM64-NEXT:    std r3, mySmallLocalExecTLS4[UL]@le+6800(r13)
+; CHECK-LARGECM64-NEXT:    std r4, mySmallLocalExecTLS5[TL]@le+8400(r13)
+; CHECK-LARGECM64-NEXT:    li r3, 1191
+; CHECK-LARGECM64-NEXT:    addi r1, r1, 48
+; CHECK-LARGECM64-NEXT:    ld r0, 16(r1)
+; CHECK-LARGECM64-NEXT:    mtlr r0
+; CHECK-LARGECM64-NEXT:    blr
+entry:
+  %0 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS6)
+  %arrayidx = getelementptr inbounds [60 x i64], ptr %0, i64 0, i64 53
+  store i64 212, ptr %arrayidx, align 8
+  %1 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS2)
+  %arrayidx1 = getelementptr inbounds [3000 x i64], ptr %1, i64 0, i64 150
+  store i64 203, ptr %arrayidx1, align 8
+  %2 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @MyTLSGDVar)
+  %arrayidx2 = getelementptr inbounds [800 x i64], ptr %2, i64 0, i64 55
+  store i64 44, ptr %arrayidx2, align 8
+  %3 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS3)
+  %arrayidx3 = getelementptr inbounds [3000 x i64], ptr %3, i64 0, i64 250
+  store i64 6, ptr %arrayidx3, align 8
+  %4 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS4)
+  %arrayidx4 = getelementptr inbounds [3000 x i64], ptr %4, i64 0, i64 850
+  store i64 100, ptr %arrayidx4, align 8
+  %5 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS5)
+  %arrayidx5 = getelementptr inbounds [3000 x i64], ptr %5, i64 0, i64 1050
+  store i64 882, ptr %arrayidx5, align 8
+  %6 = load i64, ptr %arrayidx1, align 8
+  %7 = load i64, ptr %arrayidx3, align 8
+  %8 = load i64, ptr %arrayidx4, align 8
+  %add = add i64 %6, 882
+  %add9 = add i64 %add, %7
+  %add11 = add i64 %add9, %8
+  ret i64 %add11
+}
+
+attributes #0 = { "aix-small-tls" }
+attributes #1 = { "target-features"="+aix-small-local-exec-tls" }
diff --git a/llvm/test/CodeGen/PowerPC/aix-small-tls-globalvarattr-loadaddr.ll b/llvm/test/CodeGen/PowerPC/aix-small-tls-globalvarattr-loadaddr.ll
new file mode 100644
index 00000000000000..db4266958daff1
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/aix-small-tls-globalvarattr-loadaddr.ll
@@ -0,0 +1,251 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc  -verify-machineinstrs -mcpu=pwr7 -ppc-asm-full-reg-names \
+; RUN:      -mtriple powerpc64-ibm-aix-xcoff < %s \
+; RUN:      | FileCheck %s --check-prefix=SMALLCM64
+; RUN: llc  -verify-machineinstrs -mcpu=pwr7 -ppc-asm-full-reg-names \
+; RUN:      -mtriple powerpc64-ibm-aix-xcoff --code-model=large \
+; RUN:      < %s | FileCheck %s --check-prefix=LARGECM64
+
+; Test that the 'aix-small-tls' global variable attribute generates the
+; optimized small-local-exec TLS sequence. Global variables without this
+; attribute should still generate a TOC-based local-exec access sequence.
+
+declare nonnull ptr @llvm.threadlocal.address.p0(ptr nonnull)
+
+@a = thread_local(localexec) global [87 x i8] zeroinitializer, align 1 #0
+@a_noattr = thread_local(localexec) global [87 x i8] zeroinitializer, align 1
+@b = thread_local(localexec) global [87 x i16] zeroinitializer, align 2 #0
+@b_noattr = thread_local(localexec) global [87 x i16] zeroinitializer, align 2
+@c = thread_local(localexec) global [87 x i32] zeroinitializer, align 4 #0
+@c_noattr = thread_local(localexec) global [87 x i32] zeroinitializer, align 4
+@d = thread_local(localexec) global [87 x i64] zeroinitializer, align 8 #0
+@d_noattr = thread_local(localexec) global [87 x i64] zeroinitializer, align 8 #0
+
+@e = thread_local(localexec) global [87 x double] zeroinitializer, align 8 #0
+@e_noattr = thread_local(localexec) global [87 x double] zeroinitializer, align 8
+@f = thread_local(localexec) global [87 x float] zeroinitializer, align 4 #0
+@f_noattr = thread_local(localexec) global [87 x float] zeroinitializer, align 4
+
+define nonnull ptr @AddrTest1() local_unnamed_addr {
+; SMALLCM64-LABEL: AddrTest1:
+; SMALLCM64:       # %bb.0: # %entry
+; SMALLCM64-NEXT:    addi r3, r13, a[TL]@le+1
+; SMALLCM64-NEXT:    blr
+;
+; LARGECM64-LABEL: AddrTest1:
+; LARGECM64:       # %bb.0: # %entry
+; LARGECM64-NEXT:    addi r3, r13, a[TL]@le+1
+; LARGECM64-NEXT:    blr
+entry:
+  %0 = tail call align 1 ptr @llvm.threadlocal.address.p0(ptr align 1 @a)
+  %arrayidx = getelementptr inbounds [87 x i8], ptr %0, i64 0, i64 1
+  ret ptr %arrayidx
+}
+
+define nonnull ptr @AddrTest1_NoAttr() local_unnamed_addr {
+; SMALLCM64-LABEL: AddrTest1_NoAttr:
+; SMALLCM64:       # %bb.0: # %entry
+; SMALLCM64-NEXT:    ld r3, L..C0(r2) # target-flags(ppc-tprel) @a_noattr
+; SMALLCM64-NEXT:    add r3, r13, r3
+; SM...
[truncated]

@amy-kwan
Copy link
Contributor Author

amy-kwan commented Mar 4, 2024

Ping.

1 similar comment
@amy-kwan
Copy link
Contributor Author

Ping.

Comment on lines 7586 to 7589
if (!GV->hasAttribute("aix-small-tls"))
return false;

return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO you can collapse the lines without loosing readability:

Suggested change
if (!GV->hasAttribute("aix-small-tls"))
return false;
return true;
return GV->hasAttribute("aix-small-tls");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I'll make that change.

Copy link
Contributor

@diggerlin diggerlin Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer change function code to, (but feel free to keep your code if you want)

static bool hasAIXSmallTLSAttr(SDValue Val) {
  if (GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(Val))
       if (GlobalVariable *GV = dyn_cast<GlobalVariable>(GA->getGlobal());
           if (GV->hasAttribute("aix-small-tls"))
                return true;
    return false;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully agree!

Comment on lines 7586 to 7589
if (!GV->hasAttribute("aix-small-tls"))
return false;

return true;
Copy link
Contributor

@diggerlin diggerlin Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer change function code to, (but feel free to keep your code if you want)

static bool hasAIXSmallTLSAttr(SDValue Val) {
  if (GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(Val))
       if (GlobalVariable *GV = dyn_cast<GlobalVariable>(GA->getGlobal());
           if (GV->hasAttribute("aix-small-tls"))
                return true;
    return false;
}

@amy-kwan amy-kwan requested review from diggerlin and redstar March 20, 2024 13:08
Copy link
Member

@redstar redstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@amy-kwan amy-kwan requested a review from diggerlin March 21, 2024 11:38
Copy link

@orcguru orcguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to turn the function level attribute HasAIXSmallLocalExecTLS into variable attributes for example by addAttribute inside LowerGlobalTLSAddressAIX, and then we only need to check variable attribute for the peepholes?

For any TLS LE variable accessed by more than one functions, and if one of them HasAIXSmallLocalExecTLS, then that variable should better have the "aix-small-tls" attribute?

@amy-kwan
Copy link
Contributor Author

Does it make sense to turn the function level attribute HasAIXSmallLocalExecTLS into variable attributes for example by addAttribute inside LowerGlobalTLSAddressAIX, and then we only need to check variable attribute for the peepholes?

For any TLS LE variable accessed by more than one functions, and if one of them HasAIXSmallLocalExecTLS, then that variable should better have the "aix-small-tls" attribute?

Thanks for taking a look at the patch, @orcguru! I apologize if I misunderstood your suggestion: are you suggesting that HasAIXSmallLocalExecTLS should be a variable attribute, similar to aix-small-tls?

The target/function attribute was implemented to correspond to the front end clang option to turn on this optimized code gen (-maix-small-local-exec-tls) for all variables. However, I believe this aix-small-tls is more about selectively deciding which TLS LE variables get the optimized code gen. On the other hand, in the presence of HasAIXSmallLocalExecTLS (when this is turned on/true), I remember discussing that the optimized code sequence gets generated for all cases, anyway.

@hubert-reinterpretcast Is my above understanding correct, and do you have any thoughts on this/the suggestion?

@amy-kwan amy-kwan requested review from diggerlin and orcguru March 22, 2024 21:00
Copy link

✅ With the latest revision this PR passed the Python code formatter.

Copy link

✅ With the latest revision this PR passed the C/C++ code formatter.

@orcguru
Copy link

orcguru commented Mar 24, 2024

Does it make sense to turn the function level attribute HasAIXSmallLocalExecTLS into variable attributes for example by addAttribute inside LowerGlobalTLSAddressAIX, and then we only need to check variable attribute for the peepholes?
For any TLS LE variable accessed by more than one functions, and if one of them HasAIXSmallLocalExecTLS, then that variable should better have the "aix-small-tls" attribute?

Thanks for taking a look at the patch, @orcguru! I apologize if I misunderstood your suggestion: are you suggesting that HasAIXSmallLocalExecTLS should be a variable attribute, similar to aix-small-tls?

The target/function attribute was implemented to correspond to the front end clang option to turn on this optimized code gen (-maix-small-local-exec-tls) for all variables. However, I believe this aix-small-tls is more about selectively deciding which TLS LE variables get the optimized code gen. On the other hand, in the presence of HasAIXSmallLocalExecTLS (when this is turned on/true), I remember discussing that the optimized code sequence gets generated for all cases, anyway.

@hubert-reinterpretcast Is my above understanding correct, and do you have any thoughts on this/the suggestion?

Hi Amy, my understanding is that ISEL checks those attributes, and then peephole also checks those. ISEL just executes once, however peephole may get invoked multiple times since some more peephole opportunity may surface after previous peephole did make some change. This makes it more important to simplify peephole's operations from my view.

Now given that both the function level attribute and variable attribute all serve the same purpose, can we simplify that into one kind of flag check in the peephole logic? This is my first question.

Regarding my second question, I think since the function level attribute and variable attribute are not orthogonal (e.g. there are different ways for the two attributes to generate the same effect), I'm a little bit puzzled regarding some corner cases. Please ignore the question if that is not important.

Copy link
Contributor

@diggerlin diggerlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have further comment on it, but please wait for Ting Wang happy on the patch.

@@ -0,0 +1,221 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
; RUN: llc -verify-machineinstrs -mcpu=pwr7 -ppc-asm-full-reg-names \
; RUN: -mtriple powerpc64-ibm-aix-xcoff < %s \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the default value of aix-small-local-exec-tls is false, in case of we change the default value of aix-small-local-exec-tls to true. please add -mattr=-aix-small-local-exec-tls here ,so we do not need to modify the test case.

@orcguru
Copy link

orcguru commented Mar 27, 2024

After a second thought, I think those questions are outside of current patch's scope. This patch looks good to me!

@amy-kwan
Copy link
Contributor Author

After a second thought, I think those questions are outside of current patch's scope. This patch looks good to me!

Thanks for discussing with me offline about this. Appreciate it!

…mall-tls" global variable attribute

Similar to 3f46e54, this patch allows the
backend to produce a faster access sequence for the local-exec TLS model,
where loading from the TOC can be avoided, for local-exec TLS variables that
are annotated with the "aix-small-tls" attribute.

The expectation is for local-exec TLS variables to be set with this attribute
through PGO. Furthermore, the optimized access sequence is only generated for
local-exec TLS variables annotated with "aix-small-tls", only if they are less
than ~32KB in size.
@amy-kwan amy-kwan force-pushed the amy-kwan/aix-small-tls-attr branch from e9609d6 to bff9697 Compare March 27, 2024 20:54
@amy-kwan amy-kwan merged commit a3efc53 into llvm:main Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants