-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[AIX][TLS] Produce a faster local-exec access sequence for the "aix-small-tls" global variable attribute #83053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-powerpc Author: Amy Kwan (amy-kwan) ChangesSimilar to 3f46e54, this patch allows the backend to produce a faster access sequence for the local-exec TLS model, where loading from the TOC can be avoided, for local-exec TLS variables that are annotated with the "aix-small-tls" attribute. The expectation is for local-exec TLS variables to be set with this attribute through PGO. Furthermore, the optimized access sequence is only generated for local-exec TLS variables annotated with "aix-small-tls", only if they are less than ~32KB in size. Patch is 33.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/83053.diff 5 Files Affected:
diff --git a/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp b/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
index 9e5f0b36616d1b..05f5d6ba7007a6 100644
--- a/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
@@ -7573,6 +7573,22 @@ static void reduceVSXSwap(SDNode *N, SelectionDAG *DAG) {
DAG->ReplaceAllUsesOfValueWith(SDValue(N, 0), N->getOperand(0));
}
+// Check if an SDValue has the 'aix-small-tls' global variable attribute.
+static bool hasAIXSmallTLSAttr(SDValue Val) {
+ GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(Val);
+ if (!GA)
+ return false;
+
+ const GlobalVariable *GV = dyn_cast<GlobalVariable>(GA->getGlobal());
+ if (!GV)
+ return false;
+
+ if (!GV->hasAttribute("aix-small-tls"))
+ return false;
+
+ return true;
+}
+
// Is an ADDI eligible for folding for non-TOC-based local-exec accesses?
static bool isEligibleToFoldADDIForLocalExecAccesses(SelectionDAG *DAG,
SDValue ADDIToFold) {
@@ -7582,20 +7598,25 @@ static bool isEligibleToFoldADDIForLocalExecAccesses(SelectionDAG *DAG,
(ADDIToFold.getMachineOpcode() != PPC::ADDI8))
return false;
+ // Folding is only allowed for the AIX small-local-exec TLS target attribute
+ // or when the 'aix-small-tls' global variable attribute is present.
+ const PPCSubtarget &Subtarget =
+ DAG->getMachineFunction().getSubtarget<PPCSubtarget>();
+ SDValue TLSVarNode = ADDIToFold.getOperand(1);
+ if (!(Subtarget.hasAIXSmallLocalExecTLS() || hasAIXSmallTLSAttr(TLSVarNode)))
+ return false;
+
// The first operand of the ADDIToFold should be the thread pointer.
// This transformation is only performed if the first operand of the
// addi is the thread pointer.
SDValue TPRegNode = ADDIToFold.getOperand(0);
RegisterSDNode *TPReg = dyn_cast<RegisterSDNode>(TPRegNode.getNode());
- const PPCSubtarget &Subtarget =
- DAG->getMachineFunction().getSubtarget<PPCSubtarget>();
if (!TPReg || (TPReg->getReg() != Subtarget.getThreadPointerRegister()))
return false;
// The second operand of the ADDIToFold should be the global TLS address
// (the local-exec TLS variable). We only perform the folding if the TLS
// variable is the second operand.
- SDValue TLSVarNode = ADDIToFold.getOperand(1);
GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(TLSVarNode);
if (!GA)
return false;
@@ -7664,7 +7685,6 @@ static void foldADDIForLocalExecAccesses(SDNode *N, SelectionDAG *DAG) {
void PPCDAGToDAGISel::PeepholePPC64() {
SelectionDAG::allnodes_iterator Position = CurDAG->allnodes_end();
- bool HasAIXSmallLocalExecTLS = Subtarget->hasAIXSmallLocalExecTLS();
while (Position != CurDAG->allnodes_begin()) {
SDNode *N = &*--Position;
@@ -7676,8 +7696,7 @@ void PPCDAGToDAGISel::PeepholePPC64() {
reduceVSXSwap(N, CurDAG);
// This optimization is performed for non-TOC-based local-exec accesses.
- if (HasAIXSmallLocalExecTLS)
- foldADDIForLocalExecAccesses(N, CurDAG);
+ foldADDIForLocalExecAccesses(N, CurDAG);
unsigned FirstOp;
unsigned StorageOpcode = N->getMachineOpcode();
@@ -7836,8 +7855,7 @@ void PPCDAGToDAGISel::PeepholePPC64() {
ImmOpnd.getValueType());
} else if (Offset != 0) {
// This optimization is performed for non-TOC-based local-exec accesses.
- if (HasAIXSmallLocalExecTLS &&
- isEligibleToFoldADDIForLocalExecAccesses(CurDAG, Base)) {
+ if (isEligibleToFoldADDIForLocalExecAccesses(CurDAG, Base)) {
// Add the non-zero offset information into the load or store
// instruction to be used for non-TOC-based local-exec accesses.
GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(ImmOpnd);
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index 51becf1d5b8584..128cfa79ff95e4 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -3365,6 +3365,7 @@ SDValue PPCTargetLowering::LowerGlobalTLSAddressAIX(SDValue Op,
EVT PtrVT = getPointerTy(DAG.getDataLayout());
bool Is64Bit = Subtarget.isPPC64();
bool HasAIXSmallLocalExecTLS = Subtarget.hasAIXSmallLocalExecTLS();
+ bool HasAIXSmallTLSGlobalAttr = false;
TLSModel::Model Model = getTargetMachine().getTLSModel(GV);
bool IsTLSLocalExecModel = Model == TLSModel::LocalExec;
@@ -3373,6 +3374,11 @@ SDValue PPCTargetLowering::LowerGlobalTLSAddressAIX(SDValue Op,
DAG.getTargetGlobalAddress(GV, dl, PtrVT, 0, PPCII::MO_TPREL_FLAG);
SDValue VariableOffset = getTOCEntry(DAG, dl, VariableOffsetTGA);
SDValue TLSReg;
+
+ if (const GlobalVariable *GVar = dyn_cast<GlobalVariable>(GV))
+ if (GVar->hasAttribute("aix-small-tls"))
+ HasAIXSmallTLSGlobalAttr = true;
+
if (Is64Bit) {
// For local-exec and initial-exec on AIX (64-bit), the sequence generated
// involves a load of the variable offset (from the TOC), followed by an
@@ -3382,14 +3388,16 @@ SDValue PPCTargetLowering::LowerGlobalTLSAddressAIX(SDValue Op,
// add reg2, reg1, r13 // r13 contains the thread pointer
TLSReg = DAG.getRegister(PPC::X13, MVT::i64);
- // With the -maix-small-local-exec-tls option, produce a faster access
- // sequence for local-exec TLS variables where the offset from the TLS
- // base is encoded as an immediate operand.
+ // With the -maix-small-local-exec-tls option, or with the "aix-small-tls"
+ // global variable attribute, produce a faster access sequence for
+ // local-exec TLS variables where the offset from the TLS base is encoded
+ // as an immediate operand.
//
// We only utilize the faster local-exec access sequence when the TLS
// variable has a size within the policy limit. We treat types that are
// not sized or are empty as being over the policy size limit.
- if (HasAIXSmallLocalExecTLS && IsTLSLocalExecModel) {
+ if ((HasAIXSmallLocalExecTLS || HasAIXSmallTLSGlobalAttr) &&
+ IsTLSLocalExecModel) {
Type *GVType = GV->getValueType();
if (GVType->isSized() && !GVType->isEmptyTy() &&
GV->getParent()->getDataLayout().getTypeAllocSize(GVType) <=
@@ -3407,8 +3415,9 @@ SDValue PPCTargetLowering::LowerGlobalTLSAddressAIX(SDValue Op,
TLSReg = DAG.getNode(PPCISD::GET_TPOINTER, dl, PtrVT);
// We do not implement the 32-bit version of the faster access sequence
- // for local-exec that is controlled by -maix-small-local-exec-tls.
- if (HasAIXSmallLocalExecTLS)
+ // for local-exec that is controlled by the -maix-small-local-exec-tls
+ // option, or the "aix-small-tls" global variable attribute.
+ if (HasAIXSmallLocalExecTLS || HasAIXSmallTLSGlobalAttr)
report_fatal_error("The small-local-exec TLS access sequence is "
"currently only supported on AIX (64-bit mode).");
}
diff --git a/llvm/test/CodeGen/PowerPC/aix-small-tls-globalvarattr-funcattr.ll b/llvm/test/CodeGen/PowerPC/aix-small-tls-globalvarattr-funcattr.ll
new file mode 100644
index 00000000000000..55e486876e3373
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/aix-small-tls-globalvarattr-funcattr.ll
@@ -0,0 +1,197 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
+; RUN: llc -verify-machineinstrs -mcpu=pwr7 -ppc-asm-full-reg-names \
+; RUN: -mtriple powerpc64-ibm-aix-xcoff < %s \
+; RUN: | FileCheck %s --check-prefix=CHECK-SMALLCM64
+; RUN: llc -verify-machineinstrs -mcpu=pwr7 -ppc-asm-full-reg-names \
+; RUN: -mtriple powerpc64-ibm-aix-xcoff --code-model=large \
+; RUN: < %s | FileCheck %s --check-prefix=CHECK-LARGECM64
+
+@mySmallLocalExecTLS6 = external thread_local(localexec) global [60 x i64], align 8
+@mySmallLocalExecTLS2 = external thread_local(localexec) global [3000 x i64], align 8 #0
+@MyTLSGDVar = thread_local global [800 x i64] zeroinitializer, align 8
+@mySmallLocalExecTLS3 = internal thread_local(localexec) global [3000 x i64] zeroinitializer, align 8
+@mySmallLocalExecTLS4 = internal thread_local(localexec) global [3000 x i64] zeroinitializer, align 8 #0
+@mySmallLocalExecTLS5 = thread_local(localexec) global [3000 x i64] zeroinitializer, align 8 #0
+@mySmallLocalExecTLS = thread_local(localexec) local_unnamed_addr global [7800 x i64] zeroinitializer, align 8 #0
+declare nonnull ptr @llvm.threadlocal.address.p0(ptr nonnull)
+
+; All accesses use a "faster" local-exec sequence directly off the thread pointer.
+define i64 @StoreLargeAccess1() #1 {
+; CHECK-SMALLCM64-LABEL: StoreLargeAccess1:
+; CHECK-SMALLCM64: # %bb.0: # %entry
+; CHECK-SMALLCM64-NEXT: mflr r0
+; CHECK-SMALLCM64-NEXT: stdu r1, -48(r1)
+; CHECK-SMALLCM64-NEXT: li r3, 212
+; CHECK-SMALLCM64-NEXT: li r4, 203
+; CHECK-SMALLCM64-NEXT: std r0, 64(r1)
+; CHECK-SMALLCM64-NEXT: std r3, mySmallLocalExecTLS6[UL]@le+424(r13)
+; CHECK-SMALLCM64-NEXT: std r4, mySmallLocalExecTLS2[UL]@le+1200(r13)
+; CHECK-SMALLCM64-NEXT: ld r3, L..C0(r2) # target-flags(ppc-tlsgdm) @MyTLSGDVar
+; CHECK-SMALLCM64-NEXT: ld r4, L..C1(r2) # target-flags(ppc-tlsgd) @MyTLSGDVar
+; CHECK-SMALLCM64-NEXT: bla .__tls_get_addr[PR]
+; CHECK-SMALLCM64-NEXT: li r4, 44
+; CHECK-SMALLCM64-NEXT: std r4, 440(r3)
+; CHECK-SMALLCM64-NEXT: li r3, 6
+; CHECK-SMALLCM64-NEXT: li r4, 100
+; CHECK-SMALLCM64-NEXT: std r3, mySmallLocalExecTLS3[UL]@le+2000(r13)
+; CHECK-SMALLCM64-NEXT: li r3, 882
+; CHECK-SMALLCM64-NEXT: std r4, (mySmallLocalExecTLS4[UL]@le+6800)-65536(r13)
+; CHECK-SMALLCM64-NEXT: std r3, (mySmallLocalExecTLS5[TL]@le+8400)-65536(r13)
+; CHECK-SMALLCM64-NEXT: li r3, 1191
+; CHECK-SMALLCM64-NEXT: addi r1, r1, 48
+; CHECK-SMALLCM64-NEXT: ld r0, 16(r1)
+; CHECK-SMALLCM64-NEXT: mtlr r0
+; CHECK-SMALLCM64-NEXT: blr
+;
+; CHECK-LARGECM64-LABEL: StoreLargeAccess1:
+; CHECK-LARGECM64: # %bb.0: # %entry
+; CHECK-LARGECM64-NEXT: mflr r0
+; CHECK-LARGECM64-NEXT: stdu r1, -48(r1)
+; CHECK-LARGECM64-NEXT: li r3, 212
+; CHECK-LARGECM64-NEXT: std r0, 64(r1)
+; CHECK-LARGECM64-NEXT: addis r4, L..C0@u(r2)
+; CHECK-LARGECM64-NEXT: ld r4, L..C0@l(r4)
+; CHECK-LARGECM64-NEXT: std r3, mySmallLocalExecTLS6[UL]@le+424(r13)
+; CHECK-LARGECM64-NEXT: li r3, 203
+; CHECK-LARGECM64-NEXT: std r3, mySmallLocalExecTLS2[UL]@le+1200(r13)
+; CHECK-LARGECM64-NEXT: addis r3, L..C1@u(r2)
+; CHECK-LARGECM64-NEXT: ld r3, L..C1@l(r3)
+; CHECK-LARGECM64-NEXT: bla .__tls_get_addr[PR]
+; CHECK-LARGECM64-NEXT: li r4, 44
+; CHECK-LARGECM64-NEXT: std r4, 440(r3)
+; CHECK-LARGECM64-NEXT: li r3, 6
+; CHECK-LARGECM64-NEXT: li r4, 100
+; CHECK-LARGECM64-NEXT: std r3, mySmallLocalExecTLS3[UL]@le+2000(r13)
+; CHECK-LARGECM64-NEXT: li r3, 882
+; CHECK-LARGECM64-NEXT: std r4, (mySmallLocalExecTLS4[UL]@le+6800)-65536(r13)
+; CHECK-LARGECM64-NEXT: std r3, (mySmallLocalExecTLS5[TL]@le+8400)-65536(r13)
+; CHECK-LARGECM64-NEXT: li r3, 1191
+; CHECK-LARGECM64-NEXT: addi r1, r1, 48
+; CHECK-LARGECM64-NEXT: ld r0, 16(r1)
+; CHECK-LARGECM64-NEXT: mtlr r0
+; CHECK-LARGECM64-NEXT: blr
+entry:
+ %0 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS6)
+ %arrayidx = getelementptr inbounds [60 x i64], ptr %0, i64 0, i64 53
+ store i64 212, ptr %arrayidx, align 8
+ %1 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS2)
+ %arrayidx1 = getelementptr inbounds [3000 x i64], ptr %1, i64 0, i64 150
+ store i64 203, ptr %arrayidx1, align 8
+ %2 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @MyTLSGDVar)
+ %arrayidx2 = getelementptr inbounds [800 x i64], ptr %2, i64 0, i64 55
+ store i64 44, ptr %arrayidx2, align 8
+ %3 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS3)
+ %arrayidx3 = getelementptr inbounds [3000 x i64], ptr %3, i64 0, i64 250
+ store i64 6, ptr %arrayidx3, align 8
+ %4 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS4)
+ %arrayidx4 = getelementptr inbounds [3000 x i64], ptr %4, i64 0, i64 850
+ store i64 100, ptr %arrayidx4, align 8
+ %5 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS5)
+ %arrayidx5 = getelementptr inbounds [3000 x i64], ptr %5, i64 0, i64 1050
+ store i64 882, ptr %arrayidx5, align 8
+ %6 = load i64, ptr %arrayidx1, align 8
+ %7 = load i64, ptr %arrayidx3, align 8
+ %8 = load i64, ptr %arrayidx4, align 8
+ %add = add i64 %6, 882
+ %add9 = add i64 %add, %7
+ %add11 = add i64 %add9, %8
+ ret i64 %add11
+}
+
+; Since this function does not have the 'aix-small-local-exec-tls` attribute,
+; only some local-exec variables should have the small-local-exec TLS access
+; sequence (as opposed to all of them).
+define i64 @StoreLargeAccess2() {
+; CHECK-SMALLCM64-LABEL: StoreLargeAccess2:
+; CHECK-SMALLCM64: # %bb.0: # %entry
+; CHECK-SMALLCM64-NEXT: mflr r0
+; CHECK-SMALLCM64-NEXT: stdu r1, -48(r1)
+; CHECK-SMALLCM64-NEXT: ld r3, L..C2(r2) # target-flags(ppc-tprel) @mySmallLocalExecTLS6
+; CHECK-SMALLCM64-NEXT: li r4, 212
+; CHECK-SMALLCM64-NEXT: std r0, 64(r1)
+; CHECK-SMALLCM64-NEXT: add r3, r13, r3
+; CHECK-SMALLCM64-NEXT: std r4, 424(r3)
+; CHECK-SMALLCM64-NEXT: ld r4, L..C1(r2) # target-flags(ppc-tlsgd) @MyTLSGDVar
+; CHECK-SMALLCM64-NEXT: li r3, 203
+; CHECK-SMALLCM64-NEXT: std r3, mySmallLocalExecTLS2[UL]@le+1200(r13)
+; CHECK-SMALLCM64-NEXT: ld r3, L..C0(r2) # target-flags(ppc-tlsgdm) @MyTLSGDVar
+; CHECK-SMALLCM64-NEXT: bla .__tls_get_addr[PR]
+; CHECK-SMALLCM64-NEXT: li r4, 44
+; CHECK-SMALLCM64-NEXT: std r4, 440(r3)
+; CHECK-SMALLCM64-NEXT: ld r3, L..C3(r2) # target-flags(ppc-tprel) @mySmallLocalExecTLS3
+; CHECK-SMALLCM64-NEXT: li r4, 6
+; CHECK-SMALLCM64-NEXT: add r3, r13, r3
+; CHECK-SMALLCM64-NEXT: std r4, 2000(r3)
+; CHECK-SMALLCM64-NEXT: li r3, 100
+; CHECK-SMALLCM64-NEXT: li r4, 882
+; CHECK-SMALLCM64-NEXT: std r3, mySmallLocalExecTLS4[UL]@le+6800(r13)
+; CHECK-SMALLCM64-NEXT: std r4, mySmallLocalExecTLS5[TL]@le+8400(r13)
+; CHECK-SMALLCM64-NEXT: li r3, 1191
+; CHECK-SMALLCM64-NEXT: addi r1, r1, 48
+; CHECK-SMALLCM64-NEXT: ld r0, 16(r1)
+; CHECK-SMALLCM64-NEXT: mtlr r0
+; CHECK-SMALLCM64-NEXT: blr
+;
+; CHECK-LARGECM64-LABEL: StoreLargeAccess2:
+; CHECK-LARGECM64: # %bb.0: # %entry
+; CHECK-LARGECM64-NEXT: mflr r0
+; CHECK-LARGECM64-NEXT: stdu r1, -48(r1)
+; CHECK-LARGECM64-NEXT: addis r3, L..C2@u(r2)
+; CHECK-LARGECM64-NEXT: li r4, 212
+; CHECK-LARGECM64-NEXT: std r0, 64(r1)
+; CHECK-LARGECM64-NEXT: ld r3, L..C2@l(r3)
+; CHECK-LARGECM64-NEXT: add r3, r13, r3
+; CHECK-LARGECM64-NEXT: std r4, 424(r3)
+; CHECK-LARGECM64-NEXT: li r3, 203
+; CHECK-LARGECM64-NEXT: addis r4, L..C0@u(r2)
+; CHECK-LARGECM64-NEXT: ld r4, L..C0@l(r4)
+; CHECK-LARGECM64-NEXT: std r3, mySmallLocalExecTLS2[UL]@le+1200(r13)
+; CHECK-LARGECM64-NEXT: addis r3, L..C1@u(r2)
+; CHECK-LARGECM64-NEXT: ld r3, L..C1@l(r3)
+; CHECK-LARGECM64-NEXT: bla .__tls_get_addr[PR]
+; CHECK-LARGECM64-NEXT: li r4, 44
+; CHECK-LARGECM64-NEXT: std r4, 440(r3)
+; CHECK-LARGECM64-NEXT: addis r3, L..C3@u(r2)
+; CHECK-LARGECM64-NEXT: li r4, 6
+; CHECK-LARGECM64-NEXT: ld r3, L..C3@l(r3)
+; CHECK-LARGECM64-NEXT: add r3, r13, r3
+; CHECK-LARGECM64-NEXT: std r4, 2000(r3)
+; CHECK-LARGECM64-NEXT: li r3, 100
+; CHECK-LARGECM64-NEXT: li r4, 882
+; CHECK-LARGECM64-NEXT: std r3, mySmallLocalExecTLS4[UL]@le+6800(r13)
+; CHECK-LARGECM64-NEXT: std r4, mySmallLocalExecTLS5[TL]@le+8400(r13)
+; CHECK-LARGECM64-NEXT: li r3, 1191
+; CHECK-LARGECM64-NEXT: addi r1, r1, 48
+; CHECK-LARGECM64-NEXT: ld r0, 16(r1)
+; CHECK-LARGECM64-NEXT: mtlr r0
+; CHECK-LARGECM64-NEXT: blr
+entry:
+ %0 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS6)
+ %arrayidx = getelementptr inbounds [60 x i64], ptr %0, i64 0, i64 53
+ store i64 212, ptr %arrayidx, align 8
+ %1 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS2)
+ %arrayidx1 = getelementptr inbounds [3000 x i64], ptr %1, i64 0, i64 150
+ store i64 203, ptr %arrayidx1, align 8
+ %2 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @MyTLSGDVar)
+ %arrayidx2 = getelementptr inbounds [800 x i64], ptr %2, i64 0, i64 55
+ store i64 44, ptr %arrayidx2, align 8
+ %3 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS3)
+ %arrayidx3 = getelementptr inbounds [3000 x i64], ptr %3, i64 0, i64 250
+ store i64 6, ptr %arrayidx3, align 8
+ %4 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS4)
+ %arrayidx4 = getelementptr inbounds [3000 x i64], ptr %4, i64 0, i64 850
+ store i64 100, ptr %arrayidx4, align 8
+ %5 = tail call align 8 ptr @llvm.threadlocal.address.p0(ptr align 8 @mySmallLocalExecTLS5)
+ %arrayidx5 = getelementptr inbounds [3000 x i64], ptr %5, i64 0, i64 1050
+ store i64 882, ptr %arrayidx5, align 8
+ %6 = load i64, ptr %arrayidx1, align 8
+ %7 = load i64, ptr %arrayidx3, align 8
+ %8 = load i64, ptr %arrayidx4, align 8
+ %add = add i64 %6, 882
+ %add9 = add i64 %add, %7
+ %add11 = add i64 %add9, %8
+ ret i64 %add11
+}
+
+attributes #0 = { "aix-small-tls" }
+attributes #1 = { "target-features"="+aix-small-local-exec-tls" }
diff --git a/llvm/test/CodeGen/PowerPC/aix-small-tls-globalvarattr-loadaddr.ll b/llvm/test/CodeGen/PowerPC/aix-small-tls-globalvarattr-loadaddr.ll
new file mode 100644
index 00000000000000..db4266958daff1
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/aix-small-tls-globalvarattr-loadaddr.ll
@@ -0,0 +1,251 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -verify-machineinstrs -mcpu=pwr7 -ppc-asm-full-reg-names \
+; RUN: -mtriple powerpc64-ibm-aix-xcoff < %s \
+; RUN: | FileCheck %s --check-prefix=SMALLCM64
+; RUN: llc -verify-machineinstrs -mcpu=pwr7 -ppc-asm-full-reg-names \
+; RUN: -mtriple powerpc64-ibm-aix-xcoff --code-model=large \
+; RUN: < %s | FileCheck %s --check-prefix=LARGECM64
+
+; Test that the 'aix-small-tls' global variable attribute generates the
+; optimized small-local-exec TLS sequence. Global variables without this
+; attribute should still generate a TOC-based local-exec access sequence.
+
+declare nonnull ptr @llvm.threadlocal.address.p0(ptr nonnull)
+
+@a = thread_local(localexec) global [87 x i8] zeroinitializer, align 1 #0
+@a_noattr = thread_local(localexec) global [87 x i8] zeroinitializer, align 1
+@b = thread_local(localexec) global [87 x i16] zeroinitializer, align 2 #0
+@b_noattr = thread_local(localexec) global [87 x i16] zeroinitializer, align 2
+@c = thread_local(localexec) global [87 x i32] zeroinitializer, align 4 #0
+@c_noattr = thread_local(localexec) global [87 x i32] zeroinitializer, align 4
+@d = thread_local(localexec) global [87 x i64] zeroinitializer, align 8 #0
+@d_noattr = thread_local(localexec) global [87 x i64] zeroinitializer, align 8 #0
+
+@e = thread_local(localexec) global [87 x double] zeroinitializer, align 8 #0
+@e_noattr = thread_local(localexec) global [87 x double] zeroinitializer, align 8
+@f = thread_local(localexec) global [87 x float] zeroinitializer, align 4 #0
+@f_noattr = thread_local(localexec) global [87 x float] zeroinitializer, align 4
+
+define nonnull ptr @AddrTest1() local_unnamed_addr {
+; SMALLCM64-LABEL: AddrTest1:
+; SMALLCM64: # %bb.0: # %entry
+; SMALLCM64-NEXT: addi r3, r13, a[TL]@le+1
+; SMALLCM64-NEXT: blr
+;
+; LARGECM64-LABEL: AddrTest1:
+; LARGECM64: # %bb.0: # %entry
+; LARGECM64-NEXT: addi r3, r13, a[TL]@le+1
+; LARGECM64-NEXT: blr
+entry:
+ %0 = tail call align 1 ptr @llvm.threadlocal.address.p0(ptr align 1 @a)
+ %arrayidx = getelementptr inbounds [87 x i8], ptr %0, i64 0, i64 1
+ ret ptr %arrayidx
+}
+
+define nonnull ptr @AddrTest1_NoAttr() local_unnamed_addr {
+; SMALLCM64-LABEL: AddrTest1_NoAttr:
+; SMALLCM64: # %bb.0: # %entry
+; SMALLCM64-NEXT: ld r3, L..C0(r2) # target-flags(ppc-tprel) @a_noattr
+; SMALLCM64-NEXT: add r3, r13, r3
+; SM...
[truncated]
|
Ping. |
1 similar comment
Ping. |
if (!GV->hasAttribute("aix-small-tls")) | ||
return false; | ||
|
||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO you can collapse the lines without loosing readability:
if (!GV->hasAttribute("aix-small-tls")) | |
return false; | |
return true; | |
return GV->hasAttribute("aix-small-tls"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I'll make that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer change function code to, (but feel free to keep your code if you want)
static bool hasAIXSmallTLSAttr(SDValue Val) {
if (GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(Val))
if (GlobalVariable *GV = dyn_cast<GlobalVariable>(GA->getGlobal());
if (GV->hasAttribute("aix-small-tls"))
return true;
return false;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fully agree!
if (!GV->hasAttribute("aix-small-tls")) | ||
return false; | ||
|
||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer change function code to, (but feel free to keep your code if you want)
static bool hasAIXSmallTLSAttr(SDValue Val) {
if (GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(Val))
if (GlobalVariable *GV = dyn_cast<GlobalVariable>(GA->getGlobal());
if (GV->hasAttribute("aix-small-tls"))
return true;
return false;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
llvm/test/CodeGen/PowerPC/aix-small-tls-globalvarattr-funcattr.ll
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to turn the function level attribute HasAIXSmallLocalExecTLS into variable attributes for example by addAttribute
inside LowerGlobalTLSAddressAIX
, and then we only need to check variable attribute for the peepholes?
For any TLS LE variable accessed by more than one functions, and if one of them HasAIXSmallLocalExecTLS, then that variable should better have the "aix-small-tls" attribute?
llvm/test/CodeGen/PowerPC/aix-small-tls-globalvarattr-funcattr.ll
Outdated
Show resolved
Hide resolved
Thanks for taking a look at the patch, @orcguru! I apologize if I misunderstood your suggestion: are you suggesting that The target/function attribute was implemented to correspond to the front end clang option to turn on this optimized code gen ( @hubert-reinterpretcast Is my above understanding correct, and do you have any thoughts on this/the suggestion? |
✅ With the latest revision this PR passed the Python code formatter. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
Hi Amy, my understanding is that ISEL checks those attributes, and then peephole also checks those. ISEL just executes once, however peephole may get invoked multiple times since some more peephole opportunity may surface after previous peephole did make some change. This makes it more important to simplify peephole's operations from my view. Now given that both the function level attribute and variable attribute all serve the same purpose, can we simplify that into one kind of flag check in the peephole logic? This is my first question. Regarding my second question, I think since the function level attribute and variable attribute are not orthogonal (e.g. there are different ways for the two attributes to generate the same effect), I'm a little bit puzzled regarding some corner cases. Please ignore the question if that is not important. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have further comment on it, but please wait for Ting Wang happy on the patch.
@@ -0,0 +1,221 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4 | |||
; RUN: llc -verify-machineinstrs -mcpu=pwr7 -ppc-asm-full-reg-names \ | |||
; RUN: -mtriple powerpc64-ibm-aix-xcoff < %s \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the default value of aix-small-local-exec-tls
is false, in case of we change the default value of aix-small-local-exec-tls
to true. please add -mattr=-aix-small-local-exec-tls
here ,so we do not need to modify the test case.
After a second thought, I think those questions are outside of current patch's scope. This patch looks good to me! |
Thanks for discussing with me offline about this. Appreciate it! |
…mall-tls" global variable attribute Similar to 3f46e54, this patch allows the backend to produce a faster access sequence for the local-exec TLS model, where loading from the TOC can be avoided, for local-exec TLS variables that are annotated with the "aix-small-tls" attribute. The expectation is for local-exec TLS variables to be set with this attribute through PGO. Furthermore, the optimized access sequence is only generated for local-exec TLS variables annotated with "aix-small-tls", only if they are less than ~32KB in size.
…to check for attribute
…tls-globalvarattr-loadaddr.ll
e9609d6
to
bff9697
Compare
Similar to 3f46e54, this patch allows the backend to produce a faster access sequence for the local-exec TLS model, where loading from the TOC can be avoided, for local-exec TLS variables that are annotated with the "aix-small-tls" attribute.
The expectation is for local-exec TLS variables to be set with this attribute through PGO. Furthermore, the optimized access sequence is only generated for local-exec TLS variables annotated with "aix-small-tls", only if they are less than ~32KB in size.