[AArch64] Implement -fno-plt for SelectionDAG/GlobalISel #78890

MaskRay · 2024-01-21T08:25:37Z

Clang sets the nonlazybind attribute for certain ObjC features. The
AArch64 SelectionDAG implementation for non-intrinsic calls
(46e36f0) is behind a cl option.

GCC implements -fno-plt for a few ELF targets. In Clang, -fno-plt also
sets the nonlazybind attribute. For SelectionDAG, make the cl option not
affect ELF so that non-intrinsic calls to a dso_preemptable function use
GOT. Adjust AArch64TargetLowering::LowerCall to handle intrinsic calls.

For FastISel, change fastLowerCall to bail out when a call is due to
-fno-plt.

For GlobalISel, handle non-intrinsic calls in CallLowering::lowerCall
and intrinsic calls in AArch64CallLowering::lowerCall (where the
target-independent CallLowering::lowerCall is not called).
The GlobalISel test in call-rv-marker.ll is therefore updated.

Note: the current -fno-plt -fpic implementation does not use GOT for a
preemptable function.

Link: #78275

Created using spr 1.3.4

llvmbot · 2024-01-21T08:26:09Z

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-globalisel

Author: Fangrui Song (MaskRay)

Changes

Clang sets the nonlazybind attribute for certain ObjC features. The
AArch64 SelectionDAG implementation for non-intrinsic calls
(46e36f0) is behind a cl option.

GCC implements -fno-plt for a few ELF targets. In Clang, -fno-plt also
sets the nonlazybind attribute. For SelectionDAG, make the cl option not
affect ELF so that non-intrinsic calls to a dso_preemptable function use
GOT. Adjust AArch64TargetLowering::LowerCall to handle intrinsic calls.

For FastISel, change fastLowerCall to bail out when a call is due to
-fno-plt.

For GlobalISel, handle non-intrinsic calls in CallLowering::lowerCall
and intrinsic calls in AArch64CallLowering::lowerCall (where the
target-independent CallLowering::lowerCall is not called).

Note: the current -fno-plt -fpic implementation does not use GOT for a
preemptable function.

Link: #78275

Full diff: https://github.com/llvm/llvm-project/pull/78890.diff

8 Files Affected:

(modified) llvm/lib/CodeGen/GlobalISel/CallLowering.cpp (+10-3)
(modified) llvm/lib/Target/AArch64/AArch64FastISel.cpp (+7)
(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+5-4)
(modified) llvm/lib/Target/AArch64/AArch64Subtarget.cpp (+6-5)
(modified) llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp (+12-1)
(modified) llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp (+12-4)
(modified) llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp (+3)
(modified) llvm/test/CodeGen/AArch64/nonlazybind.ll (+38-43)

diff --git a/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp b/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
index ccd9b13d730b60..d3484e5229e704 100644
--- a/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
@@ -144,9 +144,16 @@ bool CallLowering::lowerCall(MachineIRBuilder &MIRBuilder, const CallBase &CB,
   // Try looking through a bitcast from one function type to another.
   // Commonly happens with calls to objc_msgSend().
   const Value *CalleeV = CB.getCalledOperand()->stripPointerCasts();
-  if (const Function *F = dyn_cast<Function>(CalleeV))
-    Info.Callee = MachineOperand::CreateGA(F, 0);
-  else if (isa<GlobalIFunc>(CalleeV) || isa<GlobalAlias>(CalleeV)) {
+  if (const Function *F = dyn_cast<Function>(CalleeV)) {
+    if (F->hasFnAttribute(Attribute::NonLazyBind)) {
+      auto Reg =
+          MRI.createGenericVirtualRegister(getLLTForType(*F->getType(), DL));
+      MIRBuilder.buildGlobalValue(Reg, F);
+      Info.Callee = MachineOperand::CreateReg(Reg, false);
+    } else {
+      Info.Callee = MachineOperand::CreateGA(F, 0);
+    }
+  } else if (isa<GlobalIFunc>(CalleeV) || isa<GlobalAlias>(CalleeV)) {
     // IR IFuncs and Aliases can't be forward declared (only defined), so the
     // callee must be in the same TU and therefore we can direct-call it without
     // worrying about it being out of range.
diff --git a/llvm/lib/Target/AArch64/AArch64FastISel.cpp b/llvm/lib/Target/AArch64/AArch64FastISel.cpp
index e98f6c4984a752..93d6024f34c09c 100644
--- a/llvm/lib/Target/AArch64/AArch64FastISel.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FastISel.cpp
@@ -3202,6 +3202,13 @@ bool AArch64FastISel::fastLowerCall(CallLoweringInfo &CLI) {
   if (Callee && !computeCallAddress(Callee, Addr))
     return false;
 
+  // MO_GOT is not handled. -fno-plt compiled intrinsic calls do not have the
+  // nonlazybind attribute. Check "RtLibUseGOT" instead.
+  if ((Subtarget->classifyGlobalFunctionReference(Addr.getGlobalValue(), TM) !=
+       AArch64II::MO_NO_FLAG) ||
+      MF->getFunction().getParent()->getRtLibUseGOT())
+    return false;
+
   // The weak function target may be zero; in that case we must use indirect
   // addressing via a stub on windows as it may be out of range for a
   // PC-relative jump.
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 96ea692d03f563..56de890c78deca 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -7969,13 +7969,14 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
       Callee = DAG.getTargetGlobalAddress(GV, DL, PtrVT, 0, 0);
     }
   } else if (auto *S = dyn_cast<ExternalSymbolSDNode>(Callee)) {
-    if (getTargetMachine().getCodeModel() == CodeModel::Large &&
-        Subtarget->isTargetMachO()) {
-      const char *Sym = S->getSymbol();
+    bool UseGot = (getTargetMachine().getCodeModel() == CodeModel::Large &&
+                   Subtarget->isTargetMachO()) ||
+                  MF.getFunction().getParent()->getRtLibUseGOT();
+    const char *Sym = S->getSymbol();
+    if (UseGot) {
       Callee = DAG.getTargetExternalSymbol(Sym, PtrVT, AArch64II::MO_GOT);
       Callee = DAG.getNode(AArch64ISD::LOADgot, DL, PtrVT, Callee);
     } else {
-      const char *Sym = S->getSymbol();
       Callee = DAG.getTargetExternalSymbol(Sym, PtrVT, 0);
     }
   }
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
index cf57d950ae8d7f..c4c6827313b5e1 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -43,10 +43,10 @@ static cl::opt<bool>
 UseAddressTopByteIgnored("aarch64-use-tbi", cl::desc("Assume that top byte of "
                          "an address is ignored"), cl::init(false), cl::Hidden);
 
-static cl::opt<bool>
-    UseNonLazyBind("aarch64-enable-nonlazybind",
-                   cl::desc("Call nonlazybind functions via direct GOT load"),
-                   cl::init(false), cl::Hidden);
+static cl::opt<bool> MachOUseNonLazyBind(
+    "aarch64-macho-enable-nonlazybind",
+    cl::desc("Call nonlazybind functions via direct GOT load for Mach-O"),
+    cl::Hidden);
 
 static cl::opt<bool> UseAA("aarch64-use-aa", cl::init(true),
                            cl::desc("Enable the use of AA during codegen."));
@@ -434,7 +434,8 @@ unsigned AArch64Subtarget::classifyGlobalFunctionReference(
 
   // NonLazyBind goes via GOT unless we know it's available locally.
   auto *F = dyn_cast<Function>(GV);
-  if (UseNonLazyBind && F && F->hasFnAttribute(Attribute::NonLazyBind) &&
+  if ((!isTargetMachO() || MachOUseNonLazyBind) && F &&
+      F->hasFnAttribute(Attribute::NonLazyBind) &&
       !TM.shouldAssumeDSOLocal(*GV->getParent(), GV))
     return AArch64II::MO_GOT;
 
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp b/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp
index 84057ea8d2214a..773eadbf34de37 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp
@@ -1273,8 +1273,19 @@ bool AArch64CallLowering::lowerCall(MachineIRBuilder &MIRBuilder,
            !Subtarget.noBTIAtReturnTwice() &&
            MF.getInfo<AArch64FunctionInfo>()->branchTargetEnforcement())
     Opc = AArch64::BLR_BTI;
-  else
+  else {
+    // For an intrinsic call (e.g. memset), use GOT if "RtLibUseGOT" (-fno-plt)
+    // is set.
+    if (Info.Callee.isSymbol() && F.getParent()->getRtLibUseGOT()) {
+      auto Reg =
+          MRI.createGenericVirtualRegister(getLLTForType(*F.getType(), DL));
+      auto MIB = MIRBuilder.buildInstr(TargetOpcode::G_GLOBAL_VALUE);
+      DstOp(Reg).addDefToMIB(MRI, MIB);
+      MIB.addExternalSymbol(Info.Callee.getSymbolName(), AArch64II::MO_GOT);
+      Info.Callee = MachineOperand::CreateReg(Reg, false);
+    }
     Opc = getCallOpcode(MF, Info.Callee.isReg(), false);
+  }
 
   auto MIB = MIRBuilder.buildInstrNoInsert(Opc);
   unsigned CalleeOpNo = 0;
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
index 8344e79f78e1eb..e60db260e3ef10 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
@@ -2841,11 +2841,19 @@ bool AArch64InstructionSelector::select(MachineInstr &I) {
   }
 
   case TargetOpcode::G_GLOBAL_VALUE: {
-    auto GV = I.getOperand(1).getGlobal();
-    if (GV->isThreadLocal())
-      return selectTLSGlobalValue(I, MRI);
+    const GlobalValue *GV = nullptr;
+    unsigned OpFlags;
+    if (I.getOperand(1).isSymbol()) {
+      OpFlags = I.getOperand(1).getTargetFlags();
+      // Currently only used by "RtLibUseGOT".
+      assert(OpFlags == AArch64II::MO_GOT);
+    } else {
+      GV = I.getOperand(1).getGlobal();
+      if (GV->isThreadLocal())
+        return selectTLSGlobalValue(I, MRI);
+      OpFlags = STI.ClassifyGlobalReference(GV, TM);
+    }
 
-    unsigned OpFlags = STI.ClassifyGlobalReference(GV, TM);
     if (OpFlags & AArch64II::MO_GOT) {
       I.setDesc(TII.get(AArch64::LOADgot));
       I.getOperand(1).setTargetFlags(OpFlags);
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index b561cb12c93a1c..83137949d0f244 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -1314,6 +1314,9 @@ bool AArch64LegalizerInfo::legalizeSmallCMGlobalValue(
   // By splitting this here, we can optimize accesses in the small code model by
   // folding in the G_ADD_LOW into the load/store offset.
   auto &GlobalOp = MI.getOperand(1);
+  // Don't modify an intrinsic call.
+  if (GlobalOp.isSymbol())
+    return true;
   const auto* GV = GlobalOp.getGlobal();
   if (GV->isThreadLocal())
     return true; // Don't want to modify TLS vars.
diff --git a/llvm/test/CodeGen/AArch64/nonlazybind.ll b/llvm/test/CodeGen/AArch64/nonlazybind.ll
index 669a8ee04b2492..f5bb3a4ecbc9a0 100644
--- a/llvm/test/CodeGen/AArch64/nonlazybind.ll
+++ b/llvm/test/CodeGen/AArch64/nonlazybind.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
-; RUN: llc -mtriple=aarch64-apple-ios %s -o - -aarch64-enable-nonlazybind | FileCheck %s --check-prefix=MACHO
+; RUN: llc -mtriple=aarch64-apple-ios %s -o - -aarch64-macho-enable-nonlazybind | FileCheck %s --check-prefix=MACHO
 ; RUN: llc -mtriple=aarch64-apple-ios %s -o - | FileCheck %s --check-prefix=MACHO-NORMAL
 ; RUN: llc -mtriple=aarch64 -fast-isel %s -o - | FileCheck %s --check-prefixes=ELF,ELF-FI
 ; RUN: llc -mtriple=aarch64 -global-isel %s -o - | FileCheck %s --check-prefixes=ELF,ELF-GI
@@ -19,13 +19,18 @@ define void @test_laziness(ptr %a) nounwind {
 ; MACHO-NEXT:  Lloh1:
 ; MACHO-NEXT:    ldr x8, [x8, _external@GOTPAGEOFF]
 ; MACHO-NEXT:    blr x8
+; MACHO-NEXT:  Lloh2:
+; MACHO-NEXT:    adrp x8, _memset@GOTPAGE
 ; MACHO-NEXT:    mov x0, x19
 ; MACHO-NEXT:    mov w1, #1 ; =0x1
+; MACHO-NEXT:  Lloh3:
+; MACHO-NEXT:    ldr x8, [x8, _memset@GOTPAGEOFF]
 ; MACHO-NEXT:    mov w2, #1000 ; =0x3e8
-; MACHO-NEXT:    bl _memset
+; MACHO-NEXT:    blr x8
 ; MACHO-NEXT:    ldp x29, x30, [sp, #16] ; 16-byte Folded Reload
 ; MACHO-NEXT:    ldp x20, x19, [sp], #32 ; 16-byte Folded Reload
 ; MACHO-NEXT:    ret
+; MACHO-NEXT:    .loh AdrpLdrGot Lloh2, Lloh3
 ; MACHO-NEXT:    .loh AdrpLdrGot Lloh0, Lloh1
 ;
 ; MACHO-NORMAL-LABEL: test_laziness:
@@ -34,50 +39,34 @@ define void @test_laziness(ptr %a) nounwind {
 ; MACHO-NORMAL-NEXT:    stp x29, x30, [sp, #16] ; 16-byte Folded Spill
 ; MACHO-NORMAL-NEXT:    mov x19, x0
 ; MACHO-NORMAL-NEXT:    bl _external
+; MACHO-NORMAL-NEXT:  Lloh0:
+; MACHO-NORMAL-NEXT:    adrp x8, _memset@GOTPAGE
 ; MACHO-NORMAL-NEXT:    mov x0, x19
 ; MACHO-NORMAL-NEXT:    mov w1, #1 ; =0x1
+; MACHO-NORMAL-NEXT:  Lloh1:
+; MACHO-NORMAL-NEXT:    ldr x8, [x8, _memset@GOTPAGEOFF]
 ; MACHO-NORMAL-NEXT:    mov w2, #1000 ; =0x3e8
-; MACHO-NORMAL-NEXT:    bl _memset
+; MACHO-NORMAL-NEXT:    blr x8
 ; MACHO-NORMAL-NEXT:    ldp x29, x30, [sp, #16] ; 16-byte Folded Reload
 ; MACHO-NORMAL-NEXT:    ldp x20, x19, [sp], #32 ; 16-byte Folded Reload
 ; MACHO-NORMAL-NEXT:    ret
+; MACHO-NORMAL-NEXT:    .loh AdrpLdrGot Lloh0, Lloh1
 ;
-; ELF-FI-LABEL: test_laziness:
-; ELF-FI:       // %bb.0:
-; ELF-FI-NEXT:    stp x30, x19, [sp, #-16]! // 16-byte Folded Spill
-; ELF-FI-NEXT:    mov x19, x0
-; ELF-FI-NEXT:    bl external
-; ELF-FI-NEXT:    mov w8, #1 // =0x1
-; ELF-FI-NEXT:    mov x0, x19
-; ELF-FI-NEXT:    mov x2, #1000 // =0x3e8
-; ELF-FI-NEXT:    uxtb w1, w8
-; ELF-FI-NEXT:    bl memset
-; ELF-FI-NEXT:    ldp x30, x19, [sp], #16 // 16-byte Folded Reload
-; ELF-FI-NEXT:    ret
-;
-; ELF-GI-LABEL: test_laziness:
-; ELF-GI:       // %bb.0:
-; ELF-GI-NEXT:    stp x30, x19, [sp, #-16]! // 16-byte Folded Spill
-; ELF-GI-NEXT:    mov x19, x0
-; ELF-GI-NEXT:    bl external
-; ELF-GI-NEXT:    mov x0, x19
-; ELF-GI-NEXT:    mov w1, #1 // =0x1
-; ELF-GI-NEXT:    mov w2, #1000 // =0x3e8
-; ELF-GI-NEXT:    bl memset
-; ELF-GI-NEXT:    ldp x30, x19, [sp], #16 // 16-byte Folded Reload
-; ELF-GI-NEXT:    ret
-;
-; ELF-SDAG-LABEL: test_laziness:
-; ELF-SDAG:       // %bb.0:
-; ELF-SDAG-NEXT:    stp x30, x19, [sp, #-16]! // 16-byte Folded Spill
-; ELF-SDAG-NEXT:    mov x19, x0
-; ELF-SDAG-NEXT:    bl external
-; ELF-SDAG-NEXT:    mov x0, x19
-; ELF-SDAG-NEXT:    mov w1, #1 // =0x1
-; ELF-SDAG-NEXT:    mov w2, #1000 // =0x3e8
-; ELF-SDAG-NEXT:    bl memset
-; ELF-SDAG-NEXT:    ldp x30, x19, [sp], #16 // 16-byte Folded Reload
-; ELF-SDAG-NEXT:    ret
+; ELF-LABEL: test_laziness:
+; ELF:       // %bb.0:
+; ELF-NEXT:    stp x30, x19, [sp, #-16]! // 16-byte Folded Spill
+; ELF-NEXT:    adrp x8, :got:external
+; ELF-NEXT:    mov x19, x0
+; ELF-NEXT:    ldr x8, [x8, :got_lo12:external]
+; ELF-NEXT:    blr x8
+; ELF-NEXT:    adrp x8, :got:memset
+; ELF-NEXT:    mov x0, x19
+; ELF-NEXT:    mov w1, #1 // =0x1
+; ELF-NEXT:    ldr x8, [x8, :got_lo12:memset]
+; ELF-NEXT:    mov w2, #1000 // =0x3e8
+; ELF-NEXT:    blr x8
+; ELF-NEXT:    ldp x30, x19, [sp], #16 // 16-byte Folded Reload
+; ELF-NEXT:    ret
   call void @external()
   call void @llvm.memset.p0.i64(ptr align 1 %a, i8 1, i64 1000, i1 false)
   ret void
@@ -86,12 +75,12 @@ define void @test_laziness(ptr %a) nounwind {
 define void @test_laziness_tail() nounwind {
 ; MACHO-LABEL: test_laziness_tail:
 ; MACHO:       ; %bb.0:
-; MACHO-NEXT:  Lloh2:
+; MACHO-NEXT:  Lloh4:
 ; MACHO-NEXT:    adrp x0, _external@GOTPAGE
-; MACHO-NEXT:  Lloh3:
+; MACHO-NEXT:  Lloh5:
 ; MACHO-NEXT:    ldr x0, [x0, _external@GOTPAGEOFF]
 ; MACHO-NEXT:    br x0
-; MACHO-NEXT:    .loh AdrpLdrGot Lloh2, Lloh3
+; MACHO-NEXT:    .loh AdrpLdrGot Lloh4, Lloh5
 ;
 ; MACHO-NORMAL-LABEL: test_laziness_tail:
 ; MACHO-NORMAL:       ; %bb.0:
@@ -99,7 +88,9 @@ define void @test_laziness_tail() nounwind {
 ;
 ; ELF-LABEL: test_laziness_tail:
 ; ELF:       // %bb.0:
-; ELF-NEXT:    b external
+; ELF-NEXT:    adrp x0, :got:external
+; ELF-NEXT:    ldr x0, [x0, :got_lo12:external]
+; ELF-NEXT:    br x0
   tail call void @external()
   ret void
 }
@@ -108,3 +99,7 @@ declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg)
 
 !llvm.module.flags = !{!0}
 !0 = !{i32 7, !"RtLibUseGOT", i32 1}
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; ELF-FI: {{.*}}
+; ELF-GI: {{.*}}
+; ELF-SDAG: {{.*}}

MaskRay · 2024-01-29T08:02:37Z

Ping:)

davemgreen · 2024-01-29T08:47:34Z

It looks like some of the tests might be failing? Or does it need a rebase?

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp

… can be null Created using spr 1.3.4

MaskRay · 2024-01-30T06:01:42Z

It looks like some of the tests might be failing? Or does it need a rebase?

Sorry about it. The last minute FastISel change caused some failures. Addr.getGlobalValue() could be null. I have simplified AArch64FastISel.cpp to just check "RtLibUseGOT" (added by -fno-plt) and fixed the failures.

arsenm · 2024-01-30T08:42:46Z

llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp

+      auto MIB = MIRBuilder.buildInstr(TargetOpcode::G_GLOBAL_VALUE);
+      DstOp(Reg).addDefToMIB(MRI, MIB);


use buildGlobalValue

This part is to handle library calls.
buildGlobalValue builds a G_GLOBAL_VALUE with MIB.addGlobalAddress(GV);. Here we use an ExternalSymbol, and cannot use buildGlobalValue

Missing overload then, should try to avoid raw buildInstr calls when possible

Did you propose a new MachineIRBuilder API? This is a special use of buildInstr(TargetOpcode::G_GLOBAL_VALUE) and there is only one. I think avoid adding another API is good for now.

Yes. You lose CSE of these values by not going through the complete buildInstr

buildInstr(TargetOpcode::G_GLOBAL_VALUE) cannot be CSEed today, and I think this patch should not change it.

It seems that only the buildInstr overload with DstOps/SrcOps can perform CSE. MachineIRBuilder::buildGlobalValue does not call this overload.

Using raw buildInstr is a worse API. Please just add the new overload

I disagree. I think adding a function similar to buildGlobalValue, either omits const GlobalValue *GV or changes the argument to const char*, will add confusion, as this is a very special use case.

I slightly simplified the code in the just-pushed commit.

davemgreen · 2024-02-01T15:37:22Z

llvm/test/CodeGen/AArch64/call-rv-marker.ll

+; GISEL-NEXT:    ldp x29, x30, [sp, #16] ; 16-byte Folded Reload
 ; GISEL-NEXT:    ldp x20, x19, [sp], #32 ; 16-byte Folded Reload
-; GISEL-NEXT:    b _objc_release
+; GISEL-NEXT:    br x1


@fhahn, @TNorthover do these sound OK to you?

Ping:) @fhahn @TNorthover

Created using spr 1.3.4

MaskRay · 2024-02-28T06:16:33Z

Ping:)

arsenm · 2024-02-29T08:18:40Z

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp

+      auto Reg =
+          MRI.createGenericVirtualRegister(getLLTForType(*F->getType(), DL));
+      MIRBuilder.buildGlobalValue(Reg, F);


Suggested change

auto Reg =

MRI.createGenericVirtualRegister(getLLTForType(*F->getType(), DL));

MIRBuilder.buildGlobalValue(Reg, F);

Register Reg = MIRBuilder.buildGlobalValue(getLLTForType(*F->getType(), DL), F).getReg(0);

After clang-format, this becomes

- auto Reg = - MRI.createGenericVirtualRegister(getLLTForType(*F->getType(), DL)); - MIRBuilder.buildGlobalValue(Reg, F); + Register Reg = + MIRBuilder.buildGlobalValue(getLLTForType(*F->getType(), DL), F) + .getReg(0);

which is not shorter... In addition, the new code AArch64CallLowering.cpp uses createGenericVirtualRegister. So sticking with createGenericVirtualRegister here adds consistency.

Move the getLLTForType to a variable to make the line shorter

Raw MRI.createGenericVirtualRegister calls should be purged whenever possible

Done in de93d86 !

Created using spr 1.3.6-beta.1

arsenm · 2024-03-05T11:41:19Z

llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp

+      auto Reg =
+          MRI.createGenericVirtualRegister(getLLTForType(*F.getType(), DL));
+      MIRBuilder.buildInstr(TargetOpcode::G_GLOBAL_VALUE)
+          .addDef(Reg)
+          .addExternalSymbol(Info.Callee.getSymbolName(), AArch64II::MO_GOT);
+      Info.Callee = MachineOperand::CreateReg(Reg, false);
+    }


Still should fix this API

davemgreen

From what I understand from an AArch64 perspective I think this looks OK too. Thanks

Created using spr 1.3.6-beta.1

mstorsjo · 2025-01-14T22:10:40Z

This change had a surprising effect on how dllimport attributes work for Windows targets:

void __declspec(dllimport) foo(void);
void myFunc(void) { foo(); }

Before:

$ bin/clang -target aarch64-windows -S -o - dllimp.c -O2 -fno-plt
myFunc:
	adrp	x0, __imp_foo
	ldr	x0, [x0, :lo12:__imp_foo]
	br	x0

After:

$ bin/clang -target aarch64-windows -S -o - dllimp.c -O2 -fno-plt
myFunc:
	adrp	x0, foo
	ldr	x0, [x0, :lo12:foo]
	br	x0

(Obviously, this change in the output code breaks it entirely.)

I'm not entirely sure if this should be considered a bug or not - I don't think -fno-plt is supposed to have any effect on Windows targets - but it's somewhat unexpected that it makes the compiler produce broken code.

This is an elf-only flag and specifying it on PE targets has no effect - except that it actually has a quite negative effect when used with LLVM/Clang as of LLVM 19 (see llvm/llvm-project#78890 (comment)). So it is probably better to avoid this flag (regardless of the used toolchain). LLVM will also warn about it in the future (see llvm/llvm-project#124081).

-fno-plt is an ELF specific option that is only implemented for x86 (for a long time) and AArch64 (#78890). GCC doesn't bother to give a diagnostic on Windows. -fno-plt is somewhat popular and we've been ignoring it for unsupported targets for a while, so just report a warning for unsupported targets. Pull Request: #124081

Clang sets the nonlazybind attribute for certain ObjC features. The AArch64 SelectionDAG implementation for non-intrinsic calls (46e36f0) is behind a cl option. GCC implements -fno-plt for a few ELF targets. In Clang, -fno-plt also sets the nonlazybind attribute. For SelectionDAG, make the cl option not affect ELF so that non-intrinsic calls to a dso_preemptable function use GOT. Adjust AArch64TargetLowering::LowerCall to handle intrinsic calls. For FastISel, change `fastLowerCall` to bail out when a call is due to -fno-plt. For GlobalISel, handle non-intrinsic calls in CallLowering::lowerCall and intrinsic calls in AArch64CallLowering::lowerCall (where the target-independent CallLowering::lowerCall is not called). The GlobalISel test in `call-rv-marker.ll` is therefore updated. Note: the current -fno-plt -fpic implementation does not use GOT for a preemptable function. Link: #78275 Pull Request: llvm/llvm-project#78890

[𝘀𝗽𝗿] initial version

549e4ea

Created using spr 1.3.4

MaskRay requested a review from aemerson January 21, 2024 08:25

llvmbot added backend:AArch64 llvm:globalisel labels Jan 21, 2024

MaskRay requested a review from TNorthover January 21, 2024 08:25

MaskRay requested a review from davemgreen January 21, 2024 08:26

hstk30-hw self-requested a review January 21, 2024 12:37

davemgreen requested a review from smithp35 January 29, 2024 08:47

arsenm reviewed Jan 29, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp Show resolved Hide resolved

fix FastISel failures by not classifying Addr.getGlobalValue(), which…

3fb6535

… can be null Created using spr 1.3.4

arsenm reviewed Jan 30, 2024

View reviewed changes

davemgreen reviewed Feb 1, 2024

View reviewed changes

simplify intrinsic call in AArch64CallLowering.cpp

d1e4789

Created using spr 1.3.4

arsenm reviewed Feb 29, 2024

View reviewed changes

replace MRI.createGenericVirtualRegister with buildGlobalValue

de93d86

Created using spr 1.3.6-beta.1

arsenm approved these changes Mar 5, 2024

View reviewed changes

davemgreen approved these changes Mar 5, 2024

View reviewed changes

replace MRI.createGenericVirtualRegister in AArch64CallLowering.cpp

00873f5

Created using spr 1.3.6-beta.1

MaskRay merged commit 201572e into main Mar 5, 2024

MaskRay deleted the users/MaskRay/spr/aarch64-implement-fno-plt-for-selectiondagglobalisel branch March 5, 2024 21:55

MaskRay mentioned this pull request Jan 23, 2025

[Driver] -fno-plt: warn for unsupported targets #124081

Merged

lfeng14 mentioned this pull request Feb 7, 2025

-fno-plt option does not suppress PLT generation in Clang 19 dev, while GCC works as expected #126176

Open

		auto MIB = MIRBuilder.buildInstr(TargetOpcode::G_GLOBAL_VALUE);
		DstOp(Reg).addDefToMIB(MRI, MIB);

[AArch64] Implement -fno-plt for SelectionDAG/GlobalISel #78890

[AArch64] Implement -fno-plt for SelectionDAG/GlobalISel #78890

Uh oh!

Conversation

MaskRay commented Jan 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaskRay commented Jan 29, 2024

Uh oh!

davemgreen commented Jan 29, 2024

Uh oh!

Uh oh!

MaskRay commented Jan 30, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaskRay commented Feb 28, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

mstorsjo commented Jan 14, 2025

Uh oh!

Uh oh!

MaskRay commented Jan 21, 2024 •

edited

Loading

llvmbot commented Jan 21, 2024 •

edited

Loading