Skip to content

[LLD][COFF] Add support for delay-load imports on ARM64X #124600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 28, 2025

Conversation

cjacek
Copy link
Contributor

@cjacek cjacek commented Jan 27, 2025

For each imported module, emit null-terminated native import entries, followed by null-terminated EC entries. If a view lacks imports for a given module, only terminators are emitted. Use ARM64X relocations to skip native entries in the EC view.

Move delayLoadHelper and tailMergeUnwindInfoChunk to SymbolTable since they are different for each symbol table.

@llvmbot
Copy link
Member

llvmbot commented Jan 27, 2025

@llvm/pr-subscribers-platform-windows
@llvm/pr-subscribers-lld

@llvm/pr-subscribers-lld-coff

Author: Jacek Caban (cjacek)

Changes

For each imported module, emit null-terminated native import entries, followed by null-terminated EC entries. If a view lacks imports for a given module, only terminators are emitted. Use ARM64X relocations to skip native entries in the EC view.

Move delayLoadHelper and tailMergeUnwindInfoChunk to SymbolTable since they are different for each symbol table.


Patch is 29.82 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/124600.diff

7 Files Affected:

  • (modified) lld/COFF/Config.h (-1)
  • (modified) lld/COFF/DLL.cpp (+93-65)
  • (modified) lld/COFF/DLL.h (+3-5)
  • (modified) lld/COFF/Driver.cpp (+7-6)
  • (modified) lld/COFF/SymbolTable.h (+3)
  • (modified) lld/COFF/Writer.cpp (+1-2)
  • (added) lld/test/COFF/arm64x-delayimport.test (+363)
diff --git a/lld/COFF/Config.h b/lld/COFF/Config.h
index cd280aa09964d7..0c7c4e91402f18 100644
--- a/lld/COFF/Config.h
+++ b/lld/COFF/Config.h
@@ -164,7 +164,6 @@ struct Configuration {
   bool noimplib = false;
   std::set<std::string> delayLoads;
   std::map<std::string, int> dllOrder;
-  Symbol *delayLoadHelper = nullptr;
   Symbol *arm64ECIcallHelper = nullptr;
 
   llvm::DenseSet<llvm::StringRef> saveTempsArgs;
diff --git a/lld/COFF/DLL.cpp b/lld/COFF/DLL.cpp
index b6fbd5a484b5e7..198b6e1cddd1ee 100644
--- a/lld/COFF/DLL.cpp
+++ b/lld/COFF/DLL.cpp
@@ -911,12 +911,9 @@ uint64_t DelayLoadContents::getDirSize() {
   return dirs.size() * sizeof(delay_import_directory_table_entry);
 }
 
-void DelayLoadContents::create(Defined *h) {
-  helper = h;
+void DelayLoadContents::create() {
   std::vector<std::vector<DefinedImportData *>> v = binImports(ctx, imports);
 
-  Chunk *unwind = newTailMergeUnwindInfoChunk();
-
   // Create .didat contents for each DLL.
   for (std::vector<DefinedImportData *> &syms : v) {
     // Create the delay import table header.
@@ -924,54 +921,89 @@ void DelayLoadContents::create(Defined *h) {
     auto *dir = make<DelayDirectoryChunk>(dllNames.back());
 
     size_t base = addresses.size();
-    Chunk *tm = newTailMergeChunk(dir);
-    Chunk *pdataChunk = unwind ? newTailMergePDataChunk(tm, unwind) : nullptr;
-    for (DefinedImportData *s : syms) {
-      Chunk *t = newThunkChunk(s, tm);
-      auto *a = make<DelayAddressChunk>(ctx, t);
-      addresses.push_back(a);
-      thunks.push_back(t);
-      StringRef extName = s->getExternalName();
-      if (extName.empty()) {
-        names.push_back(make<OrdinalOnlyChunk>(ctx, s->getOrdinal()));
-      } else {
-        auto *c = make<HintNameChunk>(extName, 0);
-        names.push_back(make<LookupChunk>(ctx, c));
-        hintNames.push_back(c);
-        // Add a synthetic symbol for this load thunk, using the "__imp___load"
-        // prefix, in case this thunk needs to be added to the list of valid
-        // call targets for Control Flow Guard.
-        StringRef symName = saver().save("__imp___load_" + extName);
-        s->loadThunkSym =
-            cast<DefinedSynthetic>(ctx.symtab.addSynthetic(symName, t));
+    ctx.forEachSymtab([&](SymbolTable &symtab) {
+      if (ctx.hybridSymtab && symtab.isEC()) {
+        // For hybrid images, emit null-terminated native import entries
+        // followed by null-terminated EC entries. If a view is missing imports
+        // for a given module, only terminators are emitted. Emit ARM64X
+        // relocations to skip native entries in the EC view.
+        ctx.dynamicRelocs->add(
+            IMAGE_DVRT_ARM64X_FIXUP_TYPE_DELTA, 0,
+            Arm64XRelocVal(dir, offsetof(delay_import_directory_table_entry,
+                                         DelayImportAddressTable)),
+            (addresses.size() - base) * sizeof(uint64_t));
+        ctx.dynamicRelocs->add(
+            IMAGE_DVRT_ARM64X_FIXUP_TYPE_DELTA, 0,
+            Arm64XRelocVal(dir, offsetof(delay_import_directory_table_entry,
+                                         DelayImportNameTable)),
+            (addresses.size() - base) * sizeof(uint64_t));
       }
 
-      if (s->file->impECSym) {
-        auto chunk = make<AuxImportChunk>(s->file);
-        auxIat.push_back(chunk);
-        s->file->impECSym->setLocation(chunk);
+      Chunk *tm = nullptr;
 
-        chunk = make<AuxImportChunk>(s->file);
-        auxIatCopy.push_back(chunk);
-        s->file->auxImpCopySym->setLocation(chunk);
+      for (DefinedImportData *s : syms) {
+        // Process only the symbols belonging to the current symtab.
+        if (symtab.isEC() != s->file->isEC())
+          continue;
+
+        if (!tm) {
+          tm = newTailMergeChunk(symtab, dir);
+          Chunk *pdataChunk = newTailMergePDataChunk(symtab, tm);
+          if (pdataChunk)
+            pdata.push_back(pdataChunk);
+        }
+
+        Chunk *t = newThunkChunk(s, tm);
+        auto *a = make<DelayAddressChunk>(ctx, t);
+        addresses.push_back(a);
+        s->setLocation(a);
+        thunks.push_back(t);
+        StringRef extName = s->getExternalName();
+        if (extName.empty()) {
+          names.push_back(make<OrdinalOnlyChunk>(ctx, s->getOrdinal()));
+        } else {
+          auto *c = make<HintNameChunk>(extName, 0);
+          names.push_back(make<LookupChunk>(ctx, c));
+          hintNames.push_back(c);
+          // Add a synthetic symbol for this load thunk, using the
+          // "__imp___load" prefix, in case this thunk needs to be added to the
+          // list of valid call targets for Control Flow Guard.
+          StringRef symName = saver().save("__imp___load_" + extName);
+          s->loadThunkSym =
+              cast<DefinedSynthetic>(symtab.addSynthetic(symName, t));
+        }
+
+        if (symtab.isEC()) {
+          auto chunk = make<AuxImportChunk>(s->file);
+          auxIat.push_back(chunk);
+          s->file->impECSym->setLocation(chunk);
+
+          chunk = make<AuxImportChunk>(s->file);
+          auxIatCopy.push_back(chunk);
+          s->file->auxImpCopySym->setLocation(chunk);
+        } else if (ctx.hybridSymtab) {
+          // Fill the auxiliary IAT with null chunks for native imports.
+          auxIat.push_back(make<NullChunk>(ctx));
+          auxIatCopy.push_back(make<NullChunk>(ctx));
+        }
       }
-    }
-    thunks.push_back(tm);
-    if (pdataChunk)
-      pdata.push_back(pdataChunk);
-    StringRef tmName =
-        saver().save("__tailMerge_" + syms[0]->getDLLName().lower());
-    ctx.symtab.addSynthetic(tmName, tm);
-    // Terminate with null values.
-    addresses.push_back(make<NullChunk>(ctx, 8));
-    names.push_back(make<NullChunk>(ctx, 8));
-    if (ctx.config.machine == ARM64EC) {
-      auxIat.push_back(make<NullChunk>(ctx, 8));
-      auxIatCopy.push_back(make<NullChunk>(ctx, 8));
-    }
 
-    for (int i = 0, e = syms.size(); i < e; ++i)
-      syms[i]->setLocation(addresses[base + i]);
+      if (tm) {
+        thunks.push_back(tm);
+        StringRef tmName =
+            saver().save("__tailMerge_" + syms[0]->getDLLName().lower());
+        symtab.addSynthetic(tmName, tm);
+      }
+
+      // Terminate with null values.
+      addresses.push_back(make<NullChunk>(ctx, 8));
+      names.push_back(make<NullChunk>(ctx, 8));
+      if (ctx.symtabEC) {
+        auxIat.push_back(make<NullChunk>(ctx, 8));
+        auxIatCopy.push_back(make<NullChunk>(ctx, 8));
+      }
+    });
+
     auto *mh = make<NullChunk>(8, 8);
     moduleHandles.push_back(mh);
 
@@ -982,15 +1014,18 @@ void DelayLoadContents::create(Defined *h) {
     dirs.push_back(dir);
   }
 
-  if (unwind)
-    unwindinfo.push_back(unwind);
+  ctx.forEachSymtab([&](SymbolTable &symtab) {
+    if (symtab.tailMergeUnwindInfoChunk)
+      unwindinfo.push_back(symtab.tailMergeUnwindInfoChunk);
+  });
   // Add null terminator.
   dirs.push_back(
       make<NullChunk>(sizeof(delay_import_directory_table_entry), 4));
 }
 
-Chunk *DelayLoadContents::newTailMergeChunk(Chunk *dir) {
-  switch (ctx.config.machine) {
+Chunk *DelayLoadContents::newTailMergeChunk(SymbolTable &symtab, Chunk *dir) {
+  auto helper = cast<Defined>(symtab.delayLoadHelper);
+  switch (symtab.machine) {
   case AMD64:
   case ARM64EC:
     return make<TailMergeChunkX64>(dir, helper);
@@ -1005,21 +1040,14 @@ Chunk *DelayLoadContents::newTailMergeChunk(Chunk *dir) {
   }
 }
 
-Chunk *DelayLoadContents::newTailMergeUnwindInfoChunk() {
-  switch (ctx.config.machine) {
-  case AMD64:
-  case ARM64EC:
-    return make<TailMergeUnwindInfoX64>();
-    // FIXME: Add support for other architectures.
-  default:
-    return nullptr; // Just don't generate unwind info.
-  }
-}
-Chunk *DelayLoadContents::newTailMergePDataChunk(Chunk *tm, Chunk *unwind) {
-  switch (ctx.config.machine) {
+Chunk *DelayLoadContents::newTailMergePDataChunk(SymbolTable &symtab,
+                                                 Chunk *tm) {
+  switch (symtab.machine) {
   case AMD64:
   case ARM64EC:
-    return make<TailMergePDataChunkX64>(tm, unwind);
+    if (!symtab.tailMergeUnwindInfoChunk)
+      symtab.tailMergeUnwindInfoChunk = make<TailMergeUnwindInfoX64>();
+    return make<TailMergePDataChunkX64>(tm, symtab.tailMergeUnwindInfoChunk);
     // FIXME: Add support for other architectures.
   default:
     return nullptr; // Just don't generate unwind info.
@@ -1028,7 +1056,7 @@ Chunk *DelayLoadContents::newTailMergePDataChunk(Chunk *tm, Chunk *unwind) {
 
 Chunk *DelayLoadContents::newThunkChunk(DefinedImportData *s,
                                         Chunk *tailMerge) {
-  switch (ctx.config.machine) {
+  switch (s->file->getMachineType()) {
   case AMD64:
   case ARM64EC:
     return make<ThunkChunkX64>(s, tailMerge);
diff --git a/lld/COFF/DLL.h b/lld/COFF/DLL.h
index 724a323d62d205..5105b79f15d31a 100644
--- a/lld/COFF/DLL.h
+++ b/lld/COFF/DLL.h
@@ -42,7 +42,7 @@ class DelayLoadContents {
   DelayLoadContents(COFFLinkerContext &ctx) : ctx(ctx) {}
   void add(DefinedImportData *sym) { imports.push_back(sym); }
   bool empty() { return imports.empty(); }
-  void create(Defined *helper);
+  void create();
   std::vector<Chunk *> getChunks();
   std::vector<Chunk *> getDataChunks();
   ArrayRef<Chunk *> getCodeChunks() { return thunks; }
@@ -56,11 +56,9 @@ class DelayLoadContents {
 
 private:
   Chunk *newThunkChunk(DefinedImportData *s, Chunk *tailMerge);
-  Chunk *newTailMergeChunk(Chunk *dir);
-  Chunk *newTailMergePDataChunk(Chunk *tm, Chunk *unwind);
-  Chunk *newTailMergeUnwindInfoChunk();
+  Chunk *newTailMergeChunk(SymbolTable &symtab, Chunk *dir);
+  Chunk *newTailMergePDataChunk(SymbolTable &symtab, Chunk *tm);
 
-  Defined *helper;
   std::vector<DefinedImportData *> imports;
   std::vector<Chunk *> dirs;
   std::vector<Chunk *> moduleHandles;
diff --git a/lld/COFF/Driver.cpp b/lld/COFF/Driver.cpp
index 6eea11f5f451fd..ac3ac57bd17f44 100644
--- a/lld/COFF/Driver.cpp
+++ b/lld/COFF/Driver.cpp
@@ -2353,12 +2353,13 @@ void LinkerDriver::linkerMain(ArrayRef<const char *> argsArr) {
     llvm::TimeTraceScope timeScope("Delay load");
     for (auto *arg : args.filtered(OPT_delayload)) {
       config->delayLoads.insert(StringRef(arg->getValue()).lower());
-      if (config->machine == I386) {
-        config->delayLoadHelper = ctx.symtab.addGCRoot("___delayLoadHelper2@8");
-      } else {
-        config->delayLoadHelper =
-            ctx.symtab.addGCRoot("__delayLoadHelper2", true);
-      }
+      ctx.forEachSymtab([&](SymbolTable &symtab) {
+        if (symtab.machine == I386) {
+          symtab.delayLoadHelper = symtab.addGCRoot("___delayLoadHelper2@8");
+        } else {
+          symtab.delayLoadHelper = symtab.addGCRoot("__delayLoadHelper2", true);
+        }
+      });
     }
   }
 
diff --git a/lld/COFF/SymbolTable.h b/lld/COFF/SymbolTable.h
index c8d7251838842f..ff6e8487f07344 100644
--- a/lld/COFF/SymbolTable.h
+++ b/lld/COFF/SymbolTable.h
@@ -158,6 +158,9 @@ class SymbolTable {
   Chunk *edataStart = nullptr;
   Chunk *edataEnd = nullptr;
 
+  Symbol *delayLoadHelper = nullptr;
+  Chunk *tailMergeUnwindInfoChunk = nullptr;
+
   void fixupExports();
   void assignExportOrdinals();
   void parseModuleDefs(StringRef path);
diff --git a/lld/COFF/Writer.cpp b/lld/COFF/Writer.cpp
index bef2ced9f2957d..2bdaeb58ab432d 100644
--- a/lld/COFF/Writer.cpp
+++ b/lld/COFF/Writer.cpp
@@ -1307,8 +1307,7 @@ void Writer::appendImportThunks() {
   }
 
   if (!delayIdata.empty()) {
-    Defined *helper = cast<Defined>(ctx.config.delayLoadHelper);
-    delayIdata.create(helper);
+    delayIdata.create();
     for (Chunk *c : delayIdata.getChunks())
       didatSec->addChunk(c);
     for (Chunk *c : delayIdata.getDataChunks())
diff --git a/lld/test/COFF/arm64x-delayimport.test b/lld/test/COFF/arm64x-delayimport.test
new file mode 100644
index 00000000000000..56923ef748d09d
--- /dev/null
+++ b/lld/test/COFF/arm64x-delayimport.test
@@ -0,0 +1,363 @@
+REQUIRES: aarch64, x86
+RUN: split-file %s %t.dir && cd %t.dir
+
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows test-arm64ec.s -o test-arm64ec.obj
+RUN: llvm-mc -filetype=obj -triple=aarch64-windows test-arm64.s -o test-arm64.obj
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows arm64ec-helper.s -o arm64ec-helper.obj
+RUN: llvm-mc -filetype=obj -triple=aarch64-windows arm64-helper.s -o arm64-helper.obj
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows %S/Inputs/loadconfig-arm64ec.s -o loadconfig-arm64ec.obj
+RUN: llvm-mc -filetype=obj -triple=aarch64-windows %S/Inputs/loadconfig-arm64.s -o loadconfig-arm64.obj
+RUN: llvm-lib -machine:arm64ec -def:test.def -out:test-arm64ec.lib
+RUN: llvm-lib -machine:arm64 -def:test.def -out:test-arm64.lib
+
+# Test delayed-load import from both native and EC code.
+
+RUN: lld-link -machine:arm64x -dll -noentry -out:out.dll loadconfig-arm64.obj loadconfig-arm64ec.obj \
+RUN:          arm64-helper.obj arm64ec-helper.obj test-arm64.obj test-arm64ec.obj test-arm64.lib test-arm64ec.lib -delayload:test.dll
+
+RUN: llvm-readobj --coff-imports out.dll | FileCheck --check-prefix=IMPORTS %s
+IMPORTS:      DelayImport {
+IMPORTS-NEXT:   Name: test.dll
+IMPORTS-NEXT:   Attributes: 0x1
+IMPORTS-NEXT:   ModuleHandle: 0x6080
+IMPORTS-NEXT:   ImportAddressTable: 0x6088
+IMPORTS-NEXT:   ImportNameTable: 0x4390
+IMPORTS-NEXT:   BoundDelayImportTable: 0x0
+IMPORTS-NEXT:   UnloadDelayImportTable: 0x0
+IMPORTS-NEXT:   Import {
+IMPORTS-NEXT:     Symbol: func (0)
+IMPORTS-NEXT:     Address: 0x180001014
+IMPORTS-NEXT:   }
+IMPORTS-NEXT: }
+IMPORTS-NEXT: HybridObject {
+IMPORTS:        DelayImport {
+IMPORTS-NEXT:     Name: test.dll
+IMPORTS-NEXT:     Attributes: 0x1
+IMPORTS-NEXT:     ModuleHandle: 0x6080
+IMPORTS-NEXT:     ImportAddressTable: 0x6098
+IMPORTS-NEXT:     ImportNameTable: 0x43A0
+IMPORTS-NEXT:     BoundDelayImportTable: 0x0
+IMPORTS-NEXT:     UnloadDelayImportTable: 0x0
+IMPORTS-NEXT:     Import {
+IMPORTS-NEXT:       Symbol: func (0)
+IMPORTS-NEXT:       Address: 0x180003006
+IMPORTS-NEXT:     }
+IMPORTS-NEXT:   }
+IMPORTS-NEXT: }
+
+RUN: llvm-readobj --hex-dump=.test out.dll | FileCheck --check-prefix=TESTSEC %s
+TESTSEC: 0x180009000 10500000 98600000 00300000 10200000
+
+RUN: llvm-readobj --hex-dump=.testa out.dll | FileCheck --check-prefix=TESTSECA %s
+TESTSECA: 0x18000a000 88600000 08100000
+
+RUN: llvm-objdump -d out.dll | FileCheck --check-prefix=DISASM %s
+DISASM:      0000000180001000 <.text>:
+DISASM-NEXT: 180001000: 52800060     mov     w0, #0x3                // =3
+DISASM-NEXT: 180001004: d65f03c0     ret
+DISASM-NEXT: 180001008: b0000030     adrp    x16, 0x180006000
+DISASM-NEXT: 18000100c: f9404610     ldr     x16, [x16, #0x88]
+DISASM-NEXT: 180001010: d61f0200     br      x16
+DISASM-NEXT: 180001014: b0000031     adrp    x17, 0x180006000
+DISASM-NEXT: 180001018: 91022231     add     x17, x17, #0x88
+DISASM-NEXT: 18000101c: 14000001     b       0x180001020 <.text+0x20>
+DISASM-NEXT: 180001020: a9b37bfd     stp     x29, x30, [sp, #-0xd0]!
+DISASM-NEXT: 180001024: 910003fd     mov     x29, sp
+DISASM-NEXT: 180001028: a90107e0     stp     x0, x1, [sp, #0x10]
+DISASM-NEXT: 18000102c: a9020fe2     stp     x2, x3, [sp, #0x20]
+DISASM-NEXT: 180001030: a90317e4     stp     x4, x5, [sp, #0x30]
+DISASM-NEXT: 180001034: a9041fe6     stp     x6, x7, [sp, #0x40]
+DISASM-NEXT: 180001038: ad0287e0     stp     q0, q1, [sp, #0x50]
+DISASM-NEXT: 18000103c: ad038fe2     stp     q2, q3, [sp, #0x70]
+DISASM-NEXT: 180001040: ad0497e4     stp     q4, q5, [sp, #0x90]
+DISASM-NEXT: 180001044: ad059fe6     stp     q6, q7, [sp, #0xb0]
+DISASM-NEXT: 180001048: aa1103e1     mov     x1, x17
+DISASM-NEXT: 18000104c: f0000000     adrp    x0, 0x180004000
+DISASM-NEXT: 180001050: 910d4000     add     x0, x0, #0x350
+DISASM-NEXT: 180001054: 97ffffeb     bl      0x180001000 <.text>
+DISASM-NEXT: 180001058: aa0003f0     mov     x16, x0
+DISASM-NEXT: 18000105c: ad459fe6     ldp     q6, q7, [sp, #0xb0]
+DISASM-NEXT: 180001060: ad4497e4     ldp     q4, q5, [sp, #0x90]
+DISASM-NEXT: 180001064: ad438fe2     ldp     q2, q3, [sp, #0x70]
+DISASM-NEXT: 180001068: ad4287e0     ldp     q0, q1, [sp, #0x50]
+DISASM-NEXT: 18000106c: a9441fe6     ldp     x6, x7, [sp, #0x40]
+DISASM-NEXT: 180001070: a94317e4     ldp     x4, x5, [sp, #0x30]
+DISASM-NEXT: 180001074: a9420fe2     ldp     x2, x3, [sp, #0x20]
+DISASM-NEXT: 180001078: a94107e0     ldp     x0, x1, [sp, #0x10]
+DISASM-NEXT: 18000107c: a8cd7bfd     ldp     x29, x30, [sp], #0xd0
+DISASM-NEXT: 180001080: d61f0200     br      x16
+DISASM-NEXT:                 ...
+DISASM-NEXT: 180002000: 52800040     mov     w0, #0x2                // =2
+DISASM-NEXT: 180002004: d65f03c0     ret
+DISASM-NEXT: 180002008: 52800060     mov     w0, #0x3                // =3
+DISASM-NEXT: 18000200c: d65f03c0     ret
+DISASM-NEXT: 180002010: f0000010     adrp    x16, 0x180005000
+DISASM-NEXT: 180002014: f9400a10     ldr     x16, [x16, #0x10]
+DISASM-NEXT: 180002018: d61f0200     br      x16
+DISASM-NEXT: 18000201c: 9000002b     adrp    x11, 0x180006000
+DISASM-NEXT: 180002020: f9404d6b     ldr     x11, [x11, #0x98]
+DISASM-NEXT: 180002024: 9000000a     adrp    x10, 0x180002000 <.text+0x1000>
+DISASM-NEXT: 180002028: 9100c14a     add     x10, x10, #0x30
+DISASM-NEXT: 18000202c: 17fffff5     b       0x180002000 <.text+0x1000>
+DISASM-NEXT: 180002030: 52800080     mov     w0, #0x4                // =4
+DISASM-NEXT: 180002034: d65f03c0     ret
+DISASM-NEXT:                 ...
+DISASM-NEXT: 180003000: ff 25 92 30 00 00            jmpq    *0x3092(%rip)           # 0x180006098
+DISASM-NEXT: 180003006: 48 8d 05 8b 30 00 00         leaq    0x308b(%rip), %rax      # 0x180006098
+DISASM-NEXT: 18000300d: e9 00 00 00 00               jmp     0x180003012 <.text+0x2012>
+DISASM-NEXT: 180003012: 51                           pushq   %rcx
+DISASM-NEXT: 180003013: 52                           pushq   %rdx
+DISASM-NEXT: 180003014: 41 50                        pushq   %r8
+DISASM-NEXT: 180003016: 41 51                        pushq   %r9
+DISASM-NEXT: 180003018: 48 83 ec 48                  subq    $0x48, %rsp
+DISASM-NEXT: 18000301c: 66 0f 7f 04 24               movdqa  %xmm0, (%rsp)
+DISASM-NEXT: 180003021: 66 0f 7f 4c 24 10            movdqa  %xmm1, 0x10(%rsp)
+DISASM-NEXT: 180003027: 66 0f 7f 54 24 20            movdqa  %xmm2, 0x20(%rsp)
+DISASM-NEXT: 18000302d: 66 0f 7f 5c 24 30            movdqa  %xmm3, 0x30(%rsp)
+DISASM-NEXT: 180003033: 48 8b d0                     movq    %rax, %rdx
+DISASM-NEXT: 180003036: 48 8d 0d 13 13 00 00         leaq    0x1313(%rip), %rcx # 0x180004350
+DISASM-NEXT: 18000303d: e8 c6 ef ff ff               callq   0x180002008 <.text+0x1008>
+DISASM-NEXT: 180003042: 66 0f 6f 04 24               movdqa  (%rsp), %xmm0
+DISASM-NEXT: 180003047: 66 0f 6f 4c 24 10            movdqa  0x10(%rsp), %xmm1
+DISASM-NEXT: 18000304d: 66 0f 6f 54 24 20            movdqa  0x20(%rsp), %xmm2
+DISASM-NEXT: 180003053: 66 0f 6f 5c 24 30            movdqa  0x30(%rsp), %xmm3
+DISASM-NEXT: 180003059: 48 83 c4 48                  addq    $0x48, %rsp
+DISASM-NEXT: 18000305d: 41 59                        popq    %r9
+DISASM-NEXT: 18000305f: 41 58                        popq    %r8
+DISASM-NEXT: 180003061: 5a                           popq    %rdx
+DISASM-NEXT: 180003062: 59                           popq    %rcx
+DISASM-NEXT: 180003063: ff e0                        jmpq    *%rax
+
+RUN: llvm-readobj --coff-load-config out.dll | FileCheck --check-prefix=LOADCFG %s
+LOADCFG:      AuxiliaryDelayloadIAT: 0x5000
+LOADCFG-NEXT: AuxiliaryDelayloadIATCopy: 0x4140
+
+RUN: llvm-readobj --hex-dump=.rdata out.dll | FileCheck --check-prefix=AUXIAT %s
+AUXIAT:      0x180005000 00000000 00000000 00000000 00000000
+AUXIAT-NEXT: 0x180005010 1c200080 01000000 00000000 00000000
+
+
+# Test delayed-load import from native code only.
+
+RUN: lld-link -machine:arm64x -dll -noentry -out:out-native.dll loadconfig-arm64.obj loadconfig-arm64ec.obj \
+RUN:          arm64-helper.obj arm64ec-helper.obj test-arm64.obj test-arm64.lib test-arm64ec.lib -delayload:test.dll
+
+RUN: llvm-readobj --coff-imports out-native.dll | FileCheck --check-prefix=NATIVE-IMPORTS %s
+NATIVE-IMPORTS:      DelayImport {
+NATIVE-IMPORTS-NEXT:   Name: test.dll
+NATIVE-IMPORTS-NEXT:   Attributes: 0x1
+NATIVE-IMPORTS-NEXT:   ModuleHandle: 0x5080
+NATIVE-IMPORTS-NEXT:   ImportAddressTable: 0x5088
+NATIVE-IMPORTS-NEXT:   ImportNameTable: 0x3370
+NATIVE-IMPORTS...
[truncated]

@cjacek
Copy link
Contributor Author

cjacek commented Jan 27, 2025

Changes to DelayLoadContents::create may be difficult to read, as the patch alters indentation due to the forEachSymbol call. Using git diff -b can help make the changes easier to follow.

Besides the usual ctx.symtab to symtab change and the addition of ARM64X relocations, the patch also moves the creation of the tail merge chunk into the loop, ensuring it is only created when actually needed by the processed view. Since the order of symbols and addresses no longer matches, setLocation call has been moved to the main loop too, where the order is no longer relevant.

Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall.

As this change contains quite a lot of refactorings of the existing code, it would maybe be nice to have this split up into two steps, one for doing most of the refactorings, and one for adding the EC specific behaviours, but I'm not sure if it is worth it.

cjacek added a commit to cjacek/llvm-project that referenced this pull request Jan 28, 2025
…olTable (NFC)

In preparation for ARM64X delay-load import support (llvm#124600).
cjacek added a commit that referenced this pull request Jan 28, 2025
…olTable (NFC) (#124729)

In preparation for ARM64X delay-load import support (#124600).
@cjacek
Copy link
Contributor Author

cjacek commented Jan 28, 2025

Thanks! I split refactoring into #124729 and pushed a rebase on top of that.

Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still ok - looking much smaller now! One minor suggestion still for potential pre-refactoring.

Chunk *t = newThunkChunk(s, tm);
auto *a = make<DelayAddressChunk>(ctx, t);
addresses.push_back(a);
s->setLocation(a);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the moving of where the tail merge chunk is created, and this call to s->setLocation, also could be considered pre-refactorings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I split it and merged both. Thanks!

cjacek added a commit to cjacek/llvm-project that referenced this pull request Jan 28, 2025
… the addresses vector

This change prepares for ARM64X delay-load imports support (llvm#124600). Delaying the setLocation
call is problematic on ARM64X because the order of addresses may not align with the order of symbols.
cjacek added a commit that referenced this pull request Jan 28, 2025
… the addresses vector (NFC) (#124736)

This change prepares for ARM64X delay-load imports support (#124600).
Delaying the `setLocation` call is problematic on ARM64X because the
order of addresses may not align with the order of symbols.
For each imported module, emit null-terminated native import entries, followed by
null-terminated EC entries. If a view lacks imports for a given module,
only terminators are emitted. Use ARM64X relocations to skip native entries in
the EC view.
@cjacek cjacek merged commit 3a51466 into llvm:main Jan 28, 2025
5 of 7 checks passed
@cjacek cjacek deleted the arm64x-delayload branch January 28, 2025 12:06
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 28, 2025

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime running on omp-vega20-0 while building lld at step 7 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/14756

Here is the relevant piece of the build log for the reference
Step 7 (Add check check-offload) failure: test (failure)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: offloading/schedule.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/schedule.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/schedule.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a && /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/schedule.c.tmp 2>&1 | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/schedule.c
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/schedule.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/schedule.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/schedule.c.tmp
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/schedule.c
# .---command stderr------------
# | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/schedule.c:80:12: error: CHECK: expected string not found in input
# |  // CHECK: test no order OK
# |            ^
# | <stdin>:1:1: note: scanning from here
# | Fail to schedule in order.
# | ^
# | <stdin>:2:1: note: possible intended match here
# | test ordered OK
# | ^
# | 
# | Input file: <stdin>
# | Check file: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/schedule.c
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: Fail to schedule in order. 
# | check:80'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |             2: test ordered OK 
# | check:80'0     ~~~~~~~~~~~~~~~~
# | check:80'1     ?                possible intended match
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

********************


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants