-
Notifications
You must be signed in to change notification settings - Fork 13.7k
[LLD][COFF] Implement support for hybrid IAT on ARM64X #124189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -716,52 +716,176 @@ class ExportOrdinalChunk : public NonSectionChunk { | |
void IdataContents::create(COFFLinkerContext &ctx) { | ||
std::vector<std::vector<DefinedImportData *>> v = binImports(ctx, imports); | ||
|
||
// In hybrid images, EC and native code are usually very similar, | ||
// resulting in a highly similar set of imported symbols. Consequently, | ||
// their import tables can be shared, with ARM64X relocations handling any | ||
// differences. Identify matching import files used by EC and native code, and | ||
// merge them into a single hybrid import entry. | ||
if (ctx.hybridSymtab) { | ||
for (std::vector<DefinedImportData *> &syms : v) { | ||
std::vector<DefinedImportData *> hybridSyms; | ||
ImportFile *prev = nullptr; | ||
for (DefinedImportData *sym : syms) { | ||
ImportFile *file = sym->file; | ||
// At this stage, symbols are sorted by base name, ensuring that | ||
// compatible import files, if present, are adjacent. Check if the | ||
// current symbol's file imports the same symbol as the previously added | ||
// one (if any and if it was not already merged). Additionally, verify | ||
// that one of them is native while the other is EC. In rare cases, | ||
// separate matching import entries may exist within the same namespace, | ||
// which cannot be merged. | ||
if (!prev || file->isEC() == prev->isEC() || | ||
!file->isSameImport(prev)) { | ||
// We can't merge the import file, just add it to hybridSyms | ||
// and set prev to its file so that we can try to match the next | ||
// symbol. | ||
hybridSyms.push_back(sym); | ||
prev = file; | ||
continue; | ||
} | ||
|
||
// A matching symbol may appear in syms in any order. The native variant | ||
// exposes a subset of EC symbols and chunks, so always use the EC | ||
// variant as the hybrid import file. If the native file was already | ||
// added, replace it with the EC symbol in hybridSyms. Otherwise, the EC | ||
// variant is already pushed, so we can simply merge it. | ||
if (file->isEC()) { | ||
hybridSyms.pop_back(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I feel unsure about this logic here, or about the assumptions it makes. Can you add more comments about how the logic of this loop works, and the assumptions? The comments currently do explain the highlevel goals of it, but I don't feel confident in the dedup logic. If there are both native and EC imports, are they ordered so that the native one comes first - is this assumed and guaranteed? I think what I may have a hard time wrapping my head around, is when the loop is written with the early-continue style (which may be policy and anyway generally is good structure) - the codepath we have here is essentially the special case, for Is the check for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'll try to improve these comments.
I don't think
Yes, I can change the code to avoid using an early continue if you prefer.
I think it's possible for
Since the export name is the same, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ok, now I see - if we're in the dedup codepath, we don't end up pushing the second symbol at all, if it was native. Perhaps we can make it even clearer with more comments? Like this:
I'm not entirely sure if it is needed, if we have more comments for understanding the code flow. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I've pushed a new version that adds a test for this. While experimenting, I noticed that MSVC is buggy when using EXPORTAS in this way. It handles the test I attached here correctly, but I was able to make it import incorrect symbols due to improper merging. It seems that MSVC compares only symbol names rather than import names, which might also explain the previously mentioned issues with NONAME handling. I didn't follow MSVC's approach here and instead kept what I believe to be the correct behavior. I also added more comments. |
||
hybridSyms.push_back(sym); | ||
} | ||
|
||
// Merge import files by storing their hybrid form in the corresponding | ||
// file class. | ||
prev->hybridFile = file; | ||
file->hybridFile = prev; | ||
prev = nullptr; // A hybrid import file cannot be merged again. | ||
} | ||
|
||
// Sort symbols by type: native-only files first, followed by merged | ||
// hybrid files, and then EC-only files. | ||
llvm::stable_sort(hybridSyms, | ||
[](DefinedImportData *a, DefinedImportData *b) { | ||
if (a->file->hybridFile) | ||
return !b->file->hybridFile && b->file->isEC(); | ||
return !a->file->isEC() && b->file->isEC(); | ||
}); | ||
syms = std::move(hybridSyms); | ||
} | ||
} | ||
|
||
// Create .idata contents for each DLL. | ||
for (std::vector<DefinedImportData *> &syms : v) { | ||
// Create lookup and address tables. If they have external names, | ||
// we need to create hintName chunks to store the names. | ||
// If they don't (if they are import-by-ordinals), we store only | ||
// ordinal values to the table. | ||
size_t base = lookups.size(); | ||
Chunk *lookupsTerminator = nullptr, *addressesTerminator = nullptr; | ||
for (DefinedImportData *s : syms) { | ||
uint16_t ord = s->getOrdinal(); | ||
HintNameChunk *hintChunk = nullptr; | ||
Chunk *lookupsChunk, *addressesChunk; | ||
|
||
if (s->getExternalName().empty()) { | ||
lookups.push_back(make<OrdinalOnlyChunk>(ctx, ord)); | ||
addresses.push_back(make<OrdinalOnlyChunk>(ctx, ord)); | ||
lookupsChunk = make<OrdinalOnlyChunk>(ctx, ord); | ||
addressesChunk = make<OrdinalOnlyChunk>(ctx, ord); | ||
} else { | ||
auto *c = make<HintNameChunk>(s->getExternalName(), ord); | ||
lookups.push_back(make<LookupChunk>(ctx, c)); | ||
addresses.push_back(make<LookupChunk>(ctx, c)); | ||
hints.push_back(c); | ||
hintChunk = make<HintNameChunk>(s->getExternalName(), ord); | ||
lookupsChunk = make<LookupChunk>(ctx, hintChunk); | ||
addressesChunk = make<LookupChunk>(ctx, hintChunk); | ||
hints.push_back(hintChunk); | ||
} | ||
|
||
if (s->file->impECSym) { | ||
// Detect the first EC-only import in the hybrid IAT. Emit null chunk | ||
// as a terminator for the native view, and add an ARM64X relocation to | ||
// replace it with the correct import for the EC view. | ||
// | ||
// Additionally, for MSVC compatibility, store the lookup and address | ||
// chunks and append them at the end of EC-only imports, where a null | ||
// terminator chunk would typically be placed. Since they appear after | ||
// the native terminator, they will be ignored in the native view. | ||
// In the EC view, they should act as terminators, so emit ZEROFILL | ||
// relocations overriding them. | ||
if (ctx.hybridSymtab && !lookupsTerminator && s->file->isEC() && | ||
!s->file->hybridFile) { | ||
lookupsTerminator = lookupsChunk; | ||
addressesTerminator = addressesChunk; | ||
lookupsChunk = make<NullChunk>(ctx); | ||
addressesChunk = make<NullChunk>(ctx); | ||
|
||
Arm64XRelocVal relocVal = hintChunk; | ||
if (!hintChunk) | ||
relocVal = (1ULL << 63) | ord; | ||
ctx.dynamicRelocs->add(IMAGE_DVRT_ARM64X_FIXUP_TYPE_VALUE, | ||
sizeof(uint64_t), lookupsChunk, relocVal); | ||
ctx.dynamicRelocs->add(IMAGE_DVRT_ARM64X_FIXUP_TYPE_VALUE, | ||
sizeof(uint64_t), addressesChunk, relocVal); | ||
ctx.dynamicRelocs->add(IMAGE_DVRT_ARM64X_FIXUP_TYPE_ZEROFILL, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm trying to form an understanding of the expected end state of things after this loop. At the end, we're going to have the regular IAT (
I presume the purpose of the null chunk in the middle of the regular IAT is to terminate it, so in native mode you only see the native-only and hybrid entries? After applying dynamic relocations, the null chunk is turned into the entry it was initially meant to be, so you see all of it - and we adjust the pointer to the start, so that it points at the first non-native entry. What's the idea and purpose of the lookupTerminator? In the native view, it's not visible, and after relocations in EC mode, it's turned into a null terminator. Why not just do a plain null terminator to begin with? Secondly, if I follow it correctly, the aux IAT is supposed to look like this:
I've forgotten the purpose/role of the AUX IAT. Doesn't it use null pointers as terminators - how does this work? Or does the delta for the start of the regular IAT also get applied to this one? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, it's all correct.
I think you're correct, using a null chunk should work here. However, that's not what MSVC does. I'm not sure why; I can only speculate. My best guess is that a null chunk might confuse some applications that manually read the IAT (such as anti-cheat systems or DRM software). Following MSVC's approach ensures that the number of null chunks in Since this is just speculation, I'm open to changing it to null chunks if you think strictly following MSVC isn't necessary.
On ARM64EC, the regular IAT contains pointers to imported functions as exposed by PE exports. These may be x86 export thunks that perform FFS jumps to EC code, or they may simply be x86 functions. The purpose of the auxiliary IAT is to provide
The delta from the start of the IAT needs to match. The auxiliary IAT is referenced in CHPE metadata as an RVA to the entire table. The loader may need to associate its entries with those in the regular IAT, potentially using the same offset from the table's start as in the regular IAT. To ensure this works, we need to populate the auxiliary IAT with native-only imports, even though they are not used at runtime. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ok, that's probably a good enough reason. But it's quite confusing when trying to make out how it works from reading the code (ok, figuring it out from MS link.exe probably isn't any more straightforward...), so it would be good to explain this with some comments. |
||
sizeof(uint64_t), lookupsTerminator); | ||
ctx.dynamicRelocs->add(IMAGE_DVRT_ARM64X_FIXUP_TYPE_ZEROFILL, | ||
sizeof(uint64_t), addressesTerminator); | ||
} | ||
|
||
lookups.push_back(lookupsChunk); | ||
addresses.push_back(addressesChunk); | ||
|
||
if (s->file->isEC()) { | ||
auto chunk = make<AuxImportChunk>(s->file); | ||
auxIat.push_back(chunk); | ||
s->file->impECSym->setLocation(chunk); | ||
|
||
chunk = make<AuxImportChunk>(s->file); | ||
auxIatCopy.push_back(chunk); | ||
s->file->auxImpCopySym->setLocation(chunk); | ||
} else if (ctx.hybridSymtab) { | ||
// Fill the auxiliary IAT with null chunks for native-only imports. | ||
auxIat.push_back(make<NullChunk>(ctx)); | ||
auxIatCopy.push_back(make<NullChunk>(ctx)); | ||
} | ||
} | ||
// Terminate with null values. | ||
lookups.push_back(make<NullChunk>(ctx)); | ||
addresses.push_back(make<NullChunk>(ctx)); | ||
if (ctx.config.machine == ARM64EC) { | ||
lookups.push_back(lookupsTerminator ? lookupsTerminator | ||
: make<NullChunk>(ctx)); | ||
addresses.push_back(addressesTerminator ? addressesTerminator | ||
: make<NullChunk>(ctx)); | ||
if (ctx.symtabEC) { | ||
auxIat.push_back(make<NullChunk>(ctx)); | ||
auxIatCopy.push_back(make<NullChunk>(ctx)); | ||
} | ||
|
||
for (int i = 0, e = syms.size(); i < e; ++i) | ||
for (int i = 0, e = syms.size(); i < e; ++i) { | ||
syms[i]->setLocation(addresses[base + i]); | ||
if (syms[i]->file->hybridFile) | ||
syms[i]->file->hybridFile->impSym->setLocation(addresses[base + i]); | ||
} | ||
|
||
// Create the import table header. | ||
dllNames.push_back(make<StringChunk>(syms[0]->getDLLName())); | ||
auto *dir = make<ImportDirectoryChunk>(dllNames.back()); | ||
dir->lookupTab = lookups[base]; | ||
dir->addressTab = addresses[base]; | ||
dirs.push_back(dir); | ||
|
||
if (ctx.hybridSymtab) { | ||
// If native-only imports exist, they will appear as a prefix to all | ||
// imports. Emit ARM64X relocations to skip them in the EC view. | ||
uint32_t nativeOnly = | ||
llvm::find_if(syms, | ||
[](DefinedImportData *s) { return s->file->isEC(); }) - | ||
syms.begin(); | ||
if (nativeOnly) { | ||
ctx.dynamicRelocs->add( | ||
IMAGE_DVRT_ARM64X_FIXUP_TYPE_DELTA, 0, | ||
Arm64XRelocVal( | ||
dir, offsetof(ImportDirectoryTableEntry, ImportLookupTableRVA)), | ||
nativeOnly * sizeof(uint64_t)); | ||
ctx.dynamicRelocs->add( | ||
IMAGE_DVRT_ARM64X_FIXUP_TYPE_DELTA, 0, | ||
Arm64XRelocVal(dir, offsetof(ImportDirectoryTableEntry, | ||
ImportAddressTableRVA)), | ||
nativeOnly * sizeof(uint64_t)); | ||
} | ||
} | ||
} | ||
// Add null terminator. | ||
dirs.push_back(make<NullChunk>(sizeof(ImportDirectoryTableEntry), 4)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you trigger this assert with quirky user input, or is it a true linker bug if we hit it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we encounter this, it's a linker bug; the delta must always be a multiple of 4. If other deltas are ever required, we will need to use
IMAGE_DVRT_ARM64X_FIXUP_TYPE_VALUE
instead.