Skip to content

[LLD][COFF] Implement support for hybrid IAT on ARM64X #124189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 26, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 23 additions & 4 deletions lld/COFF/Chunks.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1172,11 +1172,12 @@ uint64_t Arm64XRelocVal::get() const {

size_t Arm64XDynamicRelocEntry::getSize() const {
switch (type) {
case IMAGE_DVRT_ARM64X_FIXUP_TYPE_ZEROFILL:
return sizeof(uint16_t); // Just a header.
case IMAGE_DVRT_ARM64X_FIXUP_TYPE_VALUE:
return sizeof(uint16_t) + size; // A header and a payload.
case IMAGE_DVRT_ARM64X_FIXUP_TYPE_DELTA:
case IMAGE_DVRT_ARM64X_FIXUP_TYPE_ZEROFILL:
llvm_unreachable("unsupported type");
return 2 * sizeof(uint16_t); // A header and a delta.
}
llvm_unreachable("invalid type");
}
Expand All @@ -1186,6 +1187,9 @@ void Arm64XDynamicRelocEntry::writeTo(uint8_t *buf) const {
*out = (offset.get() & 0xfff) | (type << 12);

switch (type) {
case IMAGE_DVRT_ARM64X_FIXUP_TYPE_ZEROFILL:
*out |= ((bit_width(size) - 1) << 14); // Encode the size.
break;
case IMAGE_DVRT_ARM64X_FIXUP_TYPE_VALUE:
*out |= ((bit_width(size) - 1) << 14); // Encode the size.
switch (size) {
Expand All @@ -1203,8 +1207,23 @@ void Arm64XDynamicRelocEntry::writeTo(uint8_t *buf) const {
}
break;
case IMAGE_DVRT_ARM64X_FIXUP_TYPE_DELTA:
case IMAGE_DVRT_ARM64X_FIXUP_TYPE_ZEROFILL:
llvm_unreachable("unsupported type");
int delta = value.get();
// Negative offsets use a sign bit in the header.
if (delta < 0) {
*out |= 1 << 14;
delta = -delta;
}
// Depending on the value, the delta is encoded with a shift of 2 or 3 bits.
if (delta & 7) {
assert(!(delta & 3));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you trigger this assert with quirky user input, or is it a true linker bug if we hit it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we encounter this, it's a linker bug; the delta must always be a multiple of 4. If other deltas are ever required, we will need to use IMAGE_DVRT_ARM64X_FIXUP_TYPE_VALUE instead.

delta >>= 2;
} else {
*out |= (1 << 15);
delta >>= 3;
}
out[1] = delta;
assert(!(delta & ~0xffff));
break;
}
}

Expand Down
146 changes: 135 additions & 11 deletions lld/COFF/DLL.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -716,52 +716,176 @@ class ExportOrdinalChunk : public NonSectionChunk {
void IdataContents::create(COFFLinkerContext &ctx) {
std::vector<std::vector<DefinedImportData *>> v = binImports(ctx, imports);

// In hybrid images, EC and native code are usually very similar,
// resulting in a highly similar set of imported symbols. Consequently,
// their import tables can be shared, with ARM64X relocations handling any
// differences. Identify matching import files used by EC and native code, and
// merge them into a single hybrid import entry.
if (ctx.hybridSymtab) {
for (std::vector<DefinedImportData *> &syms : v) {
std::vector<DefinedImportData *> hybridSyms;
ImportFile *prev = nullptr;
for (DefinedImportData *sym : syms) {
ImportFile *file = sym->file;
// At this stage, symbols are sorted by base name, ensuring that
// compatible import files, if present, are adjacent. Check if the
// current symbol's file imports the same symbol as the previously added
// one (if any and if it was not already merged). Additionally, verify
// that one of them is native while the other is EC. In rare cases,
// separate matching import entries may exist within the same namespace,
// which cannot be merged.
if (!prev || file->isEC() == prev->isEC() ||
!file->isSameImport(prev)) {
// We can't merge the import file, just add it to hybridSyms
// and set prev to its file so that we can try to match the next
// symbol.
hybridSyms.push_back(sym);
prev = file;
continue;
}

// A matching symbol may appear in syms in any order. The native variant
// exposes a subset of EC symbols and chunks, so always use the EC
// variant as the hybrid import file. If the native file was already
// added, replace it with the EC symbol in hybridSyms. Otherwise, the EC
// variant is already pushed, so we can simply merge it.
if (file->isEC()) {
hybridSyms.pop_back();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I feel unsure about this logic here, or about the assumptions it makes. Can you add more comments about how the logic of this loop works, and the assumptions? The comments currently do explain the highlevel goals of it, but I don't feel confident in the dedup logic.

If there are both native and EC imports, are they ordered so that the native one comes first - is this assumed and guaranteed?

I think what I may have a hard time wrapping my head around, is when the loop is written with the early-continue style (which may be policy and anyway generally is good structure) - the codepath we have here is essentially the special case, for prev && file->isEC() != prev->isEC() && file->isSameImport(prev), i.e. deduplication. With that bit clarified for me, I have an easier time to understand the rest of the loop.

Is the check for isEC() required for that aspect of the deduplication? Having isSameImport() be true while file->isEC() == prev->isEC() seems unexpected. Or is it only included as a quicker way to decide this case without needing to do string comparisons?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I feel unsure about this logic here, or about the assumptions it makes. Can you add more comments about how the logic of this loop works, and the assumptions? The comments currently do explain the highlevel
goals of it, but I don't feel confident in the dedup logic.

I'll try to improve these comments.

If there are both native and EC imports, are they ordered so that the native one comes first - is this assumed and guaranteed?

I don't think binImports guarantees this; the order may vary. That's why I check isEC on the new symbol and swap it with the one stored in hybridSyms if the native symbol appears first.

I think what I may have a hard time wrapping my head around, is when the loop is written with the early-continue style (which may be policy and anyway generally is good structure) - the codepath we have here is essentially the special case, for prev && file->isEC() != prev->isEC() && file->isSameImport(prev), i.e. deduplication. With that bit clarified for me, I have an easier time to understand the rest of the loop.

Yes, I can change the code to avoid using an early continue if you prefer.

Is the check for isEC() required for that aspect of the deduplication? Having isSameImport() be true while file->isEC() == prev->isEC() seems unexpected. Or is it only included as a quicker way to decide this case without needing to do string comparisons?

I think it's possible for isSameImport to return true for two EC or two native symbols in cases like this:

func1
func2 EXPORTAS func1

Since the export name is the same, isSameImport() will return true, and they might appear next to each other. I'll add a test for this scenario.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think binImports guarantees this; the order may vary. That's why I check isEC on the new symbol and swap it with the one stored in hybridSyms if the native symbol appears first.

Ok, now I see - if we're in the dedup codepath, we don't end up pushing the second symbol at all, if it was native.

Perhaps we can make it even clearer with more comments? Like this:

if (file->isEC()) {
  // Native symbol was pushed to hybridSyms before, replace it with the EC one
  pop_back()
  push_back()
} else {
  // EC symbol was pushed to hybridSyms before, keep it as is and don't push this one
}

Yes, I can change the code to avoid using an early continue if you prefer.

I'm not entirely sure if it is needed, if we have more comments for understanding the code flow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the export name is the same, isSameImport() will return true, and they might appear next to each other. I'll add a test for this scenario.

I've pushed a new version that adds a test for this. While experimenting, I noticed that MSVC is buggy when using EXPORTAS in this way. It handles the test I attached here correctly, but I was able to make it import incorrect symbols due to improper merging. It seems that MSVC compares only symbol names rather than import names, which might also explain the previously mentioned issues with NONAME handling. I didn't follow MSVC's approach here and instead kept what I believe to be the correct behavior.

I also added more comments.

hybridSyms.push_back(sym);
}

// Merge import files by storing their hybrid form in the corresponding
// file class.
prev->hybridFile = file;
file->hybridFile = prev;
prev = nullptr; // A hybrid import file cannot be merged again.
}

// Sort symbols by type: native-only files first, followed by merged
// hybrid files, and then EC-only files.
llvm::stable_sort(hybridSyms,
[](DefinedImportData *a, DefinedImportData *b) {
if (a->file->hybridFile)
return !b->file->hybridFile && b->file->isEC();
return !a->file->isEC() && b->file->isEC();
});
syms = std::move(hybridSyms);
}
}

// Create .idata contents for each DLL.
for (std::vector<DefinedImportData *> &syms : v) {
// Create lookup and address tables. If they have external names,
// we need to create hintName chunks to store the names.
// If they don't (if they are import-by-ordinals), we store only
// ordinal values to the table.
size_t base = lookups.size();
Chunk *lookupsTerminator = nullptr, *addressesTerminator = nullptr;
for (DefinedImportData *s : syms) {
uint16_t ord = s->getOrdinal();
HintNameChunk *hintChunk = nullptr;
Chunk *lookupsChunk, *addressesChunk;

if (s->getExternalName().empty()) {
lookups.push_back(make<OrdinalOnlyChunk>(ctx, ord));
addresses.push_back(make<OrdinalOnlyChunk>(ctx, ord));
lookupsChunk = make<OrdinalOnlyChunk>(ctx, ord);
addressesChunk = make<OrdinalOnlyChunk>(ctx, ord);
} else {
auto *c = make<HintNameChunk>(s->getExternalName(), ord);
lookups.push_back(make<LookupChunk>(ctx, c));
addresses.push_back(make<LookupChunk>(ctx, c));
hints.push_back(c);
hintChunk = make<HintNameChunk>(s->getExternalName(), ord);
lookupsChunk = make<LookupChunk>(ctx, hintChunk);
addressesChunk = make<LookupChunk>(ctx, hintChunk);
hints.push_back(hintChunk);
}

if (s->file->impECSym) {
// Detect the first EC-only import in the hybrid IAT. Emit null chunk
// as a terminator for the native view, and add an ARM64X relocation to
// replace it with the correct import for the EC view.
//
// Additionally, for MSVC compatibility, store the lookup and address
// chunks and append them at the end of EC-only imports, where a null
// terminator chunk would typically be placed. Since they appear after
// the native terminator, they will be ignored in the native view.
// In the EC view, they should act as terminators, so emit ZEROFILL
// relocations overriding them.
if (ctx.hybridSymtab && !lookupsTerminator && s->file->isEC() &&
!s->file->hybridFile) {
lookupsTerminator = lookupsChunk;
addressesTerminator = addressesChunk;
lookupsChunk = make<NullChunk>(ctx);
addressesChunk = make<NullChunk>(ctx);

Arm64XRelocVal relocVal = hintChunk;
if (!hintChunk)
relocVal = (1ULL << 63) | ord;
ctx.dynamicRelocs->add(IMAGE_DVRT_ARM64X_FIXUP_TYPE_VALUE,
sizeof(uint64_t), lookupsChunk, relocVal);
ctx.dynamicRelocs->add(IMAGE_DVRT_ARM64X_FIXUP_TYPE_VALUE,
sizeof(uint64_t), addressesChunk, relocVal);
ctx.dynamicRelocs->add(IMAGE_DVRT_ARM64X_FIXUP_TYPE_ZEROFILL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to form an understanding of the expected end state of things after this loop.

At the end, we're going to have the regular IAT (lookups here) look like this:

[native]...[native][hybrid]...[hybrid][null chunk][second EC-only]...[more EC-only][lookupTerminator]

I presume the purpose of the null chunk in the middle of the regular IAT is to terminate it, so in native mode you only see the native-only and hybrid entries? After applying dynamic relocations, the null chunk is turned into the entry it was initially meant to be, so you see all of it - and we adjust the pointer to the start, so that it points at the first non-native entry.

What's the idea and purpose of the lookupTerminator? In the native view, it's not visible, and after relocations in EC mode, it's turned into a null terminator. Why not just do a plain null terminator to begin with?

Secondly, if I follow it correctly, the aux IAT is supposed to look like this:

[null chunks for native-only][aux entries for hybrid imports][aux entries for ec-only imports][null chunk]

I've forgotten the purpose/role of the AUX IAT. Doesn't it use null pointers as terminators - how does this work? Or does the delta for the start of the regular IAT also get applied to this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to form an understanding of the expected end state of things after this loop.

At the end, we're going to have the regular IAT (lookups here) look like this:

[native]...[native][hybrid]...[hybrid][null chunk][second EC-only]...[more EC-only][lookupTerminator]

I presume the purpose of the null chunk in the middle of the regular IAT is to terminate it, so in native mode you only see the native-only and hybrid entries? After applying dynamic relocations, the null chunk is turned into the entry it was initially meant to be, so you see all of it - and we adjust the pointer to the start, so that it points at the first non-native entry.

Yes, it's all correct.

What's the idea and purpose of the lookupTerminator? In the native view, it's not visible, and after relocations in EC mode, it's turned into a null terminator. Why not just do a plain null terminator to begin with?

Secondly, if I follow it correctly, the aux IAT is supposed to look like this:

[null chunks for native-only][aux entries for hybrid imports][aux entries for ec-only imports][null chunk]

I think you're correct, using a null chunk should work here. However, that's not what MSVC does. I'm not sure why; I can only speculate. My best guess is that a null chunk might confuse some applications that manually read the IAT (such as anti-cheat systems or DRM software). Following MSVC's approach ensures that the number of null chunks in .idata$4 and .idata$5 matches the number of different imported DLLs, and that all hint chunks are referenced from the IAT in both the native and EC views, even if they are referenced from an entry that is inactive in the native view.

Since this is just speculation, I'm open to changing it to null chunks if you think strictly following MSVC isn't necessary.

I've forgotten the purpose/role of the AUX IAT.

On ARM64EC, the regular IAT contains pointers to imported functions as exposed by PE exports. These may be x86 export thunks that perform FFS jumps to EC code, or they may simply be x86 functions. The purpose of the auxiliary IAT is to provide __imp symbols in a form that can be directly called from EC code (possibly by pointing to a thunk that performs a runtime check on the regular IAT pointer).

Doesn't it use null pointers as terminators - how does this work? Or does the delta for the start of the regular IAT also get applied to this one?

The delta from the start of the IAT needs to match. The auxiliary IAT is referenced in CHPE metadata as an RVA to the entire table. The loader may need to associate its entries with those in the regular IAT, potentially using the same offset from the table's start as in the regular IAT. To ensure this works, we need to populate the auxiliary IAT with native-only imports, even though they are not used at runtime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're correct, using a null chunk should work here. However, that's not what MSVC does. I'm not sure why; I can only speculate. My best guess is that a null chunk might confuse some applications that manually read the IAT (such as anti-cheat systems or DRM software). Following MSVC's approach ensures that the number of null chunks in .idata$4 and .idata$5 matches the number of different imported DLLs, and that all hint chunks are referenced from the IAT in both the native and EC views, even if they are referenced from an entry that is inactive in the native view.

Since this is just speculation, I'm open to changing it to null chunks if you think strictly following MSVC isn't necessary.

Ok, that's probably a good enough reason. But it's quite confusing when trying to make out how it works from reading the code (ok, figuring it out from MS link.exe probably isn't any more straightforward...), so it would be good to explain this with some comments.

sizeof(uint64_t), lookupsTerminator);
ctx.dynamicRelocs->add(IMAGE_DVRT_ARM64X_FIXUP_TYPE_ZEROFILL,
sizeof(uint64_t), addressesTerminator);
}

lookups.push_back(lookupsChunk);
addresses.push_back(addressesChunk);

if (s->file->isEC()) {
auto chunk = make<AuxImportChunk>(s->file);
auxIat.push_back(chunk);
s->file->impECSym->setLocation(chunk);

chunk = make<AuxImportChunk>(s->file);
auxIatCopy.push_back(chunk);
s->file->auxImpCopySym->setLocation(chunk);
} else if (ctx.hybridSymtab) {
// Fill the auxiliary IAT with null chunks for native-only imports.
auxIat.push_back(make<NullChunk>(ctx));
auxIatCopy.push_back(make<NullChunk>(ctx));
}
}
// Terminate with null values.
lookups.push_back(make<NullChunk>(ctx));
addresses.push_back(make<NullChunk>(ctx));
if (ctx.config.machine == ARM64EC) {
lookups.push_back(lookupsTerminator ? lookupsTerminator
: make<NullChunk>(ctx));
addresses.push_back(addressesTerminator ? addressesTerminator
: make<NullChunk>(ctx));
if (ctx.symtabEC) {
auxIat.push_back(make<NullChunk>(ctx));
auxIatCopy.push_back(make<NullChunk>(ctx));
}

for (int i = 0, e = syms.size(); i < e; ++i)
for (int i = 0, e = syms.size(); i < e; ++i) {
syms[i]->setLocation(addresses[base + i]);
if (syms[i]->file->hybridFile)
syms[i]->file->hybridFile->impSym->setLocation(addresses[base + i]);
}

// Create the import table header.
dllNames.push_back(make<StringChunk>(syms[0]->getDLLName()));
auto *dir = make<ImportDirectoryChunk>(dllNames.back());
dir->lookupTab = lookups[base];
dir->addressTab = addresses[base];
dirs.push_back(dir);

if (ctx.hybridSymtab) {
// If native-only imports exist, they will appear as a prefix to all
// imports. Emit ARM64X relocations to skip them in the EC view.
uint32_t nativeOnly =
llvm::find_if(syms,
[](DefinedImportData *s) { return s->file->isEC(); }) -
syms.begin();
if (nativeOnly) {
ctx.dynamicRelocs->add(
IMAGE_DVRT_ARM64X_FIXUP_TYPE_DELTA, 0,
Arm64XRelocVal(
dir, offsetof(ImportDirectoryTableEntry, ImportLookupTableRVA)),
nativeOnly * sizeof(uint64_t));
ctx.dynamicRelocs->add(
IMAGE_DVRT_ARM64X_FIXUP_TYPE_DELTA, 0,
Arm64XRelocVal(dir, offsetof(ImportDirectoryTableEntry,
ImportAddressTableRVA)),
nativeOnly * sizeof(uint64_t));
}
}
}
// Add null terminator.
dirs.push_back(make<NullChunk>(sizeof(ImportDirectoryTableEntry), 4));
Expand Down
14 changes: 10 additions & 4 deletions lld/COFF/InputFiles.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1129,15 +1129,21 @@ void ObjFile::enqueuePdbFile(StringRef path, ObjFile *fromFile) {
}

ImportFile::ImportFile(COFFLinkerContext &ctx, MemoryBufferRef m)
: InputFile(ctx.symtab, ImportKind, m), live(!ctx.config.doGC) {}
: InputFile(ctx.getSymtab(getMachineType(m)), ImportKind, m),
live(!ctx.config.doGC) {}

MachineTypes ImportFile::getMachineType() const {
MachineTypes ImportFile::getMachineType(MemoryBufferRef m) {
uint16_t machine =
reinterpret_cast<const coff_import_header *>(mb.getBufferStart())
->Machine;
reinterpret_cast<const coff_import_header *>(m.getBufferStart())->Machine;
return MachineTypes(machine);
}

bool ImportFile::isSameImport(const ImportFile *other) const {
if (!externalName.empty())
return other->externalName == externalName;
return hdr->OrdinalHint == other->hdr->OrdinalHint;
}

ImportThunkChunk *ImportFile::makeImportThunk() {
switch (hdr->Machine) {
case AMD64:
Expand Down
6 changes: 5 additions & 1 deletion lld/COFF/InputFiles.h
Original file line number Diff line number Diff line change
Expand Up @@ -351,11 +351,15 @@ class ImportFile : public InputFile {
explicit ImportFile(COFFLinkerContext &ctx, MemoryBufferRef m);

static bool classof(const InputFile *f) { return f->kind() == ImportKind; }
MachineTypes getMachineType() const override;
MachineTypes getMachineType() const override { return getMachineType(mb); }
static MachineTypes getMachineType(MemoryBufferRef m);
bool isSameImport(const ImportFile *other) const;
bool isEC() const { return impECSym != nullptr; }

DefinedImportData *impSym = nullptr;
Defined *thunkSym = nullptr;
ImportThunkChunkARM64EC *impchkThunk = nullptr;
ImportFile *hybridFile = nullptr;
std::string dllName;

private:
Expand Down
Loading
Loading