Skip to content

[lld] check cache before real_path in loadDylib #140791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 29, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 20 additions & 6 deletions lld/MachO/DriverUtils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -229,19 +229,31 @@ static DenseMap<CachedHashStringRef, DylibFile *> loadedDylibs;

DylibFile *macho::loadDylib(MemoryBufferRef mbref, DylibFile *umbrella,
bool isBundleLoader, bool explicitlyLinked) {
// Frameworks can be found from different symlink paths, so resolve
// symlinks before looking up in the dylib cache.
SmallString<128> realPath;
std::error_code err = fs::real_path(mbref.getBufferIdentifier(), realPath);
CachedHashStringRef path(!err ? uniqueSaver().save(StringRef(realPath))
: mbref.getBufferIdentifier());
CachedHashStringRef path(mbref.getBufferIdentifier());
DylibFile *&file = loadedDylibs[path];
if (file) {
if (explicitlyLinked)
file->setExplicitlyLinked();
return file;
}

// Frameworks can be found from different symlink paths, so resolve
// symlinks and look up in the dylib cache.
DylibFile *&realfile = file;
SmallString<128> realPath;
std::error_code err = fs::real_path(mbref.getBufferIdentifier(), realPath);
if (!err) {
CachedHashStringRef resolvedPath(uniqueSaver().save(StringRef(realPath)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CachedHashStringRef resolvedPath(uniqueSaver().save(StringRef(realPath)));
CachedHashStringRef resolvedPath(uniqueSaver().save(realPath.str()));

realfile = loadedDylibs[resolvedPath];
if (realfile) {
if (explicitlyLinked)
realfile->setExplicitlyLinked();

file = realfile;
return realfile;
}
}

DylibFile *newFile;
file_magic magic = identify_magic(mbref.getBuffer());
if (magic == file_magic::tapi_file) {
Expand All @@ -253,6 +265,7 @@ DylibFile *macho::loadDylib(MemoryBufferRef mbref, DylibFile *umbrella,
}
file =
make<DylibFile>(**result, umbrella, isBundleLoader, explicitlyLinked);
realfile = file;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the common case we have

DylibFile *&file = loadedDylibs[path];
DylibFile *&realfile = file;

Then what happens on this line? It does a load and a store to the same address? I'm sure it does the right thing it just looks funny to me.

Also, if we set file later in the future, how can we make sure to not forget to also set realfile? Is there a better way of doing this?

Copy link
Contributor Author

@rmaz rmaz May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then what happens on this line? It does a load and a store to the same address?

Thats what I thought would happen, yes.

Also, if we set file later in the future, how can we make sure to not forget to also set realfile? Is there a better way of doing this?

What would you suggest? Ultimately we have 2 cache pointers, that may or may not point to the same thing, and we need to update them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we just use realfile everywhere? In the case when the pointers are the same, everything works as expected. But when they are different, the first cache lookup is basically always a miss and will rely on the second resolved path lookup to find the result. In theory, it shouldn't be a huge hit, but it's still a minor regression from the current implementation. But, it makes things easier to reason about without needing to set both file and realfile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would mean that we would never cache symlinks, only their resolved paths, which is the unhappy path. That would be a significant regression as a large number of load commands are symlinks (eg Foundation.framework/Foundation -> Foundation.framework/Versions/A/Foundation).

I also disagree with the idea of regressing the performance to reduce the amount of code by 2 lines.

The only alternative I can see is to do have realfile default to a nullptr and have the two setters change to:

if (realfile)
  realfile = file;

Can't say I prefer it though.

Copy link
Contributor

@drodriguez drodriguez May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like llvm::make_scope_exit([&]{ realfile = file; }) be OK to keep things organized? Would that work?

Edit: or add a single line after this if {} else {} block. I am scared that it will not be obvious there if something changes in the if {} else {} block, though.


// parseReexports() can recursively call loadDylib(). That's fine since
// we wrote the DylibFile we just loaded to the loadDylib cache via the
Expand All @@ -268,6 +281,7 @@ DylibFile *macho::loadDylib(MemoryBufferRef mbref, DylibFile *umbrella,
magic == file_magic::macho_executable ||
magic == file_magic::macho_bundle);
file = make<DylibFile>(mbref, umbrella, isBundleLoader, explicitlyLinked);
realfile = file;

// parseLoadCommands() can also recursively call loadDylib(). See comment
// in previous block for why this means we must copy `file` here.
Expand Down