Skip to content

[clang][FMV] Direct-call FMV callees from FMV callers #80093

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

jroelofs
Copy link
Contributor

@jroelofs jroelofs commented Jan 31, 2024

… when there is a callee with a matching feature set, and no other higher priority callee. This optimization helps the inliner see past the ifunc+resolver to the callee that we know it will always land on.

This is a conservative implementation of: #71714

… callers

... when there is a callee with a matching feature set, and no other higher
priority callee.  This optimization helps the inliner see past the
ifunc+resolver to the callee that we know it will always land on.

This is a conservative implementation of: llvm#71714
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Jan 31, 2024
@llvmbot
Copy link
Member

llvmbot commented Jan 31, 2024

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Jon Roelofs (jroelofs)

Changes

… callers when there is a callee with a matching feature set, and no other higher priority callee. This optimization helps the inliner see past the ifunc+resolver to the callee that we know it will always land on.

This is a conservative implementation of: #71714


Full diff: https://github.com/llvm/llvm-project/pull/80093.diff

3 Files Affected:

  • (modified) clang/lib/CodeGen/CGCall.cpp (+72)
  • (modified) clang/lib/CodeGen/CodeGenModule.cpp (+1-1)
  • (added) clang/test/CodeGen/attr-target-mv-direct-call.c (+245)
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 28c211aa631e4..84a04e3ccddd8 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -4966,6 +4966,11 @@ static unsigned getMaxVectorWidth(const llvm::Type *Ty) {
   return MaxVectorWidth;
 }
 
+// FIXME: put this somewhere nicer to share
+unsigned
+TargetMVPriority(const TargetInfo &TI,
+                 const CodeGenFunction::MultiVersionResolverOption &RO);
+
 RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
                                  const CGCallee &Callee,
                                  ReturnValueSlot ReturnValue,
@@ -5437,6 +5442,73 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
   const CGCallee &ConcreteCallee = Callee.prepareConcreteCallee(*this);
   llvm::Value *CalleePtr = ConcreteCallee.getFunctionPointer();
 
+  // If a multi-versioned caller calls a multi-versioned callee, skip the
+  // resolver when there is a precise match on the feature sets, and no
+  // possibility of a better match at runtime.
+  if (const auto *CallerFD = dyn_cast_or_null<FunctionDecl>(CurGD.getDecl()))
+    if (const auto *CallerTVA = CallerFD->getAttr<TargetVersionAttr>())
+      if (const FunctionDecl *FD = dyn_cast_or_null<FunctionDecl>(TargetDecl))
+        // FIXME: do the same where either the caller or callee are
+        // target_clones.
+        if (FD->isTargetMultiVersion()) {
+          llvm::SmallVector<StringRef, 8> CallerFeats;
+          CallerTVA->getFeatures(CallerFeats);
+          MultiVersionResolverOption CallerMVRO(nullptr, "", CallerFeats);
+
+          bool HasHigherPriorityCallee = false;
+          llvm::Constant *FoundMatchingCallee = nullptr;
+          getContext().forEachMultiversionedFunctionVersion(
+              FD, [this, FD, &CallerMVRO, &HasHigherPriorityCallee,
+                   &FoundMatchingCallee](const FunctionDecl *CurFD) {
+                const auto *CalleeTVA = CurFD->getAttr<TargetVersionAttr>();
+
+                GlobalDecl CurGD{
+                    (CurFD->isDefined() ? CurFD->getDefinition() : CurFD)};
+                StringRef MangledName = CGM.getMangledName(CurFD);
+
+                llvm::SmallVector<StringRef, 8> CalleeFeats;
+                CalleeTVA->getFeatures(CalleeFeats);
+                MultiVersionResolverOption CalleeMVRO(nullptr, "", CalleeFeats);
+
+                const TargetInfo &TI = getTarget();
+
+                // If there is a higher priority callee, we can't do the
+                // optimization at all, as it would be a valid choice at
+                // runtime.
+                if (TargetMVPriority(TI, CalleeMVRO) >
+                    TargetMVPriority(TI, CallerMVRO)) {
+                  HasHigherPriorityCallee = true;
+                  return;
+                }
+
+                // FIXME: we could allow a lower-priority match when the
+                // features are a proper subset. But for now, to keep things
+                // simpler, we only care about a precise match.
+                if (TargetMVPriority(TI, CalleeMVRO) <
+                    TargetMVPriority(TI, CallerMVRO))
+                  return;
+
+                if (llvm::Constant *Func = CGM.GetGlobalValue(MangledName)) {
+                  FoundMatchingCallee = Func;
+                  return;
+                }
+
+                if (CurFD->isDefined()) {
+                  // FIXME: not sure how to get the address
+                } else {
+                  const CGFunctionInfo &FI =
+                      getTypes().arrangeGlobalDeclaration(FD);
+                  llvm::FunctionType *Ty = getTypes().GetFunctionType(FI);
+                  FoundMatchingCallee =
+                      CGM.GetAddrOfFunction(CurGD, Ty, /*ForVTable=*/false,
+                                            /*DontDefer=*/false, ForDefinition);
+                }
+              });
+
+          if (FoundMatchingCallee && !HasHigherPriorityCallee)
+            CalleePtr = FoundMatchingCallee;
+        }
+
   // If we're using inalloca, set up that argument.
   if (ArgMemory.isValid()) {
     llvm::Value *Arg = ArgMemory.getPointer();
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index 6ec54cc01c923..c334e4a3a40f3 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -4092,7 +4092,7 @@ void CodeGenModule::EmitGlobalDefinition(GlobalDecl GD, llvm::GlobalValue *GV) {
 static void ReplaceUsesOfNonProtoTypeWithRealFunction(llvm::GlobalValue *Old,
                                                       llvm::Function *NewFn);
 
-static unsigned
+unsigned
 TargetMVPriority(const TargetInfo &TI,
                  const CodeGenFunction::MultiVersionResolverOption &RO) {
   unsigned Priority = 0;
diff --git a/clang/test/CodeGen/attr-target-mv-direct-call.c b/clang/test/CodeGen/attr-target-mv-direct-call.c
new file mode 100644
index 0000000000000..687fdd1ca3c24
--- /dev/null
+++ b/clang/test/CodeGen/attr-target-mv-direct-call.c
@@ -0,0 +1,245 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --check-attributes --check-globals --include-generated-funcs
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -S -emit-llvm -o - %s | FileCheck %s
+
+// Check that we make a direct call from direct_caller._Msimd to
+// direct_callee._Msimd when there is no better option.
+__attribute__((target_version("simd"))) void direct_callee(void) {}
+__attribute__((target_version("default"))) void direct_callee(void) {}
+__attribute__((target_version("simd"))) void direct_caller(void) { direct_callee(); }
+__attribute__((target_version("default"))) void direct_caller(void) { direct_callee(); }
+
+// ... and that we go through the ifunc+resolver when there is a better option
+// that might be chosen at runtime.
+__attribute__((target_version("simd"))) void resolved_callee1(void) {}
+__attribute__((target_version("fcma"))) void resolved_callee1(void) {}
+__attribute__((target_version("default"))) void resolved_callee1(void) {}
+__attribute__((target_version("simd"))) void resolved_caller1(void) { resolved_callee1(); }
+__attribute__((target_version("default"))) void resolved_caller1(void) { resolved_callee1(); }
+
+// FIXME: we could direct call in cases like this:
+__attribute__((target_version("fp"))) void resolved_callee2(void) {}
+__attribute__((target_version("default"))) void resolved_callee2(void) {}
+__attribute__((target_version("simd+fp"))) void resolved_caller2(void) { resolved_callee2(); }
+__attribute__((target_version("default"))) void resolved_caller2(void) { resolved_callee2(); }
+
+void source() {
+    direct_caller();
+    resolved_caller1();
+    resolved_caller2();
+}
+
+//.
+// CHECK: @__aarch64_cpu_features = external dso_local global { i64 }
+// CHECK: @direct_callee.ifunc = weak_odr ifunc void (), ptr @direct_callee.resolver
+// CHECK: @direct_caller.ifunc = weak_odr ifunc void (), ptr @direct_caller.resolver
+// CHECK: @resolved_callee1.ifunc = weak_odr ifunc void (), ptr @resolved_callee1.resolver
+// CHECK: @resolved_caller1.ifunc = weak_odr ifunc void (), ptr @resolved_caller1.resolver
+// CHECK: @resolved_callee2.ifunc = weak_odr ifunc void (), ptr @resolved_callee2.resolver
+// CHECK: @resolved_caller2.ifunc = weak_odr ifunc void (), ptr @resolved_caller2.resolver
+//.
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@direct_callee._Msimd
+// CHECK-SAME: () #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@direct_callee.resolver() comdat {
+// CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 512
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 512
+// CHECK-NEXT:    [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-NEXT:    br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK:       resolver_return:
+// CHECK-NEXT:    ret ptr @direct_callee._Msimd
+// CHECK:       resolver_else:
+// CHECK-NEXT:    ret ptr @direct_callee.default
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@direct_caller._Msimd
+// CHECK-SAME: () #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    call void @direct_callee._Msimd()
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@direct_caller.resolver() comdat {
+// CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 512
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 512
+// CHECK-NEXT:    [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-NEXT:    br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK:       resolver_return:
+// CHECK-NEXT:    ret ptr @direct_caller._Msimd
+// CHECK:       resolver_else:
+// CHECK-NEXT:    ret ptr @direct_caller.default
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@resolved_callee1._Msimd
+// CHECK-SAME: () #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@resolved_callee1.resolver() comdat {
+// CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 2097152
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 2097152
+// CHECK-NEXT:    [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-NEXT:    br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK:       resolver_return:
+// CHECK-NEXT:    ret ptr @resolved_callee1._Mfcma
+// CHECK:       resolver_else:
+// CHECK-NEXT:    [[TMP4:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP5:%.*]] = and i64 [[TMP4]], 512
+// CHECK-NEXT:    [[TMP6:%.*]] = icmp eq i64 [[TMP5]], 512
+// CHECK-NEXT:    [[TMP7:%.*]] = and i1 true, [[TMP6]]
+// CHECK-NEXT:    br i1 [[TMP7]], label [[RESOLVER_RETURN1:%.*]], label [[RESOLVER_ELSE2:%.*]]
+// CHECK:       resolver_return1:
+// CHECK-NEXT:    ret ptr @resolved_callee1._Msimd
+// CHECK:       resolver_else2:
+// CHECK-NEXT:    ret ptr @resolved_callee1.default
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@resolved_caller1._Msimd
+// CHECK-SAME: () #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    call void @resolved_callee1.ifunc()
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@resolved_caller1.resolver() comdat {
+// CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 512
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 512
+// CHECK-NEXT:    [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-NEXT:    br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK:       resolver_return:
+// CHECK-NEXT:    ret ptr @resolved_caller1._Msimd
+// CHECK:       resolver_else:
+// CHECK-NEXT:    ret ptr @resolved_caller1.default
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@resolved_callee2._Mfp
+// CHECK-SAME: () #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@resolved_callee2.resolver() comdat {
+// CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 256
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 256
+// CHECK-NEXT:    [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-NEXT:    br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK:       resolver_return:
+// CHECK-NEXT:    ret ptr @resolved_callee2._Mfp
+// CHECK:       resolver_else:
+// CHECK-NEXT:    ret ptr @resolved_callee2.default
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@resolved_caller2._MfpMsimd
+// CHECK-SAME: () #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    call void @resolved_callee2.ifunc()
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@resolved_caller2.resolver() comdat {
+// CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 768
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 768
+// CHECK-NEXT:    [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-NEXT:    br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK:       resolver_return:
+// CHECK-NEXT:    ret ptr @resolved_caller2._MfpMsimd
+// CHECK:       resolver_else:
+// CHECK-NEXT:    ret ptr @resolved_caller2.default
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@source
+// CHECK-SAME: () #[[ATTR1:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    call void @direct_caller.ifunc()
+// CHECK-NEXT:    call void @resolved_caller1.ifunc()
+// CHECK-NEXT:    call void @resolved_caller2.ifunc()
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@direct_callee.default
+// CHECK-SAME: () #[[ATTR1]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@direct_caller.default
+// CHECK-SAME: () #[[ATTR1]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    call void @direct_callee.ifunc()
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@resolved_callee1._Mfcma
+// CHECK-SAME: () #[[ATTR2:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@resolved_callee1.default
+// CHECK-SAME: () #[[ATTR1]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@resolved_caller1.default
+// CHECK-SAME: () #[[ATTR1]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    call void @resolved_callee1.ifunc()
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@resolved_callee2.default
+// CHECK-SAME: () #[[ATTR1]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@resolved_caller2.default
+// CHECK-SAME: () #[[ATTR1]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    call void @resolved_callee2.ifunc()
+// CHECK-NEXT:    ret void
+//
+//.
+// CHECK: attributes #[[ATTR0]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+neon" }
+// CHECK: attributes #[[ATTR1]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+// CHECK: attributes #[[ATTR2]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+complxnum,+fp-armv8,+neon" }
+//.
+// CHECK: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// CHECK: [[META1:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.

@jroelofs jroelofs changed the title [clang][FMV] Direct-call multi-versioned callees from multi-versioned [clang][FMV] Direct-call FMV callees from FMV callers Jan 31, 2024
@ilinpv ilinpv requested a review from labrinea January 31, 2024 12:05
Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My immediate response is that this sounds like something that OPT should be doing here, not us. We typically do NOT do this sort of thing the CFE, and do not want to do opt-type stuff in the CFE.

Is there good reason that this isn't a part of the inliner?

@jroelofs
Copy link
Contributor Author

My gut feel was that recovering this information from the callee's resolver's body would take heroics if we tried to do it in the backend.

@erichkeane
Copy link
Collaborator

My gut feel was that recovering this information from the callee's resolver's body would take heroics if we tried to do it in the backend.

Opt can already see the feature strings in the llvm-attributes, and can introspect into it for the resolver. I could PERHAPS see value in an llvm-attribute on the resolver to tell OPT to try to look through that (that is, something that says "this is a generated Function MultiVersion resolver, you can trust these conditions match the functions"), but the rest I don't think needs to be in the CFE.

@jroelofs
Copy link
Contributor Author

Fair. I'll give that a shot.

Doing it in opt has another big advantage I only just realized: it allows LTO to do the transformation.

@jroelofs jroelofs closed this Jan 31, 2024
@labrinea
Copy link
Collaborator

@erichkeane while I agree that Clang might not be the best place for such an optimization, I have some concerns about implementing it in LLVM:

  • We cannot distinguish a FMV resolver from any other ifunc resolver.
  • There is no information at the LLVM IR level about function versions or which resolver they are associated with.
  • We cannot use target-features to determine version priority since this information is encoded via front-end features in the TargetParser. We can only rely on the resolver's basic block layout under the assumption that predecessor basic blocks correspond to versions of higher priority than successor basic blocks. This is fragile and unreliable:
void discoverResolvedIFuncsInPriorityOrder(GlobalIFunc *IFunc) {
  DenseMap<GlobalIFunc *, SmallVector<Function *>> ResolvedIFuncs;

  std::function<void(Value *)> visitValue = [&](Value *V) {
    if (auto *Func = dyn_cast<Function>(V)) {
      ResolvedIFuncs[IFunc].push_back(Func);
    } else if (auto *Sel = dyn_cast<SelectInst>(V)) {
      visitValue(Sel->getTrueValue());
      visitValue(Sel->getFalseValue());
    } else if (auto *Phi = dyn_cast<PHINode>(V)) {
      for (unsigned I = 0, E = Phi->getNumIncomingValues(); I != E; ++I)
        visitValue(Phi->getIncomingValue(I));
    }
  };

  for (BasicBlock &BB : *IFunc->getResolverFunction())
    if (auto *Ret = dyn_cast_or_null<ReturnInst>(BB.getTerminator()))
      visitValue(Ret->getReturnValue());
  // discard default
  if (!ResolvedIFuncs[IFunc].empty())
    ResolvedIFuncs[IFunc].pop_back();
}

@erichkeane
Copy link
Collaborator

I'd be OK with Clang providing some level of metadata to clarify which is an FMV, and what our target features are. But this sort of analysis still needs to happen in LLVM.

@jroelofs
Copy link
Contributor Author

jroelofs commented Apr 1, 2024

I'm not sure llvm needs to know the priorities. I haven't had time to work on this, but my plan was to have something that attempts to step through the resolver instruction by instruction with known bits for the value loaded from __aarch64_cpu_features.features according to the caller's target features. If the return value is known, then we can fold away the resolver for that call site. If we encounter a loop, a call, or some other pattern we don't understand, then bail & leave that call site alone.

@erichkeane
Copy link
Collaborator

I'm not sure llvm needs to know the priorities. I haven't had time to work on this, but my plan was to have something that attempts to step through the resolver instruction by instruction with known bits for the value loaded from __aarch64_cpu_features.features according to the caller's target features. If the return value is known, then we can fold away the resolver for that call site. If we encounter a loop, a call, or some other pattern we don't understand, then bail & leave that call site alone.

That seems sensible to me. It would be nice to be able to recognize this /get this optimization for 'hand rolled' resolvers as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants