[clang][FMV] Direct-call FMV callees from FMV callers #80093

jroelofs · 2024-01-31T03:20:33Z

… when there is a callee with a matching feature set, and no other higher priority callee. This optimization helps the inliner see past the ifunc+resolver to the callee that we know it will always land on.

This is a conservative implementation of: #71714

… callers ... when there is a callee with a matching feature set, and no other higher priority callee. This optimization helps the inliner see past the ifunc+resolver to the callee that we know it will always land on. This is a conservative implementation of: llvm#71714

llvmbot · 2024-01-31T03:21:05Z

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Jon Roelofs (jroelofs)

Changes

… callers when there is a callee with a matching feature set, and no other higher priority callee. This optimization helps the inliner see past the ifunc+resolver to the callee that we know it will always land on.

This is a conservative implementation of: #71714

Full diff: https://github.com/llvm/llvm-project/pull/80093.diff

3 Files Affected:

(modified) clang/lib/CodeGen/CGCall.cpp (+72)
(modified) clang/lib/CodeGen/CodeGenModule.cpp (+1-1)
(added) clang/test/CodeGen/attr-target-mv-direct-call.c (+245)

diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 28c211aa631e4..84a04e3ccddd8 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -4966,6 +4966,11 @@ static unsigned getMaxVectorWidth(const llvm::Type *Ty) {
   return MaxVectorWidth;
 }
 
+// FIXME: put this somewhere nicer to share
+unsigned
+TargetMVPriority(const TargetInfo &TI,
+                 const CodeGenFunction::MultiVersionResolverOption &RO);
+
 RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
                                  const CGCallee &Callee,
                                  ReturnValueSlot ReturnValue,
@@ -5437,6 +5442,73 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
   const CGCallee &ConcreteCallee = Callee.prepareConcreteCallee(*this);
   llvm::Value *CalleePtr = ConcreteCallee.getFunctionPointer();
 
+  // If a multi-versioned caller calls a multi-versioned callee, skip the
+  // resolver when there is a precise match on the feature sets, and no
+  // possibility of a better match at runtime.
+  if (const auto *CallerFD = dyn_cast_or_null<FunctionDecl>(CurGD.getDecl()))
+    if (const auto *CallerTVA = CallerFD->getAttr<TargetVersionAttr>())
+      if (const FunctionDecl *FD = dyn_cast_or_null<FunctionDecl>(TargetDecl))
+        // FIXME: do the same where either the caller or callee are
+        // target_clones.
+        if (FD->isTargetMultiVersion()) {
+          llvm::SmallVector<StringRef, 8> CallerFeats;
+          CallerTVA->getFeatures(CallerFeats);
+          MultiVersionResolverOption CallerMVRO(nullptr, "", CallerFeats);
+
+          bool HasHigherPriorityCallee = false;
+          llvm::Constant *FoundMatchingCallee = nullptr;
+          getContext().forEachMultiversionedFunctionVersion(
+              FD, [this, FD, &CallerMVRO, &HasHigherPriorityCallee,
+                   &FoundMatchingCallee](const FunctionDecl *CurFD) {
+                const auto *CalleeTVA = CurFD->getAttr<TargetVersionAttr>();
+
+                GlobalDecl CurGD{
+                    (CurFD->isDefined() ? CurFD->getDefinition() : CurFD)};
+                StringRef MangledName = CGM.getMangledName(CurFD);
+
+                llvm::SmallVector<StringRef, 8> CalleeFeats;
+                CalleeTVA->getFeatures(CalleeFeats);
+                MultiVersionResolverOption CalleeMVRO(nullptr, "", CalleeFeats);
+
+                const TargetInfo &TI = getTarget();
+
+                // If there is a higher priority callee, we can't do the
+                // optimization at all, as it would be a valid choice at
+                // runtime.
+                if (TargetMVPriority(TI, CalleeMVRO) >
+                    TargetMVPriority(TI, CallerMVRO)) {
+                  HasHigherPriorityCallee = true;
+                  return;
+                }
+
+                // FIXME: we could allow a lower-priority match when the
+                // features are a proper subset. But for now, to keep things
+                // simpler, we only care about a precise match.
+                if (TargetMVPriority(TI, CalleeMVRO) <
+                    TargetMVPriority(TI, CallerMVRO))
+                  return;
+
+                if (llvm::Constant *Func = CGM.GetGlobalValue(MangledName)) {
+                  FoundMatchingCallee = Func;
+                  return;
+                }
+
+                if (CurFD->isDefined()) {
+                  // FIXME: not sure how to get the address
+                } else {
+                  const CGFunctionInfo &FI =
+                      getTypes().arrangeGlobalDeclaration(FD);
+                  llvm::FunctionType *Ty = getTypes().GetFunctionType(FI);
+                  FoundMatchingCallee =
+                      CGM.GetAddrOfFunction(CurGD, Ty, /*ForVTable=*/false,
+                                            /*DontDefer=*/false, ForDefinition);
+                }
+              });
+
+          if (FoundMatchingCallee && !HasHigherPriorityCallee)
+            CalleePtr = FoundMatchingCallee;
+        }
+
   // If we're using inalloca, set up that argument.
   if (ArgMemory.isValid()) {
     llvm::Value *Arg = ArgMemory.getPointer();
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index 6ec54cc01c923..c334e4a3a40f3 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -4092,7 +4092,7 @@ void CodeGenModule::EmitGlobalDefinition(GlobalDecl GD, llvm::GlobalValue *GV) {
 static void ReplaceUsesOfNonProtoTypeWithRealFunction(llvm::GlobalValue *Old,
                                                       llvm::Function *NewFn);
 
-static unsigned
+unsigned
 TargetMVPriority(const TargetInfo &TI,
                  const CodeGenFunction::MultiVersionResolverOption &RO) {
   unsigned Priority = 0;
diff --git a/clang/test/CodeGen/attr-target-mv-direct-call.c b/clang/test/CodeGen/attr-target-mv-direct-call.c
new file mode 100644
index 0000000000000..687fdd1ca3c24
--- /dev/null
+++ b/clang/test/CodeGen/attr-target-mv-direct-call.c
@@ -0,0 +1,245 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --check-attributes --check-globals --include-generated-funcs
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -S -emit-llvm -o - %s | FileCheck %s
+
+// Check that we make a direct call from direct_caller._Msimd to
+// direct_callee._Msimd when there is no better option.
+__attribute__((target_version("simd"))) void direct_callee(void) {}
+__attribute__((target_version("default"))) void direct_callee(void) {}
+__attribute__((target_version("simd"))) void direct_caller(void) { direct_callee(); }
+__attribute__((target_version("default"))) void direct_caller(void) { direct_callee(); }
+
+// ... and that we go through the ifunc+resolver when there is a better option
+// that might be chosen at runtime.
+__attribute__((target_version("simd"))) void resolved_callee1(void) {}
+__attribute__((target_version("fcma"))) void resolved_callee1(void) {}
+__attribute__((target_version("default"))) void resolved_callee1(void) {}
+__attribute__((target_version("simd"))) void resolved_caller1(void) { resolved_callee1(); }
+__attribute__((target_version("default"))) void resolved_caller1(void) { resolved_callee1(); }
+
+// FIXME: we could direct call in cases like this:
+__attribute__((target_version("fp"))) void resolved_callee2(void) {}
+__attribute__((target_version("default"))) void resolved_callee2(void) {}
+__attribute__((target_version("simd+fp"))) void resolved_caller2(void) { resolved_callee2(); }
+__attribute__((target_version("default"))) void resolved_caller2(void) { resolved_callee2(); }
+
+void source() {
+    direct_caller();
+    resolved_caller1();
+    resolved_caller2();
+}
+
+//.
+// CHECK: @__aarch64_cpu_features = external dso_local global { i64 }
+// CHECK: @direct_callee.ifunc = weak_odr ifunc void (), ptr @direct_callee.resolver
+// CHECK: @direct_caller.ifunc = weak_odr ifunc void (), ptr @direct_caller.resolver
+// CHECK: @resolved_callee1.ifunc = weak_odr ifunc void (), ptr @resolved_callee1.resolver
+// CHECK: @resolved_caller1.ifunc = weak_odr ifunc void (), ptr @resolved_caller1.resolver
+// CHECK: @resolved_callee2.ifunc = weak_odr ifunc void (), ptr @resolved_callee2.resolver
+// CHECK: @resolved_caller2.ifunc = weak_odr ifunc void (), ptr @resolved_caller2.resolver
+//.
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@direct_callee._Msimd
+// CHECK-SAME: () #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@direct_callee.resolver() comdat {
+// CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 512
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 512
+// CHECK-NEXT:    [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-NEXT:    br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK:       resolver_return:
+// CHECK-NEXT:    ret ptr @direct_callee._Msimd
+// CHECK:       resolver_else:
+// CHECK-NEXT:    ret ptr @direct_callee.default
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@direct_caller._Msimd
+// CHECK-SAME: () #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    call void @direct_callee._Msimd()
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@direct_caller.resolver() comdat {
+// CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 512
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 512
+// CHECK-NEXT:    [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-NEXT:    br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK:       resolver_return:
+// CHECK-NEXT:    ret ptr @direct_caller._Msimd
+// CHECK:       resolver_else:
+// CHECK-NEXT:    ret ptr @direct_caller.default
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@resolved_callee1._Msimd
+// CHECK-SAME: () #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@resolved_callee1.resolver() comdat {
+// CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 2097152
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 2097152
+// CHECK-NEXT:    [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-NEXT:    br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK:       resolver_return:
+// CHECK-NEXT:    ret ptr @resolved_callee1._Mfcma
+// CHECK:       resolver_else:
+// CHECK-NEXT:    [[TMP4:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP5:%.*]] = and i64 [[TMP4]], 512
+// CHECK-NEXT:    [[TMP6:%.*]] = icmp eq i64 [[TMP5]], 512
+// CHECK-NEXT:    [[TMP7:%.*]] = and i1 true, [[TMP6]]
+// CHECK-NEXT:    br i1 [[TMP7]], label [[RESOLVER_RETURN1:%.*]], label [[RESOLVER_ELSE2:%.*]]
+// CHECK:       resolver_return1:
+// CHECK-NEXT:    ret ptr @resolved_callee1._Msimd
+// CHECK:       resolver_else2:
+// CHECK-NEXT:    ret ptr @resolved_callee1.default
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@resolved_caller1._Msimd
+// CHECK-SAME: () #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    call void @resolved_callee1.ifunc()
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@resolved_caller1.resolver() comdat {
+// CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 512
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 512
+// CHECK-NEXT:    [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-NEXT:    br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK:       resolver_return:
+// CHECK-NEXT:    ret ptr @resolved_caller1._Msimd
+// CHECK:       resolver_else:
+// CHECK-NEXT:    ret ptr @resolved_caller1.default
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@resolved_callee2._Mfp
+// CHECK-SAME: () #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@resolved_callee2.resolver() comdat {
+// CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 256
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 256
+// CHECK-NEXT:    [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-NEXT:    br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK:       resolver_return:
+// CHECK-NEXT:    ret ptr @resolved_callee2._Mfp
+// CHECK:       resolver_else:
+// CHECK-NEXT:    ret ptr @resolved_callee2.default
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@resolved_caller2._MfpMsimd
+// CHECK-SAME: () #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    call void @resolved_callee2.ifunc()
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK-LABEL: define {{[^@]+}}@resolved_caller2.resolver() comdat {
+// CHECK-NEXT:  resolver_entry:
+// CHECK-NEXT:    call void @__init_cpu_features_resolver()
+// CHECK-NEXT:    [[TMP0:%.*]] = load i64, ptr @__aarch64_cpu_features, align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = and i64 [[TMP0]], 768
+// CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 768
+// CHECK-NEXT:    [[TMP3:%.*]] = and i1 true, [[TMP2]]
+// CHECK-NEXT:    br i1 [[TMP3]], label [[RESOLVER_RETURN:%.*]], label [[RESOLVER_ELSE:%.*]]
+// CHECK:       resolver_return:
+// CHECK-NEXT:    ret ptr @resolved_caller2._MfpMsimd
+// CHECK:       resolver_else:
+// CHECK-NEXT:    ret ptr @resolved_caller2.default
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@source
+// CHECK-SAME: () #[[ATTR1:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    call void @direct_caller.ifunc()
+// CHECK-NEXT:    call void @resolved_caller1.ifunc()
+// CHECK-NEXT:    call void @resolved_caller2.ifunc()
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@direct_callee.default
+// CHECK-SAME: () #[[ATTR1]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@direct_caller.default
+// CHECK-SAME: () #[[ATTR1]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    call void @direct_callee.ifunc()
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@resolved_callee1._Mfcma
+// CHECK-SAME: () #[[ATTR2:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@resolved_callee1.default
+// CHECK-SAME: () #[[ATTR1]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@resolved_caller1.default
+// CHECK-SAME: () #[[ATTR1]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    call void @resolved_callee1.ifunc()
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@resolved_callee2.default
+// CHECK-SAME: () #[[ATTR1]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret void
+//
+//
+// CHECK: Function Attrs: noinline nounwind optnone
+// CHECK-LABEL: define {{[^@]+}}@resolved_caller2.default
+// CHECK-SAME: () #[[ATTR1]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    call void @resolved_callee2.ifunc()
+// CHECK-NEXT:    ret void
+//
+//.
+// CHECK: attributes #[[ATTR0]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+fp-armv8,+neon" }
+// CHECK: attributes #[[ATTR1]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+// CHECK: attributes #[[ATTR2]] = { noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-features"="+complxnum,+fp-armv8,+neon" }
+//.
+// CHECK: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// CHECK: [[META1:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.

erichkeane

My immediate response is that this sounds like something that OPT should be doing here, not us. We typically do NOT do this sort of thing the CFE, and do not want to do opt-type stuff in the CFE.

Is there good reason that this isn't a part of the inliner?

jroelofs · 2024-01-31T18:16:44Z

My gut feel was that recovering this information from the callee's resolver's body would take heroics if we tried to do it in the backend.

erichkeane · 2024-01-31T18:23:24Z

My gut feel was that recovering this information from the callee's resolver's body would take heroics if we tried to do it in the backend.

Opt can already see the feature strings in the llvm-attributes, and can introspect into it for the resolver. I could PERHAPS see value in an llvm-attribute on the resolver to tell OPT to try to look through that (that is, something that says "this is a generated Function MultiVersion resolver, you can trust these conditions match the functions"), but the rest I don't think needs to be in the CFE.

jroelofs · 2024-01-31T18:36:24Z

Fair. I'll give that a shot.

Doing it in opt has another big advantage I only just realized: it allows LTO to do the transformation.

labrinea · 2024-03-31T13:06:25Z

@erichkeane while I agree that Clang might not be the best place for such an optimization, I have some concerns about implementing it in LLVM:

We cannot distinguish a FMV resolver from any other ifunc resolver.
There is no information at the LLVM IR level about function versions or which resolver they are associated with.
We cannot use target-features to determine version priority since this information is encoded via front-end features in the TargetParser. We can only rely on the resolver's basic block layout under the assumption that predecessor basic blocks correspond to versions of higher priority than successor basic blocks. This is fragile and unreliable:

void discoverResolvedIFuncsInPriorityOrder(GlobalIFunc *IFunc) {
  DenseMap<GlobalIFunc *, SmallVector<Function *>> ResolvedIFuncs;

  std::function<void(Value *)> visitValue = [&](Value *V) {
    if (auto *Func = dyn_cast<Function>(V)) {
      ResolvedIFuncs[IFunc].push_back(Func);
    } else if (auto *Sel = dyn_cast<SelectInst>(V)) {
      visitValue(Sel->getTrueValue());
      visitValue(Sel->getFalseValue());
    } else if (auto *Phi = dyn_cast<PHINode>(V)) {
      for (unsigned I = 0, E = Phi->getNumIncomingValues(); I != E; ++I)
        visitValue(Phi->getIncomingValue(I));
    }
  };

  for (BasicBlock &BB : *IFunc->getResolverFunction())
    if (auto *Ret = dyn_cast_or_null<ReturnInst>(BB.getTerminator()))
      visitValue(Ret->getReturnValue());
  // discard default
  if (!ResolvedIFuncs[IFunc].empty())
    ResolvedIFuncs[IFunc].pop_back();
}

erichkeane · 2024-04-01T16:28:22Z

I'd be OK with Clang providing some level of metadata to clarify which is an FMV, and what our target features are. But this sort of analysis still needs to happen in LLVM.

jroelofs · 2024-04-01T16:36:14Z

I'm not sure llvm needs to know the priorities. I haven't had time to work on this, but my plan was to have something that attempts to step through the resolver instruction by instruction with known bits for the value loaded from __aarch64_cpu_features.features according to the caller's target features. If the return value is known, then we can fold away the resolver for that call site. If we encounter a loop, a call, or some other pattern we don't understand, then bail & leave that call site alone.

erichkeane · 2024-04-01T16:38:41Z

I'm not sure llvm needs to know the priorities. I haven't had time to work on this, but my plan was to have something that attempts to step through the resolver instruction by instruction with known bits for the value loaded from __aarch64_cpu_features.features according to the caller's target features. If the return value is known, then we can fold away the resolver for that call site. If we encounter a loop, a call, or some other pattern we don't understand, then bail & leave that call site alone.

That seems sensible to me. It would be nice to be able to recognize this /get this optimization for 'hand rolled' resolvers as well.

jroelofs requested review from DanielKristofKiss and ilinpv January 31, 2024 03:20

llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Jan 31, 2024

jroelofs changed the title ~~[clang][FMV] Direct-call multi-versioned callees from multi-versioned~~ [clang][FMV] Direct-call FMV callees from FMV callers Jan 31, 2024

ilinpv requested a review from labrinea January 31, 2024 12:05

jroelofs added 3 commits January 31, 2024 08:53

don't do it at -O0

7126438

only care about caller => callee calls in the test

cadeb37

address a fixme: move priority calculation to a member function

4129daf

jroelofs requested review from tahonermann and erichkeane January 31, 2024 17:25

don't use UTC, simplify the test to its core purpose

3e2d7cc

erichkeane reviewed Jan 31, 2024

View reviewed changes

simplify the test even more. nfc

28fe40d

jroelofs closed this Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[clang][FMV] Direct-call FMV callees from FMV callers #80093

[clang][FMV] Direct-call FMV callees from FMV callers #80093

Uh oh!

jroelofs commented Jan 31, 2024 •

edited

Loading

Uh oh!

llvmbot commented Jan 31, 2024 •

edited

Loading

Uh oh!

erichkeane left a comment

Uh oh!

jroelofs commented Jan 31, 2024

Uh oh!

erichkeane commented Jan 31, 2024

Uh oh!

jroelofs commented Jan 31, 2024

Uh oh!

labrinea commented Mar 31, 2024

Uh oh!

erichkeane commented Apr 1, 2024

Uh oh!

jroelofs commented Apr 1, 2024

Uh oh!

erichkeane commented Apr 1, 2024

Uh oh!

Uh oh!

[clang][FMV] Direct-call FMV callees from FMV callers #80093

[clang][FMV] Direct-call FMV callees from FMV callers #80093

Uh oh!

Conversation

jroelofs commented Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erichkeane left a comment

Choose a reason for hiding this comment

Uh oh!

jroelofs commented Jan 31, 2024

Uh oh!

erichkeane commented Jan 31, 2024

Uh oh!

jroelofs commented Jan 31, 2024

Uh oh!

labrinea commented Mar 31, 2024

Uh oh!

erichkeane commented Apr 1, 2024

Uh oh!

jroelofs commented Apr 1, 2024

Uh oh!

erichkeane commented Apr 1, 2024

Uh oh!

Uh oh!

jroelofs commented Jan 31, 2024 •

edited

Loading

llvmbot commented Jan 31, 2024 •

edited

Loading