Skip to content

[TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. #81442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Jun 30, 2024

Conversation

mingmingl-llvm
Copy link
Contributor

@mingmingl-llvm mingmingl-llvm commented Feb 12, 2024

Clang's -fwhole-program-vtables is required for this optimization to take place. If -fwhole-program-vtables is not enabled, this change is no-op.

  • Function-comparison (before):
%vtable = load ptr, ptr %obj
%vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
%func = load ptr, ptr %vfn
%cond = icmp eq ptr %func, @callee
br i1 %cond, label bb1, label bb2:

bb1:
   call @callee

bb2:
   call %func
  • VTable-comparison (after):
%vtable = load ptr, ptr %obj
%cond = icmp eq ptr %vtable, @vtable-address-point
br i1 %cond, label bb1, label bb2:

bb1:
   call @callee

bb2:
  %vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
  %func = load ptr, ptr %vfn
  call %func

Key changes:

  1. Find out virtual calls and the vtables they come from.
    • The ICP relies on type intrinsic llvm.type.test to find out virtual calls and the
      compatible vtables, and relies on type metadata to find the address point for comparison.
  2. ICP pass does cost-benefit analysis and compares vtable only when the number of vtables for a function candidate is within (option specified) threshold.
  3. Sink the function addressing and vtable load instruction to indirect fallback.
    • The sink helper functions are simplified versions of InstCombinerImpl::tryToSinkInstruction. Currently debug intrinsics are not handled. Ideally InstCombinerImpl::tryToSinkInstructionDbgValues and InstCombinerImpl::tryToSinkInstructionDbgVariableRecords could be moved into Transforms/Utils/Local.cpp (or another util cpp file) to handle debug intrinsics when moving instructions across basic blocks.
  4. Keep value profiles updated
    1. Update vtable value profiles after inline
    2. For either function-based comparison or vtable-based comparison,
      update both vtable and indirect call value profiles.

@mingmingl-llvm mingmingl-llvm changed the base branch from main to users/minglotus-6/spr/vcsv February 12, 2024 06:30
Copy link

github-actions bot commented Feb 12, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Base automatically changed from users/minglotus-6/spr/vcsv to main May 19, 2024 23:33
@mingmingl-llvm mingmingl-llvm changed the base branch from main to users/minglotus-6/spr/vcsv May 19, 2024 23:34
@mingmingl-llvm mingmingl-llvm changed the base branch from users/minglotus-6/spr/vcsv to main May 20, 2024 04:29
@mingmingl-llvm mingmingl-llvm changed the title [TypeProf][IndirectCallPromotion]Implement vtable-based transformation [TypeProf][InstrFDO]Implement more efficient comparison sequence for indirect-call-promotion with vtable profiles. May 28, 2024
indirect-call-promotion with vtable profiles.

Clang's `-fwhole-program-vtables` is required for this optimization to
take place. If `-fwhole-program-vtables` is not enabled, this change is
no-op.

Function-comparison (before):

VTable-comparison (after):

Key changes:
1. Find out virtual calls and the vtables they come from.
   - The ICP relies on type intrinsic `llvm.type.test` and
     `llvm.public.type.test` to find out virtual calls and the
     compatible vtables, and relies on type metadata to find the address
     point (offset) for comparison.
2. ICP pass does cost-benefit analysis and compares vtable only when
   both conditions are met
   1) The function addressing and vtable load can sink to indirect
      fallback, and the indirect fallback is cold block
   2) The number of vtables for a function candidate is within
      (option specified) threshold.
3. Sink the function addressing and vtable load instruction to indirect
   fallback.
   - The sink helper functions are simplified versions of
     `InstCombinerImpl::tryToSinkInstruction`.
   - The helper functions to handle debug intrinsics are copied from
     `InstCombinerImpl::tryToSinkInstructionDbgValues` and
     `InstCombinerImpl::tryToSinkInstructionDbgVariableRecords` into
     Transforms/Utils/Local.cpp. Ideally only one copy should exist
     for inst-combine, icp and other passes.
4. Keep value profiles updated
   1) Update vtable value profiles after inline
   2) For either function-based comparison or vtable-based comparison,
      update both vtable and indirect call value profiles.
@mingmingl-llvm mingmingl-llvm force-pushed the users/minglotus-6/spr/icpass branch from 54adf41 to ff3c219 Compare May 28, 2024 15:52
@mingmingl-llvm mingmingl-llvm marked this pull request as ready for review May 28, 2024 15:53
@llvmbot
Copy link
Member

llvmbot commented May 28, 2024

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-pgo

@llvm/pr-subscribers-llvm-analysis

Author: Mingming Liu (minglotus-6)

Changes

Clang's -fwhole-program-vtables is required for this optimization to take place. If -fwhole-program-vtables is not enabled, this change is no-op.

  • Function-comparison (before):
%vtable = load ptr, ptr %obj
%vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
%func = load ptr, ptr %vfn
%cond = icmp eq ptr %func, @<!-- -->callee
br i1 %cond, label bb1, label bb2:

bb1:
   call @<!-- -->callee

bb2:
   call %func
  • VTable-comparison (after):
%vtable = load ptr, ptr %obj
%cond = icmp eq ptr %func, @<!-- -->vtable-address-point
br i1 %cond, label bb1, label bb2:

bb1:
   call @<!-- -->callee

bb2:
  %vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
  %func = load ptr, ptr %vfn
  call %func

Key changes:

  1. Find out virtual calls and the vtables they come from.
    • The ICP relies on type intrinsic llvm.type.test and llvm.public.type.test to find out virtual calls and the
      compatible vtables, and relies on type metadata to find the address point for comparison.
  2. ICP pass does cost-benefit analysis and compares vtable only when the number of vtables for a function candidate is within (option specified) threshold.
  3. Sink the function addressing and vtable load instruction to indirect fallback.
    • The sink helper functions are simplified versions of
      InstCombinerImpl::tryToSinkInstruction.
    • The helper functions to handle debug intrinsics are copied from
      InstCombinerImpl::tryToSinkInstructionDbgValues and
      InstCombinerImpl::tryToSinkInstructionDbgVariableRecords into
      Transforms/Utils/Local.cpp. Ideally only one copy should exist
      for inst-combine, icp and other passes.
  4. Keep value profiles updated
    1. Update vtable value profiles after inline
    2. For either function-based comparison or vtable-based comparison,
      update both vtable and indirect call value profiles.

Patch is 83.36 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/81442.diff

13 Files Affected:

  • (modified) compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp (+60-44)
  • (modified) llvm/include/llvm/Analysis/IndirectCallPromotionAnalysis.h (+1-1)
  • (modified) llvm/include/llvm/Analysis/IndirectCallVisitor.h (+3)
  • (modified) llvm/include/llvm/Transforms/Utils/Local.h (+9)
  • (modified) llvm/lib/Analysis/IndirectCallPromotionAnalysis.cpp (+3-3)
  • (modified) llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp (+595-32)
  • (modified) llvm/lib/Transforms/Utils/InlineFunction.cpp (+31-5)
  • (modified) llvm/lib/Transforms/Utils/Local.cpp (+184)
  • (modified) llvm/test/Transforms/Inline/update_invoke_prof.ll (+46-28)
  • (modified) llvm/test/Transforms/Inline/update_value_profile.ll (+29-25)
  • (added) llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll (+139)
  • (added) llvm/test/Transforms/PGOProfile/icp_vtable_invoke.ll (+127)
  • (added) llvm/test/Transforms/PGOProfile/icp_vtable_tail_call.ll (+67)
diff --git a/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp b/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp
index e51805bdf923c..73921adcc0c15 100644
--- a/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp
+++ b/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp
@@ -5,59 +5,61 @@
 // ld.lld: error: /lib/../lib64/Scrt1.o: ABI version 1 is not supported
 // UNSUPPORTED: ppc && host-byteorder-big-endian
 
-// RUN: %clangxx_pgogen -fuse-ld=lld -O2 -g -fprofile-generate=. -mllvm -enable-vtable-value-profiling %s -o %t-test
-// RUN: env LLVM_PROFILE_FILE=%t-test.profraw %t-test
+// RUN: rm -rf %t && mkdir %t && cd %t
+
+// RUN: %clangxx_pgogen -fuse-ld=lld -O2 -fprofile-generate=. -mllvm -enable-vtable-value-profiling %s -o test
+// RUN: env LLVM_PROFILE_FILE=test.profraw ./test
 
 // Show vtable profiles from raw profile.
-// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-test.profraw | FileCheck %s --check-prefixes=COMMON,RAW
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables test.profraw | FileCheck %s --check-prefixes=COMMON,RAW
 
 // Generate indexed profile from raw profile and show the data.
-// RUN: llvm-profdata merge %t-test.profraw -o %t-test.profdata
-// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-test.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED
+// RUN: llvm-profdata merge test.profraw -o test.profdata
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables test.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED
 
 // Generate text profile from raw and indexed profiles respectively and show the data.
-// RUN: llvm-profdata merge --text %t-test.profraw -o %t-raw.proftext
-// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text %t-raw.proftext | FileCheck %s --check-prefix=ICTEXT
-// RUN: llvm-profdata merge --text %t-test.profdata -o %t-indexed.proftext
-// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text %t-indexed.proftext | FileCheck %s --check-prefix=ICTEXT
+// RUN: llvm-profdata merge --text test.profraw -o raw.proftext
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text raw.proftext | FileCheck %s --check-prefix=ICTEXT
+// RUN: llvm-profdata merge --text test.profdata -o indexed.proftext
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables --text indexed.proftext | FileCheck %s --check-prefix=ICTEXT
 
 // Generate indexed profile from text profiles and show the data
-// RUN: llvm-profdata merge --binary %t-raw.proftext -o %t-text.profraw
-// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-text.profraw | FileCheck %s --check-prefixes=COMMON,INDEXED
-// RUN: llvm-profdata merge --binary %t-indexed.proftext -o %t-text.profdata
-// RUN: llvm-profdata show --function=main --ic-targets --show-vtables %t-text.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED
+// RUN: llvm-profdata merge --binary raw.proftext -o text.profraw
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables text.profraw | FileCheck %s --check-prefixes=COMMON,INDEXED
+// RUN: llvm-profdata merge --binary indexed.proftext -o text.profdata
+// RUN: llvm-profdata show --function=main --ic-targets --show-vtables text.profdata | FileCheck %s --check-prefixes=COMMON,INDEXED
 
 // COMMON: Counters:
 // COMMON-NEXT:  main:
-// COMMON-NEXT:  Hash: 0x0f9a16fe6d398548
-// COMMON-NEXT:  Counters: 2
+// COMMON-NEXT:  Hash: 0x068617320ec408a0
+// COMMON-NEXT:  Counters: 4
 // COMMON-NEXT:  Indirect Call Site Count: 2
 // COMMON-NEXT:  Number of instrumented vtables: 2
 // RAW:  Indirect Target Results:
-// RAW-NEXT:       [  0, _ZN8Derived15func1Eii,        250 ] (25.00%)
-// RAW-NEXT:       [  0, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii,        750 ] (75.00%)
-// RAW-NEXT:       [  1, _ZN8Derived15func2Eii,        250 ] (25.00%)
-// RAW-NEXT:       [  1, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii,        750 ] (75.00%)
+// RAW-NEXT:       [  0, _ZN8Derived14funcEii,        50 ] (25.00%)
+// RAW-NEXT:       [  0, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived24funcEii,        150 ] (75.00%)
+// RAW-NEXT:       [  1, _ZN8Derived1D0Ev,        250 ] (25.00%)
+// RAW-NEXT:       [  1, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived2D0Ev,        750 ] (75.00%)
 // RAW-NEXT:  VTable Results:
-// RAW-NEXT:       [  0, _ZTV8Derived1,        250 ] (25.00%)
-// RAW-NEXT:       [  0, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E,        750 ] (75.00%)
+// RAW-NEXT:       [  0, _ZTV8Derived1,        50 ] (25.00%)
+// RAW-NEXT:       [  0, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E,        150 ] (75.00%)
 // RAW-NEXT:       [  1, _ZTV8Derived1,        250 ] (25.00%)
 // RAW-NEXT:       [  1, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E,        750 ] (75.00%)
 // INDEXED:     Indirect Target Results:
-// INDEXED-NEXT:         [  0, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii,        750 ] (75.00%)
-// INDEXED-NEXT:         [  0, _ZN8Derived15func1Eii,        250 ] (25.00%)
-// INDEXED-NEXT:         [  1, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii,        750 ] (75.00%)
-// INDEXED-NEXT:         [  1, _ZN8Derived15func2Eii,        250 ] (25.00%)
+// INDEXED-NEXT:         [  0, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived24funcEii,        150 ] (75.00%)
+// INDEXED-NEXT:         [  0, _ZN8Derived14funcEii,        50 ] (25.00%)
+// INDEXED-NEXT:         [  1, {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived2D0Ev,        750 ] (75.00%)
+// INDEXED-NEXT:         [  1, _ZN8Derived1D0Ev,        250 ] (25.00%)
 // INDEXED-NEXT:     VTable Results:
-// INDEXED-NEXT:         [  0, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E,        750 ] (75.00%)
-// INDEXED-NEXT:         [  0, _ZTV8Derived1,        250 ] (25.00%)
+// INDEXED-NEXT:         [  0, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E,        150 ] (75.00%)
+// INDEXED-NEXT:         [  0, _ZTV8Derived1,        50 ] (25.00%)
 // INDEXED-NEXT:         [  1, {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E,        750 ] (75.00%)
 // INDEXED-NEXT:         [  1, _ZTV8Derived1,        250 ] (25.00%)
 // COMMON: Instrumentation level: IR  entry_first = 0
 // COMMON-NEXT: Functions shown: 1
-// COMMON-NEXT: Total functions: 6
+// COMMON-NEXT: Total functions: 7
 // COMMON-NEXT: Maximum function count: 1000
-// COMMON-NEXT: Maximum internal block count: 250
+// COMMON-NEXT: Maximum internal block count: 1000
 // COMMON-NEXT: Statistics for indirect call sites profile:
 // COMMON-NEXT:   Total number of sites: 2
 // COMMON-NEXT:   Total number of sites with values: 2
@@ -76,11 +78,13 @@
 // ICTEXT: :ir
 // ICTEXT: main
 // ICTEXT: # Func Hash:
-// ICTEXT: 1124236338992350536
+// ICTEXT: 470088714870327456
 // ICTEXT: # Num Counters:
-// ICTEXT: 2
+// ICTEXT: 4
 // ICTEXT: # Counter Values:
 // ICTEXT: 1000
+// ICTEXT: 1000
+// ICTEXT: 200
 // ICTEXT: 1
 // ICTEXT: # Num Value Kinds:
 // ICTEXT: 2
@@ -89,41 +93,50 @@
 // ICTEXT: # NumValueSites:
 // ICTEXT: 2
 // ICTEXT: 2
-// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func1Eii:750
-// ICTEXT: _ZN8Derived15func1Eii:250
+// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived24funcEii:150
+// ICTEXT: _ZN8Derived14funcEii:50
 // ICTEXT: 2
-// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived25func2Eii:750
-// ICTEXT: _ZN8Derived15func2Eii:250
+// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZN12_GLOBAL__N_18Derived2D0Ev:750
+// ICTEXT: _ZN8Derived1D0Ev:250
 // ICTEXT: # ValueKind = IPVK_VTableTarget:
 // ICTEXT: 2
 // ICTEXT: # NumValueSites:
 // ICTEXT: 2
 // ICTEXT: 2
-// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750
-// ICTEXT: _ZTV8Derived1:250
+// ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:150
+// ICTEXT: _ZTV8Derived1:50
 // ICTEXT: 2
 // ICTEXT: {{.*}}instrprof-vtable-value-prof.cpp;_ZTVN12_GLOBAL__N_18Derived2E:750
 // ICTEXT: _ZTV8Derived1:250
 
+// Test indirect call promotion transformation using vtable profiles.
+// RUN: %clangxx -fprofile-use=test.profdata -fuse-ld=lld -flto=thin -fwhole-program-vtables -O2 -mllvm -enable-vtable-value-profiling -mllvm -icp-enable-vtable-cmp -Rpass=pgo-icall-prom %s 2>&1 | FileCheck %s --check-prefix=REMARK --implicit-check-not="!VP"
+
+// REMARK: Promote indirect call to _ZN12_GLOBAL__N_18Derived24funcEii with count 150 out of 200, compare 1 vtables and sink 1 instructions
+// REMARK: Promote indirect call to _ZN8Derived14funcEii with count 50 out of 50, compare 1 vtables and sink 1 instructions
+// REMARK: Promote indirect call to _ZN12_GLOBAL__N_18Derived2D0Ev with count 750 out of 1000, compare 1 vtables and sink 2 instructions
+// REMARK: Promote indirect call to _ZN8Derived1D0Ev with count 250 out of 250, compare 1 vtables and sink 2 instructions
+
 #include <cstdio>
 #include <cstdlib>
 class Base {
 public:
-  virtual int func1(int a, int b) = 0;
-  virtual int func2(int a, int b) = 0;
+  virtual int func(int a, int b) = 0;
+
+  virtual ~Base() {};
 };
 class Derived1 : public Base {
 public:
-  int func1(int a, int b) override { return a + b; }
+  int func(int a, int b) override { return a * b; }
 
-  int func2(int a, int b) override { return a * b; }
+  ~Derived1() {}
 };
 namespace {
 class Derived2 : public Base {
 public:
-  int func1(int a, int b) override { return a - b; }
+  int func(int a, int b) override { return a * (a - b); }
 
-  int func2(int a, int b) override { return a * (a - b); }
+  ~Derived2() {}
 };
 } // namespace
 __attribute__((noinline)) Base *createType(int a) {
@@ -140,7 +153,10 @@ int main(int argc, char **argv) {
     int a = rand();
     int b = rand();
     Base *ptr = createType(i);
-    sum += ptr->func1(a, b) + ptr->func2(b, a);
+    if (i % 5 == 0)
+      sum += ptr->func(b, a);
+
+    delete ptr;
   }
   printf("sum is %d\n", sum);
   return 0;
diff --git a/llvm/include/llvm/Analysis/IndirectCallPromotionAnalysis.h b/llvm/include/llvm/Analysis/IndirectCallPromotionAnalysis.h
index 8a05e913a9106..eda672d7d50ee 100644
--- a/llvm/include/llvm/Analysis/IndirectCallPromotionAnalysis.h
+++ b/llvm/include/llvm/Analysis/IndirectCallPromotionAnalysis.h
@@ -57,7 +57,7 @@ class ICallPromotionAnalysis {
   ///
   /// The returned array space is owned by this class, and overwritten on
   /// subsequent calls.
-  ArrayRef<InstrProfValueData>
+  MutableArrayRef<InstrProfValueData>
   getPromotionCandidatesForInstruction(const Instruction *I, uint32_t &NumVals,
                                        uint64_t &TotalCount,
                                        uint32_t &NumCandidates);
diff --git a/llvm/include/llvm/Analysis/IndirectCallVisitor.h b/llvm/include/llvm/Analysis/IndirectCallVisitor.h
index 66c972572b06c..f070e83c41689 100644
--- a/llvm/include/llvm/Analysis/IndirectCallVisitor.h
+++ b/llvm/include/llvm/Analysis/IndirectCallVisitor.h
@@ -37,6 +37,9 @@ struct PGOIndirectCallVisitor : public InstVisitor<PGOIndirectCallVisitor> {
   // A heuristic is used to find the address feeding instructions.
   static Instruction *tryGetVTableInstruction(CallBase *CB) {
     assert(CB != nullptr && "Caller guaranteed");
+    if (!CB->isIndirectCall())
+      return nullptr;
+
     LoadInst *LI = dyn_cast<LoadInst>(CB->getCalledOperand());
 
     if (LI != nullptr) {
diff --git a/llvm/include/llvm/Transforms/Utils/Local.h b/llvm/include/llvm/Transforms/Utils/Local.h
index 6937ec8dfd21c..5535a722a40fe 100644
--- a/llvm/include/llvm/Transforms/Utils/Local.h
+++ b/llvm/include/llvm/Transforms/Utils/Local.h
@@ -316,6 +316,15 @@ void salvageDebugInfoForDbgValues(Instruction &I,
                                   ArrayRef<DbgVariableIntrinsic *> Insns,
                                   ArrayRef<DbgVariableRecord *> DPInsns);
 
+void tryToSinkInstructionDbgValues(
+    Instruction *I, BasicBlock::iterator InsertPos, BasicBlock *SrcBlock,
+    BasicBlock *DestBlock, SmallVectorImpl<DbgVariableIntrinsic *> &DbgUsers);
+
+void tryToSinkInstructionDPValues(
+    Instruction *I, BasicBlock::iterator InsertPos, BasicBlock *SrcBlock,
+    BasicBlock *DestBlock,
+    SmallVectorImpl<DbgVariableRecord *> &DbgVariableRecords);
+
 /// Given an instruction \p I and DIExpression \p DIExpr operating on
 /// it, append the effects of \p I to the DIExpression operand list
 /// \p Ops, or return \p nullptr if it cannot be salvaged.
diff --git a/llvm/lib/Analysis/IndirectCallPromotionAnalysis.cpp b/llvm/lib/Analysis/IndirectCallPromotionAnalysis.cpp
index ab53717eb889a..643c155ba6d7e 100644
--- a/llvm/lib/Analysis/IndirectCallPromotionAnalysis.cpp
+++ b/llvm/lib/Analysis/IndirectCallPromotionAnalysis.cpp
@@ -87,7 +87,7 @@ uint32_t ICallPromotionAnalysis::getProfitablePromotionCandidates(
   return I;
 }
 
-ArrayRef<InstrProfValueData>
+MutableArrayRef<InstrProfValueData>
 ICallPromotionAnalysis::getPromotionCandidatesForInstruction(
     const Instruction *I, uint32_t &NumVals, uint64_t &TotalCount,
     uint32_t &NumCandidates) {
@@ -96,8 +96,8 @@ ICallPromotionAnalysis::getPromotionCandidatesForInstruction(
                                ValueDataArray.get(), NumVals, TotalCount);
   if (!Res) {
     NumCandidates = 0;
-    return ArrayRef<InstrProfValueData>();
+    return MutableArrayRef<InstrProfValueData>();
   }
   NumCandidates = getProfitablePromotionCandidates(I, NumVals, TotalCount);
-  return ArrayRef<InstrProfValueData>(ValueDataArray.get(), NumVals);
+  return MutableArrayRef<InstrProfValueData>(ValueDataArray.get(), NumVals);
 }
diff --git a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
index 23a7c6a20aecb..4de0aaef8d7ca 100644
--- a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
+++ b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
@@ -13,13 +13,17 @@
 //===----------------------------------------------------------------------===//
 
 #include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Analysis/IndirectCallPromotionAnalysis.h"
 #include "llvm/Analysis/IndirectCallVisitor.h"
 #include "llvm/Analysis/OptimizationRemarkEmitter.h"
 #include "llvm/Analysis/ProfileSummaryInfo.h"
+#include "llvm/Analysis/TypeMetadataUtils.h"
+#include "llvm/IR/DebugInfo.h"
 #include "llvm/IR/DiagnosticInfo.h"
+#include "llvm/IR/Dominators.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/InstrTypes.h"
 #include "llvm/IR/Instructions.h"
@@ -37,6 +41,7 @@
 #include "llvm/Transforms/Instrumentation.h"
 #include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
 #include "llvm/Transforms/Utils/CallPromotionUtils.h"
+#include "llvm/Transforms/Utils/Local.h"
 #include <cassert>
 #include <cstdint>
 #include <memory>
@@ -51,6 +56,8 @@ using namespace llvm;
 STATISTIC(NumOfPGOICallPromotion, "Number of indirect call promotions.");
 STATISTIC(NumOfPGOICallsites, "Number of indirect call candidate sites.");
 
+extern cl::opt<unsigned> MaxNumVTableAnnotations;
+
 // Command line option to disable indirect-call promotion with the default as
 // false. This is for debug purpose.
 static cl::opt<bool> DisableICP("disable-icp", cl::init(false), cl::Hidden,
@@ -103,13 +110,202 @@ static cl::opt<bool>
     ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
                  cl::desc("Dump IR after transformation happens"));
 
+// This option is meant to be used by LLVM regression test and test the
+// transformation that compares vtables.
+static cl::opt<bool> ICPEnableVTableCmp(
+    "icp-enable-vtable-cmp", cl::init(false), cl::Hidden,
+    cl::desc("If ThinLTO and WPD is enabled and this option is true, "
+             "indirect-call promotion pass will compare vtables rather than "
+             "functions for speculative devirtualization of virtual calls."
+             " If set to false, indirect-call promotion pass will always "
+             "compare functions."));
+
+static cl::opt<float>
+    ICPVTableCountPercentage("icp-vtable-count-percentage", cl::init(0.99),
+                             cl::Hidden,
+                             cl::desc("Percentage of vtable count to compare"));
+
+static cl::opt<int> ICPNumAdditionalVTableLast(
+    "icp-num-additional-vtable-last", cl::init(0), cl::Hidden,
+    cl::desc("The number of additional instruction for the last candidate"));
+
 namespace {
 
+using VTableAddressPointOffsetValMap =
+    SmallDenseMap<const GlobalVariable *, SmallDenseMap<int, Constant *, 4>, 8>;
+
+// A struct to collect type information for a virtual call site.
+struct VirtualCallSiteInfo {
+  // The offset from the address point to virtual function in the vtable.
+  uint64_t FunctionOffset;
+  // The instruction that computes the address point of vtable.
+  Instruction *VPtr;
+  // The compatible type used in LLVM type intrinsics.
+  StringRef CompatibleTypeStr;
+};
+
+// The key is a virtual call, and value is its type information.
+using VirtualCallSiteTypeInfoMap =
+    SmallDenseMap<const CallBase *, VirtualCallSiteInfo, 8>;
+
+// Find the offset where type string is `CompatibleType`.
+static std::optional<uint64_t>
+getCompatibleTypeOffset(const GlobalVariable &VTableVar,
+                        StringRef CompatibleType) {
+  SmallVector<MDNode *, 2> Types; // type metadata associated with a vtable.
+  VTableVar.getMetadata(LLVMContext::MD_type, Types);
+
+  for (MDNode *Type : Types)
+    if (auto *TypeId = dyn_cast<MDString>(Type->getOperand(1).get());
+        TypeId && TypeId->getString() == CompatibleType)
+
+      return cast<ConstantInt>(
+                 cast<ConstantAsMetadata>(Type->getOperand(0))->getValue())
+          ->getZExtValue();
+
+  return std::nullopt;
+}
+
+// Returns a constant representing the vtable's address point specified by the
+// offset.
+static Constant *getVTableAddressPointOffset(GlobalVariable *VTable,
+                                             uint32_t AddressPointOffset) {
+  Module &M = *VTable->getParent();
+  LLVMContext &Context = M.getContext();
+  assert(AddressPointOffset <
+             M.getDataLayout().getTypeAllocSize(VTable->getValueType()) &&
+         "Out-of-bound access");
+
+  return ConstantExpr::getInBoundsGetElementPtr(
+      Type::getInt8Ty(Context), VTable,
+      llvm::ConstantInt::get(Type::getInt32Ty(Context), AddressPointOffset));
+}
+
+// Returns the basic block in which `Inst` by `Use`.
+static BasicBlock *getUserBasicBlock(Instruction *Inst, unsigned int OperandNo,
+                                     Instruction *UserInst) {
+  if (PHINode *PN = dyn_cast<PHINode>(UserInst))
+    return PN->getIncomingBlock(
+        PHINode::getIncomingValueNumForOperand(OperandNo));
+
+  return UserInst->getParent();
+}
+
+// `DestBB` is a suitable basic block to sink `Inst` into when the following
+// conditions are true:
+// 1) `Inst->getParent()` is the sole predecessor of `DestBB`. This way `DestBB`
+//    is dominated by `Inst->getParent()` and we don't need to sink across a
+//    critical edge.
+// 2) `Inst` have users and all users are in `DestBB`.
+static bool isDestBBSuitableForSink(Instruction *Inst, BasicBlock *DestBB) {
+  BasicBlock *BB = Inst->getParent();
+  assert(Inst->getParent() != DestBB &&
+         BB->getTerminator()->getNumSuccessors() == 2 &&
+         "Caller should guarantee");
+  // Do not sink across a critical edge for simplicity.
+  if (DestBB->getUniquePredecessor() != BB)
+    return false;
+
+  // Now we know BB dominates DestBB.
+  BasicBlock *UserBB = nullptr;
+  for (Use &Use : Inst->uses()) {
+    User *User = Use.getUser();
+    // Do checked cast since IR verifier guarantees that the user of an
+    // instruction must be an instruction. See `Verifier::visitInstruction`.
+    Instruction *UserInst = cast<Instruction>(User);
+    // We can sink debug or pseudo instructions together with Inst.
+    if (UserInst->isDebugOrPseudoInst())
+      continue;
+    UserBB = getUserBasicBlock(Inst, Use.getOperandNo(), UserInst);
+    // Do not...
[truncated]

@david-xl
Copy link
Contributor

Can we rely on instcombine to do the sinking?

(the patch summary has a small error -- the comparison instruction in the example should be %cond = icmp eq ptr %vtable, @vtable-address-point

@mingmingl-llvm
Copy link
Contributor Author

Can we rely on instcombine to do the sinking?

Good question. https://gcc.godbolt.org/z/G7Paj7h37 has two examples, instcombine can sink the first but cannot see through the second. I filed #88960 to discuss this, and I think it's more reliable to interleave the instruction sink with vtable transformations. I do think it's very preferred to re-use the helper functions like tryToSinkInstructionDbgValues and tryToSinkInstructionDbgVariableRecords.

@mingmingl-llvm
Copy link
Contributor Author

(the patch summary has a small error -- the comparison instruction in the example should be %cond = icmp eq ptr %vtable, @vtable-address-point

thanks. I corrected it.

for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) {
APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/);
APFuncCount *= VTableCount;
VTableGUIDCounts[GUID] -= APFuncCount.udiv(SumVTableCount).getZExtValue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the denominator be the total funcCount?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code updates sum count and individual <target, count> pairs within one !prof metadata, so used the sum count from the !prof metadata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


PromotedFuncCount.push_back(Candidate.Count);

TotalFuncCount -= Candidate.Count;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a check to make sure TotalFuncCount does not become negative (the difference and profile precisions in icall and vtable profile may lead to it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added assert(TotalFuncCount >= Candidate.Count && "message") to keep consistent with function comparison in (

pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, SamplePGO,
&ORE);
assert(TotalCount >= Count);
TotalCount -= Count;
NumOfPGOICallPromotion++;
)

Meanwhile used std::min since 'TotalFuncCount' is the saturating add result per

std::unique_ptr<InstrProfValueData[]> VD =
InstrProfR.getValueForSite(ValueKind, SiteIdx);
ArrayRef<InstrProfValueData> VDs(VD.get(), NV);
uint64_t Sum = 0;
for (const InstrProfValueData &V : VDs)
Sum = SaturatingAdd(Sum, V.Count);
annotateValueSite(M, Inst, VDs, Sum, ValueKind, MaxMDCount);

Copy link
Contributor Author

@mingmingl-llvm mingmingl-llvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL, thanks!

for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) {
APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/);
APFuncCount *= VTableCount;
VTableGUIDCounts[GUID] -= APFuncCount.udiv(SumVTableCount).getZExtValue();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code updates sum count and individual <target, count> pairs within one !prof metadata, so used the sum count from the !prof metadata.


PromotedFuncCount.push_back(Candidate.Count);

TotalFuncCount -= Candidate.Count;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added assert(TotalFuncCount >= Candidate.Count && "message") to keep consistent with function comparison in (

pgo::promoteIndirectCall(CB, C.TargetFunction, Count, TotalCount, SamplePGO,
&ORE);
assert(TotalCount >= Count);
TotalCount -= Count;
NumOfPGOICallPromotion++;
)

Meanwhile used std::min since 'TotalFuncCount' is the saturating add result per

std::unique_ptr<InstrProfValueData[]> VD =
InstrProfR.getValueForSite(ValueKind, SiteIdx);
ArrayRef<InstrProfValueData> VDs(VD.get(), NV);
uint64_t Sum = 0;
for (const InstrProfValueData &V : VDs)
Sum = SaturatingAdd(Sum, V.Count);
annotateValueSite(M, Inst, VDs, Sum, ValueKind, MaxMDCount);

RemainingVTableCount -= Candidate.Count;

int MaxNumVTable = 1;
if (I == Candidates.size() - 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@mingmingl-llvm mingmingl-llvm requested a review from david-xl June 12, 2024 19:34
return false;

// Do not sink convergent call instructions.
if (const auto *C = dyn_cast<CallBase>(I))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

for (const auto &[GUID, VTableCount] : C.VTableGUIDAndCounts) {
APInt APFuncCount((unsigned)128, FuncCount, false /*signed*/);
APFuncCount *= VTableCount;
VTableGUIDCounts[GUID] -= APFuncCount.udiv(SumVTableCount).getZExtValue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

1. Resolve review comments
2. Use unordered_map rather than SmallDenseMap for a couple of maps.
   - unordered_map calls allocator for each element, and (Small)DenseMap
     allocate elements in batch. But DenseMap size grows aggressively
     under size 64 [1] so not memory efficient.
3. Use stable_sort when sorting <target, count> pairs by count.
4. Only update VPtr value profiles if 'EnableVTableValueProfile' is true
   and 'VPtr' has profiles.

[1] DenseMap https://github.com/llvm/llvm-project/blob/092dbfaad257885692fa64559e9eb43a5c466798/llvm/include/llvm/ADT/DenseMap.h#L849
    SmallDenseMap https://github.com/llvm/llvm-project/blob/092dbfaad257885692fa64559e9eb43a5c466798/llvm/include/llvm/ADT/DenseMap.h#L1088
@mingmingl-llvm mingmingl-llvm requested a review from david-xl June 13, 2024 06:30
1. remove unused headers
2. use SmallDenseMap for outer map and unorderd_map for inner map (the
   latter is more memory-efficient)
Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with some mostly comment related fixes/suggestions

MDNode *N = MDNode::get(C, MDString::get(C, PGOFuncName));
F.setMetadata(getPGOFuncNameMetadataName(), N);

// Don't created duplicated metadata.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/created/create/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

if (getPGOFuncNameMetadata(F))
static void createPGONameMetadata(GlobalObject &GO, StringRef MetadataName,
StringRef PGOName) {
// For internal linkage objects, its name is not the same as its PGO name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to keep the old comment too about this only being for internal linkage functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

// vtable pointer with multiple vtable address points due to class inheritance.
// Comparing with multiple vtables inserts additional instructions on hot code
// path; and doing so for earlier candidate of one icall can affect later
// function candidate in an undesired way. We allow multiple vtable comparison
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording "earlier candidate of one icall can affect later function candidate in an undesired way" is a little confusing to me. I think what you mean is that doing so for an earlier candidate delays the comparisons for later candidates, but that for the last candidate, only the fallback path is affected? Do we expect to set this parameter above 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you mean is that doing so for an earlier candidate delays the comparisons for later candidates, but that for the last candidate, only the fallback path is affected?

Yes. I updated the comment.

Do we expect to set this parameter above 1?

Yes. Setting it to 1 is to make the default parameter conservative. Based on my tests on -pie or pie binaries , setting it to 2 gives measurable performance win compared with 1, and setting it to 3 doesn't give stable performance wins across different binaries or across runs.

One interesting thing is the actual cost of materializing one vtable address point depends on compile option fpic/fpie, and the cost of materializing a vtable address point and a function is comparable if fpie/fpic option is the same.

  • For non-pie binaries, @vtable + address-point-offset is lowered to an immediate representing vtable address point. It could be folded into icmp IR after lowering, something like icmp #imm, <reg>. For pie (but non-pic) binaries, @vtable + address-point-offset is lowered to a pc-relative address. So it takes one instruction to materialize the pc-relative address itself(something like leaq 2890849(%rip), %rdx # 0x30fe50 <_ZTV8Derived1> for x86).


// Returns the address point offset of the given compatible type.
//
// Type metadata of a vtable specifies the types that can container a pointer to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/container/contain/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

// Returns the address point offset of the given compatible type.
//
// Type metadata of a vtable specifies the types that can container a pointer to
// this vtable, for example, `Base*` can be a pointer to an instantiated type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "instantiated type" should be "derived type"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

}
return Changed;
}

// TODO: Returns false if the function addressing and vtable load instructions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Returns/Return/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

// 'MaxNumVTable' limits the number of vtables to make vtable comparison
// profitable. Comparing multiple vtables for one function candidate will
// insert additional instructions on the hot path, and allowing more than
// one vtable for non last candidates may or may not elongates dependency
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/elongates/elongate/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the dependency chain"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

if (I == CandidateSize - 1)
MaxNumVTable = ICPMaxNumVTableLastCandidate;

if ((int)Candidate.AddressPoints.size() > MaxNumVTable) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be useful to have a debug or missed optimization message for this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done by LLVM_DEBUG.

  • Didn't do missed-opt remark mainly because function comparison might be applied when vtable comparison is not profitable, and readers need to join missed vtable message and applied function one to make sense of the remark messages.

Now opt llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll -passes='pgo-icall-prom' -pass-remarks=pgo-icall-prom -enable-vtable-profile-use -icp-max-num-vtable-last-candidate=1 -debug -S gives the following log

Work on callsite   call void %1(ptr %d), !prof !10 Num_targets: 3
 Candidate 0 Count=600  Target_func: 3827408714133779784
 Candidate 1 Count=500  Target_func: 5837445539218476403
 Candidate 2 Count=400  Target_func: 9381788221313981078
 
Work on callsite #0  call void %1(ptr %d), !prof !10 Num_targets: 3 Num_candidates: 3
 Candidate 0 Count=600  Target_func: 3827408714133779784
 Candidate 1 Count=500  Target_func: 5837445539218476403
 Candidate 2 Count=400  Target_func: 9381788221313981078

Computing vtable infos for callsite #1
  Cannot find vtable definition for 12345678; maybe the vtable isn't imported

Evaluating vtable profitability for callsite #1  call void %1(ptr %d), !prof !10
  Candidate 0 FunctionCount: 600, VTableCounts: {Derived1, 600}
  Candidate 1 FunctionCount: 500, VTableCounts: {Derived2, 500}
  Candidate 2 FunctionCount: 400, VTableCounts: {Base1, 200} {Derived3, 200}
    allow at most 1 and got 2 vtables. Bail out for vtable comparison.

// If the indirect fallback is not cold, don't compare vtables.
if (PSI && PSI->hasProfileSummary() &&
!PSI->isColdCount(RemainingVTableCount))
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

<< ore::NV("DirectCallee", Candidate.TargetFunction)
<< " with count " << ore::NV("Count", Candidate.Count)
<< " out of " << ore::NV("TotalCount", TotalFuncCount)
<< ", compare "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to list what vtable(s) were compared

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, and updated IR tests.

A remark message is something like Promote indirect call to Base1_bar with count 400 out of 500, sink 2 instruction(s) and compare 2 vtable(s): {Base1, Derived3}

1. Make vtable order deterministic in the vtable remark message.
2. s/Returns/Return for function-level comment.
3. Minor fixes around comment and asserts.
@mingmingl-llvm
Copy link
Contributor Author

Going to merge as the failed tests are not related to this PR.

cat github-pull-requests_build_76910_linux-linux-x64.log | grep -B 2 -A 10 "Failed Tests "
--------------------------------------------------------------------------
********************
Failed Tests (1):
  BOLT :: X86/reader-stale-yaml-std.test


Testing Time: 385.93s

Total Discovered Tests: 120952
  Skipped          :     47 (0.04%)
  Unsupported      :   3335 (2.76%)
  Passed           : 117251 (96.94%)
  Expectedly Failed:    318 (0.26%)
--
--------------------------------------------------------------------------
********************
Failed Tests (1):
  BOLT :: X86/reader-stale-yaml-std.test


Testing Time: 1.98s

Total Discovered Tests: 454
  Skipped          :   7 (1.54%)
  Unsupported      :  13 (2.86%)
  Passed           : 431 (94.93%)
  Expectedly Failed:   2 (0.44%)

@mingmingl-llvm mingmingl-llvm merged commit 1518b26 into main Jun 30, 2024
4 of 7 checks passed
@mingmingl-llvm mingmingl-llvm deleted the users/minglotus-6/spr/icpass branch June 30, 2024 06:21
MaskRay added a commit that referenced this pull request Jun 30, 2024
Needed by VTableAddressPointOffsetValMap.
@mingmingl-llvm
Copy link
Contributor Author

lravenclaw pushed a commit to lravenclaw/llvm-project that referenced this pull request Jul 3, 2024
…indirect-call-promotion with vtable profiles. (llvm#81442)

Clang's `-fwhole-program-vtables` is required for this optimization to
take place. If `-fwhole-program-vtables` is not enabled, this change is
no-op.
    
* Function-comparison (before):

```
%vtable = load ptr, ptr %obj
%vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
%func = load ptr, ptr %vfn
%cond = icmp eq ptr %func, @callee
br i1 %cond, label bb1, label bb2:

bb1:
   call @callee

bb2:
   call %func
```

* VTable-comparison (after):

```
%vtable = load ptr, ptr %obj
%cond = icmp eq ptr %vtable, @vtable-address-point
br i1 %cond, label bb1, label bb2:

bb1:
   call @callee

bb2:
  %vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
  %func = load ptr, ptr %vfn
  call %func
```
    
Key changes:
1. Find out virtual calls and the vtables they come from.
- The ICP relies on type intrinsic `llvm.type.test` to find out virtual
calls and the
compatible vtables, and relies on type metadata to find the address
point for comparison.
2. ICP pass does cost-benefit analysis and compares vtable only when the
number of vtables for a function candidate is within (option specified)
threshold.
3. Sink the function addressing and vtable load instruction to indirect
fallback.
- The sink helper functions are simplified versions of
`InstCombinerImpl::tryToSinkInstruction`. Currently debug intrinsics are
not handled. Ideally `InstCombinerImpl::tryToSinkInstructionDbgValues`
and `InstCombinerImpl::tryToSinkInstructionDbgVariableRecords` could be
moved into Transforms/Utils/Local.cpp (or another util cpp file) to
handle debug intrinsics when moving instructions across basic blocks.
4. Keep value profiles updated
     1) Update vtable value profiles after inline
     2) For either function-based comparison or vtable-based comparison,
          update both vtable and indirect call value profiles.
lravenclaw pushed a commit to lravenclaw/llvm-project that referenced this pull request Jul 3, 2024
Needed by VTableAddressPointOffsetValMap.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants