Skip to content

[SampleFDO] Support enabling sample loader pass in O0 mode #113985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 8, 2024

Conversation

wlei-llvm
Copy link
Contributor

@wlei-llvm wlei-llvm commented Oct 29, 2024

Add support for enabling sample loader pass in O0 mode(under -fsample-profile-use). This can help verify PGO raw profile count quality or provide a more accurate performance proxy(predictor), as O0 mode has minimal or no compiler optimizations that might otherwise impact profile count accuracy.

  • Explicitly disable the sample loader inlining to ensure it only emits sampling annotation.
  • Use flattened profile for O0 mode.
  • Add the pass after AddDiscriminatorsPass pass to work with -fdebug-info-for-profiling.

@llvmbot llvmbot added PGO Profile Guided Optimizations llvm:transforms labels Oct 29, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 29, 2024

@llvm/pr-subscribers-pgo

Author: Lei Wang (wlei-llvm)

Changes

Add support for enabling sample loader pass in O0 mode(under -fsample-profile-use). This can help verify PGO raw profile count quality or provide a more accurate performance proxy(predictor), as O0 mode has minimal or no compiler optimizations that might otherwise impact profile count accuracy.

  • Explicitly disable the sample loader inlining to ensure it only emits sampling annotation.
  • Add the pass after AddDiscriminatorsPass pass to work with -fdebug-info-for-profiling.

Full diff: https://github.com/llvm/llvm-project/pull/113985.diff

3 Files Affected:

  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+13)
  • (modified) llvm/lib/Transforms/IPO/SampleProfile.cpp (+1-1)
  • (modified) llvm/test/Other/new-pm-pgo-O0.ll (+5-1)
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 488554c84c1c43..abd4afb0f2c30f 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -302,6 +302,7 @@ static cl::opt<std::string> InstrumentColdFuncOnlyPath(
 
 extern cl::opt<std::string> UseCtxProfile;
 extern cl::opt<bool> PGOInstrumentColdFunctionOnly;
+extern cl::opt<bool> DisableSampleLoaderInlining;
 
 namespace llvm {
 extern cl::opt<bool> EnableMemProfContextDisambiguation;
@@ -2138,6 +2139,18 @@ ModulePassManager PassBuilder::buildO0DefaultPipeline(OptimizationLevel Level,
   if (PGOOpt && PGOOpt->DebugInfoForProfiling)
     MPM.addPass(createModuleToFunctionPassAdaptor(AddDiscriminatorsPass()));
 
+  if (PGOOpt && PGOOpt->Action == PGOOptions::SampleUse) {
+    // Explicitly disable sample loader inlining in O0 pipeline.
+    if (!DisableSampleLoaderInlining.getNumOccurrences())
+      DisableSampleLoaderInlining = true;
+    MPM.addPass(SampleProfileLoaderPass(PGOOpt->ProfileFile,
+                                        PGOOpt->ProfileRemappingFile,
+                                        ThinOrFullLTOPhase::None));
+    // Cache ProfileSummaryAnalysis once to avoid the potential need to insert
+    // RequireAnalysisPass for PSI before subsequent non-module passes.
+    MPM.addPass(RequireAnalysisPass<ProfileSummaryAnalysis, Module>());
+  }
+
   invokePipelineEarlySimplificationEPCallbacks(MPM, Level);
 
   // Build a minimal pipeline based on the semantics required by LLVM,
diff --git a/llvm/lib/Transforms/IPO/SampleProfile.cpp b/llvm/lib/Transforms/IPO/SampleProfile.cpp
index 5d1ec7248ae679..3a2966320d54d5 100644
--- a/llvm/lib/Transforms/IPO/SampleProfile.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfile.cpp
@@ -192,7 +192,7 @@ static cl::opt<bool> ProfileSizeInline(
 // Since profiles are consumed by many passes, turning on this option has
 // side effects. For instance, pre-link SCC inliner would see merged profiles
 // and inline the hot functions (that are skipped in this pass).
-static cl::opt<bool> DisableSampleLoaderInlining(
+cl::opt<bool> DisableSampleLoaderInlining(
     "disable-sample-loader-inlining", cl::Hidden, cl::init(false),
     cl::desc("If true, artifically skip inline transformation in sample-loader "
              "pass, and merge (or scale) profiles (as configured by "
diff --git a/llvm/test/Other/new-pm-pgo-O0.ll b/llvm/test/Other/new-pm-pgo-O0.ll
index d7a6a03b8e44e3..d4f662fb25ace7 100644
--- a/llvm/test/Other/new-pm-pgo-O0.ll
+++ b/llvm/test/Other/new-pm-pgo-O0.ll
@@ -9,8 +9,9 @@
 ; RUN:     |FileCheck %s --check-prefixes=USE_POST_LINK,USE
 ; RUN: opt -debug-pass-manager -passes='lto<O0>' -pgo-kind=pgo-instr-use-pipeline -profile-file='%t.profdata' %s 2>&1 \
 ; RUN:     |FileCheck %s --check-prefixes=USE_POST_LINK,USE
+; RUN: opt -debug-pass-manager -passes='default<O0>' -pgo-kind=pgo-sample-use-pipeline -profile-file='%S/Inputs/new-pm-pgo.prof' %s 2>&1 \
+; RUN:     |FileCheck %s --check-prefixes=SAMPLE_USE
 
-;
 ; GEN: Running pass: PGOInstrumentationGen
 ; USE_DEFAULT: Running pass: PGOInstrumentationUse
 ; USE_PRE_LINK: Running pass: PGOInstrumentationUse
@@ -18,6 +19,9 @@
 ; USE-NOT: Running pass: PGOIndirectCallPromotion
 ; USE-NOT: Running pass: PGOMemOPSizeOpt
 
+; SAMPLE_USE: Running pass: AddDiscriminatorsPass
+; SAMPLE_USE: Running pass: SampleProfileLoaderPass
+
 define void @foo() {
   ret void
 }

@llvmbot
Copy link
Member

llvmbot commented Oct 29, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Lei Wang (wlei-llvm)

Changes

Add support for enabling sample loader pass in O0 mode(under -fsample-profile-use). This can help verify PGO raw profile count quality or provide a more accurate performance proxy(predictor), as O0 mode has minimal or no compiler optimizations that might otherwise impact profile count accuracy.

  • Explicitly disable the sample loader inlining to ensure it only emits sampling annotation.
  • Add the pass after AddDiscriminatorsPass pass to work with -fdebug-info-for-profiling.

Full diff: https://github.com/llvm/llvm-project/pull/113985.diff

3 Files Affected:

  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+13)
  • (modified) llvm/lib/Transforms/IPO/SampleProfile.cpp (+1-1)
  • (modified) llvm/test/Other/new-pm-pgo-O0.ll (+5-1)
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 488554c84c1c43..abd4afb0f2c30f 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -302,6 +302,7 @@ static cl::opt<std::string> InstrumentColdFuncOnlyPath(
 
 extern cl::opt<std::string> UseCtxProfile;
 extern cl::opt<bool> PGOInstrumentColdFunctionOnly;
+extern cl::opt<bool> DisableSampleLoaderInlining;
 
 namespace llvm {
 extern cl::opt<bool> EnableMemProfContextDisambiguation;
@@ -2138,6 +2139,18 @@ ModulePassManager PassBuilder::buildO0DefaultPipeline(OptimizationLevel Level,
   if (PGOOpt && PGOOpt->DebugInfoForProfiling)
     MPM.addPass(createModuleToFunctionPassAdaptor(AddDiscriminatorsPass()));
 
+  if (PGOOpt && PGOOpt->Action == PGOOptions::SampleUse) {
+    // Explicitly disable sample loader inlining in O0 pipeline.
+    if (!DisableSampleLoaderInlining.getNumOccurrences())
+      DisableSampleLoaderInlining = true;
+    MPM.addPass(SampleProfileLoaderPass(PGOOpt->ProfileFile,
+                                        PGOOpt->ProfileRemappingFile,
+                                        ThinOrFullLTOPhase::None));
+    // Cache ProfileSummaryAnalysis once to avoid the potential need to insert
+    // RequireAnalysisPass for PSI before subsequent non-module passes.
+    MPM.addPass(RequireAnalysisPass<ProfileSummaryAnalysis, Module>());
+  }
+
   invokePipelineEarlySimplificationEPCallbacks(MPM, Level);
 
   // Build a minimal pipeline based on the semantics required by LLVM,
diff --git a/llvm/lib/Transforms/IPO/SampleProfile.cpp b/llvm/lib/Transforms/IPO/SampleProfile.cpp
index 5d1ec7248ae679..3a2966320d54d5 100644
--- a/llvm/lib/Transforms/IPO/SampleProfile.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfile.cpp
@@ -192,7 +192,7 @@ static cl::opt<bool> ProfileSizeInline(
 // Since profiles are consumed by many passes, turning on this option has
 // side effects. For instance, pre-link SCC inliner would see merged profiles
 // and inline the hot functions (that are skipped in this pass).
-static cl::opt<bool> DisableSampleLoaderInlining(
+cl::opt<bool> DisableSampleLoaderInlining(
     "disable-sample-loader-inlining", cl::Hidden, cl::init(false),
     cl::desc("If true, artifically skip inline transformation in sample-loader "
              "pass, and merge (or scale) profiles (as configured by "
diff --git a/llvm/test/Other/new-pm-pgo-O0.ll b/llvm/test/Other/new-pm-pgo-O0.ll
index d7a6a03b8e44e3..d4f662fb25ace7 100644
--- a/llvm/test/Other/new-pm-pgo-O0.ll
+++ b/llvm/test/Other/new-pm-pgo-O0.ll
@@ -9,8 +9,9 @@
 ; RUN:     |FileCheck %s --check-prefixes=USE_POST_LINK,USE
 ; RUN: opt -debug-pass-manager -passes='lto<O0>' -pgo-kind=pgo-instr-use-pipeline -profile-file='%t.profdata' %s 2>&1 \
 ; RUN:     |FileCheck %s --check-prefixes=USE_POST_LINK,USE
+; RUN: opt -debug-pass-manager -passes='default<O0>' -pgo-kind=pgo-sample-use-pipeline -profile-file='%S/Inputs/new-pm-pgo.prof' %s 2>&1 \
+; RUN:     |FileCheck %s --check-prefixes=SAMPLE_USE
 
-;
 ; GEN: Running pass: PGOInstrumentationGen
 ; USE_DEFAULT: Running pass: PGOInstrumentationUse
 ; USE_PRE_LINK: Running pass: PGOInstrumentationUse
@@ -18,6 +19,9 @@
 ; USE-NOT: Running pass: PGOIndirectCallPromotion
 ; USE-NOT: Running pass: PGOMemOPSizeOpt
 
+; SAMPLE_USE: Running pass: AddDiscriminatorsPass
+; SAMPLE_USE: Running pass: SampleProfileLoaderPass
+
 define void @foo() {
   ret void
 }

Copy link
Contributor

@MatzeB MatzeB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nitpicks addressed.

Comment on lines 2330 to 2332
if (!DisableSampleLoaderInlining.getNumOccurrences() &&
DisableSampleProfileInlining)
DisableSampleLoaderInlining = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this mixes up global variable and class field and may be good to allow setting true and false from command line?

Suggested change
if (!DisableSampleLoaderInlining.getNumOccurrences() &&
DisableSampleProfileInlining)
DisableSampleLoaderInlining = true;
if (!::DisableSampleLoaderInlining.getNumOccurrences())
this->DisableSampleLoaderInlining = ::DisableSampleLoaderInlining;

Copy link
Contributor

@MatzeB MatzeB Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I think I misread the variables names. This actually changes the cl::opt setting based on the class parameter... I think that is dangerous then. In a ThinLTO or similar setting we could have multiple threads / LLVMContexts using different settings so we shouldn't just change the global variables...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point, I was to avoid setting the global variable in PassBuilderPipelines.cpp, so just learned this can also happen in the pass class. OK, I will try setting a class field.

Copy link
Contributor

@david-xl david-xl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we flatten the profile first for O0 profile loading?

@david-xl
Copy link
Contributor

david-xl commented Nov 1, 2024

Should we flatten the profile first for O0 profile loading?

Is it done later for 'nonlined callee' ?

@wlei-llvm
Copy link
Contributor Author

Should we flatten the profile first for O0 profile loading?

DisableSampleLoaderInlining does some flattening(merge the un-inline profile to the outline function) things, was thinking to improve it later.

OK, that seems not a big change, will update in this patch.

@wlei-llvm
Copy link
Contributor Author

Should we flatten the profile first for O0 profile loading?

Is it done later for 'nonlined callee' ?

Yeah, if we disable the inlining, the nonlined callee profiles are promoted to the outline profile, would be reused. But I once tried, this appears not equivalent to using a flattened profile directly(call ProfileConverter::flattenProfile)

One my guess is, it could be different for a recursive function case. The former depends on the function order, if a function has been processed, then later when it showed in an inlinee profile, even if it's promoted, but we won't run that function again.

It's not the case for call ProfileConverter::flattenProfile, as it flattens the profile at the beginning.

; FLATTEN: name: "sum"
; FLATTEN-NEXT: {!"function_entry_count", i64 35}
; FLATTEN: !{!"branch_weights", i32 13, i32 23}
; FLATTEN: !{!"branch_weights", i32 12}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, after switch using flattened profile, it triggered a test failure, I took a deep look at this, the reason might explain the difference using flattened profile vs merging not inlined profile on-the-fly(merged profile):
Original profile (two sum are in different context)


...
 sum:46
  2:10 sub:10
...
 sum:11:22
  2: sub:2
   1: 2

The merged profile:

sum:46
 2:10 sub:10
 ...
 2: sub:2
  1: 2

For a flattened profile

sum:46
 2:12 sub:12

Then for the line 2, it gives different block count 12 vs 10, because the inlinee profile isn't counted for the block count, which I think it should be, so this seems using flattened profile makes more sense(though the difference is not big)


// Use flattened profile if inlining is disabled.
if (DisableSampleProfileInlining && !Reader->profileIsCS())
ProfileConverter::flattenProfile(Reader->getProfiles());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to flatten profile if we are loading profile for O0. However, if we just disable sample loader inlining, but still runs O3/LTO, we may not want to flatten profile, because CGSCC inliner may have done some inlining, and we wanted to leverage context profile if possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Good point, sounds good to use flattened profile for only O0 mode.

Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks.

@wlei-llvm wlei-llvm merged commit bc1aa28 into llvm:main Nov 8, 2024
8 checks passed
Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Nov 15, 2024
Add support for enabling sample loader pass in O0 mode(under
`-fsample-profile-use`). This can help verify PGO raw profile count
quality or provide a more accurate performance proxy(predictor), as O0
mode has minimal or no compiler optimizations that might otherwise
impact profile count accuracy.
- Explicitly disable the sample loader inlining to ensure it only emits
sampling annotation.
- Use flattened profile for O0 mode.
- Add the pass after `AddDiscriminatorsPass` pass to work with
`-fdebug-info-for-profiling`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:transforms PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants