-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[SampleFDO] Support enabling sample loader pass in O0 mode #113985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-pgo Author: Lei Wang (wlei-llvm) ChangesAdd support for enabling sample loader pass in O0 mode(under
Full diff: https://github.com/llvm/llvm-project/pull/113985.diff 3 Files Affected:
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 488554c84c1c43..abd4afb0f2c30f 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -302,6 +302,7 @@ static cl::opt<std::string> InstrumentColdFuncOnlyPath(
extern cl::opt<std::string> UseCtxProfile;
extern cl::opt<bool> PGOInstrumentColdFunctionOnly;
+extern cl::opt<bool> DisableSampleLoaderInlining;
namespace llvm {
extern cl::opt<bool> EnableMemProfContextDisambiguation;
@@ -2138,6 +2139,18 @@ ModulePassManager PassBuilder::buildO0DefaultPipeline(OptimizationLevel Level,
if (PGOOpt && PGOOpt->DebugInfoForProfiling)
MPM.addPass(createModuleToFunctionPassAdaptor(AddDiscriminatorsPass()));
+ if (PGOOpt && PGOOpt->Action == PGOOptions::SampleUse) {
+ // Explicitly disable sample loader inlining in O0 pipeline.
+ if (!DisableSampleLoaderInlining.getNumOccurrences())
+ DisableSampleLoaderInlining = true;
+ MPM.addPass(SampleProfileLoaderPass(PGOOpt->ProfileFile,
+ PGOOpt->ProfileRemappingFile,
+ ThinOrFullLTOPhase::None));
+ // Cache ProfileSummaryAnalysis once to avoid the potential need to insert
+ // RequireAnalysisPass for PSI before subsequent non-module passes.
+ MPM.addPass(RequireAnalysisPass<ProfileSummaryAnalysis, Module>());
+ }
+
invokePipelineEarlySimplificationEPCallbacks(MPM, Level);
// Build a minimal pipeline based on the semantics required by LLVM,
diff --git a/llvm/lib/Transforms/IPO/SampleProfile.cpp b/llvm/lib/Transforms/IPO/SampleProfile.cpp
index 5d1ec7248ae679..3a2966320d54d5 100644
--- a/llvm/lib/Transforms/IPO/SampleProfile.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfile.cpp
@@ -192,7 +192,7 @@ static cl::opt<bool> ProfileSizeInline(
// Since profiles are consumed by many passes, turning on this option has
// side effects. For instance, pre-link SCC inliner would see merged profiles
// and inline the hot functions (that are skipped in this pass).
-static cl::opt<bool> DisableSampleLoaderInlining(
+cl::opt<bool> DisableSampleLoaderInlining(
"disable-sample-loader-inlining", cl::Hidden, cl::init(false),
cl::desc("If true, artifically skip inline transformation in sample-loader "
"pass, and merge (or scale) profiles (as configured by "
diff --git a/llvm/test/Other/new-pm-pgo-O0.ll b/llvm/test/Other/new-pm-pgo-O0.ll
index d7a6a03b8e44e3..d4f662fb25ace7 100644
--- a/llvm/test/Other/new-pm-pgo-O0.ll
+++ b/llvm/test/Other/new-pm-pgo-O0.ll
@@ -9,8 +9,9 @@
; RUN: |FileCheck %s --check-prefixes=USE_POST_LINK,USE
; RUN: opt -debug-pass-manager -passes='lto<O0>' -pgo-kind=pgo-instr-use-pipeline -profile-file='%t.profdata' %s 2>&1 \
; RUN: |FileCheck %s --check-prefixes=USE_POST_LINK,USE
+; RUN: opt -debug-pass-manager -passes='default<O0>' -pgo-kind=pgo-sample-use-pipeline -profile-file='%S/Inputs/new-pm-pgo.prof' %s 2>&1 \
+; RUN: |FileCheck %s --check-prefixes=SAMPLE_USE
-;
; GEN: Running pass: PGOInstrumentationGen
; USE_DEFAULT: Running pass: PGOInstrumentationUse
; USE_PRE_LINK: Running pass: PGOInstrumentationUse
@@ -18,6 +19,9 @@
; USE-NOT: Running pass: PGOIndirectCallPromotion
; USE-NOT: Running pass: PGOMemOPSizeOpt
+; SAMPLE_USE: Running pass: AddDiscriminatorsPass
+; SAMPLE_USE: Running pass: SampleProfileLoaderPass
+
define void @foo() {
ret void
}
|
@llvm/pr-subscribers-llvm-transforms Author: Lei Wang (wlei-llvm) ChangesAdd support for enabling sample loader pass in O0 mode(under
Full diff: https://github.com/llvm/llvm-project/pull/113985.diff 3 Files Affected:
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 488554c84c1c43..abd4afb0f2c30f 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -302,6 +302,7 @@ static cl::opt<std::string> InstrumentColdFuncOnlyPath(
extern cl::opt<std::string> UseCtxProfile;
extern cl::opt<bool> PGOInstrumentColdFunctionOnly;
+extern cl::opt<bool> DisableSampleLoaderInlining;
namespace llvm {
extern cl::opt<bool> EnableMemProfContextDisambiguation;
@@ -2138,6 +2139,18 @@ ModulePassManager PassBuilder::buildO0DefaultPipeline(OptimizationLevel Level,
if (PGOOpt && PGOOpt->DebugInfoForProfiling)
MPM.addPass(createModuleToFunctionPassAdaptor(AddDiscriminatorsPass()));
+ if (PGOOpt && PGOOpt->Action == PGOOptions::SampleUse) {
+ // Explicitly disable sample loader inlining in O0 pipeline.
+ if (!DisableSampleLoaderInlining.getNumOccurrences())
+ DisableSampleLoaderInlining = true;
+ MPM.addPass(SampleProfileLoaderPass(PGOOpt->ProfileFile,
+ PGOOpt->ProfileRemappingFile,
+ ThinOrFullLTOPhase::None));
+ // Cache ProfileSummaryAnalysis once to avoid the potential need to insert
+ // RequireAnalysisPass for PSI before subsequent non-module passes.
+ MPM.addPass(RequireAnalysisPass<ProfileSummaryAnalysis, Module>());
+ }
+
invokePipelineEarlySimplificationEPCallbacks(MPM, Level);
// Build a minimal pipeline based on the semantics required by LLVM,
diff --git a/llvm/lib/Transforms/IPO/SampleProfile.cpp b/llvm/lib/Transforms/IPO/SampleProfile.cpp
index 5d1ec7248ae679..3a2966320d54d5 100644
--- a/llvm/lib/Transforms/IPO/SampleProfile.cpp
+++ b/llvm/lib/Transforms/IPO/SampleProfile.cpp
@@ -192,7 +192,7 @@ static cl::opt<bool> ProfileSizeInline(
// Since profiles are consumed by many passes, turning on this option has
// side effects. For instance, pre-link SCC inliner would see merged profiles
// and inline the hot functions (that are skipped in this pass).
-static cl::opt<bool> DisableSampleLoaderInlining(
+cl::opt<bool> DisableSampleLoaderInlining(
"disable-sample-loader-inlining", cl::Hidden, cl::init(false),
cl::desc("If true, artifically skip inline transformation in sample-loader "
"pass, and merge (or scale) profiles (as configured by "
diff --git a/llvm/test/Other/new-pm-pgo-O0.ll b/llvm/test/Other/new-pm-pgo-O0.ll
index d7a6a03b8e44e3..d4f662fb25ace7 100644
--- a/llvm/test/Other/new-pm-pgo-O0.ll
+++ b/llvm/test/Other/new-pm-pgo-O0.ll
@@ -9,8 +9,9 @@
; RUN: |FileCheck %s --check-prefixes=USE_POST_LINK,USE
; RUN: opt -debug-pass-manager -passes='lto<O0>' -pgo-kind=pgo-instr-use-pipeline -profile-file='%t.profdata' %s 2>&1 \
; RUN: |FileCheck %s --check-prefixes=USE_POST_LINK,USE
+; RUN: opt -debug-pass-manager -passes='default<O0>' -pgo-kind=pgo-sample-use-pipeline -profile-file='%S/Inputs/new-pm-pgo.prof' %s 2>&1 \
+; RUN: |FileCheck %s --check-prefixes=SAMPLE_USE
-;
; GEN: Running pass: PGOInstrumentationGen
; USE_DEFAULT: Running pass: PGOInstrumentationUse
; USE_PRE_LINK: Running pass: PGOInstrumentationUse
@@ -18,6 +19,9 @@
; USE-NOT: Running pass: PGOIndirectCallPromotion
; USE-NOT: Running pass: PGOMemOPSizeOpt
+; SAMPLE_USE: Running pass: AddDiscriminatorsPass
+; SAMPLE_USE: Running pass: SampleProfileLoaderPass
+
define void @foo() {
ret void
}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with nitpicks addressed.
if (!DisableSampleLoaderInlining.getNumOccurrences() && | ||
DisableSampleProfileInlining) | ||
DisableSampleLoaderInlining = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this mixes up global variable and class field and may be good to allow setting true and false from command line?
if (!DisableSampleLoaderInlining.getNumOccurrences() && | |
DisableSampleProfileInlining) | |
DisableSampleLoaderInlining = true; | |
if (!::DisableSampleLoaderInlining.getNumOccurrences()) | |
this->DisableSampleLoaderInlining = ::DisableSampleLoaderInlining; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I think I misread the variables names. This actually changes the cl::opt
setting based on the class parameter... I think that is dangerous then. In a ThinLTO or similar setting we could have multiple threads / LLVMContexts using different settings so we shouldn't just change the global variables...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point, I was to avoid setting the global variable in PassBuilderPipelines.cpp
, so just learned this can also happen in the pass class. OK, I will try setting a class field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we flatten the profile first for O0 profile loading?
Is it done later for 'nonlined callee' ? |
OK, that seems not a big change, will update in this patch. |
Yeah, if we disable the inlining, the One my guess is, it could be different for a recursive function case. The former depends on the function order, if a function has been processed, then later when it showed in an inlinee profile, even if it's promoted, but we won't run that function again. It's not the case for call |
; FLATTEN: name: "sum" | ||
; FLATTEN-NEXT: {!"function_entry_count", i64 35} | ||
; FLATTEN: !{!"branch_weights", i32 13, i32 23} | ||
; FLATTEN: !{!"branch_weights", i32 12} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, after switch using flattened profile, it triggered a test failure, I took a deep look at this, the reason might explain the difference using flattened profile vs merging not inlined profile on-the-fly(merged profile):
Original profile (two sum are in different context)
...
sum:46
2:10 sub:10
...
sum:11:22
2: sub:2
1: 2
The merged profile:
sum:46
2:10 sub:10
...
2: sub:2
1: 2
For a flattened profile
sum:46
2:12 sub:12
Then for the line 2, it gives different block count 12 vs 10, because the inlinee profile isn't counted for the block count, which I think it should be, so this seems using flattened profile makes more sense(though the difference is not big)
|
||
// Use flattened profile if inlining is disabled. | ||
if (DisableSampleProfileInlining && !Reader->profileIsCS()) | ||
ProfileConverter::flattenProfile(Reader->getProfiles()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to flatten profile if we are loading profile for O0. However, if we just disable sample loader inlining, but still runs O3/LTO, we may not want to flatten profile, because CGSCC inliner may have done some inlining, and we wanted to leverage context profile if possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Good point, sounds good to use flattened profile for only O0 mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks.
Add support for enabling sample loader pass in O0 mode(under `-fsample-profile-use`). This can help verify PGO raw profile count quality or provide a more accurate performance proxy(predictor), as O0 mode has minimal or no compiler optimizations that might otherwise impact profile count accuracy. - Explicitly disable the sample loader inlining to ensure it only emits sampling annotation. - Use flattened profile for O0 mode. - Add the pass after `AddDiscriminatorsPass` pass to work with `-fdebug-info-for-profiling`.
Add support for enabling sample loader pass in O0 mode(under
-fsample-profile-use
). This can help verify PGO raw profile count quality or provide a more accurate performance proxy(predictor), as O0 mode has minimal or no compiler optimizations that might otherwise impact profile count accuracy.AddDiscriminatorsPass
pass to work with-fdebug-info-for-profiling
.