-
Notifications
You must be signed in to change notification settings - Fork 15k
[FuncSpec] Improve accounting of specialization codesize growth #113448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Only accumulate the codesize increase of functions that are actually specialized, rather than for every candidate specialization that we analyse. This fixes a subtle bug where prior analysis of candidate specializations that were deemed unprofitable could prevent subsequent profitable candidates from being recognised.
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-function-specialization Author: Hari Limaye (hazzlim) ChangesOnly accumulate the codesize increase of functions that are actually This fixes a subtle bug where prior analysis of candidate Full diff: https://github.com/llvm/llvm-project/pull/113448.diff 3 Files Affected:
diff --git a/llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h b/llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
index 5920dde9d77dfd..4d0be553aa6ede 100644
--- a/llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
+++ b/llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
@@ -131,13 +131,16 @@ struct Spec {
// Profitability of the specialization.
unsigned Score;
+ // Cost of the specialization, in terms of codesize.
+ unsigned CodeSizeCost;
+
// List of call sites, matching this specialization.
SmallVector<CallBase *> CallSites;
- Spec(Function *F, const SpecSig &S, unsigned Score)
- : F(F), Sig(S), Score(Score) {}
- Spec(Function *F, const SpecSig &&S, unsigned Score)
- : F(F), Sig(S), Score(Score) {}
+ Spec(Function *F, const SpecSig &S, unsigned Score, unsigned CodeSizeCost)
+ : F(F), Sig(S), Score(Score), CodeSizeCost(CodeSizeCost) {}
+ Spec(Function *F, const SpecSig &&S, unsigned Score, unsigned CodeSizeCost)
+ : F(F), Sig(S), Score(Score), CodeSizeCost(CodeSizeCost) {}
};
class InstCostVisitor : public InstVisitor<InstCostVisitor, Constant *> {
diff --git a/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp b/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
index 20249a20a37e41..35865d7213acf4 100644
--- a/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
+++ b/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
@@ -646,6 +646,18 @@ FunctionSpecializer::~FunctionSpecializer() {
cleanUpSSA();
}
+/// Get the unsigned Value of given Cost object. Assumes the Cost is always
+/// non-negative, which is true for both TCK_CodeSize and TCK_Latency, and
+/// always Valid.
+static unsigned getCostValue(const Cost &C) {
+ int64_t Value = *C.getValue();
+
+ assert(Value >= 0 && "CodeSize and Latency cannot be negative");
+ // It is safe to down cast since we know the arguments cannot be negative and
+ // Cost is of type int64_t.
+ return static_cast<unsigned>(Value);
+}
+
/// Attempt to specialize functions in the module to enable constant
/// propagation across function boundaries.
///
@@ -759,6 +771,14 @@ bool FunctionSpecializer::run() {
SmallVector<Function *> Clones;
for (unsigned I = 0; I < NSpecs; ++I) {
Spec &S = AllSpecs[BestSpecs[I]];
+
+ // Check that creating this specialization doesn't exceed the maximum
+ // codesize growth.
+ unsigned FuncSize = getCostValue(FunctionMetrics[S.F].NumInsts);
+ if ((FunctionGrowth[S.F] + S.CodeSizeCost) / FuncSize > MaxCodeSizeGrowth)
+ continue;
+ FunctionGrowth[S.F] += S.CodeSizeCost;
+
S.Clone = createSpecialization(S.F, S.Sig);
// Update the known call sites to call the clone.
@@ -837,18 +857,6 @@ static Function *cloneCandidateFunction(Function *F, unsigned NSpecs) {
return Clone;
}
-/// Get the unsigned Value of given Cost object. Assumes the Cost is always
-/// non-negative, which is true for both TCK_CodeSize and TCK_Latency, and
-/// always Valid.
-static unsigned getCostValue(const Cost &C) {
- int64_t Value = *C.getValue();
-
- assert(Value >= 0 && "CodeSize and Latency cannot be negative");
- // It is safe to down cast since we know the arguments cannot be negative and
- // Cost is of type int64_t.
- return static_cast<unsigned>(Value);
-}
-
bool FunctionSpecializer::findSpecializations(Function *F, unsigned FuncSize,
SmallVectorImpl<Spec> &AllSpecs,
SpecMap &SM) {
@@ -924,16 +932,14 @@ bool FunctionSpecializer::findSpecializations(Function *F, unsigned FuncSize,
}
CodeSize += Visitor.getCodeSizeSavingsFromPendingPHIs();
+ unsigned CodeSizeSavings = getCostValue(CodeSize);
+ unsigned CodeSizeCost = FuncSize - CodeSizeSavings;
+
auto IsProfitable = [&]() -> bool {
// No check required.
if (ForceSpecialization)
return true;
- unsigned CodeSizeSavings = getCostValue(CodeSize);
- // TODO: We should only accumulate codesize increase of specializations
- // that are actually created.
- FunctionGrowth[F] += FuncSize - CodeSizeSavings;
-
LLVM_DEBUG(
dbgs() << "FnSpecialization: Specialization bonus {Inlining = "
<< Score << " (" << (Score * 100 / FuncSize) << "%)}\n");
@@ -964,7 +970,7 @@ bool FunctionSpecializer::findSpecializations(Function *F, unsigned FuncSize,
if (LatencySavings < MinLatencySavings * FuncSize / 100)
return false;
// Maximum codesize growth.
- if (FunctionGrowth[F] / FuncSize > MaxCodeSizeGrowth)
+ if ((FunctionGrowth[F] + CodeSizeCost) / FuncSize > MaxCodeSizeGrowth)
return false;
Score += std::max(CodeSizeSavings, LatencySavings);
@@ -976,7 +982,7 @@ bool FunctionSpecializer::findSpecializations(Function *F, unsigned FuncSize,
continue;
// Create a new specialisation entry.
- auto &Spec = AllSpecs.emplace_back(F, S, Score);
+ auto &Spec = AllSpecs.emplace_back(F, S, Score, CodeSizeCost);
if (CS.getFunction() != F)
Spec.CallSites.push_back(&CS);
const unsigned Index = AllSpecs.size() - 1;
diff --git a/llvm/test/Transforms/FunctionSpecialization/function-specialization-maxgrowth.ll b/llvm/test/Transforms/FunctionSpecialization/function-specialization-maxgrowth.ll
new file mode 100644
index 00000000000000..b4aea30f6b34d3
--- /dev/null
+++ b/llvm/test/Transforms/FunctionSpecialization/function-specialization-maxgrowth.ll
@@ -0,0 +1,97 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --include-generated-funcs --version 5
+; RUN: opt -passes="ipsccp<func-spec>" -funcspec-min-function-size=1 \
+; RUN: -funcspec-for-literal-constant=true \
+; RUN: -funcspec-min-codesize-savings=50 \
+; RUN: -funcspec-min-latency-savings=50 \
+; RUN: -funcspec-max-codesize-growth=1 \
+; RUN: -S < %s | FileCheck %s
+
+; Verify that we are able to specialize a function successfully after analysis
+; of other specializations that are found to not be profitable.
+define void @test_specialize_after_failed_analysis(i32 %n) {
+entry:
+ %notspec0 = call i32 @add0(i32 0, i32 %n)
+ %notspec1 = call i32 @add0(i32 1, i32 %n)
+ %spec = call i32 @add0(i32 1, i32 1)
+ ret void
+}
+
+define i32 @add0(i32 %x, i32 %y) {
+entry:
+ %res = add i32 %x, %y
+ ret i32 %res
+}
+
+; Verify that we do not specialize once maximum codesize growth has been
+; exceeded.
+define void @test_max_codesize_growth_exceeded(i32 %n) {
+entry:
+ %spec0 = call i32 @add1(i32 0, i32 0)
+ %spec1 = call i32 @add1(i32 1, i32 1)
+ %spec2 = call i32 @add1(i32 2, i32 2)
+ %notspec = call i32 @add1(i32 3, i32 3)
+ ret void
+}
+
+define i32 @add1(i32 %x, i32 %y) {
+entry:
+ %res = add i32 %x, %y
+ ret i32 %res
+}
+
+; CHECK-LABEL: define void @test_specialize_after_failed_analysis(
+; CHECK-SAME: i32 [[N:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[NOTSPEC0:%.*]] = call i32 @add0(i32 0, i32 [[N]])
+; CHECK-NEXT: [[NOTSPEC1:%.*]] = call i32 @add0(i32 1, i32 [[N]])
+; CHECK-NEXT: [[SPEC:%.*]] = call i32 @add0.specialized.1(i32 1, i32 1)
+; CHECK-NEXT: ret void
+;
+;
+; CHECK-LABEL: define i32 @add0(
+; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[RES:%.*]] = add i32 [[X]], [[Y]]
+; CHECK-NEXT: ret i32 [[RES]]
+;
+;
+; CHECK-LABEL: define void @test_max_codesize_growth_exceeded(
+; CHECK-SAME: i32 [[N:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[SPEC0:%.*]] = call i32 @add1.specialized.2(i32 0, i32 0)
+; CHECK-NEXT: [[SPEC1:%.*]] = call i32 @add1.specialized.3(i32 1, i32 1)
+; CHECK-NEXT: [[SPEC2:%.*]] = call i32 @add1.specialized.4(i32 2, i32 2)
+; CHECK-NEXT: [[NOTSPEC:%.*]] = call i32 @add1(i32 3, i32 3)
+; CHECK-NEXT: ret void
+;
+;
+; CHECK-LABEL: define i32 @add1(
+; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[RES:%.*]] = add i32 [[X]], [[Y]]
+; CHECK-NEXT: ret i32 [[RES]]
+;
+;
+; CHECK-LABEL: define internal i32 @add0.specialized.1(
+; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: ret i32 poison
+;
+;
+; CHECK-LABEL: define internal i32 @add1.specialized.2(
+; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: ret i32 poison
+;
+;
+; CHECK-LABEL: define internal i32 @add1.specialized.3(
+; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: ret i32 poison
+;
+;
+; CHECK-LABEL: define internal i32 @add1.specialized.4(
+; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: ret i32 poison
+;
|
// codesize growth. | ||
unsigned FuncSize = getCostValue(FunctionMetrics[S.F].NumInsts); | ||
if ((FunctionGrowth[S.F] + S.CodeSizeCost) / FuncSize > MaxCodeSizeGrowth) | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was wrong before suggesting you to move this here. At this point we have committed on creating NSpecs
specializations, but now we are skipping some. The filtering needs to happen earlier, inside findSpecializations but after the lambda (actually before returning true). At that point we know it is profitable.
The thing is we know we are creating NSpecs but we don't know which ones. They may correspond to different functions. However the FunctionGrowth is kept per function, not per module. Inside findSpecialziations we are only examining specializations for the same function.
It would make sense to apply the filtering here if we kept codesize growth across functions (per module), which isn't the case. Correct me if I am wrong, but that's my understanding. Apologies for the confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that is definitely the trade-off with moving the logic to here - we may skip some specializations, and because we do so after we have built the priority queue of specializations to perform, we may fail to specialize a "next-best" specialization candidate for a function that would have fit into the FunctionGrowth for a its function because it's place has been taken in the NSpecs best candidates.
However, if we move it to where you suggest we are making a slightly different tradeoff:
#113448 (comment)
Here we successfully got past the FuncGrowth check. We know for sure the specialization will be created. We can safely do FunctionGrowth[F] += SpecSize; at this point, before returning true.
If I understand correctly, we do not know for sure the specialization will be created, because we do not know that it will have a high enough score to end up in the NSpecs we decide to specialize for the module. If for a given function F we analyze N specialization candidates that are all deemed profitable and these N candidates use up the maximum FunctionGrowth for F, then we would deem the N+1-th candidate unprofitable based on FunctionGrowth even if it had a higher score (Inlining/CodeSize/Latency) than the candidates analyzed beforehand.
Maybe you still think this is the better tradeoff?
I suppose to some extent it might make more sense to instead have a codesize growth across functions (per module) budget, and to build the priority queue of BestSpecs based on that rather than a budget based on number of specializations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is worth mentioning the background story. We initially came up with the FuncGrowth threshold to limit the overall codesize growth from being linear to the number of times the specializer runs. At first funcspec-max-iters
was set to 1, and we increased it to 10 for enabling specialization of recursive functions. Imagine runnning 10 times yielded 10 times the original function size.
We have MaxClones to limit the amount of specializations per iteration. Within a single iteration of run, the growth has no effect basically unless you have many candidates (not the case for functions relying on promoteConstantStackValues). It is used to prevent the next iteration candidates. How can there be a N+1-th candidate if at the N-th step we decided to stop? (I am thinking of recursive functions).
Maybe we can keep FunctionGrowth[S.F] += S.CodeSizeCost;
here but remove the early exit from the line above this comment. It should only remain in the isProfitable lambda in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok that makes a lot of sense, thank you for providing the additional context.
I have updated the code so that we only check the codesize increase inside the isProfitable
lambda, but do the accumulation into FunctionGrowth[S.F]
at the point we are actually creating the specialization. This has the downside of us potentially violating MaxCodeSizeGrowth
in a single iteration, in the case where we have specialization candidates within a single iteration that individually do not cause us to exceed MaxCodeSizeGrowth
(and so pass the check in the isProfitable
lambda), but in combination do exceed when they are all performed. This is why I have removed the second test case from the regression test, as it no longer makes sense.
However I think that this is an acceptable trade-off, as it still preserves the intention of the MaxCodeSizeGrowth threshold by preventing linear codesize increase across multiple iterations, as even if the threshold is exceeded in an iteration we would not consider further specializations in successive iterations once the growth has been accounted for, and we have MaxClones to limit the amount of specializations per iteration as you say.
We could fix the potential violation of MaxCodeSizeGrowth
within a single iteration by moving the accumulation into FunctionGrowth[S.F]
to the end of the isProfitable
lambda as you suggested. However I don't really like this, because we may end up discounting the most profitable specialization candidate for a function. To clarify my previous comment - by Nth and N+1-th candidates I meant candidates within a single iteration. If we accumulate the codesize growth whilst we are analyzing the candidates, we may 'use-up' MaxCodeSizeGrowth before encountering the best specialization and therefore fail to consider it:
// (Within a single iteration, MaxCodeSizeGrowth=100%)
spec1 [Score 500, Growth 50%]
spec2 [Score 500, Growth 50%]
// After examining the above, FunctionGrowth[S.F] = 100% so no more candidates will be considered profitable!
// Candidate with highest Score:
spec3 [Score 1000, Growth 10%]
// codesize growth. | ||
unsigned FuncSize = getCostValue(FunctionMetrics[S.F].NumInsts); | ||
if ((FunctionGrowth[S.F] + S.CodeSizeCost) / FuncSize > MaxCodeSizeGrowth) | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is worth mentioning the background story. We initially came up with the FuncGrowth threshold to limit the overall codesize growth from being linear to the number of times the specializer runs. At first funcspec-max-iters
was set to 1, and we increased it to 10 for enabling specialization of recursive functions. Imagine runnning 10 times yielded 10 times the original function size.
We have MaxClones to limit the amount of specializations per iteration. Within a single iteration of run, the growth has no effect basically unless you have many candidates (not the case for functions relying on promoteConstantStackValues). It is used to prevent the next iteration candidates. How can there be a N+1-th candidate if at the N-th step we decided to stop? (I am thinking of recursive functions).
Maybe we can keep FunctionGrowth[S.F] += S.CodeSizeCost;
here but remove the early exit from the line above this comment. It should only remain in the isProfitable lambda in my opinion.
- Rename local variable (s/CodeSizeCost/SpecSize) - Rename struct member (s/CodeSizeCost/CodeSize) and update comment - Remove early exit from specialization creation loop - Remove redundant prefix from regression test name - Remove regression test for exceeding FuncGrowth
…#113448) Only accumulate the codesize increase of functions that are actually specialized, rather than for every candidate specialization that we analyse. This fixes a subtle bug where prior analysis of candidate specializations that were deemed unprofitable could prevent subsequent profitable candidates from being recognised.
Only accumulate the codesize increase of functions that are actually
specialized, rather than for every candidate specialization that we
analyse.
This fixes a subtle bug where prior analysis of candidate
specializations that were deemed unprofitable could prevent subsequent
profitable candidates from being recognised.