[FuncSpec] Improve accounting of specialization codesize growth #113448

hazzlim · 2024-10-23T12:00:42Z

Only accumulate the codesize increase of functions that are actually
specialized, rather than for every candidate specialization that we
analyse.

This fixes a subtle bug where prior analysis of candidate
specializations that were deemed unprofitable could prevent subsequent
profitable candidates from being recognised.

Only accumulate the codesize increase of functions that are actually specialized, rather than for every candidate specialization that we analyse. This fixes a subtle bug where prior analysis of candidate specializations that were deemed unprofitable could prevent subsequent profitable candidates from being recognised.

llvmbot · 2024-10-23T12:01:24Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-function-specialization

Author: Hari Limaye (hazzlim)

Changes

Only accumulate the codesize increase of functions that are actually
specialized, rather than for every candidate specialization that we
analyse.

This fixes a subtle bug where prior analysis of candidate
specializations that were deemed unprofitable could prevent subsequent
profitable candidates from being recognised.

Full diff: https://github.com/llvm/llvm-project/pull/113448.diff

3 Files Affected:

(modified) llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h (+7-4)
(modified) llvm/lib/Transforms/IPO/FunctionSpecialization.cpp (+25-19)
(added) llvm/test/Transforms/FunctionSpecialization/function-specialization-maxgrowth.ll (+97)

diff --git a/llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h b/llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
index 5920dde9d77dfd..4d0be553aa6ede 100644
--- a/llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
+++ b/llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
@@ -131,13 +131,16 @@ struct Spec {
   // Profitability of the specialization.
   unsigned Score;
 
+  // Cost of the specialization, in terms of codesize.
+  unsigned CodeSizeCost;
+
   // List of call sites, matching this specialization.
   SmallVector<CallBase *> CallSites;
 
-  Spec(Function *F, const SpecSig &S, unsigned Score)
-      : F(F), Sig(S), Score(Score) {}
-  Spec(Function *F, const SpecSig &&S, unsigned Score)
-      : F(F), Sig(S), Score(Score) {}
+  Spec(Function *F, const SpecSig &S, unsigned Score, unsigned CodeSizeCost)
+      : F(F), Sig(S), Score(Score), CodeSizeCost(CodeSizeCost) {}
+  Spec(Function *F, const SpecSig &&S, unsigned Score, unsigned CodeSizeCost)
+      : F(F), Sig(S), Score(Score), CodeSizeCost(CodeSizeCost) {}
 };
 
 class InstCostVisitor : public InstVisitor<InstCostVisitor, Constant *> {
diff --git a/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp b/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
index 20249a20a37e41..35865d7213acf4 100644
--- a/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
+++ b/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
@@ -646,6 +646,18 @@ FunctionSpecializer::~FunctionSpecializer() {
   cleanUpSSA();
 }
 
+/// Get the unsigned Value of given Cost object. Assumes the Cost is always
+/// non-negative, which is true for both TCK_CodeSize and TCK_Latency, and
+/// always Valid.
+static unsigned getCostValue(const Cost &C) {
+  int64_t Value = *C.getValue();
+
+  assert(Value >= 0 && "CodeSize and Latency cannot be negative");
+  // It is safe to down cast since we know the arguments cannot be negative and
+  // Cost is of type int64_t.
+  return static_cast<unsigned>(Value);
+}
+
 /// Attempt to specialize functions in the module to enable constant
 /// propagation across function boundaries.
 ///
@@ -759,6 +771,14 @@ bool FunctionSpecializer::run() {
   SmallVector<Function *> Clones;
   for (unsigned I = 0; I < NSpecs; ++I) {
     Spec &S = AllSpecs[BestSpecs[I]];
+
+    // Check that creating this specialization doesn't exceed the maximum
+    // codesize growth.
+    unsigned FuncSize = getCostValue(FunctionMetrics[S.F].NumInsts);
+    if ((FunctionGrowth[S.F] + S.CodeSizeCost) / FuncSize > MaxCodeSizeGrowth)
+      continue;
+    FunctionGrowth[S.F] += S.CodeSizeCost;
+
     S.Clone = createSpecialization(S.F, S.Sig);
 
     // Update the known call sites to call the clone.
@@ -837,18 +857,6 @@ static Function *cloneCandidateFunction(Function *F, unsigned NSpecs) {
   return Clone;
 }
 
-/// Get the unsigned Value of given Cost object. Assumes the Cost is always
-/// non-negative, which is true for both TCK_CodeSize and TCK_Latency, and
-/// always Valid.
-static unsigned getCostValue(const Cost &C) {
-  int64_t Value = *C.getValue();
-
-  assert(Value >= 0 && "CodeSize and Latency cannot be negative");
-  // It is safe to down cast since we know the arguments cannot be negative and
-  // Cost is of type int64_t.
-  return static_cast<unsigned>(Value);
-}
-
 bool FunctionSpecializer::findSpecializations(Function *F, unsigned FuncSize,
                                               SmallVectorImpl<Spec> &AllSpecs,
                                               SpecMap &SM) {
@@ -924,16 +932,14 @@ bool FunctionSpecializer::findSpecializations(Function *F, unsigned FuncSize,
       }
       CodeSize += Visitor.getCodeSizeSavingsFromPendingPHIs();
 
+      unsigned CodeSizeSavings = getCostValue(CodeSize);
+      unsigned CodeSizeCost = FuncSize - CodeSizeSavings;
+
       auto IsProfitable = [&]() -> bool {
         // No check required.
         if (ForceSpecialization)
           return true;
 
-        unsigned CodeSizeSavings = getCostValue(CodeSize);
-        // TODO: We should only accumulate codesize increase of specializations
-        // that are actually created.
-        FunctionGrowth[F] += FuncSize - CodeSizeSavings;
-
         LLVM_DEBUG(
             dbgs() << "FnSpecialization: Specialization bonus {Inlining = "
                    << Score << " (" << (Score * 100 / FuncSize) << "%)}\n");
@@ -964,7 +970,7 @@ bool FunctionSpecializer::findSpecializations(Function *F, unsigned FuncSize,
         if (LatencySavings < MinLatencySavings * FuncSize / 100)
           return false;
         // Maximum codesize growth.
-        if (FunctionGrowth[F] / FuncSize > MaxCodeSizeGrowth)
+        if ((FunctionGrowth[F] + CodeSizeCost) / FuncSize > MaxCodeSizeGrowth)
           return false;
 
         Score += std::max(CodeSizeSavings, LatencySavings);
@@ -976,7 +982,7 @@ bool FunctionSpecializer::findSpecializations(Function *F, unsigned FuncSize,
         continue;
 
       // Create a new specialisation entry.
-      auto &Spec = AllSpecs.emplace_back(F, S, Score);
+      auto &Spec = AllSpecs.emplace_back(F, S, Score, CodeSizeCost);
       if (CS.getFunction() != F)
         Spec.CallSites.push_back(&CS);
       const unsigned Index = AllSpecs.size() - 1;
diff --git a/llvm/test/Transforms/FunctionSpecialization/function-specialization-maxgrowth.ll b/llvm/test/Transforms/FunctionSpecialization/function-specialization-maxgrowth.ll
new file mode 100644
index 00000000000000..b4aea30f6b34d3
--- /dev/null
+++ b/llvm/test/Transforms/FunctionSpecialization/function-specialization-maxgrowth.ll
@@ -0,0 +1,97 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --include-generated-funcs --version 5
+; RUN: opt -passes="ipsccp<func-spec>" -funcspec-min-function-size=1       \
+; RUN:                                 -funcspec-for-literal-constant=true \
+; RUN:                                 -funcspec-min-codesize-savings=50   \
+; RUN:                                 -funcspec-min-latency-savings=50    \
+; RUN:                                 -funcspec-max-codesize-growth=1     \
+; RUN:                                 -S < %s | FileCheck %s
+
+; Verify that we are able to specialize a function successfully after analysis
+; of other specializations that are found to not be profitable.
+define void @test_specialize_after_failed_analysis(i32 %n) {
+entry:
+  %notspec0 = call i32 @add0(i32 0, i32 %n)
+  %notspec1 = call i32 @add0(i32 1, i32 %n)
+  %spec = call i32 @add0(i32 1, i32 1)
+  ret void
+}
+
+define i32 @add0(i32 %x, i32 %y) {
+entry:
+  %res = add i32 %x, %y
+  ret i32 %res
+}
+
+; Verify that we do not specialize once maximum codesize growth has been
+; exceeded.
+define void @test_max_codesize_growth_exceeded(i32 %n) {
+entry:
+  %spec0 = call i32 @add1(i32 0, i32 0)
+  %spec1 = call i32 @add1(i32 1, i32 1)
+  %spec2 = call i32 @add1(i32 2, i32 2)
+  %notspec = call i32 @add1(i32 3, i32 3)
+  ret void
+}
+
+define i32 @add1(i32 %x, i32 %y) {
+entry:
+  %res = add i32 %x, %y
+  ret i32 %res
+}
+
+; CHECK-LABEL: define void @test_specialize_after_failed_analysis(
+; CHECK-SAME: i32 [[N:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[NOTSPEC0:%.*]] = call i32 @add0(i32 0, i32 [[N]])
+; CHECK-NEXT:    [[NOTSPEC1:%.*]] = call i32 @add0(i32 1, i32 [[N]])
+; CHECK-NEXT:    [[SPEC:%.*]] = call i32 @add0.specialized.1(i32 1, i32 1)
+; CHECK-NEXT:    ret void
+;
+;
+; CHECK-LABEL: define i32 @add0(
+; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RES:%.*]] = add i32 [[X]], [[Y]]
+; CHECK-NEXT:    ret i32 [[RES]]
+;
+;
+; CHECK-LABEL: define void @test_max_codesize_growth_exceeded(
+; CHECK-SAME: i32 [[N:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[SPEC0:%.*]] = call i32 @add1.specialized.2(i32 0, i32 0)
+; CHECK-NEXT:    [[SPEC1:%.*]] = call i32 @add1.specialized.3(i32 1, i32 1)
+; CHECK-NEXT:    [[SPEC2:%.*]] = call i32 @add1.specialized.4(i32 2, i32 2)
+; CHECK-NEXT:    [[NOTSPEC:%.*]] = call i32 @add1(i32 3, i32 3)
+; CHECK-NEXT:    ret void
+;
+;
+; CHECK-LABEL: define i32 @add1(
+; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[RES:%.*]] = add i32 [[X]], [[Y]]
+; CHECK-NEXT:    ret i32 [[RES]]
+;
+;
+; CHECK-LABEL: define internal i32 @add0.specialized.1(
+; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    ret i32 poison
+;
+;
+; CHECK-LABEL: define internal i32 @add1.specialized.2(
+; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    ret i32 poison
+;
+;
+; CHECK-LABEL: define internal i32 @add1.specialized.3(
+; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    ret i32 poison
+;
+;
+; CHECK-LABEL: define internal i32 @add1.specialized.4(
+; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    ret i32 poison
+;

labrinea · 2024-10-24T09:47:46Z

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

+    // codesize growth.
+    unsigned FuncSize = getCostValue(FunctionMetrics[S.F].NumInsts);
+    if ((FunctionGrowth[S.F] + S.CodeSizeCost) / FuncSize > MaxCodeSizeGrowth)
+      continue;


I think I was wrong before suggesting you to move this here. At this point we have committed on creating NSpecs specializations, but now we are skipping some. The filtering needs to happen earlier, inside findSpecializations but after the lambda (actually before returning true). At that point we know it is profitable.

The thing is we know we are creating NSpecs but we don't know which ones. They may correspond to different functions. However the FunctionGrowth is kept per function, not per module. Inside findSpecialziations we are only examining specializations for the same function.

It would make sense to apply the filtering here if we kept codesize growth across functions (per module), which isn't the case. Correct me if I am wrong, but that's my understanding. Apologies for the confusion.

Yes, I think that is definitely the trade-off with moving the logic to here - we may skip some specializations, and because we do so after we have built the priority queue of specializations to perform, we may fail to specialize a "next-best" specialization candidate for a function that would have fit into the FunctionGrowth for a its function because it's place has been taken in the NSpecs best candidates.

However, if we move it to where you suggest we are making a slightly different tradeoff:
#113448 (comment)

Here we successfully got past the FuncGrowth check. We know for sure the specialization will be created. We can safely do FunctionGrowth[F] += SpecSize; at this point, before returning true.

If I understand correctly, we do not know for sure the specialization will be created, because we do not know that it will have a high enough score to end up in the NSpecs we decide to specialize for the module. If for a given function F we analyze N specialization candidates that are all deemed profitable and these N candidates use up the maximum FunctionGrowth for F, then we would deem the N+1-th candidate unprofitable based on FunctionGrowth even if it had a higher score (Inlining/CodeSize/Latency) than the candidates analyzed beforehand.

Maybe you still think this is the better tradeoff?

I suppose to some extent it might make more sense to instead have a codesize growth across functions (per module) budget, and to build the priority queue of BestSpecs based on that rather than a budget based on number of specializations.

I think it is worth mentioning the background story. We initially came up with the FuncGrowth threshold to limit the overall codesize growth from being linear to the number of times the specializer runs. At first funcspec-max-iters was set to 1, and we increased it to 10 for enabling specialization of recursive functions. Imagine runnning 10 times yielded 10 times the original function size.

We have MaxClones to limit the amount of specializations per iteration. Within a single iteration of run, the growth has no effect basically unless you have many candidates (not the case for functions relying on promoteConstantStackValues). It is used to prevent the next iteration candidates. How can there be a N+1-th candidate if at the N-th step we decided to stop? (I am thinking of recursive functions).

Maybe we can keep FunctionGrowth[S.F] += S.CodeSizeCost; here but remove the early exit from the line above this comment. It should only remain in the isProfitable lambda in my opinion.

Ah ok that makes a lot of sense, thank you for providing the additional context.

I have updated the code so that we only check the codesize increase inside the isProfitable lambda, but do the accumulation into FunctionGrowth[S.F] at the point we are actually creating the specialization. This has the downside of us potentially violating MaxCodeSizeGrowth in a single iteration, in the case where we have specialization candidates within a single iteration that individually do not cause us to exceed MaxCodeSizeGrowth (and so pass the check in the isProfitable lambda), but in combination do exceed when they are all performed. This is why I have removed the second test case from the regression test, as it no longer makes sense.

However I think that this is an acceptable trade-off, as it still preserves the intention of the MaxCodeSizeGrowth threshold by preventing linear codesize increase across multiple iterations, as even if the threshold is exceeded in an iteration we would not consider further specializations in successive iterations once the growth has been accounted for, and we have MaxClones to limit the amount of specializations per iteration as you say.

We could fix the potential violation of MaxCodeSizeGrowth within a single iteration by moving the accumulation into FunctionGrowth[S.F] to the end of the isProfitable lambda as you suggested. However I don't really like this, because we may end up discounting the most profitable specialization candidate for a function. To clarify my previous comment - by Nth and N+1-th candidates I meant candidates within a single iteration. If we accumulate the codesize growth whilst we are analyzing the candidates, we may 'use-up' MaxCodeSizeGrowth before encountering the best specialization and therefore fail to consider it:

// (Within a single iteration, MaxCodeSizeGrowth=100%) spec1 [Score 500, Growth 50%] spec2 [Score 500, Growth 50%] // After examining the above, FunctionGrowth[S.F] = 100% so no more candidates will be considered profitable! // Candidate with highest Score: spec3 [Score 1000, Growth 10%]

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

labrinea · 2024-10-25T14:26:38Z

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

+    // codesize growth.
+    unsigned FuncSize = getCostValue(FunctionMetrics[S.F].NumInsts);
+    if ((FunctionGrowth[S.F] + S.CodeSizeCost) / FuncSize > MaxCodeSizeGrowth)
+      continue;


I think it is worth mentioning the background story. We initially came up with the FuncGrowth threshold to limit the overall codesize growth from being linear to the number of times the specializer runs. At first funcspec-max-iters was set to 1, and we increased it to 10 for enabling specialization of recursive functions. Imagine runnning 10 times yielded 10 times the original function size.

We have MaxClones to limit the amount of specializations per iteration. Within a single iteration of run, the growth has no effect basically unless you have many candidates (not the case for functions relying on promoteConstantStackValues). It is used to prevent the next iteration candidates. How can there be a N+1-th candidate if at the N-th step we decided to stop? (I am thinking of recursive functions).

Maybe we can keep FunctionGrowth[S.F] += S.CodeSizeCost; here but remove the early exit from the line above this comment. It should only remain in the isProfitable lambda in my opinion.

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h

- Rename local variable (s/CodeSizeCost/SpecSize) - Rename struct member (s/CodeSizeCost/CodeSize) and update comment - Remove early exit from specialization creation loop - Remove redundant prefix from regression test name - Remove regression test for exceeding FuncGrowth

…#113448) Only accumulate the codesize increase of functions that are actually specialized, rather than for every candidate specialization that we analyse. This fixes a subtle bug where prior analysis of candidate specializations that were deemed unprofitable could prevent subsequent profitable candidates from being recognised.

hazzlim added 3 commits October 23, 2024 08:07

[FuncSpec] Add tests for MaxCodeSizeGrowth

ae956b4

Update tests

d289ffa

llvmbot added function-specialization llvm:transforms labels Oct 23, 2024

hazzlim requested review from momchil-velikov, labrinea, davemgreen, david-arm and ChuanqiXu9 October 23, 2024 12:13

labrinea reviewed Oct 24, 2024

View reviewed changes

labrinea reviewed Oct 25, 2024

View reviewed changes

labrinea approved these changes Oct 28, 2024

View reviewed changes

hazzlim merged commit e19a5fc into llvm:main Oct 29, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FuncSpec] Improve accounting of specialization codesize growth #113448

[FuncSpec] Improve accounting of specialization codesize growth #113448

hazzlim commented Oct 23, 2024

Uh oh!

llvmbot commented Oct 23, 2024 •

edited

Loading

Uh oh!

labrinea Oct 24, 2024 •

edited

Loading

Uh oh!

hazzlim Oct 24, 2024

Uh oh!

labrinea Oct 25, 2024 •

edited

Loading

Uh oh!

hazzlim Oct 28, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

labrinea Oct 25, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[FuncSpec] Improve accounting of specialization codesize growth #113448

[FuncSpec] Improve accounting of specialization codesize growth #113448

Conversation

hazzlim commented Oct 23, 2024

Uh oh!

llvmbot commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

labrinea Oct 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hazzlim Oct 24, 2024

Choose a reason for hiding this comment

Uh oh!

labrinea Oct 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hazzlim Oct 28, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

labrinea Oct 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Oct 23, 2024 •

edited

Loading

labrinea Oct 24, 2024 •

edited

Loading

labrinea Oct 25, 2024 •

edited

Loading

labrinea Oct 25, 2024 •

edited

Loading