Skip to content

cmd/compile: PGO fails to do multiple levels of inlining into a single function #69046

Open
@prattmic

Description

@prattmic

Reproducer:

-- dep/dep.go --
package dep

var sink int

//go:noinline
func spin() {
        for i := 0; i < 1000000; i++ {
                sink = i
        }
}

func b() {
        // Two calls, too hot to inline, darn...
        spin()
        spin()
        for i := 0; i < 1000000; i++ {
                sink = i
        }
}

func a() {
        // Two calls, too hot to inline, darn...
        b()
        b()
        for i := 0; i < 1000000; i++ {
                sink = i
        }
}

//go:noinline
func Foo() {
        a()
}
-- dep/dep_test.go --
package dep

import "testing"

func BenchmarkMain(b *testing.B) {
        for i := 0; i < b.N; i++ {
                Foo()
        }
}
-- main.go --
package main

import "example.com/pgo/dep"

func main() {
        dep.Foo()
}
-- go.mod --
module example.com/pgo

go 1.23

https://go.dev/play/p/QxBt13wV3Uh

Run the benchmark to collect a profile, and then use that as a PGO profile:

$ go test -bench . -cpuprofile=cpu.pprof ./dep
$ go build "-gcflags=example.com/pgo/dep=-m=2 -d=pgodebug=1" -pgo=cpu.pprof                         
# example.com/pgo/dep
hot-callsite-thres-from-CDF=0.6493506493506493
dep/dep.go:6:6: cannot inline spin: marked go:noinline
hot-node enabled increased budget=2000 for func=example.com/pgo/dep.b
dep/dep.go:12:6: can inline b with cost 133 as: func() { spin(); spin(); for loop }
hot-node enabled increased budget=2000 for func=example.com/pgo/dep.a
hot-budget check allows inlining for call example.com/pgo/dep.b (cost 133) at dep/dep.go:23:3 in function example.com/pgo/dep.a
hot-budget check allows inlining for call example.com/pgo/dep.b (cost 133) at dep/dep.go:24:3 in function example.com/pgo/dep.a
dep/dep.go:21:6: can inline a with cost 285 as: func() { b(); b(); for loop }
dep/dep.go:31:6: cannot inline Foo: marked go:noinline
hot-budget check allows inlining for call example.com/pgo/dep.b (cost 133) at dep/dep.go:23:3 in function example.com/pgo/dep.a
dep/dep.go:23:3: inlining call to b
hot-budget check allows inlining for call example.com/pgo/dep.b (cost 133) at dep/dep.go:24:3 in function example.com/pgo/dep.a
dep/dep.go:24:3: inlining call to b
hot-budget check allows inlining for call example.com/pgo/dep.a (cost 285) at dep/dep.go:32:3 in function example.com/pgo/dep.Foo
dep/dep.go:32:3: inlining call to a

The hot-budget lines note that PGO allows inlining b into a and a into Foo. dep/dep.go:32:3: inlining call to a indicates that a was inlined into Foo, but there should then be a subsequent inline b into Foo. The output makes this a bit confusing, but objdump makes it clear that b was not inlined:

0000000000467a80 <example.com/pgo/dep.Foo>:
...
  467a8b:       |  |         e8 b0 ff ff ff             call   467a40 <example.com/pgo/dep.b>
  467a90:       |  |         e8 ab ff ff ff             call   467a40 <example.com/pgo/dep.b>
...

A bit of custom logging in the compiler makes it more clear what is happening:

dep/dep.go:31:6: function Foo: DevirtualizeAndInlineFunc
dep/dep.go:31:6: function Foo: TryInlineCall to a
hot-budget check allows inlining for call example.com/pgo/dep.a (cost 285) at dep/dep.go:32:3 in function example.com/pgo/dep.Foo
dep/dep.go:32:3: inlining call to a
dep/dep.go:31:6: function Foo: TryInlineCall to b
dep/dep.go:31:6: canInlineCallExpr from Foo to b: cost too high: maxCost 80, callSiteScore 133, hot false
dep/dep.go:31:6: mkinlcall b: !canInlineCallExpr

It looks like what is happening here is:

  1. a in inlined into Foo.
  2. We then attempt to inline b into Foo (from the inlined body of a).
  3. canInlineCallExpr checks inlineCostOK of Foo -> b. This consults the PGO edge map, which does not contain a hot edge from Foo -> b because PGO includes inline calls in edges. i.e., it has hot edges Foo -> a and a -> b.
  4. Inline does not occur because we didn't increase the budget and base cost is too high.

I believe what should happen is if inlining a call inside of an ir.InlinedCallExpr, we should use the original name of the InlinedCallExpr to consult the PGO edge map.

cc @cherrymui

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsFixThe path to resolution is known, but the work has not been done.compiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions