Open
Description
Reproducer:
-- dep/dep.go --
package dep
var sink int
//go:noinline
func spin() {
for i := 0; i < 1000000; i++ {
sink = i
}
}
func b() {
// Two calls, too hot to inline, darn...
spin()
spin()
for i := 0; i < 1000000; i++ {
sink = i
}
}
func a() {
// Two calls, too hot to inline, darn...
b()
b()
for i := 0; i < 1000000; i++ {
sink = i
}
}
//go:noinline
func Foo() {
a()
}
-- dep/dep_test.go --
package dep
import "testing"
func BenchmarkMain(b *testing.B) {
for i := 0; i < b.N; i++ {
Foo()
}
}
-- main.go --
package main
import "example.com/pgo/dep"
func main() {
dep.Foo()
}
-- go.mod --
module example.com/pgo
go 1.23
https://go.dev/play/p/QxBt13wV3Uh
Run the benchmark to collect a profile, and then use that as a PGO profile:
$ go test -bench . -cpuprofile=cpu.pprof ./dep
$ go build "-gcflags=example.com/pgo/dep=-m=2 -d=pgodebug=1" -pgo=cpu.pprof
# example.com/pgo/dep
hot-callsite-thres-from-CDF=0.6493506493506493
dep/dep.go:6:6: cannot inline spin: marked go:noinline
hot-node enabled increased budget=2000 for func=example.com/pgo/dep.b
dep/dep.go:12:6: can inline b with cost 133 as: func() { spin(); spin(); for loop }
hot-node enabled increased budget=2000 for func=example.com/pgo/dep.a
hot-budget check allows inlining for call example.com/pgo/dep.b (cost 133) at dep/dep.go:23:3 in function example.com/pgo/dep.a
hot-budget check allows inlining for call example.com/pgo/dep.b (cost 133) at dep/dep.go:24:3 in function example.com/pgo/dep.a
dep/dep.go:21:6: can inline a with cost 285 as: func() { b(); b(); for loop }
dep/dep.go:31:6: cannot inline Foo: marked go:noinline
hot-budget check allows inlining for call example.com/pgo/dep.b (cost 133) at dep/dep.go:23:3 in function example.com/pgo/dep.a
dep/dep.go:23:3: inlining call to b
hot-budget check allows inlining for call example.com/pgo/dep.b (cost 133) at dep/dep.go:24:3 in function example.com/pgo/dep.a
dep/dep.go:24:3: inlining call to b
hot-budget check allows inlining for call example.com/pgo/dep.a (cost 285) at dep/dep.go:32:3 in function example.com/pgo/dep.Foo
dep/dep.go:32:3: inlining call to a
The hot-budget
lines note that PGO allows inlining b
into a
and a
into Foo
. dep/dep.go:32:3: inlining call to a
indicates that a
was inlined into Foo
, but there should then be a subsequent inline b
into Foo
. The output makes this a bit confusing, but objdump makes it clear that b
was not inlined:
0000000000467a80 <example.com/pgo/dep.Foo>:
...
467a8b: | | e8 b0 ff ff ff call 467a40 <example.com/pgo/dep.b>
467a90: | | e8 ab ff ff ff call 467a40 <example.com/pgo/dep.b>
...
A bit of custom logging in the compiler makes it more clear what is happening:
dep/dep.go:31:6: function Foo: DevirtualizeAndInlineFunc
dep/dep.go:31:6: function Foo: TryInlineCall to a
hot-budget check allows inlining for call example.com/pgo/dep.a (cost 285) at dep/dep.go:32:3 in function example.com/pgo/dep.Foo
dep/dep.go:32:3: inlining call to a
dep/dep.go:31:6: function Foo: TryInlineCall to b
dep/dep.go:31:6: canInlineCallExpr from Foo to b: cost too high: maxCost 80, callSiteScore 133, hot false
dep/dep.go:31:6: mkinlcall b: !canInlineCallExpr
It looks like what is happening here is:
a
in inlined intoFoo
.- We then attempt to inline
b
intoFoo
(from the inlined body ofa
). canInlineCallExpr
checksinlineCostOK
ofFoo -> b
. This consults the PGO edge map, which does not contain a hot edge fromFoo -> b
because PGO includes inline calls in edges. i.e., it has hot edgesFoo -> a
anda -> b
.- Inline does not occur because we didn't increase the budget and base cost is too high.
I believe what should happen is if inlining a call inside of an ir.InlinedCallExpr
, we should use the original name of the InlinedCallExpr
to consult the PGO edge map.
cc @cherrymui
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Todo