-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: compiling large functions is slow #8225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I created a gigantic type switch with around 3000 cases and 40k LOC, and when compiling it got $ time go build real 2m12.840s user 2m9.947s sys 0m1.776s To find this code, check out https://github.com/zond/godec/tree/97525c53ef42889c5f9f96033030dd4675047b21, run "go generator/generator.go" Then I broke up the switch by moving the body of the cases to other functions, so that each case only called another function instead of allocate, for loop etc, and got $ time go build real 0m8.115s user 0m7.812s sys 0m0.282s To find THAT code, check out https://github.com/zond/godec/tree/a0188c898f1197709a1238fba5b5864f63eac552, run "go run generator/generator.go" |
My original issue ("large functions compile slowly") is too vague and generic to be helpful. However, the code in Comment 1 is a useful test case. As of 1.8rc1, we spend ~20% of CPU time in resolveFwdRefs and ~10% of CPU time in liveness bit twiddling. I haven't looked at resolveFwdRefs, but the liveness bit twiddling should be readily reduced. The problem is that we are doing O(number of variables) work for every instruction in the program in livenessprologue. But in almost all cases, progeffects only flips a bit or two. Switching to a sparse return value from progeffects (with appropriate caching and allocation avoidance) should help considerably. I may mail a CL that does this for 1.9. |
And looks like improving the liveness bit twiddling helps regular code a little bit as well. My crude prototype shows:
|
CL https://golang.org/cl/36112 mentions this issue. |
CL https://golang.org/cl/36211 mentions this issue. |
This speeds up compilation of the code in #8225 by 25%-30%. The complexity of the algorithm is unchanged, but this shrinks the constant factor so much that it doesn't matter, even the size of the giant type switch gets scaled up dramatically. name old time/op new time/op delta Template 218ms ± 5% 217ms ±10% ~ (p=0.163 n=27+30) Unicode 98.2ms ± 6% 97.7ms ±10% ~ (p=0.150 n=27+29) GoTypes 654ms ± 5% 650ms ± 5% ~ (p=0.350 n=30+30) Compiler 2.70s ± 4% 2.68s ± 3% ~ (p=0.128 n=30+29) name old user-ns/op new user-ns/op delta Template 276user-ms ± 6% 271user-ms ± 7% -1.83% (p=0.003 n=29+28) Unicode 138user-ms ± 5% 137user-ms ± 4% ~ (p=0.071 n=27+27) GoTypes 881user-ms ± 4% 877user-ms ± 4% ~ (p=0.423 n=30+30) Compiler 3.76user-s ± 4% 3.72user-s ± 2% -0.84% (p=0.028 n=30+29) name old alloc/op new alloc/op delta Template 40.7MB ± 0% 40.7MB ± 0% ~ (p=0.936 n=30+30) Unicode 30.8MB ± 0% 30.8MB ± 0% ~ (p=0.859 n=28+30) GoTypes 123MB ± 0% 123MB ± 0% ~ (p=0.273 n=30+30) Compiler 472MB ± 0% 472MB ± 0% ~ (p=0.432 n=30+30) name old allocs/op new allocs/op delta Template 401k ± 1% 401k ± 1% ~ (p=0.859 n=30+30) Unicode 331k ± 0% 331k ± 1% ~ (p=0.823 n=28+30) GoTypes 1.24M ± 0% 1.24M ± 0% ~ (p=0.286 n=30+30) Compiler 4.26M ± 0% 4.26M ± 0% ~ (p=0.359 n=30+30) Change-Id: Ia850065a9a84c07a5b0b4e23c1758b5679498da7 Reviewed-on: https://go-review.googlesource.com/36112 Run-TryBot: Josh Bleecher Snyder <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Keith Randall <[email protected]>
When the number of variables in a function is very large, liveness analysis gets less efficient, since every bit vector is O(number of variables). Improve the situation by returning a sparse representation from progeffects. In all scenarios, progeffects either returns a slice that is shared function-wide, and which is usually small, or a slice that is guaranteed to have at most three values. Reduces compilation time for the code in #8225 Comment 1 by ~10%. Minor effects on regular packages (below). Passes toolstash -cmp. Updates #8225 name old time/op new time/op delta Template 215ms ± 2% 212ms ± 4% -1.31% (p=0.001 n=30+30) Unicode 98.3ms ± 3% 98.4ms ± 5% ~ (p=0.971 n=30+30) GoTypes 657ms ± 3% 651ms ± 2% -0.98% (p=0.001 n=30+27) Compiler 2.78s ± 2% 2.77s ± 2% -0.60% (p=0.006 n=30+30) Flate 130ms ± 4% 130ms ± 4% ~ (p=0.712 n=29+30) GoParser 159ms ± 5% 158ms ± 3% ~ (p=0.331 n=29+30) Reflect 406ms ± 3% 404ms ± 3% -0.69% (p=0.041 n=29+30) Tar 117ms ± 4% 117ms ± 3% ~ (p=0.886 n=30+29) XML 219ms ± 2% 217ms ± 2% ~ (p=0.091 n=29+24) name old user-ns/op new user-ns/op delta Template 272user-ms ± 3% 270user-ms ± 3% -1.03% (p=0.004 n=30+30) Unicode 138user-ms ± 2% 138user-ms ± 3% ~ (p=0.902 n=29+29) GoTypes 891user-ms ± 2% 883user-ms ± 2% -0.95% (p=0.000 n=29+29) Compiler 3.85user-s ± 2% 3.84user-s ± 2% ~ (p=0.236 n=30+30) Flate 167user-ms ± 2% 166user-ms ± 4% ~ (p=0.511 n=28+30) GoParser 211user-ms ± 4% 210user-ms ± 3% ~ (p=0.287 n=29+30) Reflect 539user-ms ± 3% 536user-ms ± 2% -0.59% (p=0.034 n=29+30) Tar 154user-ms ± 3% 155user-ms ± 4% ~ (p=0.786 n=30+30) XML 289user-ms ± 3% 288user-ms ± 4% ~ (p=0.249 n=30+26) name old alloc/op new alloc/op delta Template 40.7MB ± 0% 40.8MB ± 0% +0.09% (p=0.001 n=30+30) Unicode 30.8MB ± 0% 30.8MB ± 0% ~ (p=0.112 n=30+30) GoTypes 123MB ± 0% 124MB ± 0% +0.09% (p=0.000 n=30+30) Compiler 473MB ± 0% 473MB ± 0% +0.05% (p=0.000 n=30+30) Flate 26.5MB ± 0% 26.5MB ± 0% ~ (p=0.186 n=29+30) GoParser 32.3MB ± 0% 32.4MB ± 0% +0.07% (p=0.021 n=28+30) Reflect 84.4MB ± 0% 84.6MB ± 0% +0.21% (p=0.000 n=30+30) Tar 27.3MB ± 0% 27.3MB ± 0% +0.09% (p=0.010 n=30+28) XML 44.7MB ± 0% 44.7MB ± 0% +0.07% (p=0.002 n=30+30) name old allocs/op new allocs/op delta Template 401k ± 1% 400k ± 1% ~ (p=0.321 n=30+30) Unicode 331k ± 1% 331k ± 1% ~ (p=0.357 n=30+28) GoTypes 1.24M ± 0% 1.24M ± 1% -0.19% (p=0.001 n=30+30) Compiler 4.27M ± 0% 4.27M ± 0% -0.13% (p=0.000 n=30+30) Flate 252k ± 1% 251k ± 1% -0.30% (p=0.005 n=30+30) GoParser 325k ± 1% 325k ± 1% ~ (p=0.224 n=28+30) Reflect 1.06M ± 0% 1.05M ± 0% -0.34% (p=0.000 n=30+30) Tar 266k ± 1% 266k ± 1% ~ (p=0.333 n=30+30) XML 416k ± 1% 415k ± 1% ~ (p=0.144 n=30+29) Change-Id: I6ba67a9203516373062a2618122306da73333d98 Reviewed-on: https://go-review.googlesource.com/36211 Run-TryBot: Josh Bleecher Snyder <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Matthew Dempsky <[email protected]>
The last two commits helped compile time for this code considerably. I don't see anything further to be done peculiar to this code--the rest is now general and known compiler slowness. |
When the number of variables in a function is very large, liveness analysis gets less efficient, since every operation is O(number of variables). Improve the situation by returning a sparse representation from progeffects, caching common return values, and avoiding allocation in all other cases. TODO: fix up mutilated comments restore debugging code measure exact impact on golang#8225 measure with compilebench Reduces compilation time for the code in golang#8225 Comment 1 by ~10%. Updates golang#8225 name old time/op new time/op delta Template 209ms ± 3% 206ms ± 4% -1.73% (p=0.000 n=29+28) Unicode 97.0ms ± 5% 96.8ms ± 5% ~ (p=0.456 n=29+30) GoTypes 628ms ± 3% 623ms ± 4% ~ (p=0.186 n=29+30) Compiler 2.67s ± 4% 2.63s ± 3% -1.45% (p=0.007 n=30+30) name old user-ns/op new user-ns/op delta Template 268user-ms ± 3% 261user-ms ± 3% -2.48% (p=0.000 n=29+28) Unicode 137user-ms ± 5% 136user-ms ± 3% ~ (p=0.568 n=29+29) GoTypes 858user-ms ± 3% 853user-ms ± 3% ~ (p=0.202 n=29+30) Compiler 3.72user-s ± 4% 3.68user-s ± 3% -1.02% (p=0.039 n=30+30) name old alloc/op new alloc/op delta Template 40.7MB ± 0% 40.7MB ± 0% +0.05% (p=0.038 n=29+30) Unicode 30.8MB ± 0% 30.8MB ± 0% ~ (p=0.625 n=30+29) GoTypes 123MB ± 0% 124MB ± 0% +0.10% (p=0.000 n=30+30) Compiler 473MB ± 0% 473MB ± 0% +0.05% (p=0.000 n=30+30) name old allocs/op new allocs/op delta Template 401k ± 1% 400k ± 1% -0.26% (p=0.001 n=29+30) Unicode 331k ± 1% 331k ± 0% ~ (p=0.679 n=30+29) GoTypes 1.24M ± 0% 1.24M ± 0% -0.14% (p=0.007 n=30+30) Compiler 4.27M ± 0% 4.26M ± 0% -0.10% (p=0.000 n=30+30) name old maxrss/op new maxrss/op delta Template 30.9M ± 2% 30.9M ± 2% ~ (p=0.958 n=30+29) Unicode 33.1M ± 1% 33.0M ± 1% ~ (p=0.462 n=27+30) GoTypes 61.6M ± 3% 61.6M ± 2% ~ (p=0.917 n=29+28) Compiler 202M ± 2% 203M ± 2% ~ (p=0.631 n=30+30) Change-Id: I6ba67a9203516373062a2618122306da73333d98
The text was updated successfully, but these errors were encountered: