Skip to content

Commit 39ce590

Browse files
committed
cmd/compile: rotate loops so conditional branch is at the end
Old loops look like this: loop: CMPQ ... JGE exit ... JMP loop exit: New loops look like this: JMP entry loop: ... entry: CMPQ ... JLT loop This removes one instruction (the unconditional jump) from the inner loop. Kinda surprisingly, it matters. This is a bit different than the peeling that the old obj library did in that we don't duplicate the loop exit test. We just jump to the test. I'm not sure if it is better or worse to do that (peeling gets rid of the JMP but means more code duplication), but this CL is certainly a much simpler compiler change, so I'll try this way first. The obj library used to do peeling before CL https://go-review.googlesource.com/c/36205 turned it off. Fixes #15837 (remove obj instruction reordering) The reordering is already removed, this CL implements the only part of that reordering that we'd like to keep. Fixes #14758 (append loop) name old time/op new time/op delta Foo-12 817ns ± 4% 538ns ± 0% -34.08% (p=0.000 n=10+9) Bar-12 850ns ±11% 570ns ±13% -32.88% (p=0.000 n=10+10) Update #19595 (BLAS slowdown) name old time/op new time/op delta DgemvMedMedNoTransIncN-12 13.2µs ± 9% 10.2µs ± 1% -22.26% (p=0.000 n=9+9) Fixes #19633 (append loop) name old time/op new time/op delta Foo-12 810ns ± 1% 540ns ± 0% -33.30% (p=0.000 n=8+9) Update #18977 (Fannkuch11 regression) name old time/op new time/op delta Fannkuch11-8 2.80s ± 0% 3.01s ± 0% +7.47% (p=0.000 n=9+10) This one makes no sense. There's strictly 1 less instruction in the inner loop (17 instead of 18). They are exactly the same instructions except for the JMP that has been elided. go1 benchmarks generally don't look very impressive. But the gains for the specific issues above make this CL still probably worth it. name old time/op new time/op delta BinaryTree17-8 2.32s ± 0% 2.34s ± 0% +1.14% (p=0.000 n=9+7) Fannkuch11-8 2.80s ± 0% 3.01s ± 0% +7.47% (p=0.000 n=9+10) FmtFprintfEmpty-8 44.1ns ± 1% 46.1ns ± 1% +4.53% (p=0.000 n=10+10) FmtFprintfString-8 67.8ns ± 0% 74.4ns ± 1% +9.80% (p=0.000 n=10+9) FmtFprintfInt-8 74.9ns ± 0% 78.4ns ± 0% +4.67% (p=0.000 n=8+10) FmtFprintfIntInt-8 117ns ± 1% 123ns ± 1% +4.69% (p=0.000 n=9+10) FmtFprintfPrefixedInt-8 160ns ± 1% 146ns ± 0% -8.22% (p=0.000 n=8+10) FmtFprintfFloat-8 214ns ± 0% 206ns ± 0% -3.91% (p=0.000 n=8+8) FmtManyArgs-8 468ns ± 0% 497ns ± 1% +6.09% (p=0.000 n=8+10) GobDecode-8 6.16ms ± 0% 6.21ms ± 1% +0.76% (p=0.000 n=9+10) GobEncode-8 4.90ms ± 0% 4.92ms ± 1% +0.37% (p=0.028 n=9+10) Gzip-8 209ms ± 0% 212ms ± 0% +1.33% (p=0.000 n=10+10) Gunzip-8 36.6ms ± 0% 38.0ms ± 1% +4.03% (p=0.000 n=9+9) HTTPClientServer-8 84.2µs ± 0% 86.0µs ± 1% +2.14% (p=0.000 n=9+9) JSONEncode-8 13.6ms ± 3% 13.8ms ± 1% +1.55% (p=0.003 n=9+10) JSONDecode-8 53.2ms ± 5% 52.9ms ± 0% ~ (p=0.280 n=10+10) Mandelbrot200-8 3.78ms ± 0% 3.78ms ± 1% ~ (p=0.661 n=10+9) GoParse-8 2.89ms ± 0% 2.94ms ± 2% +1.50% (p=0.000 n=10+10) RegexpMatchEasy0_32-8 68.5ns ± 2% 68.9ns ± 1% ~ (p=0.136 n=10+10) RegexpMatchEasy0_1K-8 220ns ± 1% 225ns ± 1% +2.41% (p=0.000 n=10+10) RegexpMatchEasy1_32-8 64.7ns ± 0% 64.5ns ± 0% -0.28% (p=0.042 n=10+10) RegexpMatchEasy1_1K-8 348ns ± 1% 355ns ± 0% +1.90% (p=0.000 n=10+10) RegexpMatchMedium_32-8 102ns ± 1% 105ns ± 1% +2.95% (p=0.000 n=10+10) RegexpMatchMedium_1K-8 33.1µs ± 3% 32.5µs ± 0% -1.75% (p=0.000 n=10+10) RegexpMatchHard_32-8 1.71µs ± 1% 1.70µs ± 1% -0.84% (p=0.002 n=10+9) RegexpMatchHard_1K-8 51.1µs ± 0% 50.8µs ± 1% -0.48% (p=0.004 n=10+10) Revcomp-8 411ms ± 1% 402ms ± 0% -2.22% (p=0.000 n=10+9) Template-8 61.8ms ± 1% 59.7ms ± 0% -3.44% (p=0.000 n=9+9) TimeParse-8 306ns ± 0% 318ns ± 0% +3.83% (p=0.000 n=10+10) TimeFormat-8 320ns ± 0% 318ns ± 1% -0.53% (p=0.012 n=7+10) Change-Id: Ifaf29abbe5874e437048e411ba8f7cfbc9e1c94b Reviewed-on: https://go-review.googlesource.com/38431 Run-TryBot: Keith Randall <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: David Chase <[email protected]>
1 parent 6a48019 commit 39ce590

File tree

3 files changed

+84
-5
lines changed

3 files changed

+84
-5
lines changed

src/cmd/compile/internal/ssa/compile.go

+3
Original file line numberDiff line numberDiff line change
@@ -369,6 +369,7 @@ var passes = [...]pass{
369369
{name: "late nilcheck", fn: nilcheckelim2},
370370
{name: "flagalloc", fn: flagalloc, required: true}, // allocate flags register
371371
{name: "regalloc", fn: regalloc, required: true}, // allocate int & float registers + stack slots
372+
{name: "loop rotate", fn: loopRotate},
372373
{name: "stackframe", fn: stackframe, required: true},
373374
{name: "trim", fn: trim}, // remove empty blocks
374375
}
@@ -427,6 +428,8 @@ var passOrder = [...]constraint{
427428
{"schedule", "flagalloc"},
428429
// regalloc needs flags to be allocated first.
429430
{"flagalloc", "regalloc"},
431+
// loopRotate will confuse regalloc.
432+
{"regalloc", "loop rotate"},
430433
// stackframe needs to know about spilled registers.
431434
{"regalloc", "stackframe"},
432435
// trim needs regalloc to be done first.

src/cmd/compile/internal/ssa/likelyadjust.go

-5
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,6 @@ type loop struct {
1616
children []*loop // loops nested directly within this loop. Initialized by assembleChildren().
1717
exits []*Block // exits records blocks reached by exits from this loop. Initialized by findExits().
1818

19-
// Loops aren't that common, so rather than force regalloc to keep
20-
// a map or slice for its data, just put it here.
21-
spills []*Value
22-
scratch int32
23-
2419
// Next three fields used by regalloc and/or
2520
// aid in computation of inner-ness and list of blocks.
2621
nBlocks int32 // Number of blocks in this loop but not within inner loops
+81
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
// Copyright 2017 The Go Authors. All rights reserved.
2+
// Use of this source code is governed by a BSD-style
3+
// license that can be found in the LICENSE file.
4+
5+
package ssa
6+
7+
// loopRotate converts loops with a check-loop-condition-at-beginning
8+
// to loops with a check-loop-condition-at-end.
9+
// This helps loops avoid extra unnecessary jumps.
10+
//
11+
// loop:
12+
// CMPQ ...
13+
// JGE exit
14+
// ...
15+
// JMP loop
16+
// exit:
17+
//
18+
// JMP entry
19+
// loop:
20+
// ...
21+
// entry:
22+
// CMPQ ...
23+
// JLT loop
24+
func loopRotate(f *Func) {
25+
loopnest := f.loopnest()
26+
if len(loopnest.loops) == 0 {
27+
return
28+
}
29+
30+
// Set of blocks we're moving, by ID.
31+
move := map[ID]struct{}{}
32+
33+
// Map from block ID to the moving block that should
34+
// come right after it.
35+
after := map[ID]*Block{}
36+
37+
// Check each loop header and decide if we want to move it.
38+
for _, loop := range loopnest.loops {
39+
b := loop.header
40+
var p *Block // b's in-loop predecessor
41+
for _, e := range b.Preds {
42+
if e.b.Kind != BlockPlain {
43+
continue
44+
}
45+
if loopnest.b2l[e.b.ID] != loop {
46+
continue
47+
}
48+
p = e.b
49+
}
50+
if p == nil || p == b {
51+
continue
52+
}
53+
54+
// Place b after p.
55+
move[b.ID] = struct{}{}
56+
after[p.ID] = b
57+
}
58+
59+
// Move blocks to their destinations in a single pass.
60+
// We rely here on the fact that loop headers must come
61+
// before the rest of the loop. And that relies on the
62+
// fact that we only identify reducible loops.
63+
j := 0
64+
for i, b := range f.Blocks {
65+
if _, ok := move[b.ID]; ok {
66+
continue
67+
}
68+
f.Blocks[j] = b
69+
j++
70+
if a := after[b.ID]; a != nil {
71+
if j > i {
72+
f.Fatalf("head before tail in loop %s", b)
73+
}
74+
f.Blocks[j] = a
75+
j++
76+
}
77+
}
78+
if j != len(f.Blocks) {
79+
f.Fatalf("bad reordering in looprotate")
80+
}
81+
}

0 commit comments

Comments
 (0)