Skip to content

Commit f31a18d

Browse files
committed
cmd/compile: add some generic composite type optimizations
Propagate values through some wide Zero/Move operations. Among other things this allows us to optimize some kinds of array initialization. For example, the following code no longer requires a temporary be allocated on the stack. Instead it writes the values directly into the return value. func f(i uint32) [4]uint32 { return [4]uint32{i, i+1, i+2, i+3} } The return value is unnecessarily cleared but removing that is probably a task for dead store analysis (I think it needs to be able to match multiple Store ops to wide Zero ops). In order to reliably remove stack variables that are rendered unnecessary by these new rules I've added a new generic version of the unread autos elimination pass. These rules are triggered more than 5000 times when building and testing the standard library. Updates #15925 (fixes for arrays of up to 4 elements). Updates #24386 (fixes for up to 4 kept elements). Updates #24416. compilebench results: name old time/op new time/op delta Template 353ms ± 5% 359ms ± 3% ~ (p=0.143 n=10+10) Unicode 219ms ± 1% 217ms ± 4% ~ (p=0.740 n=7+10) GoTypes 1.26s ± 1% 1.26s ± 2% ~ (p=0.549 n=9+10) Compiler 6.00s ± 1% 6.08s ± 1% +1.42% (p=0.000 n=9+8) SSA 15.3s ± 2% 15.6s ± 1% +2.43% (p=0.000 n=10+10) Flate 237ms ± 2% 240ms ± 2% +1.31% (p=0.015 n=10+10) GoParser 285ms ± 1% 285ms ± 1% ~ (p=0.878 n=8+8) Reflect 797ms ± 3% 807ms ± 2% ~ (p=0.065 n=9+10) Tar 334ms ± 0% 335ms ± 4% ~ (p=0.460 n=8+10) XML 419ms ± 0% 423ms ± 1% +0.91% (p=0.001 n=7+9) StdCmd 46.0s ± 0% 46.4s ± 0% +0.85% (p=0.000 n=9+9) name old user-time/op new user-time/op delta Template 337ms ± 3% 346ms ± 5% ~ (p=0.053 n=9+10) Unicode 205ms ±10% 205ms ± 8% ~ (p=1.000 n=10+10) GoTypes 1.22s ± 2% 1.21s ± 3% ~ (p=0.436 n=10+10) Compiler 5.85s ± 1% 5.93s ± 0% +1.46% (p=0.000 n=10+8) SSA 14.9s ± 1% 15.3s ± 1% +2.62% (p=0.000 n=10+10) Flate 229ms ± 4% 228ms ± 6% ~ (p=0.796 n=10+10) GoParser 271ms ± 3% 275ms ± 4% ~ (p=0.165 n=10+10) Reflect 779ms ± 5% 775ms ± 2% ~ (p=0.971 n=10+10) Tar 317ms ± 4% 319ms ± 5% ~ (p=0.853 n=10+10) XML 404ms ± 4% 409ms ± 5% ~ (p=0.436 n=10+10) name old alloc/op new alloc/op delta Template 34.9MB ± 0% 35.0MB ± 0% +0.26% (p=0.000 n=10+10) Unicode 29.3MB ± 0% 29.3MB ± 0% +0.02% (p=0.000 n=10+10) GoTypes 115MB ± 0% 115MB ± 0% +0.30% (p=0.000 n=10+10) Compiler 519MB ± 0% 521MB ± 0% +0.30% (p=0.000 n=10+10) SSA 1.55GB ± 0% 1.57GB ± 0% +1.34% (p=0.000 n=10+9) Flate 24.1MB ± 0% 24.2MB ± 0% +0.10% (p=0.000 n=10+10) GoParser 28.1MB ± 0% 28.1MB ± 0% +0.07% (p=0.000 n=10+10) Reflect 78.7MB ± 0% 78.7MB ± 0% +0.03% (p=0.000 n=8+10) Tar 34.4MB ± 0% 34.5MB ± 0% +0.12% (p=0.000 n=10+10) XML 43.2MB ± 0% 43.2MB ± 0% +0.13% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 330k ± 0% 330k ± 0% -0.01% (p=0.017 n=10+10) Unicode 337k ± 0% 337k ± 0% +0.01% (p=0.000 n=9+10) GoTypes 1.15M ± 0% 1.15M ± 0% +0.03% (p=0.000 n=10+10) Compiler 4.77M ± 0% 4.77M ± 0% +0.03% (p=0.000 n=9+10) SSA 12.5M ± 0% 12.6M ± 0% +1.16% (p=0.000 n=10+10) Flate 221k ± 0% 221k ± 0% +0.05% (p=0.000 n=9+10) GoParser 275k ± 0% 275k ± 0% +0.01% (p=0.014 n=10+9) Reflect 944k ± 0% 944k ± 0% -0.02% (p=0.000 n=10+10) Tar 324k ± 0% 323k ± 0% -0.12% (p=0.000 n=10+10) XML 384k ± 0% 384k ± 0% -0.01% (p=0.001 n=10+10) name old object-bytes new object-bytes delta Template 476kB ± 0% 476kB ± 0% -0.04% (p=0.000 n=10+10) Unicode 218kB ± 0% 218kB ± 0% ~ (all equal) GoTypes 1.58MB ± 0% 1.58MB ± 0% -0.04% (p=0.000 n=10+10) Compiler 6.25MB ± 0% 6.24MB ± 0% -0.09% (p=0.000 n=10+10) SSA 15.9MB ± 0% 16.1MB ± 0% +1.22% (p=0.000 n=10+10) Flate 304kB ± 0% 304kB ± 0% -0.13% (p=0.000 n=10+10) GoParser 370kB ± 0% 370kB ± 0% -0.00% (p=0.000 n=10+10) Reflect 1.27MB ± 0% 1.27MB ± 0% -0.12% (p=0.000 n=10+10) Tar 421kB ± 0% 419kB ± 0% -0.64% (p=0.000 n=10+10) XML 518kB ± 0% 517kB ± 0% -0.12% (p=0.000 n=10+10) name old export-bytes new export-bytes delta Template 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) Unicode 6.52kB ± 0% 6.52kB ± 0% ~ (all equal) GoTypes 29.2kB ± 0% 29.2kB ± 0% ~ (all equal) Compiler 88.0kB ± 0% 88.0kB ± 0% ~ (all equal) SSA 109kB ± 0% 109kB ± 0% ~ (all equal) Flate 4.49kB ± 0% 4.49kB ± 0% ~ (all equal) GoParser 8.10kB ± 0% 8.10kB ± 0% ~ (all equal) Reflect 7.71kB ± 0% 7.71kB ± 0% ~ (all equal) Tar 9.15kB ± 0% 9.15kB ± 0% ~ (all equal) XML 12.3kB ± 0% 12.3kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 676kB ± 0% 672kB ± 0% -0.59% (p=0.000 n=10+10) CmdGoSize 7.26MB ± 0% 7.24MB ± 0% -0.18% (p=0.000 n=10+10) name old data-bytes new data-bytes delta HelloSize 10.2kB ± 0% 10.2kB ± 0% ~ (all equal) CmdGoSize 248kB ± 0% 248kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 125kB ± 0% 125kB ± 0% ~ (all equal) CmdGoSize 145kB ± 0% 145kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.46MB ± 0% 1.45MB ± 0% -0.31% (p=0.000 n=10+10) CmdGoSize 14.7MB ± 0% 14.7MB ± 0% -0.17% (p=0.000 n=10+10) Change-Id: Ic72b0c189dd542f391e1c9ab88a76e9148dc4285 Reviewed-on: https://go-review.googlesource.com/106495 Run-TryBot: Michael Munday <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Keith Randall <[email protected]>
1 parent 098ca84 commit f31a18d

File tree

8 files changed

+4330
-1014
lines changed

8 files changed

+4330
-1014
lines changed

src/cmd/compile/internal/ssa/compile.go

+1
Original file line numberDiff line numberDiff line change
@@ -371,6 +371,7 @@ var passes = [...]pass{
371371
{name: "decompose builtin", fn: decomposeBuiltIn, required: true},
372372
{name: "softfloat", fn: softfloat, required: true},
373373
{name: "late opt", fn: opt, required: true}, // TODO: split required rules and optimizing rules
374+
{name: "dead auto elim", fn: elimDeadAutosGeneric},
374375
{name: "generic deadcode", fn: deadcode},
375376
{name: "check bce", fn: checkbce},
376377
{name: "branchelim", fn: branchelim},

src/cmd/compile/internal/ssa/deadstore.go

+147
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,153 @@ func dse(f *Func) {
133133
}
134134
}
135135

136+
// elimDeadAutosGeneric deletes autos that are never accessed. To acheive this
137+
// we track the operations that the address of each auto reaches and if it only
138+
// reaches stores then we delete all the stores. The other operations will then
139+
// be eliminated by the dead code elimination pass.
140+
func elimDeadAutosGeneric(f *Func) {
141+
addr := make(map[*Value]GCNode) // values that the address of the auto reaches
142+
elim := make(map[*Value]GCNode) // values that could be eliminated if the auto is
143+
used := make(map[GCNode]bool) // used autos that must be kept
144+
145+
// visit the value and report whether any of the maps are updated
146+
visit := func(v *Value) (changed bool) {
147+
args := v.Args
148+
switch v.Op {
149+
case OpAddr:
150+
// Propagate the address if it points to an auto.
151+
n, ok := v.Aux.(GCNode)
152+
if !ok || n.StorageClass() != ClassAuto {
153+
return
154+
}
155+
if addr[v] == nil {
156+
addr[v] = n
157+
changed = true
158+
}
159+
return
160+
case OpVarDef, OpVarKill:
161+
// v should be eliminated if we eliminate the auto.
162+
n, ok := v.Aux.(GCNode)
163+
if !ok || n.StorageClass() != ClassAuto {
164+
return
165+
}
166+
if elim[v] == nil {
167+
elim[v] = n
168+
changed = true
169+
}
170+
return
171+
case OpVarLive:
172+
// Don't delete the auto if it needs to be kept alive.
173+
n, ok := v.Aux.(GCNode)
174+
if !ok || n.StorageClass() != ClassAuto {
175+
return
176+
}
177+
if !used[n] {
178+
used[n] = true
179+
changed = true
180+
}
181+
return
182+
case OpStore, OpMove, OpZero:
183+
// v should be elimated if we eliminate the auto.
184+
n, ok := addr[args[0]]
185+
if ok && elim[v] == nil {
186+
elim[v] = n
187+
changed = true
188+
}
189+
// Other args might hold pointers to autos.
190+
args = args[1:]
191+
}
192+
193+
// The code below assumes that we have handled all the ops
194+
// with sym effects already. Sanity check that here.
195+
// Ignore Args since they can't be autos.
196+
if v.Op.SymEffect() != SymNone && v.Op != OpArg {
197+
panic("unhandled op with sym effect")
198+
}
199+
200+
if v.Uses == 0 || len(args) == 0 {
201+
return
202+
}
203+
204+
// If the address of the auto reaches a memory or control
205+
// operation not covered above then we probably need to keep it.
206+
if v.Type.IsMemory() || v.Type.IsFlags() || (v.Op != OpPhi && v.MemoryArg() != nil) {
207+
for _, a := range args {
208+
if n, ok := addr[a]; ok {
209+
if !used[n] {
210+
used[n] = true
211+
changed = true
212+
}
213+
}
214+
}
215+
return
216+
}
217+
218+
// Propagate any auto addresses through v.
219+
node := GCNode(nil)
220+
for _, a := range args {
221+
if n, ok := addr[a]; ok && !used[n] {
222+
if node == nil {
223+
node = n
224+
} else if node != n {
225+
// Most of the time we only see one pointer
226+
// reaching an op, but some ops can take
227+
// multiple pointers (e.g. NeqPtr, Phi etc.).
228+
// This is rare, so just propagate the first
229+
// value to keep things simple.
230+
used[n] = true
231+
changed = true
232+
}
233+
}
234+
}
235+
if node == nil {
236+
return
237+
}
238+
if addr[v] == nil {
239+
// The address of an auto reaches this op.
240+
addr[v] = node
241+
changed = true
242+
return
243+
}
244+
if addr[v] != node {
245+
// This doesn't happen in practice, but catch it just in case.
246+
used[node] = true
247+
changed = true
248+
}
249+
return
250+
}
251+
252+
iterations := 0
253+
for {
254+
if iterations == 4 {
255+
// give up
256+
return
257+
}
258+
iterations++
259+
changed := false
260+
for _, b := range f.Blocks {
261+
for _, v := range b.Values {
262+
changed = visit(v) || changed
263+
}
264+
}
265+
if !changed {
266+
break
267+
}
268+
}
269+
270+
// Eliminate stores to unread autos.
271+
for v, n := range elim {
272+
if used[n] {
273+
continue
274+
}
275+
// replace with OpCopy
276+
v.SetArgs1(v.MemoryArg())
277+
v.Aux = nil
278+
v.AuxInt = 0
279+
v.Op = OpCopy
280+
}
281+
}
282+
136283
// elimUnreadAutos deletes stores (and associated bookkeeping ops VarDef and VarKill)
137284
// to autos that are never read from.
138285
func elimUnreadAutos(f *Func) {

0 commit comments

Comments
 (0)