cmd/6g: range loops are slower than handmade loops #3909

remyoudompheng · 2012-08-05T07:39:53Z

1. What is a short input program that triggers the error?

package p

import (
    "testing"
)

var slice = make([]byte, 256)

func loop() (t byte) {
    for i := 0; i < len(slice); i++ {
        t ^= slice[i]
    }
    return
}

func rangeloop() (t byte) {
    for _, b := range slice {
        t ^= b
    }
    return
}

func BenchmarkLoop(b *testing.B) {
    for i := 0; i < b.N; i++ {
        loop()
    }
}

func BenchmarkRange(b *testing.B) {
    for i := 0; i < b.N; i++ {
        rangeloop()
    }
}


2. What is the full compiler output?

GOARCH=386:
BenchmarkLoop    1000000          1239 ns/op
BenchmarkRange   5000000           695 ns/op

The generated range code is nearly optimal:

--- prog list "rangeloop" ---
[...]
0032 (loop_test.go:17) LEAL    autotmp_0005+-12(SP),BX
0033 (loop_test.go:17) MOVL    (BX),CX
0034 (loop_test.go:17) JMP     ,36
0035 (loop_test.go:17) INCL    ,AX
0036 (loop_test.go:17) CMPL    AX,SI
0037 (loop_test.go:17) JGE     $0,42
0038 (loop_test.go:17) MOVBLZX (CX),BP
0039 (loop_test.go:17) INCL    ,CX
0040 (loop_test.go:18) XORB    BP,t+0(FP)
0041 (loop_test.go:17) JMP     ,35
0042 (loop_test.go:20) RET     ,

GOARCH=amd64:
BenchmarkLoop    5000000           341 ns/op
BenchmarkRange   5000000           533 ns/op

The generate code for the loop body is quite shocking:

--- prog list "rangeloop" ---
[...]
0031 (loop_test.go:17) LEAQ    autotmp_0000+-16(SP),BX
0032 (loop_test.go:17) MOVQ    (BX),CX
0033 (loop_test.go:17) JMP     ,35
0034 (loop_test.go:17) INCL    ,AX
0035 (loop_test.go:17) MOVL    DI,BP
0036 (loop_test.go:17) CMPL    AX,DI
0037 (loop_test.go:17) JGE     ,49
0038 (loop_test.go:17) MOVQ    CX,BX
0039 (loop_test.go:17) MOVB    (CX),BP
0040 (loop_test.go:17) MOVB    BP,SI
0041 (loop_test.go:17) MOVB    SI,BX
0042 (loop_test.go:17) MOVB    BX,DX // 4 mov to do MOVB (CX), DX ?
0043 (loop_test.go:17) INCQ    ,CX
0044 (loop_test.go:17) MOVQ    CX,BX // BX is clobbered below
0045 (loop_test.go:18) MOVB    DX,BX
0046 (loop_test.go:18) XORB    BX,R8
0047 (loop_test.go:18) MOVB    R8,t+0(FP)
0048 (loop_test.go:17) JMP     ,34
0049 (loop_test.go:20) RET     ,


3. What version of the compiler are you using?  (Run it with the -V flag.)

$ /usr/bin/go tool 6g -V
6g version go1.0.2

rsc · 2012-08-05T16:56:52Z

Comment 1:

Don't spend too much time on this. It's just a giant waste. No one
understands what will be fast or not.

robpike · 2012-08-05T18:02:43Z

Comment 2:

The title of this issue contradicts the content. What is the problem exactly?

remyoudompheng · 2012-08-05T18:19:22Z

Comment 3:

The issue is that BenchmarkRange is abnormally slower than BenchmarkLoop on amd64. The
generated code shows unoptimized instructions that do not appear on 386.

robpike · 2012-08-07T01:01:29Z

Comment 4:

Oh I see, there are two sets of benchmark numbers. Missed that the first time.
Those regular loops look great! :)

remyoudompheng · 2012-09-01T14:41:00Z

Comment 5:

This issue was closed by revision 8f3c205.

Status changed to Fixed.

remyoudompheng added fixed Performance labels Sep 1, 2012

golang locked and limited conversation to collaborators Jun 24, 2016

gopherbot added the FrozenDueToAge label Jun 24, 2016

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/6g: range loops are slower than handmade loops #3909

cmd/6g: range loops are slower than handmade loops #3909

remyoudompheng commented Aug 5, 2012

rsc commented Aug 5, 2012

robpike commented Aug 5, 2012

remyoudompheng commented Aug 5, 2012

robpike commented Aug 7, 2012

remyoudompheng commented Sep 1, 2012

cmd/6g: range loops are slower than handmade loops #3909

cmd/6g: range loops are slower than handmade loops #3909

Comments

remyoudompheng commented Aug 5, 2012

rsc commented Aug 5, 2012

robpike commented Aug 5, 2012

remyoudompheng commented Aug 5, 2012

robpike commented Aug 7, 2012

remyoudompheng commented Sep 1, 2012