Skip to content

cmd/6g: range loops are slower than handmade loops #3909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
remyoudompheng opened this issue Aug 5, 2012 · 5 comments
Closed

cmd/6g: range loops are slower than handmade loops #3909

remyoudompheng opened this issue Aug 5, 2012 · 5 comments

Comments

@remyoudompheng
Copy link
Contributor

1. What is a short input program that triggers the error?

package p

import (
    "testing"
)

var slice = make([]byte, 256)

func loop() (t byte) {
    for i := 0; i < len(slice); i++ {
        t ^= slice[i]
    }
    return
}

func rangeloop() (t byte) {
    for _, b := range slice {
        t ^= b
    }
    return
}

func BenchmarkLoop(b *testing.B) {
    for i := 0; i < b.N; i++ {
        loop()
    }
}

func BenchmarkRange(b *testing.B) {
    for i := 0; i < b.N; i++ {
        rangeloop()
    }
}


2. What is the full compiler output?

GOARCH=386:
BenchmarkLoop    1000000          1239 ns/op
BenchmarkRange   5000000           695 ns/op

The generated range code is nearly optimal:

--- prog list "rangeloop" ---
[...]
0032 (loop_test.go:17) LEAL    autotmp_0005+-12(SP),BX
0033 (loop_test.go:17) MOVL    (BX),CX
0034 (loop_test.go:17) JMP     ,36
0035 (loop_test.go:17) INCL    ,AX
0036 (loop_test.go:17) CMPL    AX,SI
0037 (loop_test.go:17) JGE     $0,42
0038 (loop_test.go:17) MOVBLZX (CX),BP
0039 (loop_test.go:17) INCL    ,CX
0040 (loop_test.go:18) XORB    BP,t+0(FP)
0041 (loop_test.go:17) JMP     ,35
0042 (loop_test.go:20) RET     ,

GOARCH=amd64:
BenchmarkLoop    5000000           341 ns/op
BenchmarkRange   5000000           533 ns/op

The generate code for the loop body is quite shocking:

--- prog list "rangeloop" ---
[...]
0031 (loop_test.go:17) LEAQ    autotmp_0000+-16(SP),BX
0032 (loop_test.go:17) MOVQ    (BX),CX
0033 (loop_test.go:17) JMP     ,35
0034 (loop_test.go:17) INCL    ,AX
0035 (loop_test.go:17) MOVL    DI,BP
0036 (loop_test.go:17) CMPL    AX,DI
0037 (loop_test.go:17) JGE     ,49
0038 (loop_test.go:17) MOVQ    CX,BX
0039 (loop_test.go:17) MOVB    (CX),BP
0040 (loop_test.go:17) MOVB    BP,SI
0041 (loop_test.go:17) MOVB    SI,BX
0042 (loop_test.go:17) MOVB    BX,DX // 4 mov to do MOVB (CX), DX ?
0043 (loop_test.go:17) INCQ    ,CX
0044 (loop_test.go:17) MOVQ    CX,BX // BX is clobbered below
0045 (loop_test.go:18) MOVB    DX,BX
0046 (loop_test.go:18) XORB    BX,R8
0047 (loop_test.go:18) MOVB    R8,t+0(FP)
0048 (loop_test.go:17) JMP     ,34
0049 (loop_test.go:20) RET     ,


3. What version of the compiler are you using?  (Run it with the -V flag.)

$ /usr/bin/go tool 6g -V
6g version go1.0.2
@rsc
Copy link
Contributor

rsc commented Aug 5, 2012

Comment 1:

Don't spend too much time on this. It's just a giant waste. No one
understands what will be fast or not.

@robpike
Copy link
Contributor

robpike commented Aug 5, 2012

Comment 2:

The title of this issue contradicts the content. What is the problem exactly?

@remyoudompheng
Copy link
Contributor Author

Comment 3:

The issue is that BenchmarkRange is abnormally slower than BenchmarkLoop on amd64. The
generated code shows unoptimized instructions that do not appear on 386.

@robpike
Copy link
Contributor

robpike commented Aug 7, 2012

Comment 4:

Oh I see, there are two sets of benchmark numbers. Missed that the first time.
Those regular loops look great! :)

@remyoudompheng
Copy link
Contributor Author

Comment 5:

This issue was closed by revision 8f3c205.

Status changed to Fixed.

@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants