Description
What version of Go are you using (go version
)?
$ go version go version go1.14.3 linux/arm64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (go env
)?
go env
Output
GO111MODULE="" GOARCH="arm64" GOBIN="" GOCACHE="/home/barak/.cache/go-build" GOENV="/home/barak/.config/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="arm64" GOHOSTOS="linux" GOINSECURE="" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/home/barak/src/popcount" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/usr/lib/go" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/lib/go/pkg/tool/linux_arm64" GCCGO="gccgo" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build456776336=/tmp/go-build -gno-record-gcc-switches"
What did you do?
I was working with direct Arm64 assembly instructions and discovered the following minimal reproduction:
main.go:
// +build arm64,!gccgo
package main
import "fmt"
func CountBytes(s []byte) uint64 {
if len(s) != 16 {
return 0
}
return countSixteenBytes(&s[0], uint64(len(s)))
}
// This function is implemented in sixteen_bytes_arm64.s
//go:noescape
func countSixteenBytes(src *byte, len uint64) (ret uint64)
func main() {
bytes := []byte{
0x03, 0x03, 0x03, 0x03,
0x03, 0x03, 0x03, 0x03,
0x03, 0x03, 0x03, 0x03,
0x03, 0x03, 0x03, 0x03,
}
x := CountBytes(bytes)
fmt.Println(x) // SHOULD PRINT 32, PRINTS 16
}
sixteen_bytes_arm64.s
// +build arm64,!gccgo,!appengine
#include "textflag.h"
TEXT ·countSixteenBytes(SB), NOSPLIT, $0
MOVD src+0(FP), R1
MOVD len+8(FP), R2
VLD1.P 16(R1), [V16.B16]
VCNT V16.B16, V7.B16
WORD $0x6E3038E8 // UADDLV V7.B16, V8.H
VMOV V8.H[0], R0
MOVD R0, ret+16(FP)
RET
The line VCNT V16.B16, V7.B16
is the troubling one. The assembler accepts it, as it is a valid instruction.
What did you expect to see?
I expected the CountBytes function to return 32 (as indeed, the population count of the array is 32)
Instead, it returns 16.
What did you see instead?
Stepping through the debugger, I saw something very odd in the disassembly.
objdump -d repro
000000000009a970 <main.countSixteenBytes>:
9a970: f94007e1 ldr x1, [sp, #8]
9a974: f9400be2 ldr x2, [sp, #16]
9a978: 4cdf7030 ld1 {v16.16b}, [x1], #16
9a97c: 0e205a07 cnt v7.8b, v16.8b
9a980: 6e3038e8 uaddlv h8, v7.16b
9a984: 0e023d00 umov w0, v8.h[0]
9a988: f9000fe0 str x0, [sp, #24]
9a98c: d65f03c0 ret
This is the silent error. I asked for the equivalent of
4e205a07 cnt v7.16b, v16.16b
Which is the extended, arm64, 16-byte vector version of the instruction, and instead got the 8-byte version when assembled.
(On page 167 of this ARM64 guide, the Q flag bit is not set in the first instruction byte, when it should be: https://static.docs.arm.com/ddi0596/a/DDI_0596_ARM_a64_instruction_set_architecture.pdf)