Skip to content

arm64: Assembler silently changes 16B to 8B vectors in the VCNT instruction #39445

Closed
@barakmich

Description

@barakmich

What version of Go are you using (go version)?

$ go version
go version go1.14.3 linux/arm64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/home/barak/.cache/go-build"
GOENV="/home/barak/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/barak/src/popcount"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_arm64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build456776336=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I was working with direct Arm64 assembly instructions and discovered the following minimal reproduction:

main.go:

// +build arm64,!gccgo

package main

import "fmt"

func CountBytes(s []byte) uint64 {
	if len(s) != 16 {
		return 0
	}
	return countSixteenBytes(&s[0], uint64(len(s)))
}

// This function is implemented in sixteen_bytes_arm64.s
//go:noescape
func countSixteenBytes(src *byte, len uint64) (ret uint64)

func main() {
	bytes := []byte{
		0x03, 0x03, 0x03, 0x03,
		0x03, 0x03, 0x03, 0x03,
		0x03, 0x03, 0x03, 0x03,
		0x03, 0x03, 0x03, 0x03,
	}
	x := CountBytes(bytes)
	fmt.Println(x) // SHOULD PRINT 32, PRINTS 16
}

sixteen_bytes_arm64.s

// +build arm64,!gccgo,!appengine

#include "textflag.h"

TEXT ·countSixteenBytes(SB), NOSPLIT, $0
	MOVD src+0(FP), R1
	MOVD len+8(FP), R2

	VLD1.P 16(R1), [V16.B16]
	VCNT   V16.B16, V7.B16
	WORD $0x6E3038E8            // UADDLV V7.B16, V8.H
	VMOV V8.H[0], R0
	MOVD R0, ret+16(FP)
	RET

The line VCNT V16.B16, V7.B16 is the troubling one. The assembler accepts it, as it is a valid instruction.

What did you expect to see?

I expected the CountBytes function to return 32 (as indeed, the population count of the array is 32)

Instead, it returns 16.

What did you see instead?

Stepping through the debugger, I saw something very odd in the disassembly.

objdump -d repro

000000000009a970 <main.countSixteenBytes>:
   9a970:       f94007e1        ldr     x1, [sp, #8]
   9a974:       f9400be2        ldr     x2, [sp, #16]
   9a978:       4cdf7030        ld1     {v16.16b}, [x1], #16
   9a97c:       0e205a07        cnt     v7.8b, v16.8b
   9a980:       6e3038e8        uaddlv  h8, v7.16b
   9a984:       0e023d00        umov    w0, v8.h[0]
   9a988:       f9000fe0        str     x0, [sp, #24]
   9a98c:       d65f03c0        ret

This is the silent error. I asked for the equivalent of

     4e205a07        cnt     v7.16b, v16.16b

Which is the extended, arm64, 16-byte vector version of the instruction, and instead got the 8-byte version when assembled.
(On page 167 of this ARM64 guide, the Q flag bit is not set in the first instruction byte, when it should be: https://static.docs.arm.com/ddi0596/a/DDI_0596_ARM_a64_instruction_set_architecture.pdf)

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsFixThe path to resolution is known, but the work has not been done.arch-arm64

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions