Skip to content

Commit 1850874

Browse files
Wei Xiaocherrymui
Wei Xiao
authored andcommitted
reflect: optimize CALLFN wrapper for arm64
Optimize arm64 CALLFN wrapper with LDP/STP instructions. This provides a significant speedup for big argument copy. Benchmark results for reflect: name old time/op new time/op delta Call-8 79.0ns ± 4% 73.6ns ± 4% -6.78% (p=0.000 n=10+10) CallArgCopy/size=128-8 80.5ns ± 0% 60.3ns ± 0% -25.06% (p=0.000 n=10+9) CallArgCopy/size=256-8 119ns ± 2% 67ns ± 1% -43.59% (p=0.000 n=8+10) CallArgCopy/size=1024-8 524ns ± 1% 99ns ± 1% -81.03% (p=0.000 n=10+10) CallArgCopy/size=4096-8 837ns ± 0% 231ns ± 1% -72.42% (p=0.000 n=9+9) CallArgCopy/size=65536-8 13.6µs ± 6% 3.1µs ± 1% -77.38% (p=0.000 n=10+10) PtrTo-8 12.9ns ± 0% 13.1ns ± 3% +1.86% (p=0.000 n=10+10) FieldByName1-8 28.7ns ± 2% 28.6ns ± 2% ~ (p=0.408 n=9+10) FieldByName2-8 928ns ± 4% 946ns ± 8% ~ (p=0.326 n=9+10) FieldByName3-8 5.35µs ± 5% 5.32µs ± 5% ~ (p=0.755 n=10+10) InterfaceBig-8 2.57ns ± 0% 2.57ns ± 0% ~ (all equal) InterfaceSmall-8 2.57ns ± 0% 2.57ns ± 0% ~ (all equal) New-8 9.09ns ± 1% 8.83ns ± 1% -2.81% (p=0.000 n=10+9) name old alloc/op new alloc/op delta Call-8 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call-8 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128-8 1.59GB/s ± 0% 2.12GB/s ± 1% +33.46% (p=0.000 n=10+9) CallArgCopy/size=256-8 2.14GB/s ± 2% 3.81GB/s ± 1% +78.02% (p=0.000 n=8+10) CallArgCopy/size=1024-8 1.95GB/s ± 1% 10.30GB/s ± 0% +427.99% (p=0.000 n=10+9) CallArgCopy/size=4096-8 4.89GB/s ± 0% 17.69GB/s ± 1% +261.87% (p=0.000 n=9+9) CallArgCopy/size=65536-8 4.84GB/s ± 6% 21.36GB/s ± 1% +341.67% (p=0.000 n=10+10) Change-Id: I775d88b30c43cb2eda1d0612ac15e6d283e70beb Reviewed-on: https://go-review.googlesource.com/70570 Reviewed-by: Cherry Zhang <[email protected]> Run-TryBot: Cherry Zhang <[email protected]> TryBot-Result: Gobot Gobot <[email protected]>
1 parent 378de1a commit 1850874

File tree

1 file changed

+20
-10
lines changed

1 file changed

+20
-10
lines changed

src/runtime/asm_arm64.s

Lines changed: 20 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -368,16 +368,26 @@ TEXT NAME(SB), WRAPPER, $MAXSIZE-24; \
368368
NO_LOCAL_POINTERS; \
369369
/* copy arguments to stack */ \
370370
MOVD arg+16(FP), R3; \
371-
MOVWU argsize+24(FP), R4; \
372-
MOVD RSP, R5; \
373-
ADD $(8-1), R5; \
374-
SUB $1, R3; \
375-
ADD R5, R4; \
376-
CMP R5, R4; \
377-
BEQ 4(PC); \
378-
MOVBU.W 1(R3), R6; \
379-
MOVBU.W R6, 1(R5); \
380-
B -4(PC); \
371+
MOVWU argsize+24(FP), R4; \
372+
ADD $8, RSP, R5; \
373+
BIC $0xf, R4, R6; \
374+
CBZ R6, 6(PC); \
375+
/* if R6=(argsize&~15) != 0 */ \
376+
ADD R6, R5, R6; \
377+
/* copy 16 bytes a time */ \
378+
LDP.P 16(R3), (R7, R8); \
379+
STP.P (R7, R8), 16(R5); \
380+
CMP R5, R6; \
381+
BNE -3(PC); \
382+
AND $0xf, R4, R6; \
383+
CBZ R6, 6(PC); \
384+
/* if R6=(argsize&15) != 0 */ \
385+
ADD R6, R5, R6; \
386+
/* copy 1 byte a time for the rest */ \
387+
MOVBU.P 1(R3), R7; \
388+
MOVBU.P R7, 1(R5); \
389+
CMP R5, R6; \
390+
BNE -3(PC); \
381391
/* call function */ \
382392
MOVD f+8(FP), R26; \
383393
MOVD (R26), R0; \

0 commit comments

Comments
 (0)