Skip to content

Commit ab94ede

Browse files
committed
cmd/compile: fix long RMW bit operations on AMD64
Under certain circumstances, the existing rules for bit operations can produce code that writes beyond its intended bounds. For example, consider the following code: func repro(b []byte, addr, bit int32) { _ = b[3] v := uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 | uint32(b[3])<<24 | 1<<(bit&31) b[0] = byte(v) b[1] = byte(v >> 8) b[2] = byte(v >> 16) b[3] = byte(v >> 24) } Roughly speaking: 1. The expression `1 << (bit & 31)` is rewritten into `(SHLL 1 bit)` 2. The expression `uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 | uint32(b[3])<<24` is rewritten into `(MOVLload &b[0])` 3. The statements `b[0] = byte(v) ... b[3] = byte(v >> 24)` are rewritten into `(MOVLstore &b[0], v)` 4. `(ORL (SHLL 1, bit) (MOVLload &b[0]))` is rewritten into `(BTSL (MOVLload &b[0]) bit)`. This is a valid transformation because the destination is a register: in this case, the bit offset is masked by the number of bits in the destination register. This is identical to the masking performed by `SHL`. 5. `(MOVLstore &b[0] (BTSL (MOVLload &b[0]) bit))` is rewritten into `(BTSLmodify &b[0] bit)`. This is an invalid transformation because the destination is memory: in this case, the bit offset is not masked, and the chosen instruction may write outside its intended 32-bit location. These changes fix the invalid rewrite performed in step (5) by explicitly maksing the bit offset operand to `BT(S|R|C)(L|Q)modify`. In the example above, the adjusted rules produce `(BTSLmodify &b[0] (ANDLconst [31] bit))` in step (5). These changes also add several new rules to rewrite bit sets, toggles, and clears that are rooted at `(OR|XOR|AND)(L|Q)modify` operators into appropriate `BT(S|R|C)(L|Q)modify` operators. These rules catch cases where `MOV(L|Q)store ((OR|XOR|AND)(L|Q) ...)` is rewritten to `(OR|XOR|AND)(L|Q)modify` before the `(OR|XOR|AND)(L|Q) ...` can be rewritten to `BT(S|R|C)(L|Q) ...`. Overall, compilecmp reports small improvements in code size on darwin/amd64 when the changes to the compiler itself are exlcuded: file before after Δ % runtime.s 536464 536412 -52 -0.010% bytes.s 32629 32593 -36 -0.110% strings.s 44565 44529 -36 -0.081% os/signal.s 7967 7959 -8 -0.100% cmd/vendor/golang.org/x/sys/unix.s 81686 81678 -8 -0.010% math/big.s 188235 188253 +18 +0.010% cmd/link/internal/loader.s 89295 89056 -239 -0.268% cmd/link/internal/ld.s 633551 633232 -319 -0.050% cmd/link/internal/arm.s 18934 18928 -6 -0.032% cmd/link/internal/arm64.s 31814 31801 -13 -0.041% cmd/link/internal/riscv64.s 7347 7345 -2 -0.027% cmd/compile/internal/ssa.s 4029173 4033066 +3893 +0.097% total 21298280 21301472 +3192 +0.015%
1 parent 98a9023 commit ab94ede

File tree

4 files changed

+244
-24
lines changed

4 files changed

+244
-24
lines changed

src/cmd/compile/internal/ssa/gen/AMD64.rules

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -624,6 +624,14 @@
624624
// Recognize bit setting (a |= 1<<b) and toggling (a ^= 1<<b)
625625
(OR(Q|L) (SHL(Q|L) (MOV(Q|L)const [1]) y) x) => (BTS(Q|L) x y)
626626
(XOR(Q|L) (SHL(Q|L) (MOV(Q|L)const [1]) y) x) => (BTC(Q|L) x y)
627+
(ORLmodify [off] {sym} ptr s:(SHLL (MOVLconst [1]) <t> x) mem) =>
628+
(BTSLmodify [off] {sym} ptr (ANDLconst <t> [31] x) mem)
629+
(ORQmodify [off] {sym} ptr s:(SHLQ (MOVQconst [1]) <t> x) mem) =>
630+
(BTSQmodify [off] {sym} ptr (ANDQconst <t> [63] x) mem)
631+
(XORLmodify [off] {sym} ptr s:(SHLL (MOVLconst [1]) <t> x) mem) =>
632+
(BTCLmodify [off] {sym} ptr (ANDLconst <t> [31] x) mem)
633+
(XORQmodify [off] {sym} ptr s:(SHLQ (MOVQconst [1]) <t> x) mem) =>
634+
(BTCQmodify [off] {sym} ptr (ANDQconst <t> [63] x) mem)
627635

628636
// Convert ORconst into BTS, if the code gets smaller, with boundary being
629637
// (ORL $40,AX is 3 bytes, ORL $80,AX is 6 bytes).
@@ -646,6 +654,10 @@
646654
=> (BTRQconst [int8(log64(^c))] x)
647655
(ANDL (MOVLconst [c]) x) && isUint32PowerOfTwo(int64(^c)) && uint64(^c) >= 128
648656
=> (BTRLconst [int8(log32(^c))] x)
657+
(ANDLmodify [off] {sym} ptr (NOTL s:(SHLL (MOVLconst [1]) <t> x)) mem) =>
658+
(BTRLmodify [off] {sym} ptr (ANDLconst <t> [31] x) mem)
659+
(ANDQmodify [off] {sym} ptr (NOTQ s:(SHLQ (MOVQconst [1]) <t> x)) mem) =>
660+
(BTRQmodify [off] {sym} ptr (ANDQconst <t> [63] x) mem)
649661

650662
// Special-case bit patterns on first/last bit.
651663
// generic.rules changes ANDs of high-part/low-part masks into a couple of shifts,
@@ -2064,11 +2076,15 @@
20642076
((ADD|SUB|MUL|DIV)SD x l:(MOVSDload [off] {sym} ptr mem)) && canMergeLoadClobber(v, l, x) && clobber(l) => ((ADD|SUB|MUL|DIV)SDload x [off] {sym} ptr mem)
20652077
((ADD|SUB|MUL|DIV)SS x l:(MOVSSload [off] {sym} ptr mem)) && canMergeLoadClobber(v, l, x) && clobber(l) => ((ADD|SUB|MUL|DIV)SSload x [off] {sym} ptr mem)
20662078
(MOVLstore {sym} [off] ptr y:((ADD|AND|OR|XOR)Lload x [off] {sym} ptr mem) mem) && y.Uses==1 && clobber(y) => ((ADD|AND|OR|XOR)Lmodify [off] {sym} ptr x mem)
2067-
(MOVLstore {sym} [off] ptr y:((ADD|SUB|AND|OR|XOR|BTC|BTR|BTS)L l:(MOVLload [off] {sym} ptr mem) x) mem) && y.Uses==1 && l.Uses==1 && clobber(y, l) =>
2068-
((ADD|SUB|AND|OR|XOR|BTC|BTR|BTS)Lmodify [off] {sym} ptr x mem)
2079+
(MOVLstore {sym} [off] ptr y:((ADD|SUB|AND|OR|XOR)L l:(MOVLload [off] {sym} ptr mem) x) mem) && y.Uses==1 && l.Uses==1 && clobber(y, l) =>
2080+
((ADD|SUB|AND|OR|XOR)Lmodify [off] {sym} ptr x mem)
2081+
(MOVLstore {sym} [off] ptr y:((BTC|BTR|BTS)L l:(MOVLload [off] {sym} ptr mem) <t> x) mem) && y.Uses==1 && l.Uses==1 && clobber(y, l) =>
2082+
((BTC|BTR|BTS)Lmodify [off] {sym} ptr (ANDLconst <t> [31] x) mem)
20692083
(MOVQstore {sym} [off] ptr y:((ADD|AND|OR|XOR)Qload x [off] {sym} ptr mem) mem) && y.Uses==1 && clobber(y) => ((ADD|AND|OR|XOR)Qmodify [off] {sym} ptr x mem)
2070-
(MOVQstore {sym} [off] ptr y:((ADD|SUB|AND|OR|XOR|BTC|BTR|BTS)Q l:(MOVQload [off] {sym} ptr mem) x) mem) && y.Uses==1 && l.Uses==1 && clobber(y, l) =>
2071-
((ADD|SUB|AND|OR|XOR|BTC|BTR|BTS)Qmodify [off] {sym} ptr x mem)
2084+
(MOVQstore {sym} [off] ptr y:((ADD|SUB|AND|OR|XOR)Q l:(MOVQload [off] {sym} ptr mem) x) mem) && y.Uses==1 && l.Uses==1 && clobber(y, l) =>
2085+
((ADD|SUB|AND|OR|XOR)Qmodify [off] {sym} ptr x mem)
2086+
(MOVQstore {sym} [off] ptr y:((BTC|BTR|BTS)Q l:(MOVQload [off] {sym} ptr mem) <t> x) mem) && y.Uses==1 && l.Uses==1 && clobber(y, l) =>
2087+
((BTC|BTR|BTS)Qmodify [off] {sym} ptr (ANDQconst <t> [63] x) mem)
20722088

20732089
// Merge ADDQconst and LEAQ into atomic loads.
20742090
(MOV(Q|L|B)atomicload [off1] {sym} (ADDQconst [off2] ptr) mem) && is32Bit(int64(off1)+int64(off2)) =>

src/cmd/compile/internal/ssa/gen/AMD64Ops.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -363,6 +363,11 @@ func init() {
363363
{name: "BTSQconst", argLength: 1, reg: gp11, asm: "BTSQ", resultInArg0: true, clobberFlags: true, aux: "Int8"}, // set bit auxint in arg0, 0 <= auxint < 64
364364

365365
// direct bit operation on memory operand
366+
//
367+
// Note that these operations do not mask the bit offset (arg1), and will write beyond their expected
368+
// bounds if that argument is larger than 64/32 (for BT*Q and BT*L, respectively). If the compiler
369+
// cannot prove that arg1 is in range, it must be explicitly masked (see e.g. the patterns that produce
370+
// BT*modify from (MOVstore (BT* (MOVLload ptr mem) x) mem)).
366371
{name: "BTCQmodify", argLength: 3, reg: gpstore, asm: "BTCQ", aux: "SymOff", typ: "Mem", clobberFlags: true, faultOnNilArg0: true, symEffect: "Read,Write"}, // complement bit arg1 in 64-bit arg0+auxint+aux, arg2=mem
367372
{name: "BTCLmodify", argLength: 3, reg: gpstore, asm: "BTCL", aux: "SymOff", typ: "Mem", clobberFlags: true, faultOnNilArg0: true, symEffect: "Read,Write"}, // complement bit arg1 in 32-bit arg0+auxint+aux, arg2=mem
368373
{name: "BTSQmodify", argLength: 3, reg: gpstore, asm: "BTSQ", aux: "SymOff", typ: "Mem", clobberFlags: true, faultOnNilArg0: true, symEffect: "Read,Write"}, // set bit arg1 in 64-bit arg0+auxint+aux, arg2=mem

0 commit comments

Comments
 (0)