-
Notifications
You must be signed in to change notification settings - Fork 18.4k
cmd/compile: fix long RMW bit operations on AMD64 #45242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
|
@googlebot I fixed it. |
|
This PR (HEAD: 0198052) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/304869 to see it. Tip: You can toggle comments from me using the |
|
Message from Go Bot: Patch Set 1: Congratulations on opening your first change. Thank you for your contribution! Next steps: Most changes in the Go project go through a few rounds of revision. This can be During May-July and Nov-Jan the Go project is in a code freeze, during which Please don’t reply on this GitHub thread. Visit golang.org/cl/304869. |
|
Message from Keith Randall: Patch Set 1: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/304869. |
|
This PR (HEAD: 42e795b) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/304869 to see it. Tip: You can toggle comments from me using the |
|
Message from Pat Gavlin: Patch Set 1: (2 comments) Please don’t reply on this GitHub thread. Visit golang.org/cl/304869. |
|
Message from Keith Randall: Patch Set 2: Run-TryBot+1 Code-Review+2 (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/304869. |
|
Message from Go Bot: Patch Set 2: TryBots beginning. Status page: https://farmer.golang.org/try?commit=b6ed9729 Please don’t reply on this GitHub thread. Visit golang.org/cl/304869. |
|
Message from Go Bot: Patch Set 2: TryBot-Result+1 TryBots are happy. Please don’t reply on this GitHub thread. Visit golang.org/cl/304869. |
|
Message from Pat Gavlin: Patch Set 2: (2 comments) Please don’t reply on this GitHub thread. Visit golang.org/cl/304869. |
|
This PR (HEAD: f12a621) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/304869 to see it. Tip: You can toggle comments from me using the |
|
Message from Keith Randall: Patch Set 3: Code-Review+2 (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/304869. |
|
This PR (HEAD: 7e92885) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/304869 to see it. Tip: You can toggle comments from me using the |
|
Message from Pat Gavlin: Patch Set 5: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/304869. |
|
Message from Josh Bleecher Snyder: Patch Set 5: Trust+1 (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/304869. |
|
Message from Keith Randall: Patch Set 5: Code-Review+2 (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/304869. |
|
Message from Pat Gavlin: Patch Set 5: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/304869. |
|
This PR (HEAD: ab94ede) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/304869 to see it. Tip: You can toggle comments from me using the |
|
@gopherbot please open backport issues. |
|
Message from Keith Randall: Patch Set 6: Code-Review+2 (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/304869. |
|
Message from Pat Gavlin: Patch Set 6: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/304869. |
Under certain circumstances, the existing rules for bit operations can
produce code that writes beyond its intended bounds. For example,
consider the following code:
func repro(b []byte, addr, bit int32) {
_ = b[3]
v := uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 | uint32(b[3])<<24 | 1<<(bit&31)
b[0] = byte(v)
b[1] = byte(v >> 8)
b[2] = byte(v >> 16)
b[3] = byte(v >> 24)
}
Roughly speaking:
1. The expression `1 << (bit & 31)` is rewritten into `(SHLL 1 bit)`
2. The expression `uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 |
uint32(b[3])<<24` is rewritten into `(MOVLload &b[0])`
3. The statements `b[0] = byte(v) ... b[3] = byte(v >> 24)` are
rewritten into `(MOVLstore &b[0], v)`
4. `(ORL (SHLL 1, bit) (MOVLload &b[0]))` is rewritten into
`(BTSL (MOVLload &b[0]) bit)`. This is a valid transformation because
the destination is a register: in this case, the bit offset is masked
by the number of bits in the destination register. This is identical
to the masking performed by `SHL`.
5. `(MOVLstore &b[0] (BTSL (MOVLload &b[0]) bit))` is rewritten into
`(BTSLmodify &b[0] bit)`. This is an invalid transformation because
the destination is memory: in this case, the bit offset is not
masked, and the chosen instruction may write outside its intended
32-bit location.
These changes fix the invalid rewrite performed in step (5) by
explicitly maksing the bit offset operand to `BT(S|R|C)(L|Q)modify`. In
the example above, the adjusted rules produce
`(BTSLmodify &b[0] (ANDLconst [31] bit))` in step (5).
These changes also add several new rules to rewrite bit sets, toggles,
and clears that are rooted at `(OR|XOR|AND)(L|Q)modify` operators into
appropriate `BT(S|R|C)(L|Q)modify` operators. These rules catch cases
where `MOV(L|Q)store ((OR|XOR|AND)(L|Q) ...)` is rewritten to
`(OR|XOR|AND)(L|Q)modify` before the `(OR|XOR|AND)(L|Q) ...` can be
rewritten to `BT(S|R|C)(L|Q) ...`.
Overall, compilecmp reports small improvements in code size on
darwin/amd64 when the changes to the compiler itself are exlcuded:
file before after Δ %
runtime.s 536464 536412 -52 -0.010%
bytes.s 32629 32593 -36 -0.110%
strings.s 44565 44529 -36 -0.081%
os/signal.s 7967 7959 -8 -0.100%
cmd/vendor/golang.org/x/sys/unix.s 81686 81678 -8 -0.010%
math/big.s 188235 188253 +18 +0.010%
cmd/link/internal/loader.s 89295 89056 -239 -0.268%
cmd/link/internal/ld.s 633551 633232 -319 -0.050%
cmd/link/internal/arm.s 18934 18928 -6 -0.032%
cmd/link/internal/arm64.s 31814 31801 -13 -0.041%
cmd/link/internal/riscv64.s 7347 7345 -2 -0.027%
cmd/compile/internal/ssa.s 4029173 4033066 +3893 +0.097%
total 21298280 21301472 +3192 +0.015%
|
This PR (HEAD: 9a42bd2) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/304869 to see it. Tip: You can toggle comments from me using the |
|
Message from Pat Gavlin: Patch Set 7: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/304869. |
|
Message from Keith Randall: Patch Set 8: Code-Review+2 (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/304869. |
|
Message from Pat Gavlin: Patch Set 8: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/304869. |
Under certain circumstances, the existing rules for bit operations can
produce code that writes beyond its intended bounds. For example,
consider the following code:
func repro(b []byte, addr, bit int32) {
_ = b[3]
v := uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 | uint32(b[3])<<24 | 1<<(bit&31)
b[0] = byte(v)
b[1] = byte(v >> 8)
b[2] = byte(v >> 16)
b[3] = byte(v >> 24)
}
Roughly speaking:
1. The expression `1 << (bit & 31)` is rewritten into `(SHLL 1 bit)`
2. The expression `uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 |
uint32(b[3])<<24` is rewritten into `(MOVLload &b[0])`
3. The statements `b[0] = byte(v) ... b[3] = byte(v >> 24)` are
rewritten into `(MOVLstore &b[0], v)`
4. `(ORL (SHLL 1, bit) (MOVLload &b[0]))` is rewritten into
`(BTSL (MOVLload &b[0]) bit)`. This is a valid transformation because
the destination is a register: in this case, the bit offset is masked
by the number of bits in the destination register. This is identical
to the masking performed by `SHL`.
5. `(MOVLstore &b[0] (BTSL (MOVLload &b[0]) bit))` is rewritten into
`(BTSLmodify &b[0] bit)`. This is an invalid transformation because
the destination is memory: in this case, the bit offset is not
masked, and the chosen instruction may write outside its intended
32-bit location.
These changes fix the invalid rewrite performed in step (5) by
explicitly maksing the bit offset operand to `BT(S|R|C)(L|Q)modify`. In
the example above, the adjusted rules produce
`(BTSLmodify &b[0] (ANDLconst [31] bit))` in step (5).
These changes also add several new rules to rewrite bit sets, toggles,
and clears that are rooted at `(OR|XOR|AND)(L|Q)modify` operators into
appropriate `BT(S|R|C)(L|Q)modify` operators. These rules catch cases
where `MOV(L|Q)store ((OR|XOR|AND)(L|Q) ...)` is rewritten to
`(OR|XOR|AND)(L|Q)modify` before the `(OR|XOR|AND)(L|Q) ...` can be
rewritten to `BT(S|R|C)(L|Q) ...`.
Overall, compilecmp reports small improvements in code size on
darwin/amd64 when the changes to the compiler itself are exlcuded:
file before after Δ %
runtime.s 536464 536412 -52 -0.010%
bytes.s 32629 32593 -36 -0.110%
strings.s 44565 44529 -36 -0.081%
os/signal.s 7967 7959 -8 -0.100%
cmd/vendor/golang.org/x/sys/unix.s 81686 81678 -8 -0.010%
math/big.s 188235 188253 +18 +0.010%
cmd/link/internal/loader.s 89295 89056 -239 -0.268%
cmd/link/internal/ld.s 633551 633232 -319 -0.050%
cmd/link/internal/arm.s 18934 18928 -6 -0.032%
cmd/link/internal/arm64.s 31814 31801 -13 -0.041%
cmd/link/internal/riscv64.s 7347 7345 -2 -0.027%
cmd/compile/internal/ssa.s 4029173 4033066 +3893 +0.097%
total 21298280 21301472 +3192 +0.015%
Change-Id: I2e560548b515865129e1724e150e30540e9d29ce
GitHub-Last-Rev: 9a42bd2
GitHub-Pull-Request: #45242
Reviewed-on: https://go-review.googlesource.com/c/go/+/304869
Reviewed-by: Keith Randall <[email protected]>
Trust: Josh Bleecher Snyder <[email protected]>
|
This PR is being closed because golang.org/cl/304869 has been merged. |
…MD64
Under certain circumstances, the existing rules for bit operations can
produce code that writes beyond its intended bounds. For example,
consider the following code:
func repro(b []byte, addr, bit int32) {
_ = b[3]
v := uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 | uint32(b[3])<<24 | 1<<(bit&31)
b[0] = byte(v)
b[1] = byte(v >> 8)
b[2] = byte(v >> 16)
b[3] = byte(v >> 24)
}
Roughly speaking:
1. The expression `1 << (bit & 31)` is rewritten into `(SHLL 1 bit)`
2. The expression `uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 |
uint32(b[3])<<24` is rewritten into `(MOVLload &b[0])`
3. The statements `b[0] = byte(v) ... b[3] = byte(v >> 24)` are
rewritten into `(MOVLstore &b[0], v)`
4. `(ORL (SHLL 1, bit) (MOVLload &b[0]))` is rewritten into
`(BTSL (MOVLload &b[0]) bit)`. This is a valid transformation because
the destination is a register: in this case, the bit offset is masked
by the number of bits in the destination register. This is identical
to the masking performed by `SHL`.
5. `(MOVLstore &b[0] (BTSL (MOVLload &b[0]) bit))` is rewritten into
`(BTSLmodify &b[0] bit)`. This is an invalid transformation because
the destination is memory: in this case, the bit offset is not
masked, and the chosen instruction may write outside its intended
32-bit location.
These changes fix the invalid rewrite performed in step (5) by
explicitly maksing the bit offset operand to `BT(S|R|C)(L|Q)modify`. In
the example above, the adjusted rules produce
`(BTSLmodify &b[0] (ANDLconst [31] bit))` in step (5).
These changes also add several new rules to rewrite bit sets, toggles,
and clears that are rooted at `(OR|XOR|AND)(L|Q)modify` operators into
appropriate `BT(S|R|C)(L|Q)modify` operators. These rules catch cases
where `MOV(L|Q)store ((OR|XOR|AND)(L|Q) ...)` is rewritten to
`(OR|XOR|AND)(L|Q)modify` before the `(OR|XOR|AND)(L|Q) ...` can be
rewritten to `BT(S|R|C)(L|Q) ...`.
Overall, compilecmp reports small improvements in code size on
darwin/amd64 when the changes to the compiler itself are exlcuded:
file before after Δ %
runtime.s 536464 536412 -52 -0.010%
bytes.s 32629 32593 -36 -0.110%
strings.s 44565 44529 -36 -0.081%
os/signal.s 7967 7959 -8 -0.100%
cmd/vendor/golang.org/x/sys/unix.s 81686 81678 -8 -0.010%
math/big.s 188235 188253 +18 +0.010%
cmd/link/internal/loader.s 89295 89056 -239 -0.268%
cmd/link/internal/ld.s 633551 633232 -319 -0.050%
cmd/link/internal/arm.s 18934 18928 -6 -0.032%
cmd/link/internal/arm64.s 31814 31801 -13 -0.041%
cmd/link/internal/riscv64.s 7347 7345 -2 -0.027%
cmd/compile/internal/ssa.s 4029173 4033066 +3893 +0.097%
total 21298280 21301472 +3192 +0.015%
Fixes #45253
Change-Id: I2e560548b515865129e1724e150e30540e9d29ce
GitHub-Last-Rev: ab94ede
GitHub-Pull-Request: #45242
Reviewed-on: https://go-review.googlesource.com/c/go/+/305069
Trust: Emmanuel Odeke <[email protected]>
Run-TryBot: Emmanuel Odeke <[email protected]>
TryBot-Result: Go Bot <[email protected]>
Reviewed-by: Keith Randall <[email protected]>
These operations (BT{S,R,C}{Q,L}modify) are quite a bit slower than
other ways of doing the same thing.
Without the BTxmodify operations, there are two fallback ways the compiler
performs these operations: AND/OR/XOR operations directly on memory, or
load-BTx-write sequences. The compiler kinda chooses one arbitrarily
depending on rewrite rule application order. Currently, it uses
load-BTx-write for the Const benchmarks and AND/OR/XOR directly to memory
for the non-Const benchmarks. TBD, someone might investigate which of
the two fallback strategies is really better. For now, they are both
better than BTx ops.
name old time/op new time/op delta
BitSet-8 1.09µs ± 2% 0.64µs ± 5% -41.60% (p=0.000 n=9+10)
BitClear-8 1.15µs ± 3% 0.68µs ± 6% -41.00% (p=0.000 n=10+10)
BitToggle-8 1.18µs ± 4% 0.73µs ± 2% -38.36% (p=0.000 n=10+8)
BitSetConst-8 37.0ns ± 7% 25.8ns ± 2% -30.24% (p=0.000 n=10+10)
BitClearConst-8 30.7ns ± 2% 25.0ns ±12% -18.46% (p=0.000 n=10+10)
BitToggleConst-8 36.9ns ± 1% 23.8ns ± 3% -35.46% (p=0.000 n=9+10)
Fixes #45790
Update #45242
Change-Id: Ie33a72dc139f261af82db15d446cd0855afb4e59
Reviewed-on: https://go-review.googlesource.com/c/go/+/318149
Trust: Keith Randall <[email protected]>
Run-TryBot: Keith Randall <[email protected]>
TryBot-Result: Go Bot <[email protected]>
Reviewed-by: Ben Shi <[email protected]>
Under certain circumstances, the existing rules for bit operations can
produce code that writes beyond its intended bounds. For example,
consider the following code:
Roughly speaking:
1 << (bit & 31)is rewritten into(SHLL 1 bit)uint32(b[0]) | uint32(b[1])<<8 | uint32(b[2])<<16 | uint32(b[3])<<24is rewritten into(MOVLload &b[0])b[0] = byte(v) ... b[3] = byte(v >> 24)arerewritten into
(MOVLstore &b[0], v)(ORL (SHLL 1, bit) (MOVLload &b[0]))is rewritten into(BTSL (MOVLload &b[0]) bit). This is a valid transformation becausethe destination is a register: in this case, the bit offset is masked
by the number of bits in the destination register. This is identical
to the masking performed by
SHL.(MOVLstore &b[0] (BTSL (MOVLload &b[0]) bit))is rewritten into(BTSLmodify &b[0] bit). This is an invalid transformation becausethe destination is memory: in this case, the bit offset is not
masked, and the chosen instruction may write outside its intended
32-bit location.
These changes fix the invalid rewrite performed in step (5) by
explicitly maksing the bit offset operand to
BT(S|R|C)(L|Q)modify. Inthe example above, the adjusted rules produce
(BTSLmodify &b[0] (ANDLconst [31] bit))in step (5).These changes also add several new rules to rewrite bit sets, toggles,
and clears that are rooted at
(OR|XOR|AND)(L|Q)modifyoperators intoappropriate
BT(S|R|C)(L|Q)modifyoperators. These rules catch caseswhere
MOV(L|Q)store ((OR|XOR|AND)(L|Q) ...)is rewritten to(OR|XOR|AND)(L|Q)modifybefore the(OR|XOR|AND)(L|Q) ...can berewritten to
BT(S|R|C)(L|Q) ....Overall, compilecmp reports small improvements in code size on
darwin/amd64 when the changes to the compiler itself are exlcuded:
file before after Δ %
runtime.s 536464 536412 -52 -0.010%
bytes.s 32629 32593 -36 -0.110%
strings.s 44565 44529 -36 -0.081%
os/signal.s 7967 7959 -8 -0.100%
cmd/vendor/golang.org/x/sys/unix.s 81686 81678 -8 -0.010%
math/big.s 188235 188253 +18 +0.010%
cmd/link/internal/loader.s 89295 89056 -239 -0.268%
cmd/link/internal/ld.s 633551 633232 -319 -0.050%
cmd/link/internal/arm.s 18934 18928 -6 -0.032%
cmd/link/internal/arm64.s 31814 31801 -13 -0.041%
cmd/link/internal/riscv64.s 7347 7345 -2 -0.027%
cmd/compile/internal/ssa.s 4029173 4033066 +3893 +0.097%
total 21298280 21301472 +3192 +0.015%