-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: BCE/Prove do not take arithmetic into account #25197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
FWIW, this is going to be very hard and surely impossible to generalize within the context of the current prove pass. |
I do not think its safe to remove the bounds checks in the example given if e.g. len(src) is Int max. If i remember correctly wrap around of arithmetic operations is defined in go and the compiler can not assume they do not overflow. This could lead to dst being accessed with a negative index in the example. |
Yes, part of the complexity is making sure there are no overflows. It's questionable that len(src)*2 can overflow for instance, it would mean that the slice has at least 2**63 elements, which is theoretically feasible only for one-byte elements (and I'm already assuming it's a mmap-ed virtual area, not something really returned by an allocation). I've had a short discussion with @dr2chase and @aclements about what could be assumed as a maximum length for a slice, but their suggestion was to avoid creating dependency on this in optimizers. Anyway, an alternative would be to change the check in |
On 32bit archs it only needs 1<<30+1 byte elements to overflow and the following program can be run: GOARCH=386 go run main.go
Only wanted to note that proving with arithmetic involved in go is more complex in any prove implementation than when it can be assumed that overflow is undefined behavior and that for the given example it is not safe (without further constraints). |
@martisch, @rasky, indeed the pre-check as presented in my original example is not 100% correct because of potential package hex
func Encode(dst, src []byte) {
if len(dst) / 2 < len(src) {
panic("dst overflow")
}
for i, _ := range src {
dst[i*2] = 0 // XXX BC not eliminated
dst[i*2+1] = 0 // XXX BC not eliminated
}
} my bad, thanks for spotting this. About how to make deductions in such case, if I recall correctly, rational expressions has to be introduced, then we have rules:
=>
and the question is whether those expressions are within [0, Because we have connectivity between
then we can omit the flooring in the upper limit because flooring makes expression only potentially a bit less, and by omiting it we can make the upper limit only a bit more which would still hold true, and then 2*/2 cancels:
which shows that both expressions are withing correct range for dst indexing. |
@navytux, that's true, but bounds-check elimination needs to strike a balance between coverage and performance. We could put a full SMT engine in there and it could figure out amazing things, but I assert [1] that it wouldn't get substantially better coverage of most code. It would also substantially slow down the compiler while mostly spending that build time on code that's not performance critical (simply because most code is not) [2]. [1] It would be interesting to actually verify this. You could imagine wiring together https://godoc.org/golang.org/x/tools/go/ssa with, say, https://godoc.org/github.com/aclements/go-z3 to get a sense of what's possible. [2] A possible far-off but intriguing idea is to use profile-guided optimization not just to direct the compiler's optimization decisions, but to direct where the compiler focuses more expensive optimizations. |
@aclements thanks for feedback and for Z3 links. I agree there should be balance between compilation time and optimizations applied in order not to loose the fast edit-try property of Go. However currently whenever there is a need to process data slices with indices non-trivially, bounds checking kills the performance. An example of this is Fannkuch11 from Go1 benchmark set, where it is known that by making BCE more clever whole execution time could be reduced by ~ 30% (please see #24660 for details). The idea to apply optimizations based on runtime feedback on where it is mostly needed is interesting.
So maybe not a full SMT but something more simple would be still good to have. |
This is follow-up to #19714 where prove/BCE were taught to take transitivity into account, but originally-motivating example where indices are constructed as arithmetic expressions remains with bounds checks not eliminated:
/cc @rasky, @josharian
The text was updated successfully, but these errors were encountered: