-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: performance of slice append is optimizable #36405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
BenchmarkAppend and BenchmarkAppend2 have essentially the same inner loop.
BenchmarkAppend2:
I think what is happening here is the fact that the source and destination arrays are aliased in version 2. When copying those 7 bytes, we use two 4-byte load/stores. That would cause havoc with the write buffer on the chip, as it has to resolve the writes before doing the reads. Are you seeing this problem in a real application? Generally one doesn't append portions of a buffer to itself. |
I doubt that is it neccessary to malloc new memory for I found this issue because I find serialize/unserialize with However, we can't append data to []byte with position index assignment(but only with Unfortunately, |
The benchmarks you provided don't call any of these things, at least in the inner loop. So I don't think this issue is directly applicable to your original problem (slowness in Perhaps you could provide a benchmark for your replacement code?
You don't need to use e.g.:
|
This issue is not the reason for the poor performance of
Thanks, the reslice do solve my problem. In this situation |
I don't think there's any optimization we can actually do here. Go is optimized to do appends fast when the backing store of the slice you're appending to is write-only. Switching between reads and writes on every small append is not a common use case, and is thus not something we optimize for. I'm going to close this issue as not actionable. Please reopen if you disagree. |
What version of Go are you using (
go version
)?What did you do?
output:
What did you expect to see?
I expected
BenchmarkAppend2
would be obviously much fastest thanBenchmarkAppend
as the memory ofbuf1
andbuf2
is continuous.append
operator would not has actuallymemmove
inBenchmarkAppend2
.But the benchmark says
BenchmarkAppend2
cost much more time thanBenchmarkAppend
, just as it has malloc memory for every loop because it cost almost equal toBenchmarkAppend3
andBenchmarkAppend3
I wonder it's there has something wrong or bug in append for memory copy, or will it be fix/optimize in future?
The text was updated successfully, but these errors were encountered: