-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: non-atomic pointer writes in memmove/memclr on arm64 #12552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I ran it in a much longer loop: 17 failures out of 100 runs. And each run takes a minute... |
CC @RLH @aclements |
@mwhudson, I just pushed some improvements to this error message to master. Could you try reproducing it at tip? |
https://gist.github.com/mwhudson/db0c9def25e417c03cb1 Seems marginally more intermittent now (~10% but maybe that's just due to load or someting) |
Excellent. Thanks. Since it sounds like you're able to reproduce the failure regularly, could
|
Yep sure, I'd noticed that the three I've made so far this morning do look slightly different. I'll let it run for a few hours and post what I get. |
http://people.canonical.com/~mwh/oracle-fails.tgz is 41 failures from 314 runs. Have fun :-) |
Interesting. This didn't at all show the sorts of patterns I was expecting to see, but it did show a different pattern: in every case, the bad pointer has changed between when we first read it from the object (and checked it) and when we printed the object it was found in. Furthermore, excepting the cases where it's 0 the second time, the first read is always a zero-padded prefix of the second read: panic.txt.103:runtime: pointer 0x4824000000 to unallocated spanidx=0x2000 span.start=0x4829326000 span.limit=0x482932a000 span.state=4 panic.txt.11:runtime: pointer 0x4825000000 to unused region of spanidx=0x2800 span.start=0x4820a8a000 span.limit=0x4820a8c000 span.state=0 panic.txt.129:runtime: pointer 0x4821f80000 to unallocated spanidx=0xfc0 span.start=0x4821f7a000 span.limit=0x4821f7e000 span.state=2 Whether there are two or three bytes of prefix varies. I suspect there can also be one or four bytes of prefix, but that we don't see one because it lands outside the heap and we don't see four because it lands inside of a valid span (just not the right object in the span). Almost certainly something is doing a byte-at-a-time copying and the garbage collector is catching it in the middle of copying a pointer. The runtime's arm64 memmove does byte copying, which could cause exactly this failure. @4ad, @minux, it looks like we need to implement doubleword memmove at least for doubleword-aligned pointers on arm64. It looks like the compiler's blockcopy does 8-byte moves, so it should be good. I'm not aware of anything else in the compiler or runtime that does bulk memory copying. |
This also affects ppc64. It always uses byte copying. This also affects amd64p32, though more subtly: it uses 8 byte moves and then falls back to 1 byte moves, so if a pointer is the last field in an object and it's 4-byte aligned, but not 8-byte aligned, it will be copied one byte at a time. I believe the other architectures are okay. |
Oh hah, that's one of those things that had never occurred to me but is totally obvious in retrospect. At least the fix is fairly simple even if it's a bunch of typing. |
Yeah, I had the same reaction. :) It's a standard optimization, of course, but it hadn't occurred to me that it was a required optimization in Go. |
Awesome detecting work Michael and Austin. Adding a better memcpy to arm64 Thanks Dave
|
Yes, since this is a runtime bug that affects correct programs. I don't think it can actually lead to memory corruption, but I haven't been able to completely convince myself of that. It can certainly lead to over-retention. |
Not only is this an obvious optimization, but it turns out that it's necessary to avoid the GC seeing partially written pointers. Fixes golang#12552 Change-Id: Iaeaf8a812cd06f4747ba2f792de1ded738890735
Not only is this an obvious optimization: benchmark old MB/s new MB/s speedup BenchmarkMemmove1-4 33.75 28.89 0.86x BenchmarkMemmove2-4 63.78 51.70 0.81x BenchmarkMemmove3-4 85.54 73.90 0.86x BenchmarkMemmove4-4 108.14 93.03 0.86x BenchmarkMemmove5-4 128.88 110.78 0.86x BenchmarkMemmove6-4 143.59 120.04 0.84x BenchmarkMemmove7-4 158.37 133.45 0.84x BenchmarkMemmove8-4 171.13 231.80 1.35x BenchmarkMemmove9-4 178.18 247.62 1.39x BenchmarkMemmove10-4 156.53 252.38 1.61x BenchmarkMemmove11-4 172.58 263.11 1.52x BenchmarkMemmove12-4 175.53 270.70 1.54x BenchmarkMemmove13-4 191.03 272.52 1.43x BenchmarkMemmove14-4 198.02 278.16 1.40x BenchmarkMemmove15-4 197.94 283.13 1.43x BenchmarkMemmove16-4 211.35 448.27 2.12x BenchmarkMemmove32-4 273.37 761.02 2.78x BenchmarkMemmove64-4 312.49 1102.20 3.53x BenchmarkMemmove128-4 359.33 1514.77 4.22x BenchmarkMemmove256-4 381.04 2122.21 5.57x BenchmarkMemmove512-4 370.32 2557.00 6.90x BenchmarkMemmove1024-4 375.21 2850.44 7.60x BenchmarkMemmove2048-4 399.43 2861.04 7.16x BenchmarkMemmove4096-4 389.70 3018.44 7.75x but it turns out that it's necessary to avoid the GC seeing partially written pointers. It's of course possible to be more sophisticated (using ldp/stp to move 16 bytes at a time in the core loop and unrolling the tail copying loops being the obvious ideas) but I wanted something simple and (reasonably) obviously correct. Fixes golang#12552 Change-Id: Iaeaf8a812cd06f4747ba2f792de1ded738890735
CL https://golang.org/cl/14813 mentions this issue. |
Well I implemented word by word copying and the bug still presents. It's possible I've messed it up, or is there something else that does byte-by-byte copies? |
Look for duffcopy paths, I think there is a path that copies n bytes by On Tue, 22 Sep 2015 13:43 Michael Hudson-Doyle [email protected]
|
They look OK to me, but maybe I'm mistaken. I found one more thing that does byte-by-byte memory setting (memclr) and fixed that but I'm still seeing failures. I uploaded a few as http://people.canonical.com/~mwh/oracle-fails-2.tgz |
Argh was mistaken and testing the wrong binary. It does actually seem to work when I test the right one (63 runs with no failure so far). |
Over 200 runs without failure. Pretty sure it's fixed now. |
CL https://golang.org/cl/14840 mentions this issue. |
… as it can DO NOT SUBMIT (needs some extended testing and benchmark numbers) Issue golang#12552 can happen on ppc64 too, although much less frequently in my testing. This should fix it (not proven yet). Update golang#12552 Change-Id: I7192e9deb9684a843aed37f58a16a4e29970e893
… as it can DO NOT SUBMIT (needs some extended testing and benchmark numbers) Issue golang#12552 can happen on ppc64 too, although much less frequently in my testing. This should fix it (not proven yet). Update golang#12552 Change-Id: I7192e9deb9684a843aed37f58a16a4e29970e893
… as it can DO NOT SUBMIT (needs some extended testing and benchmark numbers) Issue golang#12552 can happen on ppc64 too, although much less frequently in my testing. This should fix it (not proven yet). Update golang#12552 Change-Id: I7192e9deb9684a843aed37f58a16a4e29970e893
… as it can DO NOT SUBMIT (needs some extended testing and benchmark numbers) Issue golang#12552 can happen on ppc64 too, although much less frequently in my testing. This should fix it (not proven yet). Update golang#12552 Change-Id: I7192e9deb9684a843aed37f58a16a4e29970e893
… as it can DO NOT SUBMIT (needs some extended testing and benchmark numbers) Issue golang#12552 can happen on ppc64 too, although much less frequently in my testing. This should fix it (not proven yet). Update golang#12552 Change-Id: I7192e9deb9684a843aed37f58a16a4e29970e893
… as it can Issue golang#12552 can happen on ppc64 too, although much less frequently in my testing. I'm fairly sure this fixes it (2 out of 200 runs of oracle.test failed without this change and 0 of 200 failed with it). It's also a lot faster for large moves/clears: name old speed new speed delta Memmove1-6 157MB/s ± 9% 144MB/s ± 0% -8.20% (p=0.004 n=10+9) Memmove2-6 281MB/s ± 1% 249MB/s ± 1% -11.53% (p=0.000 n=10+10) Memmove3-6 376MB/s ± 1% 328MB/s ± 1% -12.64% (p=0.000 n=10+10) Memmove4-6 475MB/s ± 4% 345MB/s ± 1% -27.28% (p=0.000 n=10+8) Memmove5-6 540MB/s ± 1% 393MB/s ± 0% -27.21% (p=0.000 n=10+10) Memmove6-6 609MB/s ± 0% 423MB/s ± 0% -30.56% (p=0.000 n=9+10) Memmove7-6 659MB/s ± 0% 468MB/s ± 0% -28.99% (p=0.000 n=8+10) Memmove8-6 705MB/s ± 0% 1295MB/s ± 1% +83.73% (p=0.000 n=9+9) Memmove9-6 740MB/s ± 1% 1241MB/s ± 1% +67.61% (p=0.000 n=10+8) Memmove10-6 780MB/s ± 0% 1162MB/s ± 1% +48.95% (p=0.000 n=10+9) Memmove11-6 811MB/s ± 0% 1180MB/s ± 0% +45.58% (p=0.000 n=8+9) Memmove12-6 820MB/s ± 1% 1073MB/s ± 1% +30.83% (p=0.000 n=10+9) Memmove13-6 849MB/s ± 0% 1068MB/s ± 1% +25.87% (p=0.000 n=10+10) Memmove14-6 877MB/s ± 0% 911MB/s ± 0% +3.83% (p=0.000 n=10+10) Memmove15-6 893MB/s ± 0% 922MB/s ± 0% +3.25% (p=0.000 n=10+9) Memmove16-6 897MB/s ± 1% 2418MB/s ± 1% +169.67% (p=0.000 n=10+9) Memmove32-6 908MB/s ± 0% 3927MB/s ± 2% +332.64% (p=0.000 n=10+8) Memmove64-6 1.11GB/s ± 0% 5.59GB/s ± 0% +404.64% (p=0.000 n=9+9) Memmove128-6 1.25GB/s ± 0% 6.71GB/s ± 2% +437.49% (p=0.000 n=9+10) Memmove256-6 1.33GB/s ± 0% 7.25GB/s ± 1% +445.06% (p=0.000 n=10+10) Memmove512-6 1.38GB/s ± 0% 8.87GB/s ± 0% +544.43% (p=0.000 n=10+10) Memmove1024-6 1.40GB/s ± 0% 10.00GB/s ± 0% +613.80% (p=0.000 n=10+10) Memmove2048-6 1.41GB/s ± 0% 10.65GB/s ± 0% +652.95% (p=0.000 n=9+10) Memmove4096-6 1.42GB/s ± 0% 11.01GB/s ± 0% +675.37% (p=0.000 n=8+10) Memclr5-6 269MB/s ± 1% 264MB/s ± 0% -1.80% (p=0.000 n=10+10) Memclr16-6 600MB/s ± 0% 887MB/s ± 1% +47.83% (p=0.000 n=10+10) Memclr64-6 1.06GB/s ± 0% 2.91GB/s ± 1% +174.58% (p=0.000 n=8+10) Memclr256-6 1.32GB/s ± 0% 6.58GB/s ± 0% +399.86% (p=0.000 n=9+10) Memclr4096-6 1.42GB/s ± 0% 10.90GB/s ± 0% +668.03% (p=0.000 n=8+10) Memclr65536-6 1.43GB/s ± 0% 11.37GB/s ± 0% +697.83% (p=0.000 n=9+8) GoMemclr5-6 359MB/s ± 0% 360MB/s ± 0% +0.46% (p=0.000 n=10+10) GoMemclr16-6 750MB/s ± 0% 1264MB/s ± 1% +68.45% (p=0.000 n=10+10) GoMemclr64-6 1.17GB/s ± 0% 3.78GB/s ± 1% +223.58% (p=0.000 n=10+9) GoMemclr256-6 1.35GB/s ± 0% 7.47GB/s ± 0% +452.44% (p=0.000 n=10+10) Update golang#12552 Change-Id: I7192e9deb9684a843aed37f58a16a4e29970e893
… as it can Issue #12552 can happen on ppc64 too, although much less frequently in my testing. I'm fairly sure this fixes it (2 out of 200 runs of oracle.test failed without this change and 0 of 200 failed with it). It's also a lot faster for large moves/clears: name old speed new speed delta Memmove1-6 157MB/s ± 9% 144MB/s ± 0% -8.20% (p=0.004 n=10+9) Memmove2-6 281MB/s ± 1% 249MB/s ± 1% -11.53% (p=0.000 n=10+10) Memmove3-6 376MB/s ± 1% 328MB/s ± 1% -12.64% (p=0.000 n=10+10) Memmove4-6 475MB/s ± 4% 345MB/s ± 1% -27.28% (p=0.000 n=10+8) Memmove5-6 540MB/s ± 1% 393MB/s ± 0% -27.21% (p=0.000 n=10+10) Memmove6-6 609MB/s ± 0% 423MB/s ± 0% -30.56% (p=0.000 n=9+10) Memmove7-6 659MB/s ± 0% 468MB/s ± 0% -28.99% (p=0.000 n=8+10) Memmove8-6 705MB/s ± 0% 1295MB/s ± 1% +83.73% (p=0.000 n=9+9) Memmove9-6 740MB/s ± 1% 1241MB/s ± 1% +67.61% (p=0.000 n=10+8) Memmove10-6 780MB/s ± 0% 1162MB/s ± 1% +48.95% (p=0.000 n=10+9) Memmove11-6 811MB/s ± 0% 1180MB/s ± 0% +45.58% (p=0.000 n=8+9) Memmove12-6 820MB/s ± 1% 1073MB/s ± 1% +30.83% (p=0.000 n=10+9) Memmove13-6 849MB/s ± 0% 1068MB/s ± 1% +25.87% (p=0.000 n=10+10) Memmove14-6 877MB/s ± 0% 911MB/s ± 0% +3.83% (p=0.000 n=10+10) Memmove15-6 893MB/s ± 0% 922MB/s ± 0% +3.25% (p=0.000 n=10+9) Memmove16-6 897MB/s ± 1% 2418MB/s ± 1% +169.67% (p=0.000 n=10+9) Memmove32-6 908MB/s ± 0% 3927MB/s ± 2% +332.64% (p=0.000 n=10+8) Memmove64-6 1.11GB/s ± 0% 5.59GB/s ± 0% +404.64% (p=0.000 n=9+9) Memmove128-6 1.25GB/s ± 0% 6.71GB/s ± 2% +437.49% (p=0.000 n=9+10) Memmove256-6 1.33GB/s ± 0% 7.25GB/s ± 1% +445.06% (p=0.000 n=10+10) Memmove512-6 1.38GB/s ± 0% 8.87GB/s ± 0% +544.43% (p=0.000 n=10+10) Memmove1024-6 1.40GB/s ± 0% 10.00GB/s ± 0% +613.80% (p=0.000 n=10+10) Memmove2048-6 1.41GB/s ± 0% 10.65GB/s ± 0% +652.95% (p=0.000 n=9+10) Memmove4096-6 1.42GB/s ± 0% 11.01GB/s ± 0% +675.37% (p=0.000 n=8+10) Memclr5-6 269MB/s ± 1% 264MB/s ± 0% -1.80% (p=0.000 n=10+10) Memclr16-6 600MB/s ± 0% 887MB/s ± 1% +47.83% (p=0.000 n=10+10) Memclr64-6 1.06GB/s ± 0% 2.91GB/s ± 1% +174.58% (p=0.000 n=8+10) Memclr256-6 1.32GB/s ± 0% 6.58GB/s ± 0% +399.86% (p=0.000 n=9+10) Memclr4096-6 1.42GB/s ± 0% 10.90GB/s ± 0% +668.03% (p=0.000 n=8+10) Memclr65536-6 1.43GB/s ± 0% 11.37GB/s ± 0% +697.83% (p=0.000 n=9+8) GoMemclr5-6 359MB/s ± 0% 360MB/s ± 0% +0.46% (p=0.000 n=10+10) GoMemclr16-6 750MB/s ± 0% 1264MB/s ± 1% +68.45% (p=0.000 n=10+10) GoMemclr64-6 1.17GB/s ± 0% 3.78GB/s ± 1% +223.58% (p=0.000 n=10+9) GoMemclr256-6 1.35GB/s ± 0% 7.47GB/s ± 0% +452.44% (p=0.000 n=10+10) Update #12552 Change-Id: I7192e9deb9684a843aed37f58a16a4e29970e893 Reviewed-on: https://go-review.googlesource.com/14840 Reviewed-by: Minux Ma <[email protected]>
Currently, amd64p32's memmove and memclr use 8 byte writes as much as possible and 1 byte writes for the tail of the object. However, if an object ends with a 4 byte pointer at an 8 byte aligned offset, this may copy/zero the pointer field one byte at a time, allowing the garbage collector to observe a partially copied pointer. Fix this by using 4 byte writes instead of 8 byte writes. Updates #12552. Change-Id: I13324fd05756fb25ae57e812e836f0a975b5595c Reviewed-on: https://go-review.googlesource.com/15370 Run-TryBot: Austin Clements <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Brad Fitzpatrick <[email protected]> Reviewed-by: Keith Randall <[email protected]>
…ch as they can Not only is this an obvious optimization: benchmark old MB/s new MB/s speedup BenchmarkMemmove1-4 35.35 29.65 0.84x BenchmarkMemmove2-4 63.78 52.53 0.82x BenchmarkMemmove3-4 89.72 73.96 0.82x BenchmarkMemmove4-4 109.94 95.73 0.87x BenchmarkMemmove5-4 127.60 112.80 0.88x BenchmarkMemmove6-4 143.59 126.67 0.88x BenchmarkMemmove7-4 157.90 138.92 0.88x BenchmarkMemmove8-4 167.18 231.81 1.39x BenchmarkMemmove9-4 175.23 252.07 1.44x BenchmarkMemmove10-4 165.68 261.10 1.58x BenchmarkMemmove11-4 174.43 263.31 1.51x BenchmarkMemmove12-4 180.76 267.56 1.48x BenchmarkMemmove13-4 189.06 284.93 1.51x BenchmarkMemmove14-4 186.31 284.72 1.53x BenchmarkMemmove15-4 195.75 281.62 1.44x BenchmarkMemmove16-4 202.96 439.23 2.16x BenchmarkMemmove32-4 264.77 775.77 2.93x BenchmarkMemmove64-4 306.81 1209.64 3.94x BenchmarkMemmove128-4 357.03 1515.41 4.24x BenchmarkMemmove256-4 380.77 2066.01 5.43x BenchmarkMemmove512-4 385.05 2556.45 6.64x BenchmarkMemmove1024-4 381.23 2804.10 7.36x BenchmarkMemmove2048-4 379.06 2814.83 7.43x BenchmarkMemmove4096-4 387.43 3064.96 7.91x BenchmarkMemmoveUnaligned1-4 28.91 25.40 0.88x BenchmarkMemmoveUnaligned2-4 56.13 47.56 0.85x BenchmarkMemmoveUnaligned3-4 74.32 69.31 0.93x BenchmarkMemmoveUnaligned4-4 97.02 83.58 0.86x BenchmarkMemmoveUnaligned5-4 110.17 103.62 0.94x BenchmarkMemmoveUnaligned6-4 124.95 113.26 0.91x BenchmarkMemmoveUnaligned7-4 142.37 130.82 0.92x BenchmarkMemmoveUnaligned8-4 151.20 205.64 1.36x BenchmarkMemmoveUnaligned9-4 166.97 215.42 1.29x BenchmarkMemmoveUnaligned10-4 148.49 221.22 1.49x BenchmarkMemmoveUnaligned11-4 159.47 239.57 1.50x BenchmarkMemmoveUnaligned12-4 163.52 247.32 1.51x BenchmarkMemmoveUnaligned13-4 167.55 256.54 1.53x BenchmarkMemmoveUnaligned14-4 175.12 251.03 1.43x BenchmarkMemmoveUnaligned15-4 192.10 267.13 1.39x BenchmarkMemmoveUnaligned16-4 190.76 378.87 1.99x BenchmarkMemmoveUnaligned32-4 259.02 562.98 2.17x BenchmarkMemmoveUnaligned64-4 317.72 842.44 2.65x BenchmarkMemmoveUnaligned128-4 355.43 1274.49 3.59x BenchmarkMemmoveUnaligned256-4 378.17 1815.74 4.80x BenchmarkMemmoveUnaligned512-4 362.15 2180.81 6.02x BenchmarkMemmoveUnaligned1024-4 376.07 2453.58 6.52x BenchmarkMemmoveUnaligned2048-4 381.66 2568.32 6.73x BenchmarkMemmoveUnaligned4096-4 398.51 2669.36 6.70x BenchmarkMemclr5-4 113.83 107.93 0.95x BenchmarkMemclr16-4 223.84 389.63 1.74x BenchmarkMemclr64-4 421.99 1209.58 2.87x BenchmarkMemclr256-4 525.94 2411.58 4.59x BenchmarkMemclr4096-4 581.66 4372.20 7.52x BenchmarkMemclr65536-4 565.84 4747.48 8.39x BenchmarkGoMemclr5-4 194.63 160.31 0.82x BenchmarkGoMemclr16-4 295.30 630.07 2.13x BenchmarkGoMemclr64-4 480.24 1884.03 3.92x BenchmarkGoMemclr256-4 540.23 2926.49 5.42x but it turns out that it's necessary to avoid the GC seeing partially written pointers. It's of course possible to be more sophisticated (using ldp/stp to move 16 bytes at a time in the core loop and unrolling the tail copying loops being the obvious ideas) but I wanted something simple and (reasonably) obviously correct. Update golang#12552 Change-Id: Iaeaf8a812cd06f4747ba2f792de1ded738890735
CL https://golang.org/cl/16668 mentions this issue. |
Make sure that we're moving or zeroing pointers atomically. Anything that is a multiple of pointer size and at least pointer aligned might have pointers in it. All the code looks ok except for the 1-pointer-sized moves. Fixes #13160 Update #12552 Change-Id: Ib97d9b918fa9f4cc5c56c67ed90255b7fdfb7b45 Reviewed-on: https://go-review.googlesource.com/16668 Reviewed-by: Dmitry Vyukov <[email protected]> Run-TryBot: Keith Randall <[email protected]> TryBot-Result: Gobot Gobot <[email protected]>
Since it's not at all obvious from the issue log, here's the set of commits that actually fix this and should be cherry-picked for 1.5.2, in cherry-picking order: 2c91114 runtime: adjust the ppc64x memmove and memclr to copy by word as much as it can (https://golang.org/cl/14840) I've confirmed that they cherry-pick cleanly (in this order) to the 1.5 release branch. |
…to copy by word as much as it can Issue #12552 can happen on ppc64 too, although much less frequently in my testing. I'm fairly sure this fixes it (2 out of 200 runs of oracle.test failed without this change and 0 of 200 failed with it). It's also a lot faster for large moves/clears: name old speed new speed delta Memmove1-6 157MB/s ± 9% 144MB/s ± 0% -8.20% (p=0.004 n=10+9) Memmove2-6 281MB/s ± 1% 249MB/s ± 1% -11.53% (p=0.000 n=10+10) Memmove3-6 376MB/s ± 1% 328MB/s ± 1% -12.64% (p=0.000 n=10+10) Memmove4-6 475MB/s ± 4% 345MB/s ± 1% -27.28% (p=0.000 n=10+8) Memmove5-6 540MB/s ± 1% 393MB/s ± 0% -27.21% (p=0.000 n=10+10) Memmove6-6 609MB/s ± 0% 423MB/s ± 0% -30.56% (p=0.000 n=9+10) Memmove7-6 659MB/s ± 0% 468MB/s ± 0% -28.99% (p=0.000 n=8+10) Memmove8-6 705MB/s ± 0% 1295MB/s ± 1% +83.73% (p=0.000 n=9+9) Memmove9-6 740MB/s ± 1% 1241MB/s ± 1% +67.61% (p=0.000 n=10+8) Memmove10-6 780MB/s ± 0% 1162MB/s ± 1% +48.95% (p=0.000 n=10+9) Memmove11-6 811MB/s ± 0% 1180MB/s ± 0% +45.58% (p=0.000 n=8+9) Memmove12-6 820MB/s ± 1% 1073MB/s ± 1% +30.83% (p=0.000 n=10+9) Memmove13-6 849MB/s ± 0% 1068MB/s ± 1% +25.87% (p=0.000 n=10+10) Memmove14-6 877MB/s ± 0% 911MB/s ± 0% +3.83% (p=0.000 n=10+10) Memmove15-6 893MB/s ± 0% 922MB/s ± 0% +3.25% (p=0.000 n=10+9) Memmove16-6 897MB/s ± 1% 2418MB/s ± 1% +169.67% (p=0.000 n=10+9) Memmove32-6 908MB/s ± 0% 3927MB/s ± 2% +332.64% (p=0.000 n=10+8) Memmove64-6 1.11GB/s ± 0% 5.59GB/s ± 0% +404.64% (p=0.000 n=9+9) Memmove128-6 1.25GB/s ± 0% 6.71GB/s ± 2% +437.49% (p=0.000 n=9+10) Memmove256-6 1.33GB/s ± 0% 7.25GB/s ± 1% +445.06% (p=0.000 n=10+10) Memmove512-6 1.38GB/s ± 0% 8.87GB/s ± 0% +544.43% (p=0.000 n=10+10) Memmove1024-6 1.40GB/s ± 0% 10.00GB/s ± 0% +613.80% (p=0.000 n=10+10) Memmove2048-6 1.41GB/s ± 0% 10.65GB/s ± 0% +652.95% (p=0.000 n=9+10) Memmove4096-6 1.42GB/s ± 0% 11.01GB/s ± 0% +675.37% (p=0.000 n=8+10) Memclr5-6 269MB/s ± 1% 264MB/s ± 0% -1.80% (p=0.000 n=10+10) Memclr16-6 600MB/s ± 0% 887MB/s ± 1% +47.83% (p=0.000 n=10+10) Memclr64-6 1.06GB/s ± 0% 2.91GB/s ± 1% +174.58% (p=0.000 n=8+10) Memclr256-6 1.32GB/s ± 0% 6.58GB/s ± 0% +399.86% (p=0.000 n=9+10) Memclr4096-6 1.42GB/s ± 0% 10.90GB/s ± 0% +668.03% (p=0.000 n=8+10) Memclr65536-6 1.43GB/s ± 0% 11.37GB/s ± 0% +697.83% (p=0.000 n=9+8) GoMemclr5-6 359MB/s ± 0% 360MB/s ± 0% +0.46% (p=0.000 n=10+10) GoMemclr16-6 750MB/s ± 0% 1264MB/s ± 1% +68.45% (p=0.000 n=10+10) GoMemclr64-6 1.17GB/s ± 0% 3.78GB/s ± 1% +223.58% (p=0.000 n=10+9) GoMemclr256-6 1.35GB/s ± 0% 7.47GB/s ± 0% +452.44% (p=0.000 n=10+10) Update #12552 Change-Id: I7192e9deb9684a843aed37f58a16a4e29970e893 Reviewed-on: https://go-review.googlesource.com/14840 Reviewed-by: Minux Ma <[email protected]> Reviewed-on: https://go-review.googlesource.com/16907 Reviewed-by: Russ Cox <[email protected]>
…/memclr Currently, amd64p32's memmove and memclr use 8 byte writes as much as possible and 1 byte writes for the tail of the object. However, if an object ends with a 4 byte pointer at an 8 byte aligned offset, this may copy/zero the pointer field one byte at a time, allowing the garbage collector to observe a partially copied pointer. Fix this by using 4 byte writes instead of 8 byte writes. Updates #12552. Change-Id: I13324fd05756fb25ae57e812e836f0a975b5595c Reviewed-on: https://go-review.googlesource.com/15370 Run-TryBot: Austin Clements <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Brad Fitzpatrick <[email protected]> Reviewed-by: Keith Randall <[email protected]> Reviewed-on: https://go-review.googlesource.com/16908 Reviewed-by: Russ Cox <[email protected]>
…o operate by word as much as they can Not only is this an obvious optimization: benchmark old MB/s new MB/s speedup BenchmarkMemmove1-4 35.35 29.65 0.84x BenchmarkMemmove2-4 63.78 52.53 0.82x BenchmarkMemmove3-4 89.72 73.96 0.82x BenchmarkMemmove4-4 109.94 95.73 0.87x BenchmarkMemmove5-4 127.60 112.80 0.88x BenchmarkMemmove6-4 143.59 126.67 0.88x BenchmarkMemmove7-4 157.90 138.92 0.88x BenchmarkMemmove8-4 167.18 231.81 1.39x BenchmarkMemmove9-4 175.23 252.07 1.44x BenchmarkMemmove10-4 165.68 261.10 1.58x BenchmarkMemmove11-4 174.43 263.31 1.51x BenchmarkMemmove12-4 180.76 267.56 1.48x BenchmarkMemmove13-4 189.06 284.93 1.51x BenchmarkMemmove14-4 186.31 284.72 1.53x BenchmarkMemmove15-4 195.75 281.62 1.44x BenchmarkMemmove16-4 202.96 439.23 2.16x BenchmarkMemmove32-4 264.77 775.77 2.93x BenchmarkMemmove64-4 306.81 1209.64 3.94x BenchmarkMemmove128-4 357.03 1515.41 4.24x BenchmarkMemmove256-4 380.77 2066.01 5.43x BenchmarkMemmove512-4 385.05 2556.45 6.64x BenchmarkMemmove1024-4 381.23 2804.10 7.36x BenchmarkMemmove2048-4 379.06 2814.83 7.43x BenchmarkMemmove4096-4 387.43 3064.96 7.91x BenchmarkMemmoveUnaligned1-4 28.91 25.40 0.88x BenchmarkMemmoveUnaligned2-4 56.13 47.56 0.85x BenchmarkMemmoveUnaligned3-4 74.32 69.31 0.93x BenchmarkMemmoveUnaligned4-4 97.02 83.58 0.86x BenchmarkMemmoveUnaligned5-4 110.17 103.62 0.94x BenchmarkMemmoveUnaligned6-4 124.95 113.26 0.91x BenchmarkMemmoveUnaligned7-4 142.37 130.82 0.92x BenchmarkMemmoveUnaligned8-4 151.20 205.64 1.36x BenchmarkMemmoveUnaligned9-4 166.97 215.42 1.29x BenchmarkMemmoveUnaligned10-4 148.49 221.22 1.49x BenchmarkMemmoveUnaligned11-4 159.47 239.57 1.50x BenchmarkMemmoveUnaligned12-4 163.52 247.32 1.51x BenchmarkMemmoveUnaligned13-4 167.55 256.54 1.53x BenchmarkMemmoveUnaligned14-4 175.12 251.03 1.43x BenchmarkMemmoveUnaligned15-4 192.10 267.13 1.39x BenchmarkMemmoveUnaligned16-4 190.76 378.87 1.99x BenchmarkMemmoveUnaligned32-4 259.02 562.98 2.17x BenchmarkMemmoveUnaligned64-4 317.72 842.44 2.65x BenchmarkMemmoveUnaligned128-4 355.43 1274.49 3.59x BenchmarkMemmoveUnaligned256-4 378.17 1815.74 4.80x BenchmarkMemmoveUnaligned512-4 362.15 2180.81 6.02x BenchmarkMemmoveUnaligned1024-4 376.07 2453.58 6.52x BenchmarkMemmoveUnaligned2048-4 381.66 2568.32 6.73x BenchmarkMemmoveUnaligned4096-4 398.51 2669.36 6.70x BenchmarkMemclr5-4 113.83 107.93 0.95x BenchmarkMemclr16-4 223.84 389.63 1.74x BenchmarkMemclr64-4 421.99 1209.58 2.87x BenchmarkMemclr256-4 525.94 2411.58 4.59x BenchmarkMemclr4096-4 581.66 4372.20 7.52x BenchmarkMemclr65536-4 565.84 4747.48 8.39x BenchmarkGoMemclr5-4 194.63 160.31 0.82x BenchmarkGoMemclr16-4 295.30 630.07 2.13x BenchmarkGoMemclr64-4 480.24 1884.03 3.92x BenchmarkGoMemclr256-4 540.23 2926.49 5.42x but it turns out that it's necessary to avoid the GC seeing partially written pointers. It's of course possible to be more sophisticated (using ldp/stp to move 16 bytes at a time in the core loop and unrolling the tail copying loops being the obvious ideas) but I wanted something simple and (reasonably) obviously correct. Fixes #12552 Change-Id: Iaeaf8a812cd06f4747ba2f792de1ded738890735 Reviewed-on: https://go-review.googlesource.com/14813 Reviewed-by: Austin Clements <[email protected]> Reviewed-on: https://go-review.googlesource.com/16909 Reviewed-by: Russ Cox <[email protected]>
Make sure that we're moving or zeroing pointers atomically. Anything that is a multiple of pointer size and at least pointer aligned might have pointers in it. All the code looks ok except for the 1-pointer-sized moves. Fixes #13160 Update #12552 Change-Id: Ib97d9b918fa9f4cc5c56c67ed90255b7fdfb7b45 Reviewed-on: https://go-review.googlesource.com/16668 Reviewed-by: Dmitry Vyukov <[email protected]> Run-TryBot: Keith Randall <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-on: https://go-review.googlesource.com/16910 Reviewed-by: Russ Cox <[email protected]>
… as it can Issue golang#12552 can happen on ppc64 too, although much less frequently in my testing. I'm fairly sure this fixes it (2 out of 200 runs of oracle.test failed without this change and 0 of 200 failed with it). It's also a lot faster for large moves/clears: name old speed new speed delta Memmove1-6 157MB/s ± 9% 144MB/s ± 0% -8.20% (p=0.004 n=10+9) Memmove2-6 281MB/s ± 1% 249MB/s ± 1% -11.53% (p=0.000 n=10+10) Memmove3-6 376MB/s ± 1% 328MB/s ± 1% -12.64% (p=0.000 n=10+10) Memmove4-6 475MB/s ± 4% 345MB/s ± 1% -27.28% (p=0.000 n=10+8) Memmove5-6 540MB/s ± 1% 393MB/s ± 0% -27.21% (p=0.000 n=10+10) Memmove6-6 609MB/s ± 0% 423MB/s ± 0% -30.56% (p=0.000 n=9+10) Memmove7-6 659MB/s ± 0% 468MB/s ± 0% -28.99% (p=0.000 n=8+10) Memmove8-6 705MB/s ± 0% 1295MB/s ± 1% +83.73% (p=0.000 n=9+9) Memmove9-6 740MB/s ± 1% 1241MB/s ± 1% +67.61% (p=0.000 n=10+8) Memmove10-6 780MB/s ± 0% 1162MB/s ± 1% +48.95% (p=0.000 n=10+9) Memmove11-6 811MB/s ± 0% 1180MB/s ± 0% +45.58% (p=0.000 n=8+9) Memmove12-6 820MB/s ± 1% 1073MB/s ± 1% +30.83% (p=0.000 n=10+9) Memmove13-6 849MB/s ± 0% 1068MB/s ± 1% +25.87% (p=0.000 n=10+10) Memmove14-6 877MB/s ± 0% 911MB/s ± 0% +3.83% (p=0.000 n=10+10) Memmove15-6 893MB/s ± 0% 922MB/s ± 0% +3.25% (p=0.000 n=10+9) Memmove16-6 897MB/s ± 1% 2418MB/s ± 1% +169.67% (p=0.000 n=10+9) Memmove32-6 908MB/s ± 0% 3927MB/s ± 2% +332.64% (p=0.000 n=10+8) Memmove64-6 1.11GB/s ± 0% 5.59GB/s ± 0% +404.64% (p=0.000 n=9+9) Memmove128-6 1.25GB/s ± 0% 6.71GB/s ± 2% +437.49% (p=0.000 n=9+10) Memmove256-6 1.33GB/s ± 0% 7.25GB/s ± 1% +445.06% (p=0.000 n=10+10) Memmove512-6 1.38GB/s ± 0% 8.87GB/s ± 0% +544.43% (p=0.000 n=10+10) Memmove1024-6 1.40GB/s ± 0% 10.00GB/s ± 0% +613.80% (p=0.000 n=10+10) Memmove2048-6 1.41GB/s ± 0% 10.65GB/s ± 0% +652.95% (p=0.000 n=9+10) Memmove4096-6 1.42GB/s ± 0% 11.01GB/s ± 0% +675.37% (p=0.000 n=8+10) Memclr5-6 269MB/s ± 1% 264MB/s ± 0% -1.80% (p=0.000 n=10+10) Memclr16-6 600MB/s ± 0% 887MB/s ± 1% +47.83% (p=0.000 n=10+10) Memclr64-6 1.06GB/s ± 0% 2.91GB/s ± 1% +174.58% (p=0.000 n=8+10) Memclr256-6 1.32GB/s ± 0% 6.58GB/s ± 0% +399.86% (p=0.000 n=9+10) Memclr4096-6 1.42GB/s ± 0% 10.90GB/s ± 0% +668.03% (p=0.000 n=8+10) Memclr65536-6 1.43GB/s ± 0% 11.37GB/s ± 0% +697.83% (p=0.000 n=9+8) GoMemclr5-6 359MB/s ± 0% 360MB/s ± 0% +0.46% (p=0.000 n=10+10) GoMemclr16-6 750MB/s ± 0% 1264MB/s ± 1% +68.45% (p=0.000 n=10+10) GoMemclr64-6 1.17GB/s ± 0% 3.78GB/s ± 1% +223.58% (p=0.000 n=10+9) GoMemclr256-6 1.35GB/s ± 0% 7.47GB/s ± 0% +452.44% (p=0.000 n=10+10) Update golang#12552 Change-Id: I7192e9deb9684a843aed37f58a16a4e29970e893
Full output here: http://paste.ubuntu.com/12319496/
I've had this happen on an ubuntu builder (with the 1.5 release) and with tip as of 5cbca8d
Seems to happen roughly every other time.
The text was updated successfully, but these errors were encountered: