-
Notifications
You must be signed in to change notification settings - Fork 18.1k
math/big: optimize amd64 asm shlVU and shrVU for shift==0 case #31171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR (HEAD: 4fb4ee4) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/170257 to see it. Tip: You can toggle comments from me using the |
Message from Josh Bleecher Snyder: Patch Set 1: Run-TryBot+1 (2 comments) I’m on my phone and won’t be at a laptop for a while, but some quick initial reactions... Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
Message from Gobot Gobot: Patch Set 1: TryBots beginning. Status page: https://farmer.golang.org/try?commit=112d35da Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
Message from Gobot Gobot: Patch Set 1: TryBot-Result+1 TryBots are happy. Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
Message from Neven Sajko: Patch Set 1: (2 comments) Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
Message from Josh Bleecher Snyder: Patch Set 1: (2 comments) Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
This adds branches for s == 0 and s == 0 && z.base == x.base to shlVU and shrVU. In the first case runtime.memmove is called, while in the second case we just return. Tests and benchmarks are also added for the new branches. Benchmarked on AMD64 Linux on i5-8300H: name old time/op new time/op delta ShlVUCopy1e7-8 16.0ms ± 0% 11.1ms ± 1% -30.79% (p=0.000 n=10+19) ShlVUNop1e7-8 10.5ms ± 1% 0.0ms ± 0% -100.00% (p=0.000 n=9+20) ShrVUCopy1e7-8 15.5ms ± 0% 11.1ms ± 1% -28.55% (p=0.000 n=8+18) ShrVUNop1e7-8 10.3ms ± 2% 0.0ms ± 0% -100.00% (p=0.000 n=9+20) Fixes #31097
This PR (HEAD: 3e61065) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/170257 to see it. Tip: You can toggle comments from me using the |
Message from Josh Bleecher Snyder: Patch Set 3: Run-TryBot+1 Thanks, this is looking better. So we have confirmation that the fast passes are in fact faster. So I guess the question is why they aren't moving the needle on macro benchmarks, particularly since the equivalent Go optimization did. Do you have any insight on that? (Obvious questions: What percentage of times does each optimization trigger? Do you perhaps just need to run the benchmarks many more times to get a better estimation of the distribution? Note that when benchmarks run very quickly, I sometimes use -benchtime=100ms or even -benchtime=10ms with a much higher count.) Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
Message from Gobot Gobot: Patch Set 3: TryBots beginning. Status page: https://farmer.golang.org/try?commit=02cb6bbc Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
Message from Gobot Gobot: Patch Set 3: TryBot-Result+1 TryBots are happy. Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
Message from Neven Sajko: Patch Set 3: I figured out why the benchmarks for Float functions did not show improvement. Profiling (with go test -bench Float -cpuprofile) shows that shlVU runs quite often and shrVU only rarely: (pprof) top But 97% of times that shlVU is called, s (the shift uint argument) does not equal zero, thus preventing this optimization to make a difference in the macro benchmark. shrVU is called 35% of times with len(z) != 0 && s == 0 && z.base == x.base, but since shrVU itself runs more rarely that also fails to speed thing up. Data for shlVU: 3.893254741497618e-09, 0.969874753994966, 0.015196813760319559, 0.014928428351459678 Data for shrVU: 2.1323989445478184e-07, 0.6460625040249031, 6.610436728098237e-06, 0.3539306722984744 The columns in the two rows above are ratios and refer respectively to the case when len(z) == len(x) == 0, the case when len(z) != 0 && s != 0, the case when len(z) != 0 && s == 0 && z.base != x.base, the case when len(z) != 0 && s == 0 && z.base == x.base. The counts that correspond to the ratios are: Data for shlVU: 1 249116695 3903370 3834434 I did not investigate more deeply the usage patterns of shlVU and shrVU, but presumably in some cases shlVU and shrVU would be called with zero shift more often ... ? Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
Message from Josh Bleecher Snyder: Patch Set 3:
Thanks, makes sense. I'm inclined to continue with this optimization, but I'd like Robert to weigh in. Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
Message from Josh Bleecher Snyder: Patch Set 3: Ping, Robert. Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
Message from Robert Griesemer: Patch Set 3:
It's in my inbox but this will have to wait a bit. Sorry. Higher-priority items on my plate at the moment. Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
53bd915
to
6139019
Compare
4a7ed1f
to
0f992b9
Compare
Message from Robert Griesemer: Patch Set 3:
Just noticed this is still sitting here. Is this still current/relevant? Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
Message from Robert Griesemer: Patch Set 3:
For go 1.15. Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
Message from Ian Lance Taylor: Patch Set 3: This still seems valid but would have to be updated to the current sources. Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
Message from Go Bot: Patch Set 1: TryBots beginning. Status page: https://farmer.golang.org/try?commit=112d35da Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
Message from Go Bot: Patch Set 1: TryBot-Result+1 TryBots are happy. Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
Message from Go Bot: Patch Set 3: TryBots beginning. Status page: https://farmer.golang.org/try?commit=02cb6bbc Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
Message from Go Bot: Patch Set 3: TryBot-Result+1 TryBots are happy. Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
Message from Ian Lance Taylor: Patch Set 3: This still seems valid but would have to be updated to the current sources. Please don’t reply on this GitHub thread. Visit golang.org/cl/170257. |
This adds branches for s == 0 and s == 0 && z.base == x.base to shlVU
and shrVU. In the first case runtime.memmove is called, while in the
second case we just return.
Tests and benchmarks are also added for the new branches.
Benchmarked on AMD64 Linux on i5-8300H:
name old time/op new time/op delta
ShlVUCopy1e7-8 16.0ms ± 0% 11.1ms ± 1% -30.79% (p=0.000 n=10+19)
ShlVUNop1e7-8 10.5ms ± 1% 0.0ms ± 0% -100.00% (p=0.000 n=9+20)
ShrVUCopy1e7-8 15.5ms ± 0% 11.1ms ± 1% -28.55% (p=0.000 n=8+18)
ShrVUNop1e7-8 10.3ms ± 2% 0.0ms ± 0% -100.00% (p=0.000 n=9+20)
Fixes #31097