Skip to content

math/big: optimize amd64 asm shlVU and shrVU for shift==0 case #31171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

math/big: optimize amd64 asm shlVU and shrVU for shift==0 case #31171

wants to merge 1 commit into from

Conversation

nsajko
Copy link
Contributor

@nsajko nsajko commented Mar 31, 2019

This adds branches for s == 0 and s == 0 && z.base == x.base to shlVU
and shrVU. In the first case runtime.memmove is called, while in the
second case we just return.

Tests and benchmarks are also added for the new branches.

Benchmarked on AMD64 Linux on i5-8300H:

name old time/op new time/op delta
ShlVUCopy1e7-8 16.0ms ± 0% 11.1ms ± 1% -30.79% (p=0.000 n=10+19)
ShlVUNop1e7-8 10.5ms ± 1% 0.0ms ± 0% -100.00% (p=0.000 n=9+20)
ShrVUCopy1e7-8 15.5ms ± 0% 11.1ms ± 1% -28.55% (p=0.000 n=8+18)
ShrVUNop1e7-8 10.3ms ± 2% 0.0ms ± 0% -100.00% (p=0.000 n=9+20)

Fixes #31097

@googlebot googlebot added the cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change. label Mar 31, 2019
@gopherbot
Copy link
Contributor

This PR (HEAD: 4fb4ee4) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/170257 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from Josh Bleecher Snyder:

Patch Set 1: Run-TryBot+1

(2 comments)

I’m on my phone and won’t be at a laptop for a while, but some quick initial reactions...


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Gobot Gobot:

Patch Set 1:

TryBots beginning. Status page: https://farmer.golang.org/try?commit=112d35da


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Gobot Gobot:

Patch Set 1: TryBot-Result+1

TryBots are happy.


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Neven Sajko:

Patch Set 1:

(2 comments)


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Josh Bleecher Snyder:

Patch Set 1:

(2 comments)


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

This adds branches for s == 0 and s == 0 && z.base == x.base to shlVU
and shrVU. In the first case runtime.memmove is called, while in the
second case we just return.

Tests and benchmarks are also added for the new branches.

Benchmarked on AMD64 Linux on i5-8300H:

name            old time/op  new time/op  delta
ShlVUCopy1e7-8  16.0ms ± 0%  11.1ms ± 1%   -30.79%  (p=0.000 n=10+19)
ShlVUNop1e7-8   10.5ms ± 1%   0.0ms ± 0%  -100.00%  (p=0.000 n=9+20)
ShrVUCopy1e7-8  15.5ms ± 0%  11.1ms ± 1%   -28.55%  (p=0.000 n=8+18)
ShrVUNop1e7-8   10.3ms ± 2%   0.0ms ± 0%  -100.00%  (p=0.000 n=9+20)

Fixes #31097
@gopherbot
Copy link
Contributor

This PR (HEAD: 3e61065) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/170257 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from Josh Bleecher Snyder:

Patch Set 3: Run-TryBot+1

Thanks, this is looking better. So we have confirmation that the fast passes are in fact faster. So I guess the question is why they aren't moving the needle on macro benchmarks, particularly since the equivalent Go optimization did. Do you have any insight on that? (Obvious questions: What percentage of times does each optimization trigger? Do you perhaps just need to run the benchmarks many more times to get a better estimation of the distribution? Note that when benchmarks run very quickly, I sometimes use -benchtime=100ms or even -benchtime=10ms with a much higher count.)


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Gobot Gobot:

Patch Set 3:

TryBots beginning. Status page: https://farmer.golang.org/try?commit=02cb6bbc


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Gobot Gobot:

Patch Set 3: TryBot-Result+1

TryBots are happy.


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Neven Sajko:

Patch Set 3:

I figured out why the benchmarks for Float functions did not show improvement.

Profiling (with go test -bench Float -cpuprofile) shows that shlVU runs quite often and shrVU only rarely:

(pprof) top
Showing nodes accounting for 18920ms, 56.95% of 33220ms total
Dropped 243 nodes (cum <= 166.10ms)
Showing top 10 nodes out of 114
flat flat% sum% cum cum%
4510ms 13.58% 13.58% 4740ms 14.27% math/big.shr
4340ms 13.06% 26.64% 4340ms 13.06% math/big.shlVU
2820ms 8.49% 35.13% 2820ms 8.49% math/big.addMulVVW
1470ms 4.43% 39.55% 1470ms 4.43% math/big.subVV
1260ms 3.79% 43.35% 2240ms 6.74% math/big.(*Float).round
1000ms 3.01% 46.36% 1000ms 3.01% math/big.addVV
950ms 2.86% 49.22% 2980ms 8.97% runtime.mallocgc
930ms 2.80% 52.02% 4450ms 13.40% math/big.nat.shl
870ms 2.62% 54.64% 870ms 2.62% math/big.divWVW
770ms 2.32% 56.95% 6480ms 19.51% math/big.(*Float).usub

But 97% of times that shlVU is called, s (the shift uint argument) does not equal zero, thus preventing this optimization to make a difference in the macro benchmark. shrVU is called 35% of times with len(z) != 0 && s == 0 && z.base == x.base, but since shrVU itself runs more rarely that also fails to speed thing up.

Data for shlVU: 3.893254741497618e-09, 0.969874753994966, 0.015196813760319559, 0.014928428351459678

Data for shrVU: 2.1323989445478184e-07, 0.6460625040249031, 6.610436728098237e-06, 0.3539306722984744

The columns in the two rows above are ratios and refer respectively to the case when len(z) == len(x) == 0, the case when len(z) != 0 && s != 0, the case when len(z) != 0 && s == 0 && z.base != x.base, the case when len(z) != 0 && s == 0 && z.base == x.base.

The counts that correspond to the ratios are:

Data for shlVU: 1 249116695 3903370 3834434
Data for shrVU: 1 3029745 31 1659777

I did not investigate more deeply the usage patterns of shlVU and shrVU, but presumably in some cases shlVU and shrVU would be called with zero shift more often ... ?


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Josh Bleecher Snyder:

Patch Set 3:

I figured out why the benchmarks for Float functions did not show improvement.

Thanks, makes sense. I'm inclined to continue with this optimization, but I'd like Robert to weigh in.


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Josh Bleecher Snyder:

Patch Set 3:

Ping, Robert.


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Robert Griesemer:

Patch Set 3:

Patch Set 3:

Ping, Robert.

It's in my inbox but this will have to wait a bit. Sorry. Higher-priority items on my plate at the moment.


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

@gopherbot gopherbot force-pushed the master branch 3 times, most recently from 53bd915 to 6139019 Compare October 3, 2019 21:09
@gopherbot gopherbot force-pushed the master branch 6 times, most recently from 4a7ed1f to 0f992b9 Compare November 5, 2019 02:56
@gopherbot
Copy link
Contributor

Message from Robert Griesemer:

Patch Set 3:

Patch Set 3:

Patch Set 3:

Ping, Robert.

It's in my inbox but this will have to wait a bit. Sorry. Higher-priority items on my plate at the moment.

Just noticed this is still sitting here. Is this still current/relevant?


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Robert Griesemer:

Patch Set 3:

Patch Set 3:

Patch Set 3:

Patch Set 3:

Ping, Robert.

It's in my inbox but this will have to wait a bit. Sorry. Higher-priority items on my plate at the moment.

Just noticed this is still sitting here. Is this still current/relevant?

For go 1.15.


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Ian Lance Taylor:

Patch Set 3:

This still seems valid but would have to be updated to the current sources.


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Go Bot:

Patch Set 1:

TryBots beginning. Status page: https://farmer.golang.org/try?commit=112d35da


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Go Bot:

Patch Set 1: TryBot-Result+1

TryBots are happy.


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Go Bot:

Patch Set 3:

TryBots beginning. Status page: https://farmer.golang.org/try?commit=02cb6bbc


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Go Bot:

Patch Set 3: TryBot-Result+1

TryBots are happy.


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Ian Lance Taylor:

Patch Set 3:

This still seems valid but would have to be updated to the current sources.


Please don’t reply on this GitHub thread. Visit golang.org/cl/170257.
After addressing review feedback, remember to publish your drafts!

@heschi heschi closed this Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

math/big: optimize amd64 asm shlVU and shrVU for shift==0 case
4 participants