Skip to content

doc: figure out, document amd64 minimum requirements #19593

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bradfitz opened this issue Mar 17, 2017 · 11 comments
Closed

doc: figure out, document amd64 minimum requirements #19593

bradfitz opened this issue Mar 17, 2017 · 11 comments
Labels
Documentation Issues describing a change to documentation. FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Mar 17, 2017

In https://golang.org/cl/38320, @khr adds POPCOUNT intrinsics for amd64.

I realize that our https://golang.org/wiki/MinimumRequirements doesn't say anything about GOARCH=amd64. (only 386)

Which version of amd64 do we assume? Decide and document.

/cc @randall77 @ianlancetaylor @griesemer @rsc @mdempsky @josharian @minux @cherrymui @aclements

@bradfitz bradfitz added Documentation Issues describing a change to documentation. NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Mar 17, 2017
@bradfitz bradfitz added this to the Go1.9 milestone Mar 17, 2017
@bradfitz
Copy link
Contributor Author

(Related to #18616)

@randall77
Copy link
Contributor

I think I jumped the gun with the popcount CL, I didn't realize it was not in the same class as BSF/BSR. I think the right requirement is we assume SSE2 but nothing beyond that.

@aclements
Copy link
Member

You're right. Looks like it was introduced along with SSE 4.2 in the Core i7 965.

@randall77, any thoughts on using the CPUID flag to decide whether or not to use the instruction? Do you think it would be worthwhile to stash this in a global during runtime init and switch on it? Or dynamically generate a code page at runtime init?

@randall77
Copy link
Contributor

We do stash CPUID results in globals at runtime init, for e.g. deciding whether we have AES or AVX2 instructions. These are all used at a fairly high-level to decide between assembly algorithms with 100+ instruction bodies.
The POPCNT case would be using a global to guard a 1-instruction body. The overhead of the check is a lot more relevant in that scenario. In addition POPCNT would be the first compiler-generated case, so we need to figure out how to generate the fallback code (just Go code? In which case we would need lots of spill/restore. Some bespoke convention code page thing?). I'm not sure the complexity is worth it. That said, 99.99% of processors probably have POPCNT, so the performance of the fallback is ~irrelevant. But then testing the fallback is a pain...

@aclements
Copy link
Member

I did a simple but, I think, representative micro-benchmark of different ways to have a fallback and it seems like the global check is quite cheap:

goos: linux
goarch: amd64
BenchmarkUnguarded-4   	2000000000	         0.90 ns/op
BenchmarkGlobal-4      	2000000000	         0.60 ns/op
BenchmarkIndirect-4    	500000000	         3.71 ns/op
BenchmarkCodegen-4     	1000000000	         2.05 ns/op

I skimmed over the generated code and it seems reasonable. It's possible we could speed up indirect and codegen by making assumptions about what registers they clobber rather than treating them as typical Go functions that follow the Go calling convention.

Presumably the fact that the global check benchmarks as faster than no check at all is some microarchitectural quirk, but, nevertheless.

@TocarIP
Copy link
Contributor

TocarIP commented Mar 17, 2017

I'm not sure that benchmarks tell the whole story, in particular I'm afraid that we are running in:
http://stackoverflow.com/questions/25078285/replacing-a-32-bit-loop-count-variable-with-64-bit-introduces-crazy-performance (false dest dependency for popcnt)
Also there already a usage of guard + 1 instruction(popcnt) in https://github.com/golang/tools/blob/f1a397bba50dee815e8c73f3ec94ffc0e8df1a09/container/intsets/popcnt_amd64.go
What about something like GO386, but for AMD64?
We could have for example GOAMD64=AVX, which will enable nondestructive FP math + popcnt + ..

@randall77
Copy link
Contributor

What is "nondestructive FP math"? Is that the same as 3-operand instructions?

I think we'll get to a GOAMD64 at some point, but at this point the case for it isn't compelling. POPCNT and 3-operand floating point just isn't important enough to justify the split IMO.

@aclements
Copy link
Member

I'm not sure that benchmarks tell the whole story, in particular I'm afraid that we are running in:
http://stackoverflow.com/questions/25078285/replacing-a-32-bit-loop-count-variable-with-64-bit-introduces-crazy-performance (false dest dependency for popcnt)

Interesting. That might explain why BenchmarkGlobal is faster than BenchmarkUnguarded. BenchmarkGlobal happens to generate a MOV to the target register of the POPCNT, which should break the dependency chain. Amusingly, if you lift the load of havePopcount out of the loop so the compiler keeps it in a register, then BenchmarkGlobal loses this clobber and drops to the same 0.9 ns/op.

Still, the benchmarks are showing 2 to 3 cycles per loop iteration. That's not far from optimal whether or not we do the global load and I suspect a little more cleverness from the compiler could drop the global check even lower.

@TocarIP
Copy link
Contributor

TocarIP commented Mar 17, 2017

Yes, nondestructive means 3 op (source is not destroyed)

@josharian
Copy link
Contributor

Another instruction set of interest is FMA.

@randall77
Copy link
Contributor

Decided stock amd64 is the minimum requirement. I've updated the wiki.

@golang golang locked and limited conversation to collaborators Mar 21, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Documentation Issues describing a change to documentation. FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

6 participants