-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: no automatic use of fused multiply-add on amd64 even with GOAMD64=v3 #71204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Related Issues
Related Code Changes
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.) |
cc @golang/compiler |
Change https://go.dev/cl/646335 mentions this issue: |
Hi @dominikh I did the initial implementation in CL 646335 and it shows some nice gains on GOAMD=v3:
It fails on this particular test case. The results from FMA are slightly different than MULS(S|D)/ADDS(S|D). From my reading, FMA is supposed to be more precise. Do we make another set of tests for when GOAMD64 >= v3? Could you chime in here @randall77 ? Thanks!
|
Weirdly, it didn't fail on gerrit (besides on the recently buggy arm64 runners). |
That test should not fail. It uses the On arm64, we don't throw it away. On amd64, we do. We do need to run those tests in AMD64=v1 and AMD64=v3 modes now. I think we already do, we have a goamd64v3 builder on build.golang.org. Maybe that config is not in the default trybot set. |
Thanks. So, if I understand correctly, the fix is to implement zerowidth LoweredRound(32|64)F Ops on AMD64 and use them when lowering from Round(32|64)F? |
Yes, something like that. I'm not sure where the LoweredRound ops end up, but following arm64 and doing the same thing should work. |
Go version
go version go1.23.4 linux/amd64
Output of
go env
in your module/workspace:What did you do?
Compile the following program with
GOARCH=amd64 GOAMD64=v3 go build -gcflags=-S
What did you see happen?
What did you expect to see?
I expected fooImplicit and fooExplicit to generate identical code when setting GOAMD64=v3.
On arm64, the compiler detects the
x*y + z
pattern and automatically uses FMA. On amd64, math.FMA uses runtime feature detection unless the GOAMD64 environment variable is set to v3 or higher, in which case calls to math.FMA compile directly to VFMADD231SD. However,x*y + z
isn't detected, regardless of the value of GOAMD64.The text was updated successfully, but these errors were encountered: