Skip to content

Go 1.7 implemented AVX2 support should we do ARM64 support upstream? #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
harshavardhana opened this issue Oct 5, 2016 · 14 comments
Closed

Comments

@harshavardhana
Copy link
Member

https://golang.org/src/crypto/sha256/sha256block_amd64.s , we should also update the benchmarks.

@ncw
Copy link

ncw commented Jan 12, 2017

I think you should upstream the ARMv8 code. I wrote the assembly support for MD5 and SHA1 for ARM 32 bit and I found the experience interacting with the go team very positive. I think for a 100x speedup they would be very interested!

@harshavardhana
Copy link
Member Author

I think you should upstream the ARMv8 code. I wrote the assembly support for MD5 and SHA1 for ARM 32 bit and I found the experience interacting with the go team very positive. I think for a 100x speedup they would be very interested!

That is the plan @ncw

@ncw
Copy link

ncw commented Jan 12, 2017

Have you looked at the ARMv8 AES instruction too?

@fwessels
Copy link
Contributor

@ncw We need to do some code cleanup, esp. on the Intel side as this was still developed under Go 1.6 before a lot of new AVX instructions were introduced in 1.7.

Regarding ARM we are first going to look at PMULL (polynomial arithmetic) to speed up Erasure Coding. AES would be interesting too -- do you have a specific need for this?

@ncw
Copy link

ncw commented Jan 12, 2017

@fwessels I've had a complaint about the speed of SSL with rclone on ARMv8: rclone/rclone#1013 - using the AES instruction would help greatly! That probably affects minio too.

@fwessels
Copy link
Contributor

@ncw We haven't had the complaint thus far, but it would most likely affect minio too. Are you aware whether there is maybe any ARM sample code out there that could potentially be ported?

Although it would be fun to do, too be honest, it is not high on our priority list, so don't expect anything soon...

@vielmetti
Copy link

I'm happy to help wrangle sample code and look for testbed resource to do and to benchmark this work.

@fwessels
Copy link
Contributor

@vielmetti That would be great, I did a quick browse around and here is a pointer to some sample code to maybe get you started:

(note that this is actually a perl script which, when executed, generates the actual assembly file to go through the assembler)

Let us know if you would have any further questions, we'd be happy to help.

@yonderblue
Copy link

Would also love the arm64 AES support here or upstream :)

@vielmetti
Copy link

I cross-linked the upstream issue golang/go#18498 to identify folks who are working on and interested in this.

@fwessels
Copy link
Contributor

@vielmetti Thank you for the cross-post, we were not aware of this one.

@vielmetti
Copy link

Thanks @fwessels . I also note @williamweixiao comments in golang/go#19715 (comment) as relevant

@williamweixiao
Copy link

yes, we (arm enterprise language team) have planned to optimize hash, crypto and runtime with armv8 SIMD instructions this year. since most optimizations will be implemented by assembly and upstream hopes us to add the missing disassembler first to enhance the assembler test for future adding new instructions (golang/go#18070).
we are now working at developing arm64 decoder which is expected to be ready next month. after that we will start optimizing CRC, AES and runtime utility functions.
welcome to sync with us about optimizing golang for arm64.

@kannappanr
Copy link

Will revisit this work on to upstream our implementation at a later date, closing this issue for now.

hunjixin pushed a commit to ipfs-force-community/sha256-simd that referenced this issue May 23, 2022
```
$ go test -run=NONE -bench .
PASS
BenchmarkHash64-4  	 1000000	      1036 ns/op	  61.77 MB/s
BenchmarkHash128-4 	 2000000	       801 ns/op	 159.67 MB/s
BenchmarkHash1K-4  	  500000	      2464 ns/op	 415.53 MB/s
BenchmarkHash8K-4  	  200000	     11212 ns/op	 730.60 MB/s
BenchmarkHash32K-4 	   30000	     40766 ns/op	 803.80 MB/s
BenchmarkHash128K-4	   10000	    163170 ns/op	 803.28 MB/s
ok  	github.com/minio/blake2b-simd	10.298s
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants