Assembly for Arm v8.5-A ISA

I'm sure it has gotten the attention of everyone that Apple's M-chips are basically as fast as the state-of-the-art x86 processors (see [GMP's benchmark results](https://gmplib.org/gmpbench)). Therefore, I think we should implement assembly routines for these ones as well.

These are the current routines that should be implemented:
- [x] Hard(ish)coded multiplication (treated in #1808, works as a full replacement for `mpn_mul_basecase`)
- [x] Hardcoded squaring (treated in #1912)
- [x] Hardcoded high multiplication (treated in #1912)
- [x] Hardcoded high squaring (treated in #1912)
- [x] High multiplication, basecase (treated in #1912)
- [ ] High squaring, basecase
- [ ] Hardcoded low multiplication
- [ ] Hardcoded low squaring
- [ ] Low multiplication, basecase
- [ ] Low squaring, basecase

Useful links:
1. https://dougallj.github.io/applecpu/firestorm.html
2. https://dougallj.github.io/applecpu/firestorm-int.html
3. https://dougallj.github.io/applecpu/firestorm-simd.html
4. https://developer.arm.com/architectures/instruction-sets/intrinsics/
5. https://developer.arm.com/documentation/ddi0602/2023-12?lang=en
6. https://github.com/corsix/amx
7. https://stackoverflow.com/questions/70717360/how-to-load-vector-registers-from-integer-registers-in-arm64-m1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Assembly for Arm v8.5-A ISA #1806

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Assembly for Arm v8.5-A ISA #1806

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions