Add mpn_sqr and mpn_mulhigh routines for Arm #1912

albinahlback · 2024-04-09T16:03:55Z

And cleanup some code for related tests.

Added hardcoded squaring routines for $n < 10$ and added high multiplication routines, hardcoded ones for $n \le 8$ and then a basecase one.

Hopefully this Arm routine also works for Apple.

Also fixed mulhigh tests to check against a generic version, so that all architecture behaves the same.

albinahlback · 2024-04-09T16:04:34Z

I hope I didn't screw up preprocessor stuff in mpn_extras.h.

albinahlback · 2024-04-09T16:05:25Z

Btw, @fredrik-johansson, I only want us to define FLINT_HAVE_NATIVE_xxx if there is a native function for xxx (i.e. it is written in assembly).

albinahlback · 2024-04-09T16:09:33Z

Btw, here is some timings from an M1 running Debian profiling mulhigh:

        mul_n / mulhigh_n || mpfr / flint
n =  1:     0.885x        ||    5.292
n =  2:     1.256x        ||    5.759
n =  3:     1.451x        ||    4.383
n =  4:     1.512x        ||    3.481
n =  5:     1.556x        ||    3.351
n =  6:     1.594x        ||    3.250
n =  7:     1.639x        ||    3.019
n =  8:     1.558x        ||    2.254
n =  9:     1.437x        ||    1.758
n = 10:     1.437x        ||    1.622
n = 11:     1.390x        ||    1.545
n = 12:     1.406x        ||    1.531
n = 13:     1.376x        ||    1.512
n = 14:     1.355x        ||    1.428
n = 15:     1.333x        ||    1.386
n = 16:     1.397x        ||    1.368
n = 17:     1.372x        ||    1.783
n = 18:     1.363x        ||    1.859
n = 19:     1.388x        ||    1.784
n = 20:     1.310x        ||    1.590
n = 21:     1.306x        ||    1.565
n = 22:     1.248x        ||    1.530
n = 23:     1.294x        ||    1.333
n = 24:     1.274x        ||    1.667
n = 25:     1.293x        ||    1.639
n = 26:     1.241x        ||    1.352
n = 27:     1.262x        ||    1.349
n = 28:     1.264x        ||    1.599
n = 29:     1.256x        ||    1.342
n = 30:     1.237x        ||    1.327

And cleanup some code for related tests. Added routines are: * flint_mpn_sqr_N for N <= 9, * flint_mpn_mulhigh_N for N <= 8, * flint_mpn_sqrhigh_N for N <= 8, * _flint_mpn_mulhigh_basecase which works for n > 8. Also optimized flint_mpn_mul_8n for Arm.

albinahlback · 2024-04-10T12:52:53Z

Hardcoded sqrhigh routines:

         sqr / sqrhigh || mpfr / flint
n = 1:     1.006x      ||    5.956
n = 2:     1.061x      ||    3.139
n = 3:     1.243x      ||    2.463
n = 4:     1.248x      ||    2.381
n = 5:     1.409x      ||    2.591
n = 6:     1.524x      ||    2.910
n = 7:     1.342x      ||    2.728
n = 8:     1.752x      ||    3.268

albinahlback force-pushed the arm_sqr_mulhigh branch from 493cc84 to cf6b4a4 Compare April 10, 2024 12:46

albinahlback mentioned this pull request Apr 10, 2024

Assembly for Arm v8.5-A ISA #1806

Open

10 tasks

albinahlback merged commit 25146c4 into flintlib:main Apr 10, 2024

albinahlback deleted the arm_sqr_mulhigh branch April 10, 2024 13:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add mpn_sqr and mpn_mulhigh routines for Arm #1912

Add mpn_sqr and mpn_mulhigh routines for Arm #1912

Uh oh!

albinahlback commented Apr 9, 2024 •

edited

Loading

Uh oh!

albinahlback commented Apr 9, 2024

Uh oh!

albinahlback commented Apr 9, 2024

Uh oh!

albinahlback commented Apr 9, 2024

Uh oh!

albinahlback commented Apr 10, 2024

Uh oh!

Uh oh!

Add mpn_sqr and mpn_mulhigh routines for Arm #1912

Add mpn_sqr and mpn_mulhigh routines for Arm #1912

Uh oh!

Conversation

albinahlback commented Apr 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

albinahlback commented Apr 9, 2024

Uh oh!

albinahlback commented Apr 9, 2024

Uh oh!

albinahlback commented Apr 9, 2024

Uh oh!

albinahlback commented Apr 10, 2024

Uh oh!

Uh oh!

albinahlback commented Apr 9, 2024 •

edited

Loading