Skip to content

Conversation

albinahlback
Copy link
Collaborator

@albinahlback albinahlback commented Apr 9, 2024

And cleanup some code for related tests.

Added hardcoded squaring routines for $n < 10$ and added high multiplication routines, hardcoded ones for $n \le 8$ and then a basecase one.

Hopefully this Arm routine also works for Apple.

Also fixed mulhigh tests to check against a generic version, so that all architecture behaves the same.

@albinahlback
Copy link
Collaborator Author

I hope I didn't screw up preprocessor stuff in mpn_extras.h.

@albinahlback
Copy link
Collaborator Author

Btw, @fredrik-johansson, I only want us to define FLINT_HAVE_NATIVE_xxx if there is a native function for xxx (i.e. it is written in assembly).

@albinahlback
Copy link
Collaborator Author

Btw, here is some timings from an M1 running Debian profiling mulhigh:

        mul_n / mulhigh_n || mpfr / flint
n =  1:     0.885x        ||    5.292
n =  2:     1.256x        ||    5.759
n =  3:     1.451x        ||    4.383
n =  4:     1.512x        ||    3.481
n =  5:     1.556x        ||    3.351
n =  6:     1.594x        ||    3.250
n =  7:     1.639x        ||    3.019
n =  8:     1.558x        ||    2.254
n =  9:     1.437x        ||    1.758
n = 10:     1.437x        ||    1.622
n = 11:     1.390x        ||    1.545
n = 12:     1.406x        ||    1.531
n = 13:     1.376x        ||    1.512
n = 14:     1.355x        ||    1.428
n = 15:     1.333x        ||    1.386
n = 16:     1.397x        ||    1.368
n = 17:     1.372x        ||    1.783
n = 18:     1.363x        ||    1.859
n = 19:     1.388x        ||    1.784
n = 20:     1.310x        ||    1.590
n = 21:     1.306x        ||    1.565
n = 22:     1.248x        ||    1.530
n = 23:     1.294x        ||    1.333
n = 24:     1.274x        ||    1.667
n = 25:     1.293x        ||    1.639
n = 26:     1.241x        ||    1.352
n = 27:     1.262x        ||    1.349
n = 28:     1.264x        ||    1.599
n = 29:     1.256x        ||    1.342
n = 30:     1.237x        ||    1.327

And cleanup some code for related tests.

Added routines are:

* flint_mpn_sqr_N for N <= 9,

* flint_mpn_mulhigh_N for N <= 8,

* flint_mpn_sqrhigh_N for N <= 8,

* _flint_mpn_mulhigh_basecase which works for n > 8.

Also optimized flint_mpn_mul_8n for Arm.
@albinahlback
Copy link
Collaborator Author

Hardcoded sqrhigh routines:

         sqr / sqrhigh || mpfr / flint
n = 1:     1.006x      ||    5.956
n = 2:     1.061x      ||    3.139
n = 3:     1.243x      ||    2.463
n = 4:     1.248x      ||    2.381
n = 5:     1.409x      ||    2.591
n = 6:     1.524x      ||    2.910
n = 7:     1.342x      ||    2.728
n = 8:     1.752x      ||    3.268

@albinahlback albinahlback mentioned this pull request Apr 10, 2024
10 tasks
@albinahlback albinahlback merged commit 25146c4 into flintlib:main Apr 10, 2024
@albinahlback albinahlback deleted the arm_sqr_mulhigh branch April 10, 2024 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant