Skip to content
This repository was archived by the owner on Apr 28, 2025. It is now read-only.

Add fmodf128 #470

Merged
merged 2 commits into from
Jan 24, 2025
Merged

Add fmodf128 #470

merged 2 commits into from
Jan 24, 2025

Conversation

tgross35
Copy link
Contributor

@tgross35 tgross35 commented Jan 24, 2025

This function is significantly slower than all others so includes an override in EXTREMELY_SLOW_TESTS. Without it, PR CI takes ~1hour and the extensive tests in CI take ~1day.

@tgross35 tgross35 mentioned this pull request Jan 24, 2025
@tgross35
Copy link
Contributor Author

tgross35 commented Jan 24, 2025

Based on the icount, it looks like fmodf128 should be about 28x slower than fmod, which seems reasonable given non-native integer sizes and a few paths with possible f128 multiplication. I might just need to turn down the test iteration here so it completes in a reasonable amount of time.

 icount::icount_bench_fmod_group::icount_bench_fmod logspace:setup_fmod()
  Baselines:                      softfloat|softfloat
  Instructions:                     1103106|1109057              (-0.53658%) [-1.00539x]
  L1 Hits:                          1105306|1111254              (-0.53525%) [-1.00538x]
  L2 Hits:                                0|1                    (-100.000%) [---inf---]
  RAM Hits:                               8|10                   (-20.0000%) [-1.25000x]
  Total read+write:                 1105314|1111265              (-0.53552%) [-1.00538x]
  Estimated Cycles:                 1105586|1111609              (-0.54183%) [-1.00545x]
icount::icount_bench_fmodf128_group::icount_bench_fmodf128 logspace:setup_fmodf128()
  Baselines:                      softfloat|softfloat
  Instructions:                    31327744|N/A                  (*********)
  L1 Hits:                         31361122|N/A                  (*********)
  L2 Hits:                                3|N/A                  (*********)
  RAM Hits:                              37|N/A                  (*********)
  Total read+write:                31361162|N/A                  (*********)
  Estimated Cycles:                31362432|N/A                  (*********)
icount::icount_bench_fmodf16_group::icount_bench_fmodf16 logspace:setup_fmodf16()
  Baselines:                      softfloat|softfloat
  Instructions:                       82388|N/A                  (*********)
  L1 Hits:                            97228|N/A                  (*********)
  L2 Hits:                                1|N/A                  (*********)
  RAM Hits:                              15|N/A                  (*********)
  Total read+write:                   97244|N/A                  (*********)
  Estimated Cycles:                   97758|N/A                  (*********)
icount::icount_bench_fmodf_group::icount_bench_fmodf logspace:setup_fmodf()
  Baselines:                      softfloat|softfloat
  Instructions:                      186830|188298               (-0.77962%) [-1.00786x]
  L1 Hits:                           189029|190498               (-0.77114%) [-1.00777x]
  L2 Hits:                                1|0                    (+++inf+++) [+++inf+++]
  RAM Hits:                               8|8                    (No change)
  Total read+write:                  189038|190506               (-0.77058%) [-1.00777x]
  Estimated Cycles:                  189314|190778               (-0.76738%) [-1.00773x]

@tgross35
Copy link
Contributor Author

tgross35 commented Jan 24, 2025

I am collecting a baseline for extensive test runtime at #471. fmod is doing slightly under 1M iterations/second. The extensive CI on this PR has fmodf128 at about 50k/s, which loosely correlates with the icount ratio.

Think I might just need to turn down the test iterations here unless there are some obvious perf wins.

Edit: the f64 baseline completed in about 30 minutes.

@tgross35 tgross35 force-pushed the f128-fmod branch 3 times, most recently from 7bcc6d7 to 513aa00 Compare January 24, 2025 08:21
Certain functions (`fmodf128`) are significantly slower than others,
to the point that running the default number of tests adds tens of
minutes to PR CI and extensive test time increases to ~1day. It does not
make sense to do this by default; so, introduce `EXTREMELY_SLOW_TESTS`
to test configuration that allows setting specific tests that need to
have a reduced iteration count.
This function is significantly slower than all others so includes an
override in `EXTREMELY_SLOW_TESTS`. Without it, PR CI takes ~1hour and
the extensive tests in CI take ~1day.
@tgross35
Copy link
Contributor Author

Seems like this is more or less expected. I added a way to override the default number of iterations to make sure tests don't just run forever.

@tgross35
Copy link
Contributor Author

With the override, extensive tests complete in 12 minutes and normal PR CI looks about the same. Seems reasonable 🎉

@tgross35 tgross35 merged commit 614c00d into rust-lang:master Jan 24, 2025
35 checks passed
@tgross35 tgross35 deleted the f128-fmod branch January 24, 2025 08:51
tgross35 added a commit that referenced this pull request Apr 18, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant