Skip to content

Commit 67c998d

Browse files
authored
bpo-41513: Expand comments and add references for a better understanding (GH-22123)
1 parent 63f102f commit 67c998d

File tree

1 file changed

+21
-5
lines changed

1 file changed

+21
-5
lines changed

Modules/mathmodule.c

+21-5
Original file line numberDiff line numberDiff line change
@@ -2419,9 +2419,9 @@ To avoid overflow/underflow and to achieve high accuracy giving results
24192419
that are almost always correctly rounded, four techniques are used:
24202420
24212421
* lossless scaling using a power-of-two scaling factor
2422-
* accurate squaring using Veltkamp-Dekker splitting
2423-
* compensated summation using a variant of the Neumaier algorithm
2424-
* differential correction of the square root
2422+
* accurate squaring using Veltkamp-Dekker splitting [1]
2423+
* compensated summation using a variant of the Neumaier algorithm [2]
2424+
* differential correction of the square root [3]
24252425
24262426
The usual presentation of the Neumaier summation algorithm has an
24272427
expensive branch depending on which operand has the larger
@@ -2456,7 +2456,11 @@ Given that csum >= 1.0, we have:
24562456
Since lo**2 is less than 1/2 ulp(csum), we have csum+lo*lo == csum.
24572457
24582458
To minimize loss of information during the accumulation of fractional
2459-
values, each term has a separate accumulator.
2459+
values, each term has a separate accumulator. This also breaks up
2460+
sequential dependencies in the inner loop so the CPU can maximize
2461+
floating point throughput. [5] On a 2.6 GHz Haswell, adding one
2462+
dimension has an incremental cost of only 5ns -- for example when
2463+
moving from hypot(x,y) to hypot(x,y,z).
24602464
24612465
The square root differential correction is needed because a
24622466
correctly rounded square root of a correctly rounded sum of
@@ -2466,20 +2470,32 @@ The differential correction starts with a value *x* that is
24662470
the difference between the square of *h*, the possibly inaccurately
24672471
rounded square root, and the accurately computed sum of squares.
24682472
The correction is the first order term of the Maclaurin series
2469-
expansion of sqrt(h**2 + x) == h + x/(2*h) + O(x**2).
2473+
expansion of sqrt(h**2 + x) == h + x/(2*h) + O(x**2). [4]
24702474
24712475
Essentially, this differential correction is equivalent to one
24722476
refinement step in Newton's divide-and-average square root
24732477
algorithm, effectively doubling the number of accurate bits.
24742478
This technique is used in Dekker's SQRT2 algorithm and again in
24752479
Borges' ALGORITHM 4 and 5.
24762480
2481+
Without proof for all cases, hypot() cannot claim to be always
2482+
correctly rounded. However for n <= 1000, prior to the final addition
2483+
that rounds the overall result, the internal accuracy of "h" together
2484+
with its correction of "x / (2.0 * h)" is at least 100 bits. [6]
2485+
Also, hypot() was tested against a Decimal implementation with
2486+
prec=300. After 100 million trials, no incorrectly rounded examples
2487+
were found. In addition, perfect commutativity (all permutations are
2488+
exactly equal) was verified for 1 billion random inputs with n=5. [7]
2489+
24772490
References:
24782491
24792492
1. Veltkamp-Dekker splitting: http://csclub.uwaterloo.ca/~pbarfuss/dekker1971.pdf
24802493
2. Compensated summation: http://www.ti3.tu-harburg.de/paper/rump/Ru08b.pdf
24812494
3. Square root differential correction: https://arxiv.org/pdf/1904.09481.pdf
24822495
4. https://www.wolframalpha.com/input/?i=Maclaurin+series+sqrt%28h**2+%2B+x%29+at+x%3D0
2496+
5. https://bugs.python.org/file49439/hypot.png
2497+
6. https://bugs.python.org/file49435/best_frac.py
2498+
7. https://bugs.python.org/file49448/test_hypot_commutativity.py
24832499
24842500
*/
24852501

0 commit comments

Comments
 (0)