common/ompio: fix a floating point division problem #6286

edgargabriel · 2019-01-17T17:13:06Z

This commit fixes a problem reported on the mailing list with
individual writes larger than 512 MB.

The culprit is a floating point division of two large, close values.
Changing the datatypes from float to double (which is what is being
used in the fcoll components) fixes the problem.

See issue #6285 and

https://forum.hdfgroup.org/t/cannot-write-more-than-512-mb-in-1d/5118

Signed-off-by: Edgar Gabriel [email protected]

ax3l · 2019-01-17T17:20:32Z

ompi/mca/common/ompio/common_ompio_file_read.c

 	bytes_per_cycle = OMPIO_MCA_GET(fh, cycle_buffer_size);
    }
-    cycles = ceil((float)max_data/bytes_per_cycle);
+    cycles = ceil((double)max_data/bytes_per_cycle);


please don't change it from

ceil((float)max_data/bytes_per_cycle; // to ceil((double)max_data/bytes_per_cycle;

which only defers the issue.
Instead, just cast both max_data and bytes_per_cycle to uint64_t to be platform agnostic and convert it to an "integer-ceil" via:

(max_data + bytes_per_cycle - 1) / bytes_per_cycle

jsquyres · 2019-01-17T18:08:20Z

@edgargabriel Can you put a "Thanks to ..." note in your commit message? It helps us when creating NEWS items for the release.

Thanks!

edgargabriel · 2019-01-17T19:41:56Z

done both requests, thanks!

ompi/mca/common/ompio/common_ompio_file_write.c

psychocoderHPC · 2019-01-18T11:57:52Z

ompi/mca/common/ompio/common_ompio_file_read.c

 	bytes_per_cycle = OMPIO_MCA_GET(fh, cycle_buffer_size);
    }
-    cycles = ceil((float)max_data/bytes_per_cycle);
+    cycles = (max_data + bytes_per_cycle - 1) / bytes_per_cycle;


As @ax3l wrote here I would suggest to use uint64_t at this place else this equation will reduce the possible range of max_data on a 32Bit system to [0;max_data - bytes_per_cycle). All values larger for max_data will produce an overflow.
With the current PR it would not be possible to write ~4GiB data on a 32Bit system.

A possible other solution if the type should stay size_t is: cycles = max_data / bytes_per_cycle + (max_data % bytes_per_cycle ? 1u : 0u);

This integral ceil is also descripted already in the OpenMMPi code base:

ompi/ompi/mca/topo/treematch/treematch/uthash.h

Line 627 in 387b2ff

* ceil(n/b) = (n/b) + ((n%b)?1:0)

Well, honestly I think going back to the ceil solution might the `easiest way out, and simple avoid the type cast to a float. The problem is that changing the 'cycles' variable to uint64_t forces you to change that also for the 'index; variable (otherwise you get a warning in the for loop), and the interface of the build_io_array function which is used all over the code base. Not sure I want to make this sweeping change at this point.

Another solution could be to calculate the no of cycles into a temporary a uint64_t, and then assign that value using an int typcast to 'cycles'. This would have the advantage that we do not have to change the interfaces, and till get rid of the ceil function. An int as the datatype for the no_of_cycles is perfectly fine, you can read/write 512 MB per cycle, so (if my math is correct) that would still allow for 1024 PB of data overall being written, that should be good enough for the next decade.

Rather the latter, if you ask me. Be aware that there is no int-ceil in C, which would mean even without the cast it is an implicit cast to a float/double type...

What's the exact warning you see now? We actually refer to the variable max_data (and bytes_per_cycle to avoid type mixing) where we suggest to use uint64_t, not cycles.

@edgargabriel have you seen this comment? :) We just mentioned that max_data on 32-bit systems will limit writes to 4 Gbyte as it is only size_t. That's our concern.

Regarding the cycles var: it should probably also be at least of size_t, better also a fixes large integer type as you just added. An int is most of the time 2 to 4 bytes. With an upper bound of 32767 cycles on some 32-bit systems and 512 MByte per cycle this is as well "only" 16 PByte. (4 Byte ints are indeed opening this to 1 EByte). Not as critical here, but max_data is a concern and should be of larger type.

Similar to open-mpi#6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors. - remove floating point operations for `round up` - removes floating point conversion for round down (native behavior of integer division)

Similar to open-mpi#6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors. - remove floating point operations for `round up` - removes floating point conversion for round down (native behavior of integer division) Signed-off-by: René Widera <[email protected]>

edgargabriel · 2019-01-21T17:06:16Z

The failure in the pull request build checker is unrelated to this commit:
java.nio.file.FileSystemException: /global/homes/h/hpp/jenkins_cori/workspace/open-mpi.build.platforms/Platform/cray_cle: Disk quota exceeded

This commit fixes a problem reported on the mailing list with individual writes larger than 512 MB. The culprit is a floating point division of two large, close values. Changing the datatypes from float to double (which is what is being used in the fcoll components) fixes the problem. See issue open-mpi#6285 and https://forum.hdfgroup.org/t/cannot-write-more-than-512-mb-in-1d/5118 Thanks for Axel Huebl and René Widera for reporting the issue. Signed-off-by: Edgar Gabriel <[email protected]>

edgargabriel · 2019-01-21T23:59:39Z

You are right. I spent now more time on this (what was supposed to be ) one line fix than I can justify for my work. I am going back to my original solution, which was technically perfectly fine, and introduces the least amount of change from one release of the 3.1.x series to the next one (and dito for 4.0.x), and therefore the least surprise. Applications and tests that worked before, will also continue to work, and the bug originally reported in this issue is fixed. Thanks.

ax3l · 2019-01-22T08:06:56Z

Let us please just do a minimal fix as in #6287, the double-cast is not a clean solution, independent of the other aspect discussed regarding ranges.

These two lines are all that need to be changed, no better or worse with regards to ranges, but fixing the reported crash from the rounding issue in a future-proof manner:

diff --git a/ompi/mca/common/ompio/common_ompio_file_read.c b/ompi/mca/common/ompio/common_ompio_file_read.c
index 3203e2a697..49f0c89532 100644
--- a/ompi/mca/common/ompio/common_ompio_file_read.c
+++ b/ompi/mca/common/ompio/common_ompio_file_read.c
@@ -132,7 +132,7 @@ int mca_common_ompio_file_read (ompio_file_t *fh,
     else {
        bytes_per_cycle = OMPIO_MCA_GET(fh, cycle_buffer_size);
     }
-    cycles = ceil((float)max_data/bytes_per_cycle);
+    cycles = (max_data + bytes_per_cycle - 1) / bytes_per_cycle;
 
 #if 0
        printf ("Bytes per Cycle: %d   Cycles: %d max_data:%d \n",bytes_per_cycle, cycles, max_data);
diff --git a/ompi/mca/common/ompio/common_ompio_file_write.c b/ompi/mca/common/ompio/common_ompio_file_write.c
index e53a1d080b..6ea8bae064 100644
--- a/ompi/mca/common/ompio/common_ompio_file_write.c
+++ b/ompi/mca/common/ompio/common_ompio_file_write.c
@@ -116,7 +116,7 @@ int mca_common_ompio_file_write (ompio_file_t *fh,
     else {
        bytes_per_cycle = OMPIO_MCA_GET(fh, cycle_buffer_size);
     }
-    cycles = ceil((float)max_data/bytes_per_cycle);
+    cycles = (max_data + bytes_per_cycle - 1) / bytes_per_cycle;
 
 #if 0
     printf ("Bytes per Cycle: %d   Cycles: %d\n", bytes_per_cycle, cycles);

Everything else one can address and refactor later.

psychocoderHPC · 2019-01-22T10:59:43Z

Let us please just do a minimal fix as in #6287, the double-cast is not a clean solution, independent of the other aspect discussed regarding ranges.

These two lines are all that need to be changed, no better or worse with regards to ranges, but fixing the reported crash from the rounding issue in a future-proof manner:

diff --git a/ompi/mca/common/ompio/common_ompio_file_read.c b/ompi/mca/common/ompio/common_ompio_file_read.c
index 3203e2a697..49f0c89532 100644
--- a/ompi/mca/common/ompio/common_ompio_file_read.c
+++ b/ompi/mca/common/ompio/common_ompio_file_read.c
@@ -132,7 +132,7 @@ int mca_common_ompio_file_read (ompio_file_t *fh,
     else {
        bytes_per_cycle = OMPIO_MCA_GET(fh, cycle_buffer_size);
     }
-    cycles = ceil((float)max_data/bytes_per_cycle);
+    cycles = (max_data + bytes_per_cycle - 1) / bytes_per_cycle;
 
 #if 0
        printf ("Bytes per Cycle: %d   Cycles: %d max_data:%d \n",bytes_per_cycle, cycles, max_data);
diff --git a/ompi/mca/common/ompio/common_ompio_file_write.c b/ompi/mca/common/ompio/common_ompio_file_write.c
index e53a1d080b..6ea8bae064 100644
--- a/ompi/mca/common/ompio/common_ompio_file_write.c
+++ b/ompi/mca/common/ompio/common_ompio_file_write.c
@@ -116,7 +116,7 @@ int mca_common_ompio_file_write (ompio_file_t *fh,
     else {
        bytes_per_cycle = OMPIO_MCA_GET(fh, cycle_buffer_size);
     }
-    cycles = ceil((float)max_data/bytes_per_cycle);
+    cycles = (max_data + bytes_per_cycle - 1) / bytes_per_cycle;
 
 #if 0
     printf ("Bytes per Cycle: %d   Cycles: %d\n", bytes_per_cycle, cycles);

Everything else one can address and refactor later.

cycles = max_data / bytes_per_cycle + (max_data % bytes_per_cycle ? 1u : 0u); (without overflow risk) must be used insead of cycles = (max_data + bytes_per_cycle - 1) / bytes_per_cycle; to guarantee that you can write 4GiB on 32bit a systems.

ax3l · 2019-01-22T11:53:05Z

@psychocoderHPC true, otherwise we would cap the upper range a little.
So the diff we propose is:

diff --git a/ompi/mca/common/ompio/common_ompio_file_read.c b/ompi/mca/common/ompio/common_ompio_file_read.c
index 3203e2a697..fb863a3821 100644
--- a/ompi/mca/common/ompio/common_ompio_file_read.c
+++ b/ompi/mca/common/ompio/common_ompio_file_read.c
@@ -132,7 +132,7 @@ int mca_common_ompio_file_read (ompio_file_t *fh,
     else {
        bytes_per_cycle = OMPIO_MCA_GET(fh, cycle_buffer_size);
     }
-    cycles = ceil((float)max_data/bytes_per_cycle);
+    cycles = max_data / bytes_per_cycle + (max_data % bytes_per_cycle > 0u ? 1u : 0u);
 
 #if 0
        printf ("Bytes per Cycle: %d   Cycles: %d max_data:%d \n",bytes_per_cycle, cycles, max_data);
diff --git a/ompi/mca/common/ompio/common_ompio_file_write.c b/ompi/mca/common/ompio/common_ompio_file_write.c
index e53a1d080b..d60c68828f 100644
--- a/ompi/mca/common/ompio/common_ompio_file_write.c
+++ b/ompi/mca/common/ompio/common_ompio_file_write.c
@@ -116,7 +116,7 @@ int mca_common_ompio_file_write (ompio_file_t *fh,
     else {
        bytes_per_cycle = OMPIO_MCA_GET(fh, cycle_buffer_size);
     }
-    cycles = ceil((float)max_data/bytes_per_cycle);
+    cycles = max_data / bytes_per_cycle + (max_data % bytes_per_cycle > 0u ? 1u : 0u);
 
 #if 0
     printf ("Bytes per Cycle: %d   Cycles: %d\n", bytes_per_cycle, cycles);

(Slightly edited since I would avoid implicit integer to bool conversions as they both potentially thrown warnings and outsmart the reader)

edgargabriel · 2019-01-22T14:35:01Z

There is no advantage to your solution compared to mine. So I do not see the point of this. This is the end of the discussion from my point of view.

jsquyres · 2019-01-22T14:39:42Z

bot:ompi:retest

psychocoderHPC · 2019-01-22T14:51:09Z

@edgargabriel The point is that double floating point numbers has 53 significant digits. This means we can represent all byte numbers up to 8PiB without precision loss. After 8PiB we get the same problem again. So will run into the same issue in the next 5 to 10 years, again.

edgargabriel · 2019-01-22T14:57:04Z

I doubt that in my lifetime one 'internal iteration' will be increased from 512MB to 8PiB. This is not the overall amount of data that we are talking about.

jsquyres · 2019-01-22T15:02:06Z

Let me ask what might be a naieve question (I'm not an IO expert and I'm not very familiar with this code). It looks like this code is looping over reading bytes_per_cycle bytes each iteration.

Is it feasible to read 8PiB in a single iteration? Or will we be using some other algorithm for that?

(...and now I see Edgar just made a similar observation -- that was not intentional! 😄)

ax3l · 2019-01-22T15:24:16Z

All fine with us, we are not trying to nag.
We just searched for a solution that avoids a cast, since this is pure byte-counter math, and is type-agnostic in case the underlying types get refactored some day.

edgargabriel · 2019-01-24T15:58:45Z

bot:ompi:retest

edgargabriel · 2019-01-25T15:15:51Z

bot:ompi:retest

ax3l · 2019-01-25T17:40:04Z

One of the CI servers has run out of disk quota.

hppritcha · 2019-01-25T22:06:55Z

bot:lanl:retest

hppritcha · 2019-01-26T20:44:17Z

bot:ompi:retest

edgargabriel · 2019-01-28T14:28:39Z

@hppritcha we are still seeing the same quota problem on the cray, which leads to the failed builds:

final close failed: Disk quota exceeded

hppritcha · 2019-01-28T16:24:42Z

bot:ompi:retest

Similar to open-mpi#6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors. - remove floating point operations for `round up` - removes floating point conversion for round down (native behavior of integer division) Signed-off-by: René Widera <[email protected]>

Similar to open-mpi#6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors. - remove floating point operations for `round up` - removes floating point conversion for round down (native behavior of integer division) Signed-off-by: René Widera <[email protected]> (cherry picked from commit a91fab8)

Similar to open-mpi#6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors. - remove floating point operations for `round up` - removes floating point conversion for round down (native behavior of integer division) Signed-off-by: René Widera <[email protected]> Note: a direct cherry pick of commit a91fab8 is not possible, due to structural differences between the the 3.1.x and the master/v4.0.x branch. This commit is the equivalent of commit a91fab8. Signed-off-by: Edgar Gabriel <[email protected]>

ax3l suggested changes Jan 17, 2019

View reviewed changes

jsquyres added bug NEWS Target: main Target: v5.0.x labels Jan 17, 2019

edgargabriel force-pushed the pr/floating-point-division-problem branch from d7c394e to 431b19a Compare January 17, 2019 19:40

ax3l suggested changes Jan 17, 2019

View reviewed changes

ompi/mca/common/ompio/common_ompio_file_write.c Outdated Show resolved Hide resolved

ax3l mentioned this pull request Jan 17, 2019

MPI I/O Crashes and Corruption in HDF5 #6285

Closed

2 tasks

psychocoderHPC reviewed Jan 18, 2019

View reviewed changes

psychocoderHPC mentioned this pull request Jan 18, 2019

common/ompio: possible rounding issue #6287

Merged

edgargabriel force-pushed the pr/floating-point-division-problem branch from 431b19a to 5cae28b Compare January 21, 2019 15:56

edgargabriel force-pushed the pr/floating-point-division-problem branch from 5cae28b to c0f8ce0 Compare January 21, 2019 23:59

edgargabriel merged commit 7023357 into open-mpi:master Jan 29, 2019

ax3l mentioned this pull request Feb 23, 2019

Prepare for v3.1.4 release #6414

Merged

ax3l mentioned this pull request Feb 23, 2019

Pr/v3.0.x OMPIO Crash on Large Data #6428

Merged

edgargabriel deleted the pr/floating-point-division-problem branch November 26, 2019 16:38

common/ompio: fix a floating point division problem #6286

common/ompio: fix a floating point division problem #6286

Uh oh!

Conversation

edgargabriel commented Jan 17, 2019

Uh oh!

ax3l Jan 17, 2019

Choose a reason for hiding this comment

Uh oh!

jsquyres commented Jan 17, 2019

Uh oh!

edgargabriel commented Jan 17, 2019

Uh oh!

Uh oh!

psychocoderHPC Jan 18, 2019

Choose a reason for hiding this comment

Uh oh!

psychocoderHPC Jan 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edgargabriel Jan 20, 2019

Choose a reason for hiding this comment

Uh oh!

edgargabriel Jan 20, 2019

Choose a reason for hiding this comment

Uh oh!

ax3l Jan 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ax3l Jan 21, 2019

Choose a reason for hiding this comment

Uh oh!

edgargabriel commented Jan 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edgargabriel commented Jan 21, 2019

Uh oh!

ax3l commented Jan 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

psychocoderHPC commented Jan 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ax3l commented Jan 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edgargabriel commented Jan 22, 2019

Uh oh!

jsquyres commented Jan 22, 2019

Uh oh!

psychocoderHPC commented Jan 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edgargabriel commented Jan 22, 2019

Uh oh!

jsquyres commented Jan 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ax3l commented Jan 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edgargabriel commented Jan 24, 2019

Uh oh!

edgargabriel commented Jan 25, 2019

Uh oh!

ax3l commented Jan 25, 2019 via email

Uh oh!

hppritcha commented Jan 25, 2019

Uh oh!

hppritcha commented Jan 26, 2019

Uh oh!

edgargabriel commented Jan 28, 2019

Uh oh!

hppritcha commented Jan 28, 2019

Uh oh!

Reviewers

Assignees

Labels

psychocoderHPC Jan 18, 2019 •

edited

Loading

ax3l Jan 21, 2019 •

edited

Loading

edgargabriel commented Jan 21, 2019 •

edited

Loading

ax3l commented Jan 22, 2019 •

edited

Loading

psychocoderHPC commented Jan 22, 2019 •

edited

Loading

ax3l commented Jan 22, 2019 •

edited

Loading

psychocoderHPC commented Jan 22, 2019 •

edited

Loading

jsquyres commented Jan 22, 2019 •

edited

Loading

ax3l commented Jan 22, 2019 •

edited

Loading