Skip to content

Conversation

@edgargabriel
Copy link
Member

This commit fixes a problem reported on the mailing list with
individual writes larger than 512 MB.

The culprit is a floating point division of two large, close values.
Changing the datatypes from float to double (which is what is being
used in the fcoll components) fixes the problem.

See issue #6285 and

https://forum.hdfgroup.org/t/cannot-write-more-than-512-mb-in-1d/5118

Signed-off-by: Edgar Gabriel [email protected]

bytes_per_cycle = OMPIO_MCA_GET(fh, cycle_buffer_size);
}
cycles = ceil((float)max_data/bytes_per_cycle);
cycles = ceil((double)max_data/bytes_per_cycle);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't change it from

ceil((float)max_data/bytes_per_cycle;
// to
ceil((double)max_data/bytes_per_cycle;

which only defers the issue.
Instead, just cast both max_data and bytes_per_cycle to uint64_t to be platform agnostic and convert it to an "integer-ceil" via:

(max_data + bytes_per_cycle - 1) / bytes_per_cycle

@jsquyres
Copy link
Member

@edgargabriel Can you put a "Thanks to ..." note in your commit message? It helps us when creating NEWS items for the release.

Thanks!

@edgargabriel
Copy link
Member Author

done both requests, thanks!

@ax3l ax3l mentioned this pull request Jan 17, 2019
2 tasks
bytes_per_cycle = OMPIO_MCA_GET(fh, cycle_buffer_size);
}
cycles = ceil((float)max_data/bytes_per_cycle);
cycles = (max_data + bytes_per_cycle - 1) / bytes_per_cycle;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @ax3l wrote here I would suggest to use uint64_t at this place else this equation will reduce the possible range of max_data on a 32Bit system to [0;max_data - bytes_per_cycle). All values larger for max_data will produce an overflow.
With the current PR it would not be possible to write ~4GiB data on a 32Bit system.

Copy link
Contributor

@psychocoderHPC psychocoderHPC Jan 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible other solution if the type should stay size_t is: cycles = max_data / bytes_per_cycle + (max_data % bytes_per_cycle ? 1u : 0u);

This integral ceil is also descripted already in the OpenMMPi code base:

* ceil(n/b) = (n/b) + ((n%b)?1:0)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, honestly I think going back to the ceil solution might the `easiest way out, and simple avoid the type cast to a float. The problem is that changing the 'cycles' variable to uint64_t forces you to change that also for the 'index; variable (otherwise you get a warning in the for loop), and the interface of the build_io_array function which is used all over the code base. Not sure I want to make this sweeping change at this point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another solution could be to calculate the no of cycles into a temporary a uint64_t, and then assign that value using an int typcast to 'cycles'. This would have the advantage that we do not have to change the interfaces, and till get rid of the ceil function. An int as the datatype for the no_of_cycles is perfectly fine, you can read/write 512 MB per cycle, so (if my math is correct) that would still allow for 1024 PB of data overall being written, that should be good enough for the next decade.

Copy link

@ax3l ax3l Jan 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather the latter, if you ask me. Be aware that there is no int-ceil in C, which would mean even without the cast it is an implicit cast to a float/double type...

What's the exact warning you see now? We actually refer to the variable max_data (and bytes_per_cycle to avoid type mixing) where we suggest to use uint64_t, not cycles.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgargabriel have you seen this comment? :) We just mentioned that max_data on 32-bit systems will limit writes to 4 Gbyte as it is only size_t. That's our concern.

Regarding the cycles var: it should probably also be at least of size_t, better also a fixes large integer type as you just added. An int is most of the time 2 to 4 bytes. With an upper bound of 32767 cycles on some 32-bit systems and 512 MByte per cycle this is as well "only" 16 PByte. (4 Byte ints are indeed opening this to 1 EByte). Not as critical here, but max_data is a concern and should be of larger type.

psychocoderHPC added a commit to psychocoderHPC/ompi that referenced this pull request Jan 18, 2019
Similar to open-mpi#6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors.

- remove floating point operations for `round up`
- removes floating point conversion for round down (native behavior of integer division)
psychocoderHPC added a commit to psychocoderHPC/ompi that referenced this pull request Jan 18, 2019
Similar to open-mpi#6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors.

- remove floating point operations for `round up`
- removes floating point conversion for round down (native behavior of integer division)

Signed-off-by: René Widera <[email protected]>
@edgargabriel edgargabriel force-pushed the pr/floating-point-division-problem branch from 431b19a to 5cae28b Compare January 21, 2019 15:56
edgargabriel pushed a commit to edgargabriel/ompi that referenced this pull request Jan 21, 2019
Similar to open-mpi#6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors.

- remove floating point operations for `round up`
- removes floating point conversion for round down (native behavior of integer division)

Signed-off-by: René Widera <[email protected]>
edgargabriel pushed a commit to edgargabriel/ompi that referenced this pull request Jan 21, 2019
Similar to open-mpi#6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors.

- remove floating point operations for `round up`
- removes floating point conversion for round down (native behavior of integer division)

Signed-off-by: René Widera <[email protected]>
@edgargabriel
Copy link
Member Author

edgargabriel commented Jan 21, 2019

The failure in the pull request build checker is unrelated to this commit:
java.nio.file.FileSystemException: /global/homes/h/hpp/jenkins_cori/workspace/open-mpi.build.platforms/Platform/cray_cle: Disk quota exceeded

This commit fixes  a problem reported on the mailing list with
individual writes larger than 512 MB.

The culprit is a floating point division of two large, close values.
Changing the datatypes from float to double (which is what is being
used in the fcoll components) fixes the problem.

See issue open-mpi#6285 and

 https://forum.hdfgroup.org/t/cannot-write-more-than-512-mb-in-1d/5118

Thanks for Axel Huebl and René Widera for reporting the issue.

Signed-off-by: Edgar Gabriel <[email protected]>
@edgargabriel edgargabriel force-pushed the pr/floating-point-division-problem branch from 5cae28b to c0f8ce0 Compare January 21, 2019 23:59
@edgargabriel
Copy link
Member Author

You are right. I spent now more time on this (what was supposed to be ) one line fix than I can justify for my work. I am going back to my original solution, which was technically perfectly fine, and introduces the least amount of change from one release of the 3.1.x series to the next one (and dito for 4.0.x), and therefore the least surprise. Applications and tests that worked before, will also continue to work, and the bug originally reported in this issue is fixed. Thanks.

@ax3l
Copy link

ax3l commented Jan 22, 2019

Let us please just do a minimal fix as in #6287, the double-cast is not a clean solution, independent of the other aspect discussed regarding ranges.

These two lines are all that need to be changed, no better or worse with regards to ranges, but fixing the reported crash from the rounding issue in a future-proof manner:

diff --git a/ompi/mca/common/ompio/common_ompio_file_read.c b/ompi/mca/common/ompio/common_ompio_file_read.c
index 3203e2a697..49f0c89532 100644
--- a/ompi/mca/common/ompio/common_ompio_file_read.c
+++ b/ompi/mca/common/ompio/common_ompio_file_read.c
@@ -132,7 +132,7 @@ int mca_common_ompio_file_read (ompio_file_t *fh,
     else {
        bytes_per_cycle = OMPIO_MCA_GET(fh, cycle_buffer_size);
     }
-    cycles = ceil((float)max_data/bytes_per_cycle);
+    cycles = (max_data + bytes_per_cycle - 1) / bytes_per_cycle;
 
 #if 0
        printf ("Bytes per Cycle: %d   Cycles: %d max_data:%d \n",bytes_per_cycle, cycles, max_data);
diff --git a/ompi/mca/common/ompio/common_ompio_file_write.c b/ompi/mca/common/ompio/common_ompio_file_write.c
index e53a1d080b..6ea8bae064 100644
--- a/ompi/mca/common/ompio/common_ompio_file_write.c
+++ b/ompi/mca/common/ompio/common_ompio_file_write.c
@@ -116,7 +116,7 @@ int mca_common_ompio_file_write (ompio_file_t *fh,
     else {
        bytes_per_cycle = OMPIO_MCA_GET(fh, cycle_buffer_size);
     }
-    cycles = ceil((float)max_data/bytes_per_cycle);
+    cycles = (max_data + bytes_per_cycle - 1) / bytes_per_cycle;
 
 #if 0
     printf ("Bytes per Cycle: %d   Cycles: %d\n", bytes_per_cycle, cycles);

Everything else one can address and refactor later.

@psychocoderHPC
Copy link
Contributor

psychocoderHPC commented Jan 22, 2019

Let us please just do a minimal fix as in #6287, the double-cast is not a clean solution, independent of the other aspect discussed regarding ranges.

These two lines are all that need to be changed, no better or worse with regards to ranges, but fixing the reported crash from the rounding issue in a future-proof manner:

diff --git a/ompi/mca/common/ompio/common_ompio_file_read.c b/ompi/mca/common/ompio/common_ompio_file_read.c
index 3203e2a697..49f0c89532 100644
--- a/ompi/mca/common/ompio/common_ompio_file_read.c
+++ b/ompi/mca/common/ompio/common_ompio_file_read.c
@@ -132,7 +132,7 @@ int mca_common_ompio_file_read (ompio_file_t *fh,
     else {
        bytes_per_cycle = OMPIO_MCA_GET(fh, cycle_buffer_size);
     }
-    cycles = ceil((float)max_data/bytes_per_cycle);
+    cycles = (max_data + bytes_per_cycle - 1) / bytes_per_cycle;
 
 #if 0
        printf ("Bytes per Cycle: %d   Cycles: %d max_data:%d \n",bytes_per_cycle, cycles, max_data);
diff --git a/ompi/mca/common/ompio/common_ompio_file_write.c b/ompi/mca/common/ompio/common_ompio_file_write.c
index e53a1d080b..6ea8bae064 100644
--- a/ompi/mca/common/ompio/common_ompio_file_write.c
+++ b/ompi/mca/common/ompio/common_ompio_file_write.c
@@ -116,7 +116,7 @@ int mca_common_ompio_file_write (ompio_file_t *fh,
     else {
        bytes_per_cycle = OMPIO_MCA_GET(fh, cycle_buffer_size);
     }
-    cycles = ceil((float)max_data/bytes_per_cycle);
+    cycles = (max_data + bytes_per_cycle - 1) / bytes_per_cycle;
 
 #if 0
     printf ("Bytes per Cycle: %d   Cycles: %d\n", bytes_per_cycle, cycles);

Everything else one can address and refactor later.

cycles = max_data / bytes_per_cycle + (max_data % bytes_per_cycle ? 1u : 0u); (without overflow risk) must be used insead of cycles = (max_data + bytes_per_cycle - 1) / bytes_per_cycle; to guarantee that you can write 4GiB on 32bit a systems.

@ax3l
Copy link

ax3l commented Jan 22, 2019

@psychocoderHPC true, otherwise we would cap the upper range a little.
So the diff we propose is:

diff --git a/ompi/mca/common/ompio/common_ompio_file_read.c b/ompi/mca/common/ompio/common_ompio_file_read.c
index 3203e2a697..fb863a3821 100644
--- a/ompi/mca/common/ompio/common_ompio_file_read.c
+++ b/ompi/mca/common/ompio/common_ompio_file_read.c
@@ -132,7 +132,7 @@ int mca_common_ompio_file_read (ompio_file_t *fh,
     else {
        bytes_per_cycle = OMPIO_MCA_GET(fh, cycle_buffer_size);
     }
-    cycles = ceil((float)max_data/bytes_per_cycle);
+    cycles = max_data / bytes_per_cycle + (max_data % bytes_per_cycle > 0u ? 1u : 0u);
 
 #if 0
        printf ("Bytes per Cycle: %d   Cycles: %d max_data:%d \n",bytes_per_cycle, cycles, max_data);
diff --git a/ompi/mca/common/ompio/common_ompio_file_write.c b/ompi/mca/common/ompio/common_ompio_file_write.c
index e53a1d080b..d60c68828f 100644
--- a/ompi/mca/common/ompio/common_ompio_file_write.c
+++ b/ompi/mca/common/ompio/common_ompio_file_write.c
@@ -116,7 +116,7 @@ int mca_common_ompio_file_write (ompio_file_t *fh,
     else {
        bytes_per_cycle = OMPIO_MCA_GET(fh, cycle_buffer_size);
     }
-    cycles = ceil((float)max_data/bytes_per_cycle);
+    cycles = max_data / bytes_per_cycle + (max_data % bytes_per_cycle > 0u ? 1u : 0u);
 
 #if 0
     printf ("Bytes per Cycle: %d   Cycles: %d\n", bytes_per_cycle, cycles);

(Slightly edited since I would avoid implicit integer to bool conversions as they both potentially thrown warnings and outsmart the reader)

@edgargabriel
Copy link
Member Author

There is no advantage to your solution compared to mine. So I do not see the point of this. This is the end of the discussion from my point of view.

@jsquyres
Copy link
Member

bot:ompi:retest

@psychocoderHPC
Copy link
Contributor

psychocoderHPC commented Jan 22, 2019

@edgargabriel The point is that double floating point numbers has 53 significant digits. This means we can represent all byte numbers up to 8PiB without precision loss. After 8PiB we get the same problem again. So will run into the same issue in the next 5 to 10 years, again.

@edgargabriel
Copy link
Member Author

I doubt that in my lifetime one 'internal iteration' will be increased from 512MB to 8PiB. This is not the overall amount of data that we are talking about.

@jsquyres
Copy link
Member

jsquyres commented Jan 22, 2019

Let me ask what might be a naieve question (I'm not an IO expert and I'm not very familiar with this code). It looks like this code is looping over reading bytes_per_cycle bytes each iteration.

Is it feasible to read 8PiB in a single iteration? Or will we be using some other algorithm for that?

(...and now I see Edgar just made a similar observation -- that was not intentional! 😄)

@ax3l
Copy link

ax3l commented Jan 22, 2019

All fine with us, we are not trying to nag.
We just searched for a solution that avoids a cast, since this is pure byte-counter math, and is type-agnostic in case the underlying types get refactored some day.

@edgargabriel
Copy link
Member Author

bot:ompi:retest

1 similar comment
@edgargabriel
Copy link
Member Author

bot:ompi:retest

@ax3l
Copy link

ax3l commented Jan 25, 2019 via email

@hppritcha
Copy link
Member

bot:lanl:retest

@hppritcha
Copy link
Member

bot:ompi:retest

@edgargabriel
Copy link
Member Author

@hppritcha we are still seeing the same quota problem on the cray, which leads to the failed builds:

final close failed: Disk quota exceeded

@hppritcha
Copy link
Member

bot:ompi:retest

@edgargabriel edgargabriel merged commit 7023357 into open-mpi:master Jan 29, 2019
edgargabriel pushed a commit to edgargabriel/ompi that referenced this pull request Jan 29, 2019
Similar to open-mpi#6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors.

- remove floating point operations for `round up`
- removes floating point conversion for round down (native behavior of integer division)

Signed-off-by: René Widera <[email protected]>
edgargabriel pushed a commit to edgargabriel/ompi that referenced this pull request Jan 30, 2019
Similar to open-mpi#6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors.

- remove floating point operations for `round up`
- removes floating point conversion for round down (native behavior of integer division)

Signed-off-by: René Widera <[email protected]>
edgargabriel pushed a commit to edgargabriel/ompi that referenced this pull request Jan 30, 2019
Similar to open-mpi#6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors.

- remove floating point operations for `round up`
- removes floating point conversion for round down (native behavior of integer division)

Signed-off-by: René Widera <[email protected]>
(cherry picked from commit a91fab8)
edgargabriel added a commit to edgargabriel/ompi that referenced this pull request Jan 31, 2019
Similar to open-mpi#6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors.

- remove floating point operations for `round up`
- removes floating point conversion for round down (native behavior of integer division)

Signed-off-by: René Widera <[email protected]>

Note: a direct cherry pick of commit a91fab8 is not possible, due to structural differences between the the 3.1.x and the master/v4.0.x branch. This commit is the equivalent of commit  a91fab8.

Signed-off-by: Edgar Gabriel <[email protected]>
edgargabriel added a commit to edgargabriel/ompi that referenced this pull request Jan 31, 2019
Similar to open-mpi#6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors.

- remove floating point operations for `round up`
- removes floating point conversion for round down (native behavior of integer division)

Signed-off-by: René Widera <[email protected]>

Note: a direct cherry pick of commit a91fab8 is not possible, due to structural differences between the the 3.1.x and the master/v4.0.x branch. This commit is the equivalent of commit  a91fab8.

Signed-off-by: Edgar Gabriel <[email protected]>
@ax3l ax3l mentioned this pull request Feb 23, 2019
ax3l pushed a commit to ax3l/ompi that referenced this pull request Feb 23, 2019
Similar to open-mpi#6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors.

- remove floating point operations for `round up`
- removes floating point conversion for round down (native behavior of integer division)

Signed-off-by: René Widera <[email protected]>

Note: a direct cherry pick of commit a91fab8 is not possible, due to structural differences between the the 3.1.x and the master/v4.0.x branch. This commit is the equivalent of commit  a91fab8.

Signed-off-by: Edgar Gabriel <[email protected]>
@edgargabriel edgargabriel deleted the pr/floating-point-division-problem branch November 26, 2019 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants