-
Notifications
You must be signed in to change notification settings - Fork 902
Question about correct file format in external32 #5643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Reproducer builds with mpicc, run as a single process (will fail with more than one).
|
The derived data type is in there because that causes problems with a different MPI. For Open MPI, my question is just about the file format used for the 5 MPI_UNIT64_T values. |
To be precise, I would expect to get the following with external32: 00000000 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 02 |................|
00000010 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00 04 |................| rather than: 00000000 01 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00 |................|
00000010 03 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 |................| |
@adammoody sorry for the late reply. The ompio implementation of MPI I/O does not support at the moment using any other data representation than native, this is still work in progress. To use the external32 representation, please force Open MPI to use romio, e.g. mpirun --mca io romio314 -np x ../my_exec |
@edgargabriel Should we have MPI_FILE_SET_VIEW fail when |
@jsquyres that would be an option, although the logic probably should be reversed (i.e. if it is not native or internal, fail), since a user could also define its own data representation. |
FWIW, I noted romio 3.2.1 (from the |
the issue in |
@edgargabriel Sorry; I think you divined what I really meant: when the @ggouaillardet You mean that that fixes the external32 issue in ROMIO -- not OMPIO -- right? Does it need to be cherry picked to other releases (e.g., 3.0.x and 3.1.x)? |
Yep, I only fixed ROMIO. This patch got lost when refreshing ROMIO from 3.1.4 (v3 branches) to 3.2.1 (master and v4 branches), so there is no point in backporting it to the v3 branches. Note the default Open MPI behavior is to return with an error when a MPI-IO subroutine fails (e.g. no abort), so we might want to issue an error message to make it crystal clear this is an OMPIO limitation (and hint to use ROMIO). |
Sounds like some fixes are in the pipeline. Thanks guys! |
@ggouaillardet @edgargabriel @bosilca Could this be related? I realize this is in collectives, not IO, but it's related to external32, and it looks like we're not initializing the
|
fwiw, the bug in |
check for providing a data representation that is actually supported by ompio. Add also one check for a non-NULL pointer in mpi/c/file_set_view for the data representation. Also fixes parts of issue open-mpi#5643 Signed-off-by: Edgar Gabriel <[email protected]>
check for providing a data representation that is actually supported by ompio. Add also one check for a non-NULL pointer in mpi/c/file_set_view for the data representation. Also fixes parts of issue open-mpi#5643 Signed-off-by: Edgar Gabriel <[email protected]>
Hi guys, I was retesting this reproducer lately. I realize external32 is not yet supported in OMPIO. However, now I'm getting seemingly bad results from the ROMIO module in Open MPI 4.0.3.
That output above should be duplicated in the top and bottom portions. The test code here writes and reads a file in "external32" and then in "native". However, the integers and strings in the top part appear to be corrupted. Do you see the same? |
@adammoody ompio actually supports meanwhile external32 on master, and it is scheduled to be part of the 5.0 release (not the upcoming 4.1 release however). Based on my own testsuite, the romio321 implementation supports external32 for blocking I/O operations with OpenMPI, but not non-blocking. --snip-- I have not tried this with mpich, so I am not entirely sure whether it is a romio or an integration with Open MPI issue. |
This seems to also involve a regression in the Though the trimmed test program I wrote in that ticket passes in @edgargabriel you might want to wait for #7851 to be fixed, or investigate this issue on the |
Thanks @edgargabriel , it's great to hear that external32 is coming in a future release! |
And thanks for investigating the problem in the reproducer in the other issue, @ggouaillardet and @edgargabriel . |
@edgargabriel checking this ext32 support via ompi i/o is not going to be fixed in the v4.1.x and older releases right? |
@hppritcha ext32 support is in v4.1, but not in v4.0. |
closing as won't fix for 4.0.x and older releases. If this issue isn't resolved using 4.1.x or newer please reopn. |
Uh oh!
There was an error while loading. Please reload this page.
Background information
My understanding is that external32 should write integer values in big-endian format, but it seems to be writing in little-endian format instead. I will attach a test case that writes a series of MPI_UINT64_T values to two files: one in "native" data representation and one in "external32". Running hexdump on the resulting file shows that both appear to store those values in little-endian representation.
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
Open MPI 3.0.1
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Installed from v3.0.1 distribution tarball.
Please describe the system on which you are running
Details of the problem
I am trying to debug some data format issues with a different MPI library, which got me testing the difference between “native” and “external32” data representations. I wrote a program that writes 4 consecutive MPI_UINT64_T values to a file in both formats. According to the standard external32 should be using big-endian in the file, but it looks to me like it’s using little-endian instead. This is with Open MPI 3.0.1 on an machine with Intel processors. The sequence of MPI_UINT64_T values the test case writes is {1, 2, 3, 4}.
This looks to be little-endian format.
The text was updated successfully, but these errors were encountered: