Skip to content

Question about correct file format in external32 #5643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
adammoody opened this issue Sep 4, 2018 · 22 comments
Closed

Question about correct file format in external32 #5643

adammoody opened this issue Sep 4, 2018 · 22 comments

Comments

@adammoody
Copy link

adammoody commented Sep 4, 2018

Background information

My understanding is that external32 should write integer values in big-endian format, but it seems to be writing in little-endian format instead. I will attach a test case that writes a series of MPI_UINT64_T values to two files: one in "native" data representation and one in "external32". Running hexdump on the resulting file shows that both appear to store those values in little-endian representation.

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

Open MPI 3.0.1

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Installed from v3.0.1 distribution tarball.

Please describe the system on which you are running

  • Operating system/version: RHEL 7.5
  • Computer hardware: Intel(R) Xeon(R) CPU E5-2695 v4
  • Network type: Intel Omni-path

Details of the problem

I am trying to debug some data format issues with a different MPI library, which got me testing the difference between “native” and “external32” data representations. I wrote a program that writes 4 consecutive MPI_UINT64_T values to a file in both formats. According to the standard external32 should be using big-endian in the file, but it looks to me like it’s using little-endian instead. This is with Open MPI 3.0.1 on an machine with Intel processors. The sequence of MPI_UINT64_T values the test case writes is {1, 2, 3, 4}.

>>: hexdump -C openmpi/testfile_ext32
00000000  01 00 00 00 00 00 00 00  02 00 00 00 00 00 00 00  |................|
00000010  03 00 00 00 00 00 00 00  04 00 00 00 00 00 00 00  |................|

This looks to be little-endian format.

@adammoody
Copy link
Author

Reproducer builds with mpicc, run as a single process (will fail with more than one).

#include <stdlib.h>
#include <stdio.h>
#include <inttypes.h>
#include <string.h>

#include "mpi.h"

static void create_stattype(int chars, MPI_Datatype* dt_stat)
{
    int size = chars + sizeof(uint64_t);

    MPI_Datatype dt_filepath;
    MPI_Type_contiguous(chars, MPI_CHAR, &dt_filepath);

    MPI_Datatype types[2];
    types[0] = dt_filepath;  /* file name */
    types[1] = MPI_UINT64_T; /* file type */

    int blocklens[2] = {1, 1};
    MPI_Aint displs[2];
    displs[0] = 0;
    displs[1] = chars;

    MPI_Datatype dttemp;
    MPI_Type_create_struct(2, blocklens, displs, types, &dttemp);
    MPI_Type_create_resized(dttemp, 0, size, dt_stat);
    MPI_Type_commit(dt_stat);
    MPI_Type_free(&dttemp);

    MPI_Aint extent, lb;
    MPI_Type_get_extent(dt_filepath, &lb, &extent);
    printf("extent=%d lb=%d\n", (int) extent, (int) lb);
    MPI_Type_get_extent(MPI_UINT64_T, &lb, &extent);
    printf("extent=%d lb=%d\n", (int) extent, (int) lb);
    MPI_Type_get_extent(*dt_stat, &lb, &extent);
    printf("extent=%d lb=%d\n", (int) extent, (int) lb);

    MPI_Type_free(&dt_filepath);
    return;
}

void write_read_file(const char* name, const char* datarep)
{
  int rank, ranks;
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &ranks);

  /************
   * write values to file
   ************/

  uint64_t values[5] = {1, 2, 3, 4, 5};
  char* strings[5] = {"hello", "there", "world", "catss", "dogss"};

  /* to make things interesting, create a derived type that is a (string, UINT64_T) pair */

  MPI_Datatype dt;
  create_stattype(6, &dt);

  int bufsize = 5 * (6 + 8);
  char* buf = (char*) malloc(bufsize);
  char* ptr = buf;

  int i;

  for (i = 0; i < 5; i++) {
    strcpy(ptr, strings[i]);
    ptr += 6;
    *((uint64_t*)ptr) = values[i];
    ptr += sizeof(uint64_t);
  }

  int rc;
  MPI_Status status;
  MPI_File fh;

  int amode = MPI_MODE_WRONLY | MPI_MODE_CREATE;

  /* use mpi io hints to stripe across OSTs */
  MPI_Info info;
  MPI_Info_create(&info);

  /* change number of ranks to string to pass to MPI_Info */
  char str_buf[12];
  sprintf(str_buf, "%d", ranks);

  /* no. of I/O devices for lustre striping is number of ranks */
  MPI_Info_set(info, "striping_factor", str_buf);

  MPI_File_open(MPI_COMM_WORLD, (char*)name, amode, info, &fh);

  /* truncate file to 0 bytes */
  MPI_File_set_size(fh, 0);

  MPI_Offset disp = 0;

  /* write 5 UINT64_T values */

  MPI_File_set_view(fh, disp, MPI_UINT64_T, MPI_UINT64_T, datarep, MPI_INFO_NULL);
  if (rank == 0) {
      MPI_File_write_at(fh, 0, &values, 5, MPI_UINT64_T, &status);
  }
  disp += 5 * 8;

  /* write 5 derived data types */

  MPI_File_set_view(fh, disp, dt, dt, datarep, MPI_INFO_NULL);
  if (rank == 0) {
      MPI_File_write_at(fh, 0, buf, 5, dt, &status);
  }

  MPI_File_close(&fh);
  MPI_Info_free(&info);

  /************
   * read the values back in and print to screen
   ************/

  amode = MPI_MODE_RDONLY;
  rc = MPI_File_open(MPI_COMM_WORLD, (char*)name, amode, MPI_INFO_NULL, &fh);
  if (rc != MPI_SUCCESS) {
      if (rank == 0) {
          printf("Failed to open file %s", name);
      }
      return;
  }

  /* blank out our buffers */

  for (i = 0; i < 5; i++) {
     values[i] = 0;
  }
  for (i = 0; i < bufsize; i++) {
     buf[i] = (char)0;
  }

  disp = 0;

  /* read the 5 UINT64_T values, one-by-one */

  for (i = 0; i < 5; i++) {
      MPI_File_set_view(fh, disp, MPI_UINT64_T, MPI_UINT64_T, datarep, MPI_INFO_NULL);
      if (rank == 0) {
          MPI_File_read_at(fh, 0, &values[i], 1, MPI_UINT64_T, &status);
      }
      disp += 8;
  }

  /* read the 5 derived datatypes */

  MPI_File_set_view(fh, disp, dt, dt, datarep, MPI_INFO_NULL);
  if (rank == 0) {
      MPI_File_read_at(fh, 0, buf, 5, dt, &status);
  }

  /* print values to stdout */

  for (i = 0; i < 5; i++) {
     printf("%llu\n", (unsigned long long) values[i]);
  }

  ptr = buf;
  for (i = 0; i < 5; i++) {
    printf("%s\n", ptr);
    ptr += 6;
    printf("%llu\n", (unsigned long long) *((uint64_t*)ptr));
    ptr += sizeof(uint64_t);
  }

  MPI_File_close(&fh);
}

int main(int argc, char* argv[])
{
  MPI_Init(&argc, &argv);

  write_read_file("testfile_ext32",  "external32");
  write_read_file("testfile_native", "native");

  MPI_Finalize();

  return 0;
}

@adammoody
Copy link
Author

The derived data type is in there because that causes problems with a different MPI. For Open MPI, my question is just about the file format used for the 5 MPI_UNIT64_T values.

@adammoody
Copy link
Author

To be precise, I would expect to get the following with external32:

00000000  00 00 00 00 00 00 00 01  00 00 00 00 00 00 00 02  |................|
00000010  00 00 00 00 00 00 00 03  00 00 00 00 00 00 00 04  |................|

rather than:

00000000  01 00 00 00 00 00 00 00  02 00 00 00 00 00 00 00  |................|
00000010  03 00 00 00 00 00 00 00  04 00 00 00 00 00 00 00  |................|

@edgargabriel
Copy link
Member

@adammoody sorry for the late reply. The ompio implementation of MPI I/O does not support at the moment using any other data representation than native, this is still work in progress. To use the external32 representation, please force Open MPI to use romio, e.g.

mpirun --mca io romio314 -np x ../my_exec

@jsquyres
Copy link
Member

jsquyres commented Sep 5, 2018

@edgargabriel Should we have MPI_FILE_SET_VIEW fail when external32 is specified when OMPIO is selected?

@edgargabriel
Copy link
Member

@jsquyres that would be an option, although the logic probably should be reversed (i.e. if it is not native or internal, fail), since a user could also define its own data representation.

@ggouaillardet
Copy link
Contributor

FWIW, I noted romio 3.2.1 (from the master branch) works fine with MPI_UINT64_T in external32 mode, but fails to write correctly MPI_INT in external32 mode. I am now investigating this.

@ggouaillardet
Copy link
Contributor

the issue in master has been fixed by merging #5494 (I thought that had been done a while ago, especially since the commit has already been cherry picked into the v4.0.x branch)

@jsquyres
Copy link
Member

jsquyres commented Sep 5, 2018

@edgargabriel Sorry; I think you divined what I really meant: when the ompio module gets the call from the front end MPI_FILE_SET_VIEW, it should just return an error when it's a data representation that it does not support (e.g., external32). Right?

@ggouaillardet You mean that that fixes the external32 issue in ROMIO -- not OMPIO -- right? Does it need to be cherry picked to other releases (e.g., 3.0.x and 3.1.x)?

@ggouaillardet
Copy link
Contributor

Yep, I only fixed ROMIO.

This patch got lost when refreshing ROMIO from 3.1.4 (v3 branches) to 3.2.1 (master and v4 branches), so there is no point in backporting it to the v3 branches.

Note the default Open MPI behavior is to return with an error when a MPI-IO subroutine fails (e.g. no abort), so we might want to issue an error message to make it crystal clear this is an OMPIO limitation (and hint to use ROMIO).

@adammoody
Copy link
Author

Sounds like some fixes are in the pipeline. Thanks guys!

@jsquyres
Copy link
Member

@ggouaillardet @edgargabriel @bosilca Could this be related?

I realize this is in collectives, not IO, but it's related to external32, and it looks like we're not initializing the size variable when calling ompi_datatype_unpack_external().

In file included from nbc_ialltoallw.c:21:
./nbc_internal.h:538:60: warning: variable 'size' is uninitialized when used here [-Wuninitialized]
    res = ompi_datatype_unpack_external("external32", src, size, &pos, tgt, srccount, srctype);
                                                           ^~~~
./nbc_internal.h:515:16: note: initialize the variable 'size' to silence this warning
  MPI_Aint size, pos;
               ^
                = 0
In file included from nbc_ibcast.c:21:
./nbc_internal.h:538:60: warning: variable 'size' is uninitialized when used here [-Wuninitialized]
    res = ompi_datatype_unpack_external("external32", src, size, &pos, tgt, srccount, srctype);
                                                           ^~~~
./nbc_internal.h:515:16: note: initialize the variable 'size' to silence this warning
  MPI_Aint size, pos;
               ^
                = 0
In file included from nbc_ibarrier.c:22:
./nbc_internal.h:538:60: warning: variable 'size' is uninitialized when used here [-Wuninitialized]
    res = ompi_datatype_unpack_external("external32", src, size, &pos, tgt, srccount, srctype);In file included from 
nbc_iallgather.c                                                           ^~~~:21:
./nbc_internal.h:538:60: warning: variable 'size' is uninitialized when used here [-Wuninitialized]
    res = ompi_datatype_unpack_external("external32", src, size, &pos, tgt, srccount, srctype);
                                                           ^~~~
./nbc_internal.h:515:16: note: initialize the variable 'size' to silence this warning
  MPI_Aint size, pos;
               ^
                = 0
In file included from nbc.c:27:
./nbc_internal.h:538:60: warning: variable 'size' is uninitialized when used here [-Wuninitialized]
    res = ompi_datatype_unpack_external("external32", src, size, &pos, tgt, srccount, srctype);
                                                           ^~~~
./nbc_internal.h:515:16: note: initialize the variable 'size' to silence this warning
  MPI_Aint size, pos;
               ^
                = 0
In file included from nbc_ialltoall.c:22:
./nbc_internal.h:538:60: warning: variable 'size' is uninitialized when used here [-Wuninitialized]
    res = ompi_datatype_unpack_external("external32", src, size, &pos, tgt, srccount, srctype);
                                                           ^~~~
./nbc_internal.h:515:16: note: initialize the variable 'size' to silence this warning
  MPI_Aint size, pos;
               ^
                = 0

./nbc_internal.h:515:16: note: initialize the variable 'size' to silence this warning
  MPI_Aint size, pos;
               ^
                = 0In file included from coll_libnbc_component.c:32:
./nbc_internal.h:538:60: warning: variable 'size' is uninitialized when used here [-Wuninitialized]
    res = ompi_datatype_unpack_external("external32", src, size, &pos, tgt, srccount, srctype);
                                                           ^~~~
./nbc_internal.h:515:16: note: initialize the variable 'size' to silence this warning
  MPI_Aint size, pos;
               ^
                = 0
In file included from nbc_iallgatherv.c:23:
./nbc_internal.h:538:60: warning: variable 'size' is uninitialized when used here [-Wuninitialized]
    res = ompi_datatype_unpack_external("external32", src, size, &pos, tgt, srccount, srctype);
                                                           ^~~~
./nbc_internal.h:515:16: note: initialize the variable 'size' to silence this warning
  MPI_Aint size, pos;
               ^
                = 0
In file included from nbc_ialltoallv.c:21:
./nbc_internal.h:538
:60: In file included from nbc_iallreduce.c:21:
./nbc_internal.h:538:60: warning: variable 'size' is uninitialized when used here [-Wuninitialized]
    res = ompi_datatype_unpack_external("external32", src, size, &pos, tgt, srccount, srctype);
                                                           ^~~~
./nbc_internal.h:515:16: note: initialize the variable 'size' to silence this warning
  MPI_Aint size, pos;
               ^
                = 0
warning: variable 'size' is uninitialized when used here [-Wuninitialized]
    res = ompi_datatype_unpack_external("external32", src, size, &pos, tgt, srccount, srctype);
                                                           ^~~~
./nbc_internal.h:515:16: note: initialize the variable 'size' to silence this warning
  MPI_Aint size, pos;
               ^
                = 0

@ggouaillardet
Copy link
Contributor

@jsquyres this is an unrelated bug genuine bug that will be fixed in #5680

@ggouaillardet
Copy link
Contributor

fwiw, the bug in coll/libnbc only occurs with an algorithm that is implemented but never used.

edgargabriel added a commit to edgargabriel/ompi that referenced this issue Oct 16, 2018
check for providing a data representation that is actually supported
by ompio.

Add also one check for a non-NULL pointer in mpi/c/file_set_view
for the data representation.

Also fixes parts of issue open-mpi#5643

Signed-off-by: Edgar Gabriel <[email protected]>
edgargabriel added a commit to edgargabriel/ompi that referenced this issue Oct 17, 2018
check for providing a data representation that is actually supported
by ompio.

Add also one check for a non-NULL pointer in mpi/c/file_set_view
for the data representation.

Also fixes parts of issue open-mpi#5643

Signed-off-by: Edgar Gabriel <[email protected]>
@adammoody
Copy link
Author

Hi guys, I was retesting this reproducer lately. I realize external32 is not yet supported in OMPIO. However, now I'm getting seemingly bad results from the ROMIO module in Open MPI 4.0.3.

>>: mpicc -o mpiio_ext32 mpiio_ext32.c
>>: mpirun --mca io romio321 -np 1 ./mpiio_ext32
extent=6 lb=0
extent=8 lb=0
extent=14 lb=0
1
2
3
4
5
h
12249896539564036352
t
27021597769531392
w
6917529029000036352
c
347892350976
d
89060441849856
extent=6 lb=0
extent=8 lb=0
extent=14 lb=0
1
2
3
4
5
hello
1
there
2
world
3
catss
4
dogss
5

That output above should be duplicated in the top and bottom portions. The test code here writes and reads a file in "external32" and then in "native". However, the integers and strings in the top part appear to be corrupted.

Do you see the same?

@edgargabriel
Copy link
Member

@adammoody ompio actually supports meanwhile external32 on master, and it is scheduled to be part of the 5.0 release (not the upcoming 4.1 release however). Based on my own testsuite, the romio321 implementation supports external32 for blocking I/O operations with OpenMPI, but not non-blocking.

--snip--
Checking for external32 data representation:
using MPI_File_write....................working
using MPI_File_write_at.................working
using MPI_File_iwrite...................false
using MPI_File_iwrite_at................false
using MPI_File_write_shared.............working
using MPI_File_iwrite_shared............false
using MPI_File_read.....................working
using MPI_File_read_at..................working
using MPI_File_iread....................false
using MPI_File_iread_at.................false
using MPI_File_read_shared..............working
using MPI_File_iread_shared.............false
using MPI_File_write_all................working
using MPI_File_write_at_all.............working
using MPI_File_read_all.................working
using MPI_File_read_at_all..............working
--snip--

I have not tried this with mpich, so I am not entirely sure whether it is a romio or an integration with Open MPI issue.

@ggouaillardet
Copy link
Contributor

This seems to also involve a regression in the v4 and master branches, I opened #7851 in order to track it.

Though the trimmed test program I wrote in that ticket passes in 4.0.0, the reproducer in this ticket also fails on that Open MPI release.

@edgargabriel you might want to wait for #7851 to be fixed, or investigate this issue on the 4.0.0 version or try to revert the commit pointed by git bisect.

@adammoody
Copy link
Author

Thanks @edgargabriel , it's great to hear that external32 is coming in a future release!

@adammoody
Copy link
Author

And thanks for investigating the problem in the reproducer in the other issue, @ggouaillardet and @edgargabriel .

@hppritcha
Copy link
Member

@edgargabriel checking this ext32 support via ompi i/o is not going to be fixed in the v4.1.x and older releases right?

@edgargabriel
Copy link
Member

@hppritcha ext32 support is in v4.1, but not in v4.0.

@hppritcha
Copy link
Member

closing as won't fix for 4.0.x and older releases. If this issue isn't resolved using 4.1.x or newer please reopn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants