Skip to content

error with MPI_Unpack_external ("external32",...) and derived datatype #7851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ggouaillardet opened this issue Jun 21, 2020 · 4 comments · Fixed by #8735
Closed

error with MPI_Unpack_external ("external32",...) and derived datatype #7851

ggouaillardet opened this issue Jun 21, 2020 · 4 comments · Fixed by #8735
Assignees
Labels

Comments

@ggouaillardet
Copy link
Contributor

This is a followup of #5643

The reproducer below can be used to evidence this issue.

The issue occurs on master and the v4 release branches.
It does not occur on the v3 branches nor the 4.0.0 release.

fwiw, git bisect (on the v4.0.x branch )points to 4f754d0

@bosilca can you please have a look at it?

#include <stdlib.h>
#include <stdio.h>
#include <inttypes.h>
#include <string.h>

#include "mpi.h"

static void create_stattype(int chars, MPI_Datatype* dt_stat)
{
    int size = chars + sizeof(uint64_t);

    MPI_Datatype dt_filepath;
    MPI_Type_contiguous(chars, MPI_CHAR, &dt_filepath);

    MPI_Datatype types[2];
    types[0] = dt_filepath;  /* file name */
    types[1] = MPI_UINT64_T; /* file type */

    int blocklens[2] = {1, 1};
    MPI_Aint displs[2];
    displs[0] = 0;
    displs[1] = chars;

    MPI_Datatype dttemp;
    MPI_Type_create_struct(2, blocklens, displs, types, &dttemp);
    MPI_Type_create_resized(dttemp, 0, size, dt_stat);
    MPI_Type_commit(dt_stat);
    MPI_Type_free(&dttemp);

    MPI_Type_free(&dt_filepath);
    return;
}

int main(int argc, char *argv[]) {
    MPI_Init(&argc, &argv);
    char inbuf[120];
    char outbuf[120];

    uint64_t values[5] = {1, 2, 3, 4, 5};
    char* strings[5] = {"hello", "there", "world", "catss", "dogss"};

    /* to make things interesting, create a derived type that is a (string, UINT64_T) pair */

    MPI_Datatype dt;
    create_stattype(6, &dt);

    char* ptr = inbuf;
  
    int i;

    for (i=0; i<120; i++) {
       outbuf[i] = 'A';
    }

    for (i = 0; i < 5; i++) {
      strcpy(ptr, strings[i]);
      ptr += 6;
      *((uint64_t*)ptr) = values[i];
      ptr += sizeof(uint64_t);
    }

    MPI_Aint position=0;
    MPI_Unpack_external ("external32", inbuf, 70, &position, outbuf, 5, dt);
    for (i=70; i<120; i++) {
        if ('A' != outbuf[i]) {
            fprintf(stderr, "outbuf was overwritten at index %d\n", i);
            MPI_Abort(MPI_COMM_WORLD, 1);
        }
    }

    MPI_Finalize();
    return 0;
}
@markalle
Copy link
Contributor

markalle commented Jul 9, 2020

This probably isn't a complete fix, but I started looking at a bug closely related to the above and found quite a few issues in pack/unpack that I've addressed here:
#7919

I'll add that the bug I was looking at was using MPIO with external32, but so far I've only looked closely into the bad output from MPI_Pack/Unpack, and hope to eventually get back to the failing MPIO testcase which I believe has some additional bugs beyond the Pack/Unpack issues.

Anyway I have an MPI standard question for the pasted testcase. Is it legal to put the MPI_UINT64_T at byte offset 6 like that? The standard's definition of "extent" talks about how it rounds up based on alignment requirements, and the extent from that first MPI_Type_create_struct is 16 because of that. Then the Type_create_resized is explicitly bumping it down to 14. In my MPIO testcase that's related to the above, the ROMIO and OMPIO output diverges with OMPIO respecting the 14 byte extent but ROMIO instead writing consecutive instances of that datatype 16 bytes apart. The cause is ROMIO's set_view call that does MPI_Type_contiguous(1, filetype, &copy_filetype) which takes alignment into account and puts the extent back to 16.

I'd like to just declare that the misaligned MPI datatype constructed there is illegal and thus the ROMIO code isn't really wrong. Thoughts?

Anyway the Pack/Unpack issues remain even if you change 6 to 8 in the above testcase, so some fixes like I have in 7919 are still needed.

@markalle
Copy link
Contributor

Fwiw I finally tested the above testcase with
#7919
and it works. So my fixes do fix this, if they pass review.

@markalle
Copy link
Contributor

My #7919 is still awaiting review which I guess is understandable because it's pretty big. But I did finish up the other part of this as well, where if you use the above dt_stat with romio it sends dt_stat into a MPI_Type_contiguous(1, dt_stat, &newdt) call and that call was erroneously throwing away the specified 14 extent resetting it back to 16.

That fix is merged in now, so between that and 7919 I think this datatype is working now, both in the above testcase and if you use it in MPI/IO calls too (romio or ompio).

@bosilca
Copy link
Member

bosilca commented Mar 29, 2021

Based on the replicator here I made a simpler one that only looks at the conversion itself. I run this test with the master today (without the #7919), as far as I can see the data representation for uint64_t is correctly converted to the external32 format. I'm not saying this works with files, that's another topic, but the pack/unpack to and from external32 seems to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants