Skip to content

opal_unpack_general_function might upack too much data #2535

@ggouaillardet

Description

@ggouaillardet

@bosilca can you please take care of that ?

the following program works fine if both task run on homogeneous nodes, but fail otherwise.

i strongly suspect the root cause is opal_unpack_general_function that does not exit when the unpacked data reaches *max_data

#include <assert.h>
#include <mpi.h>

/* simple bug reproducer
 * task 0 MPI_Recv 2 MPI_INT but task 1 only MPI_Send 1 MPI_INT
 * ok on homogeneous node(s), ko on heterogeneous nodes */
int main(int argc, char *argv[]) {
    int b[2];
    int rank;
    MPI_Status status;
    int count;
    int err;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    if (0 == rank) {
        b[0] = -1;
        b[1] = -1;
        err = MPI_Recv(b, 2, MPI_INT, 1, 0, MPI_COMM_WORLD, &status);
        assert(MPI_SUCCESS == err);
        err = MPI_Get_elements(&status, MPI_INT, &count);
        assert(MPI_SUCCESS == err);
        assert(1 == count);
        assert(0 == b[0]);
        assert(-1 == b[1]); // currently fails on heterogeneous nodes
    } else if (1 == rank) {
        b[0] = 0;
        b[1] = 2;
        MPI_Send(b, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
    }
    
    MPI_Finalize();
    return 0;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions