Skip to content

Incorrect behaviour with certain extent values for simple derived datatypes #6899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
d7919 opened this issue Aug 14, 2019 · 15 comments
Closed

Comments

@d7919
Copy link

d7919 commented Aug 14, 2019

Background information

I am attempting to construct a moderately complicated derived datatype but have been running into difficulties so have tried to strip back to a minimal case and in doing so I have come across what looks like incorrect behaviour. I have constructed a minimal example which can be accessed in this gist.

I first make a contiguous type corresponding to the first half of the message, next I use mpi_type_create_resized to change the extent, with the intention that I can make the start of the second half of the message overlap the first half. When the extent corresponds to two elements of the base type this works as expected - I see the first two elements of the first half of the message and then the entirety of the second message. If I change the extent to correspond to one element of the base type instead of seeing just the first element of the first half followed by the entirety of the second half we instead get the entire first half of the message followed by the entire second half -- in other words it is as if I hadn't set the extent at all. This test case is only using one processor.

I would have expected the output to look like

With extent =                      8
           1           2           9          10          11          12
          13          14          15          16          -1          -1
          -1          -1          -1          -1
 With extent =                      4
           1           9          10          11          12          13
          14          15          16          -1          -1          -1
          -1          -1          -1          -1

but instead I find

 With extent =                      8
           1           2           9          10          11          12
          13          14          15          16          -1          -1
          -1          -1          -1          -1
 With extent =                      4
           1           2           3           4           5           6
           7           8           9          10          11          12
          13          14          15          16

A version using mpi_f08 demonstrates identical behaviour. Similarly changing the base_type from integer to real, complex etc. all behave in the same way.

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

I've tested this with Open MPI v2.1.1 and Open MPI v3.1.1 and get the same output in both cases. These are on systems I don't manage so I'm not 100% sure how they were installed, but I'm fairly confident that the v2.1.1 has been installed from a package manager whilst v3.1.1 was installed from tarball. I've tried using Intel MPI (2018) and I get the expected behaviour for all extents.

Please describe the system on which you are running

  • Operating system/version: tested on both Ubuntu (16.04) and something derived from CetOS Linux 7.
  • Computer hardware: Running on various rack mount nodes with two sockets populated with various modern intel server chips. Memory per node > 100 GB.
  • Network type: This is all run on one processor so shouldn't be touching the network as far as I can tell, but the machines I've tested on include one with just ethernet as well as a hpc system with infiniband
@ggouaillardet
Copy link
Contributor

Thanks for the report!

There is indeed an issue in Open MPI.

Note the test program you posted is incorrect with respect to the MPI standard since overlapping derived datatypes cannot be used in MPI_Recv() (but they are fine in MPI_Send()). The rationale is a message can be split under the hood, and the fragments can be received and processed in any order, so overlapping datatypes would cause a non deterministic behavior.
IIRC, Open MPI flags overlapping derived datatypes and issue an error if they are used in a receive. For some reasons, Open MPI failed to flag these datatypes.

Also, there is a potential deadlock caused by two MPI_Wait() in a raw. Other options are MPI_Isend(); MPI_Recv(); MPI_Wait() or an all-in-one MPI_Sendrecv().

I rewrote the test in C and made it compliant with the MPI standard, and the error is clearly evidenced.

#include <stdio.h>

#include <mpi.h>

int main(int argc, char *argv[]) {
    const int n = 8;
    int input[2*n];
    int output[2*n];
    int i, iproc;
    MPI_Datatype sendtype, tmp1, tmp2;
    MPI_Aint lb, ext;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &iproc);

    if (0 == iproc) {
        for (i=0; i<2*n; i++) {
            output[i] = -1;
            input[i] = i;
        }
        MPI_Type_contiguous(n, MPI_INT, &tmp1);
        MPI_Type_get_extent(MPI_INT, &lb, &ext);
        lb = 0;
        ext *= 2;
        MPI_Type_create_resized(tmp1, lb, ext, &tmp2);
        MPI_Type_vector(2, 1, 1, tmp2, &sendtype);
        MPI_Type_commit(&sendtype);

        MPI_Sendrecv(input, 1, sendtype, 0, 0,
                     output, 2*n, MPI_INT, 0, 0,
                     MPI_COMM_WORLD, MPI_STATUS_IGNORE);

        printf("With extent = %ld\n", ext);
        for (i=0; i<2*n; i++) {
            printf ("%d ", output[i]);
        }
        printf("\n");
        MPI_Type_get_true_extent(sendtype, &lb, &ext);
        printf("true_extent = %ld\n", ext);
        MPI_Type_free(&tmp1);
        MPI_Type_free(&tmp2);
        MPI_Type_free(&sendtype);

        for (i=0; i<2*n; i++) {
            output[i] = -1;
            input[i] = i;
        }
        MPI_Type_contiguous(n, MPI_INT, &tmp1);
        MPI_Type_get_extent(MPI_INT, &lb, &ext);
        lb = 0;
        ext *= 1;
        MPI_Type_create_resized(tmp1, lb, ext, &tmp2);
        MPI_Type_vector(2, 1, 1, tmp2, &sendtype);
        MPI_Type_commit(&sendtype);

        MPI_Sendrecv(input, 1, sendtype, 0, 0,
                     output, 2*n, MPI_INT, 0, 0,
                     MPI_COMM_WORLD, MPI_STATUS_IGNORE);

        printf("With extent = %ld\n", ext);
        for (i=0; i<2*n; i++) {
            printf ("%d ", output[i]);
        }
        printf("\n");
        MPI_Type_get_true_extent(sendtype, &lb, &ext);
        printf("true_extent = %ld\n", ext);
        MPI_Type_free(&tmp1);
        MPI_Type_free(&tmp2);
        MPI_Type_free(&sendtype);
    }

    MPI_Finalize();
    return 0;

}

@d7919
Copy link
Author

d7919 commented Aug 14, 2019

Thanks for the fast response and confirming there's an issue -- I had indeed wondered what would happen if an overlapping receive was used due to this non-deterministic behaviour but didn't realise it was prohibited in the standard. Indeed I don't really want to create an overlapping receive (or send) type but just came across this in testing to try to understand why a more complicated type wasn't working.

(I thought my use of non-blocking send and receive would guard against a deadlock but I hadn't thought about it carefully enough in the context of sending and receiving to the same rank but could imagine how this might cause an issue.)

@ggouaillardet
Copy link
Contributor

on second thought, let me take back the part about the deadlock.
The standard might mandate that MPI_Wait() progresses all the requests is has to in order to ensure the one waited for completes. (a single MPI_Wait(), or MPI_Waitall() or MPI_Sendrecv() is still a good practice or a better fit here, but that might not be required strictly speaking).

@bosilca
Copy link
Member

bosilca commented Aug 14, 2019

Not quite, the datatype engine works perfectly according to the data description. Let me walk over the type map of your datatype.

First let me drop here the code related to the datatype creation, with all variables substituted with their corresponding values.

    MPI_Type_contiguous(8, MPI_INT, &tmp1);
    MPI_Type_create_resized(tmp1, 0, 2*4, &tmp2);
    MPI_Type_vector(2, 1, 1, tmp2, &sendtype);

First we have a contiguous type with the typemap ((INT, 0), (INT, 4), (INT, 8), (INT, 12), (INT, 16), (INT, 20), (INT, 24), (INT, 28)). Then you move the extent of this datatype based on the extent of the base type (INT), which leaves you with the same typemap for tmp2 but now an extent of 8 instead of 32 (8*4). So, when you repeat the tmp2 type twice you get:

(INT, 0), (INT, 4), (INT, 8), (INT, 12), (INT, 16), (INT, 20), (INT, 24), (INT, 28)
                    (INT, 8), (INT, 12), (INT, 16), (INT, 20), (INT, 24), (INT, 28), (INT, 32), (INT, 36)

So, if we assume one is allowed to receive using this datatype (just to be clear this is not the case in the current MPI standard), and the input data is received as a contiguous array, you should get exactly you expected originally. I think the root issue here is a slight misunderstanding of the MPI datatype, the extent does not remove elements, it just move the "reading head" (see it as in file operations) by that amount before the next read. Thus, as the tmp1 datatype contains 8 elements they will all be visible for the first vector before going back and sending the second vector from the same pointer plus the 8 bytes extent.

@bosilca bosilca closed this as completed Aug 14, 2019
@d7919
Copy link
Author

d7919 commented Aug 14, 2019

I certainly expect that I don't fully understand the MPI datatype, but the example type map you give above is indeed exactly what I expect to see and indeed when the extent is twice the base type extent the code behaves exactly as expected and as described by that type map. It's when the extent is set to only one times the base type that the unexpected behaviour seems to occur.

Given the above I think this should give a type map of

(INT, 0), (INT, 4), (INT, 8), (INT, 12), (INT, 16), (INT, 20), (INT, 24), (INT, 28)
                    (INT, 4), (INT, 8), (INT, 12), (INT, 16), (INT, 20), (INT, 24), (INT, 28), (INT, 32)

but it seems in practice to produce something like

(INT, 0), (INT, 4), (INT, 8), (INT, 12), (INT, 16), (INT, 20), (INT, 24), (INT, 28)
                    (INT, 32), (INT, 36), (INT, 40), (INT, 44), (INT, 48), (INT, 52), (INT, 56), (INT, 60)

The updated example provided by @ggouaillardet which switches to a custom send datatype may make this clearer.

@ggouaillardet
Copy link
Contributor

@bosilca the code I posted is based on the Fortran reproducer provided by @d7919

The first test (e.g. ext = 8) passes, but the second one fails (e.g. ext = 4). The map is wrong but for some reasons, the true extent is correct.

@ggouaillardet ggouaillardet reopened this Aug 14, 2019
@bosilca
Copy link
Member

bosilca commented Aug 14, 2019

I looked at the datatype description and it looks exactly how it should, and I also looked at the generated output (with @ggouaillardet test) and it also look exactly how it should.

With extent = 8
0 1 2 3 4 5 6 7 2 3 4 5 6 7 8 9
true_extent = 40
With extent = 4
0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8
true_extent = 36

The first case you print 8 elements and then reprint them by skipping the first 2 (because the extent of 8). On the second you do the same but only skip the first int (extent being 4).

@bosilca
Copy link
Member

bosilca commented Aug 14, 2019

@d7919 the typemap applies to the array used as send buffer right (because the sendrecv uses a contiguous buffer as the receive buffer) ?

If the typemap you mention would be what the OMPI datatype would have generated then the output would have been

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

and not

0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8

that we currently see for an extent of 4.

@d7919
Copy link
Author

d7919 commented Aug 14, 2019

In fact 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 is exactly what I do see. I suppose this indicates the issue has been fixed between whichever versions I've been testing on and that one you are using.

@jsquyres
Copy link
Member

Is there a discrepancy between the v4.0.x releases / branch and master?

@d7919
Copy link
Author

d7919 commented Aug 14, 2019

In addition to the versions I tested with (2.1.1 and 3.1.1) a colleague has just tested with 4.0.1 and reported the same unexpected result.

@bosilca
Copy link
Member

bosilca commented Aug 14, 2019

I was indeed testing with master. All stables seems to have this issue with the extent 4, while they work with extent 8. Troubling.

For the 4.x there is an update pending (#6863).

bosilca added a commit to bosilca/ompi that referenced this issue Aug 14, 2019
It turns out that if a complex datatype built out of a single predefined
type has been resized to the size of its predefined type, we miscompute
the extent when creating other datatypes with the type.

Provides a fix for open-mpi#6899 for the 3.x branch.

Signed-off-by: George Bosilca <[email protected]>
@bosilca
Copy link
Member

bosilca commented Aug 14, 2019

Fix for the 3.x branch pending #6901.
Fix for the 2.x branch: apply the 3.x patch and update the copyright year by hand or use

diff --git a/opal/datatype/opal_datatype_add.c b/opal/datatype/opal_datatype_add.c
index 890f550..0013c8e 100644
--- a/opal/datatype/opal_datatype_add.c
+++ b/opal/datatype/opal_datatype_add.c
@@ -3,7 +3,7 @@
  * Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana
  *                         University Research and Technology
  *                         Corporation.  All rights reserved.
- * Copyright (c) 2004-2016 The University of Tennessee and The University
+ * Copyright (c) 2004-2019 The University of Tennessee and The University
  *                         of Tennessee Research Foundation.  All rights
  *                         reserved.
  * Copyright (c) 2004-2006 High Performance Computing Center Stuttgart,
@@ -297,7 +297,8 @@ int32_t opal_datatype_add( opal_datatype_t* pdtBase, const opal_datatype_t* pd
             if( pdtAdd->btypes[i] != 0 ) pdtBase->btypes[i] += (count * pdtAdd->btypes[i]);

         if( (1 == pdtAdd->desc.used) && (extent == (pdtAdd->ub - pdtAdd->lb)) &&
-            (extent == pdtAdd->desc.desc[0].elem.extent) ){
+            (extent == pdtAdd->desc.desc[0].elem.extent) &&
+            (extent == (pdtAdd->true_ub - pdtAdd->true_lb)) ) {
             pLast->elem        = pdtAdd->desc.desc[0].elem;
             pLast->elem.count *= count;
             pLast->elem.disp  += disp;

jsquyres pushed a commit to jsquyres/ompi that referenced this issue Aug 20, 2019
It turns out that if a complex datatype built out of a single predefined
type has been resized to the size of its predefined type, we miscompute
the extent when creating other datatypes with the type.

Provides a fix for open-mpi#6899 for the 3.x branch.

Signed-off-by: George Bosilca <[email protected]>
(cherry picked from commit f09d011)
@jsquyres
Copy link
Member

PRs made for v3.0.x (#6913) and v3.1.x (#6901). Both will be merged shortly.

This commit added to a larger DDT PR for v4.0.x (#6863), which -- as of 20 Aug 2019 -- is still pending.

This fix will not be going to any older versions of Open MPI (e.g., v2.x).

@jsquyres
Copy link
Member

Ok, this is now merged to all the branches on which we're going to fix it: v3.0.x, v3.1.x, v4.0.x. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants