Skip to content

yalla with irregular contig datatype -- Fixes 3566 #3765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 30, 2017

Conversation

alinask
Copy link
Member

@alinask alinask commented Jun 27, 2017

Yalla has a macro PML_YALLA_INIT_MXM_REQ_DATA that checks if a datatype
is contiguous via opal_datatype_is_contiguous_memory_layout(dt,count)
and if so it selects a size and lb that presumably is what will rdma, as
ompi_datatype_type_size(_dtype, &size);
ompi_datatype_type_lb(_dtype, &lb); \

This failed when I gave it a datatype constructed as [ ...] with extent 4.
What I mean by that datatype is
lens[0] = 3;
disps[0] = 1;
types[0] = MPI_CHAR;
MPI_Type_struct(1, lens, disps, types, &tmpdt);
MPI_Type_create_resized(tmpdt, 0, 4, &mydt);
So there are 3 chars at offset 1, and the LB is 0 and the UB is 4.

So that macro decides that size=4 and lb=0 and later I suppose size is getting
updated to 3 for the final rdma, and so a send of a buffer
[ 0 1 2 3 ] gets recved as [ 0 1 2 _ ]. I think it should use the true lb
and the true extent.

For "regular" contig datatypes it would be the same, and for the irregular
ones that are still deemed contiguous by that utility function it should
still be the right thing to use.

Signed-off-by: Mark Allen [email protected]
(cherry picked from commit 36f51bc)

Fixes #3566

Yalla has a macro PML_YALLA_INIT_MXM_REQ_DATA that checks if a datatype
is contiguous via opal_datatype_is_contiguous_memory_layout(dt,count)
and if so it selects a size and lb that presumably is what will rdma, as
            ompi_datatype_type_size(_dtype, &size); \
            ompi_datatype_type_lb(_dtype, &lb); \

This failed when I gave it a datatype constructed as [ ...] with extent 4.
What I mean by that datatype is
    lens[0] = 3;
    disps[0] = 1;
    types[0] = MPI_CHAR;
    MPI_Type_struct(1, lens, disps, types, &tmpdt);
    MPI_Type_create_resized(tmpdt, 0, 4, &mydt);
So there are 3 chars at offset 1, and the LB is 0 and the UB is 4.

So that macro decides that size=4 and lb=0 and later I suppose size is getting
updated to 3 for the final rdma, and so a send of a buffer
[ 0 1 2 3 ] gets recved as [ 0 1 2 _ ]. I think it should use the true lb
and the true extent.

For "regular" contig datatypes it would be the same, and for the irregular
ones that are still deemed contiguous by that utility function it should
still be the right thing to use.

Signed-off-by: Mark Allen <[email protected]>
(cherry picked from commit 36f51bc)
@alinask
Copy link
Member Author

alinask commented Jun 27, 2017

This is a cherry-pick of the fix that @markalle committed to the master branch (#3567).
@jladd-mlnx @yosefe

Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a special construct that can help with this decision: opal_datatype_span. It returns the span and the initial gap for a block of count datatypes.

@jladd-mlnx
Copy link
Member

@hppritcha This is a bug fix, please merge.

@jladd-mlnx jladd-mlnx added this to the v2.1.2 milestone Jun 28, 2017
@jladd-mlnx jladd-mlnx added the bug label Jun 28, 2017
@jsquyres
Copy link
Member

@alinask GitHub pro tip: you need to put "Fixes #3566" either in the commit message or in the body of the issue description (not the title -- and it must include#) to have this issue both properly refer to #3566 and close #3566 when this PR is merged.

@jsquyres
Copy link
Member

@alinask @jladd-mlnx Do we need a PR for this for v2.0.x as well?

@alinask
Copy link
Member Author

alinask commented Jun 28, 2017

@jsquyres Thanks! This was a cherry-pick so I didn't change anything in the commit. Will do for next time.
I issued a PR for v2.x as well - #3766

@jladd-mlnx
Copy link
Member

@alinask #3766 is for 3.x. Can you open one for 2.0.x, please.

@jsquyres
Copy link
Member

@hppritcha @bwbarrett This seems like an obvious bug fix for v2.0.x-v3.0.x. Good to go.

@jsquyres
Copy link
Member

@alinask FWIW/GitHub pro tip: The description of the github PR is pulled from the git commit message (if there's only 1 commit), but you can still change / edit it before the PR is created (and you can edit it after it was created, too).

@hppritcha hppritcha merged commit 31aacd9 into open-mpi:v2.x Jun 30, 2017
yosefe added a commit to yosefe/ompi that referenced this pull request Jul 7, 2017
pull request open-mpi#3765 introduced a bug where the extent of a type is used
instead of its size.

Signed-off-by: Yossi Itigin <[email protected]>
yosefe added a commit to yosefe/ompi that referenced this pull request Jul 16, 2017
pull request open-mpi#3765 introduced a bug where the extent of a type is used
instead of its size.

Signed-off-by: Yossi Itigin <[email protected]>
bwbarrett pushed a commit that referenced this pull request Jul 20, 2017
pull request #3765 introduced a bug where the extent of a type is used
instead of its size.

Signed-off-by: Yossi Itigin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants