-
Notifications
You must be signed in to change notification settings - Fork 900
yalla with irregular contig datatype -- Fixes 3566 #3765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
yalla with irregular contig datatype -- Fixes 3566 #3765
Conversation
Yalla has a macro PML_YALLA_INIT_MXM_REQ_DATA that checks if a datatype is contiguous via opal_datatype_is_contiguous_memory_layout(dt,count) and if so it selects a size and lb that presumably is what will rdma, as ompi_datatype_type_size(_dtype, &size); \ ompi_datatype_type_lb(_dtype, &lb); \ This failed when I gave it a datatype constructed as [ ...] with extent 4. What I mean by that datatype is lens[0] = 3; disps[0] = 1; types[0] = MPI_CHAR; MPI_Type_struct(1, lens, disps, types, &tmpdt); MPI_Type_create_resized(tmpdt, 0, 4, &mydt); So there are 3 chars at offset 1, and the LB is 0 and the UB is 4. So that macro decides that size=4 and lb=0 and later I suppose size is getting updated to 3 for the final rdma, and so a send of a buffer [ 0 1 2 3 ] gets recved as [ 0 1 2 _ ]. I think it should use the true lb and the true extent. For "regular" contig datatypes it would be the same, and for the irregular ones that are still deemed contiguous by that utility function it should still be the right thing to use. Signed-off-by: Mark Allen <[email protected]> (cherry picked from commit 36f51bc)
This is a cherry-pick of the fix that @markalle committed to the master branch (#3567). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a special construct that can help with this decision: opal_datatype_span. It returns the span and the initial gap for a block of count datatypes.
@hppritcha This is a bug fix, please merge. |
@alinask @jladd-mlnx Do we need a PR for this for v2.0.x as well? |
@hppritcha @bwbarrett This seems like an obvious bug fix for v2.0.x-v3.0.x. Good to go. |
@alinask FWIW/GitHub pro tip: The description of the github PR is pulled from the git commit message (if there's only 1 commit), but you can still change / edit it before the PR is created (and you can edit it after it was created, too). |
pull request open-mpi#3765 introduced a bug where the extent of a type is used instead of its size. Signed-off-by: Yossi Itigin <[email protected]>
pull request open-mpi#3765 introduced a bug where the extent of a type is used instead of its size. Signed-off-by: Yossi Itigin <[email protected]>
pull request #3765 introduced a bug where the extent of a type is used instead of its size. Signed-off-by: Yossi Itigin <[email protected]>
Yalla has a macro PML_YALLA_INIT_MXM_REQ_DATA that checks if a datatype
is contiguous via opal_datatype_is_contiguous_memory_layout(dt,count)
and if so it selects a size and lb that presumably is what will rdma, as
ompi_datatype_type_size(_dtype, &size);
ompi_datatype_type_lb(_dtype, &lb); \
This failed when I gave it a datatype constructed as [ ...] with extent 4.
What I mean by that datatype is
lens[0] = 3;
disps[0] = 1;
types[0] = MPI_CHAR;
MPI_Type_struct(1, lens, disps, types, &tmpdt);
MPI_Type_create_resized(tmpdt, 0, 4, &mydt);
So there are 3 chars at offset 1, and the LB is 0 and the UB is 4.
So that macro decides that size=4 and lb=0 and later I suppose size is getting
updated to 3 for the final rdma, and so a send of a buffer
[ 0 1 2 3 ] gets recved as [ 0 1 2 _ ]. I think it should use the true lb
and the true extent.
For "regular" contig datatypes it would be the same, and for the irregular
ones that are still deemed contiguous by that utility function it should
still be the right thing to use.
Signed-off-by: Mark Allen [email protected]
(cherry picked from commit 36f51bc)
Fixes #3566