-
Notifications
You must be signed in to change notification settings - Fork 900
yalla with irregular contig datatype -- Fixes 3566 #3567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Yalla has a macro PML_YALLA_INIT_MXM_REQ_DATA that checks if a datatype is contiguous via opal_datatype_is_contiguous_memory_layout(dt,count) and if so it selects a size and lb that presumably is what will rdma, as ompi_datatype_type_size(_dtype, &size); \ ompi_datatype_type_lb(_dtype, &lb); \ This failed when I gave it a datatype constructed as [ ...] with extent 4. What I mean by that datatype is lens[0] = 3; disps[0] = 1; types[0] = MPI_CHAR; MPI_Type_struct(1, lens, disps, types, &tmpdt); MPI_Type_create_resized(tmpdt, 0, 4, &mydt); So there are 3 chars at offset 1, and the LB is 0 and the UB is 4. So that macro decides that size=4 and lb=0 and later I suppose size is getting updated to 3 for the final rdma, and so a send of a buffer [ 0 1 2 3 ] gets recved as [ 0 1 2 _ ]. I think it should use the true lb and the true extent. For "regular" contig datatypes it would be the same, and for the irregular ones that are still deemed contiguous by that utility function it should still be the right thing to use. Signed-off-by: Mark Allen <[email protected]>
Here's a gist with a testcase example that hits the problem when run with the yalla pml: |
@jladd-mlnx @artpol84 - Please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 nice catch.
bot:ompi:retest |
Fixes Issue #3566 |
@yosefe Please review and bless. |
I'm tempted to propose that a utility function that identifies is_contiguous(count, dt) should have a couple extra arguments so it could return "yes it's contiguous, and here's the offset and length of the contiguous region represented by that count and dt." I think the answer is always a fairly simple true_lb and count*true_extent, so I guess the utility function doesn't really need that feature, but it could be a convenience. |
|
Yalla has a macro PML_YALLA_INIT_MXM_REQ_DATA that checks if a datatype
is contiguous via opal_datatype_is_contiguous_memory_layout(dt,count)
and if so it selects a size and lb that presumably is what will rdma, as
This failed when I gave it a datatype constructed as
[ ...]
with extent 4.What I mean by that datatype is
So there are 3 chars at offset 1, and the LB is 0 and the UB is 4.
So that macro decides that size=4 and lb=0 and later I suppose size is getting
updated to 3 for the final rdma, and so a send of a buffer
[ 0 1 2 3 ]
gets recved as[ 0 1 2 _ ]
. I think it should use the true lband the true extent.
For "regular" contig datatypes it would be the same, and for the irregular
ones that are still deemed contiguous by that utility function it should
still be the right thing to use.
Signed-off-by: Mark Allen [email protected]