-
Notifications
You must be signed in to change notification settings - Fork 900
Refresh of the datatype engine from Topic/backport 6695 #6863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes open-mpi#6575. Signed-off-by: George Bosilca <[email protected]>
Move toward a base type of vector (count, type, blocklen, extent, disp) with disp and extent applying toward the count repertition and blocklen being a contiguous memory of type type. Implement 2 optimizations on this description used during type_commit: - collapse: successive similar datatype descriptions are collapsed together with an increased count. - fusion: fuse successive datatype descriptions in order to minimize the number of resulting memcpy during pack/unpack. Fixes at the OMPI datatype level including: - Fix the create_hindexed and vector creation. - Fix the handling of [get|set]_elements and _count. - Correctly compute the dispacement for block indexed types. - Support the MPI_LB and MPI_UB deprecation, aka. OMPI_ENABLE_MPI1_COMPAT. Signed-off-by: George Bosilca <[email protected]>
Update the comments to better reflect what is going on. Minor indentations. Signed-off-by: George Bosilca <[email protected]>
Merge contiguous iov in order to minimize the number of returned iovec. Signed-off-by: George Bosilca <[email protected]>
Rework the to_self test to be able to be used as a benchmark. Signed-off-by: George Bosilca <[email protected]>
- optimize handling of contiguous with gaps datatypes. - fixes a performance issue for all datatypes with a count of 1. - optimize the pack/unpack of contiguous with gaps datatype. - optimize the case of blocklen == 1 Signed-off-by: George Bosilca <[email protected]>
Signed-off-by: George Bosilca <[email protected]>
Upon detecting a datatype loop representation skip the entire loop according the the remaining space. Signed-off-by: George Bosilca <[email protected]>
Optimize contiguous loops by collapsing them into a single element. During datatype optimization collapse similar elements into larger blocks. Signed-off-by: George Bosilca <[email protected]>
Amazing how a bad instruction scheduling can have such a drastic impact on the code performance. With this change, the get a boost of at least 50% on the performance of data with a small blocklen and/or count. Signed-off-by: George Bosilca <[email protected]>
Start optimizing the code. This commit divides the operations in 2 parts, the first, outside the critical part, deals with partial blocks of predefined elements, and the second, inside the critical path, only deals with full blocks of elements. This reduces the number of expensive operations in the critical path and results in a decent performance increase. Signed-off-by: George Bosilca <[email protected]>
Thanks @bosilca! |
bot:ibm:gnu:retest |
@derbeyn @ggouaillardet Can either of you please review this v4.0.x backport of PR #6695 Please? |
Hmm. It won't let me request a review from @derbeyn even though she reviewed the master PR. |
@ggouaillardet Are you able to review this PR? Once this goes in, we may be able to create a v4.0.2 rc1. |
Fixes the convertor iovec description on the MPI-IO reported by Edgar. Signed-off-by: George Bosilca <[email protected]>
Signed-off-by: George Bosilca <[email protected]>
No code or logic changes. Signed-off-by: George Bosilca <[email protected]> Signed-off-by: Jeff Squyres <[email protected]>
bot:ompi:retest |
@ggouaillardet Can you please review this? This is the only PR blocking a v4.0.2 rc1 build. |
A backport of the datatype improvements on the 4.0.
Few things to mention: