-
Notifications
You must be signed in to change notification settings - Fork 910
v4.0.x: Long live MPI_LONG and MPI_UNSIGNED_LONG #9088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The only 2 types that have an external32 representation with a different size than most current architectures, and as a result are more challenging to handle. This patch also brings back the support for packing and unpacking to and from external32 for all datatypes and does a little cleaning in the datatype API and comments. Signed-off-by: George Bosilca <[email protected]> (cherry picked from commit 4e56e83)
Cherry picked George's external32 fix for consideration into v4.0.x |
Does this also need to be cherry picked to v4.1.x? |
Does this PR change the v4.0.x ABI? |
@jsquyres Yeah, I'll do a v4.1.x too. And I don't think it introduces any ABI change because mpi.h only exposes MPI_LONG as &ompi_mpi_long, and that already existed. The inside of ompi_mpi_long is changed, ompi_mpi_long.id = the new 0x2F value, but I don't think anything in the ABI has visibility into what's inside the internal ompi_mpi_long. |
FWIW here's a gist for a testcase pack2.c that's fixed by this cherry-pick from master. https://gist.github.com/markalle/6d70cf8ca14761e94bce9d0240000a3e |
When performing an op with MPI_LONG, for example, an MPI_Allreduce() with MPI_MIN, this would segv when performing the op, since there is no longer an op function for this type. I followed the blueprint for int64_t/uint64_t. Signed-off-by: Austen Lauria <[email protected]> (cherry picked from commit ff9f03c)
Thanks Austen, I agree. Repushed with 8780 cherry picked |
Discussion on the 2021-06-29 webex: it sounds like this is a bug fix for a customer-found defect. The v4.0.x RMs are going to gather a little more information from @markalle and @edgargabriel and possibly @bosilca about exactly what this PR is for, does it actually address the customer issue, is it complete or does it need more PRs, what's its relation to OMPIO, ... etc. Once this info is better understood, it should be examined as to the scope and risk of this PR for the v4.0.x and v4.1.x branches. For example, if this ends up being a large scope / high risk bug fix (not a new feature), it could still be suitable for v4.1.x, but maybe not suitable for v4.0.x. TBD. |
In the current series (4.x and 5.x), OMPI So, this PR creates 2 predefined OPAL datatypes for |
I don't see any risk in the category of ABI changes. As Austen's PR addition already proved, there was risk of interaction with other pieces like the reduction operations had to be made aware of the changes. Besides the addition of LONG to opal to allow pack/unpack to recognize and treat it differently, there was enough touching and cleaning of pack/unpack code going on I'm pretty sure it fixed pack/unpack external32 failures that went beyond just MPI_LONG. I forget specifically, but if that part's important I could checkout a 4.0 and double check which tests exactly failed before |
The only 2 types that have an external32 representation with a different
size than most current architectures, and as a result are more
challenging to handle.
This patch also brings back the support for packing and unpacking to and
from external32 for all datatypes and does a little cleaning in the
datatype API and comments.
Signed-off-by: George Bosilca [email protected]
(cherry picked from commit 4e56e83)