Skip to content

v4.0.x: Long live MPI_LONG and MPI_UNSIGNED_LONG #9088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 2, 2021

Conversation

markalle
Copy link
Contributor

The only 2 types that have an external32 representation with a different
size than most current architectures, and as a result are more
challenging to handle.

This patch also brings back the support for packing and unpacking to and
from external32 for all datatypes and does a little cleaning in the
datatype API and comments.

Signed-off-by: George Bosilca [email protected]
(cherry picked from commit 4e56e83)

The only 2 types that have an external32 representation with a different
size than most current architectures, and as a result are more
challenging to handle.

This patch also brings back the support for packing and unpacking to and
from external32 for all datatypes and does a little cleaning in the
datatype API and comments.

Signed-off-by: George Bosilca <[email protected]>
(cherry picked from commit 4e56e83)
@markalle
Copy link
Contributor Author

Cherry picked George's external32 fix for consideration into v4.0.x

@gpaulsen gpaulsen requested review from bosilca and gpaulsen June 23, 2021 13:11
@jsquyres
Copy link
Member

Does this also need to be cherry picked to v4.1.x?

@jsquyres jsquyres added this to the v4.0.7 milestone Jun 23, 2021
@jsquyres
Copy link
Member

Does this PR change the v4.0.x ABI?

@markalle
Copy link
Contributor Author

@jsquyres Yeah, I'll do a v4.1.x too.

And I don't think it introduces any ABI change because mpi.h only exposes MPI_LONG as &ompi_mpi_long, and that already existed. The inside of ompi_mpi_long is changed, ompi_mpi_long.id = the new 0x2F value, but I don't think anything in the ABI has visibility into what's inside the internal ompi_mpi_long.

@markalle
Copy link
Contributor Author

FWIW here's a gist for a testcase pack2.c that's fixed by this cherry-pick from master.

https://gist.github.com/markalle/6d70cf8ca14761e94bce9d0240000a3e
% mpicc -o x pack2.c
% mpirun -np 1 ./x

@awlauria
Copy link
Contributor

@markalle @jsquyres if you bring in this change you may also want this:

#8780

When performing an op with MPI_LONG, for example,
an MPI_Allreduce() with MPI_MIN, this would segv when performing
the op, since there is no longer an op function
for this type.

I followed the blueprint for int64_t/uint64_t.

Signed-off-by: Austen Lauria <[email protected]>
(cherry picked from commit ff9f03c)
@markalle
Copy link
Contributor Author

Thanks Austen, I agree. Repushed with 8780 cherry picked

@jsquyres jsquyres changed the title Long live MPI_LONG and MPI_UNSIGNED_LONG v4.0.x: Long live MPI_LONG and MPI_UNSIGNED_LONG Jun 29, 2021
@jsquyres
Copy link
Member

Discussion on the 2021-06-29 webex: it sounds like this is a bug fix for a customer-found defect. The v4.0.x RMs are going to gather a little more information from @markalle and @edgargabriel and possibly @bosilca about exactly what this PR is for, does it actually address the customer issue, is it complete or does it need more PRs, what's its relation to OMPIO, ... etc. Once this info is better understood, it should be examined as to the scope and risk of this PR for the v4.0.x and v4.1.x branches.

For example, if this ends up being a large scope / high risk bug fix (not a new feature), it could still be suitable for v4.1.x, but maybe not suitable for v4.0.x. TBD.

@bosilca
Copy link
Member

bosilca commented Jun 30, 2021

In the current series (4.x and 5.x), OMPI long and unsigned long types are mapped onto POSIX integer types (int16_t, int32, ...) according to the type sizeof. Unfortunately, the MPI standard defines different pack/unpack requirements for external32 for the long and unsigned long types, possibly allowing their external32 representation to differ from the one used by the local mapped int* type. To put this differently, all int* types have a fixed length representation, while long doesn't.

So, this PR creates 2 predefined OPAL datatypes for long and unsigned long addressing the issue described above, and allowing for specialized pack/unpack of these types with regard to external32. It also provide basic support for the pack/unpack operations when the local architecture and the external32 have the same sizeof.

@markalle
Copy link
Contributor Author

markalle commented Jul 2, 2021

I don't see any risk in the category of ABI changes. As Austen's PR addition already proved, there was risk of interaction with other pieces like the reduction operations had to be made aware of the changes.

Besides the addition of LONG to opal to allow pack/unpack to recognize and treat it differently, there was enough touching and cleaning of pack/unpack code going on I'm pretty sure it fixed pack/unpack external32 failures that went beyond just MPI_LONG. I forget specifically, but if that part's important I could checkout a 4.0 and double check which tests exactly failed before

@hppritcha hppritcha merged commit ff49a11 into open-mpi:v4.0.x Jul 2, 2021
@hppritcha hppritcha added the NEWS label Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants