Skip to content

fortran/use-mpi-f08: add support for Fortran 2018 ISO_Fortran_binding.h #6569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ggouaillardet
Copy link
Contributor

add the infrastructuce and update MPI_Send() and MPI_Recv()

Signed-off-by: Gilles Gouaillardet [email protected]

@ggouaillardet
Copy link
Contributor Author

@jsquyres @hppritcha @bosilca this is a first cut on a topic we discussed long time ago.

stuff in ompi/mpi/fortran/use-mpi-f08/cdesc is (inspired) from MPICH

I ran a simple pingpong test with

integer :: buf(0:1025,0:1025)

and use buf(1:1024,1:1024) as the buffer. The performance is now 4x faster with use mpi_f08 compared to use mpi (that requires internal buffer allocation/filling/deallocation from the Fortran runtime). I used Intel compilers since upcoming gcc 9 is not quite ready now (and this is currently discussed on the MP starting at https://gcc.gnu.org/ml/fortran/2019-04/msg00013.html)

@ggouaillardet ggouaillardet force-pushed the poc/f08_cdesc branch 2 times, most recently from 9a05dd7 to 2f47ed6 Compare April 8, 2019 07:56
@bosilca
Copy link
Member

bosilca commented Apr 9, 2019

This is really cool stuff, it was about time someone address this issue. Let me take a look at how we can build these types in a better way. Can you point me to some good documentation about the Fortran cdesc ?

@ggouaillardet
Copy link
Contributor Author

@bosilca I guess the best documentation is in the standard. I am not sure it is available at no cost, but you can check the recent and freely available draft at https://j3-fortran.org/doc/year/18/18-007r1.pdf

As a side note, we should really fix how non blocking collectives handle datatypes/operators refcounts
(if I apply the same technique to non blocking collectives, when we will end up using free'd datatypes and bad things will happen).

@ggouaillardet
Copy link
Contributor Author

Refs. #2154

last = i; goto fn_exit;
}

mpi_errno = PMPI_Type_commit(&types[i+1]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we only have to commit the last datatype?

I.e., isn't "commit" potentially an expensive operation? If so, can't we use uncommitted datatypes to build up new datatypes, and therefore only have to commit the last datatype?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bosilca confirmed to me in an email that I was correct -- you only need to commit the final datatype. Depending on the complexity of the array slice, this could be a nice optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I will do that (and issue a PR to mpich since this is basically their code).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to move this type_commit outside of the loop and only commit the last/final datatype.

@bosilca
Copy link
Member

bosilca commented Apr 10, 2019

I did not meant to resolve the conversation, I just clicked on the wrong button. What I wanted to say is that I do not like copying the code from MPICH here. This entire piece of code seems to be a call to darray, we might want to go that route.

@AboorvaDevarajan
Copy link
Member

Can one of the admins verify this patch?

@ggouaillardet
Copy link
Contributor Author

These new Fortran 2008 bindings bring new opportunities (performance improvements, correct handling of sub-arrays in non blocking subroutines) but also some challenges I will try to list here

  1. handling a part of a subarray
integer :: a(0:3,0:3)
call mpi_send(a(1:2,1:2), 3, MPI_INT, ...)

this is currently unsupported. Under the hood, a derived datatype is created for the subarray (4 elements) and the C PMPI_Send() binding is invoked with this intermediary datatype and count=1. As a consequence, only the full subarray can be sent.
2. scatter and friends

integer :: a(0:3,0:3)
call mpi_scatter(a(1:2,1:2), 2, MPI_INT, ...)

if the communicator size is 2, then this is a legit usage that is currently not supported (just as above, the C binding is invoked with a ddt for the full subarray and count=1. In order to work, the intermediate datatype should have 2 elements (and not 4).
if the communicator size is 4, this is again a legit call, but would have to invoke the C PMPI_Scatterv() binding which is currently not supported.
3. pack and friends

integer :: a(0:3,0:3)
call mpi_pack(..., outbuf=a(0:2,0:2), outsize=4, ...)

this is arguably a dumb thing to do, and this is currently not supported. note the current ignore_tkr version works just fine.
4. ignore_tkr support
Currently, ignore_tkr is required to build the mpi_f08 bindings. A compiler might not support ignore_tkr but might support the new bindings (and yes, I have one in mind).

Bottom line, some decisions have to be made about what we support and how.
For example, we might not handle (non contiguous) subarrays and have the subroutine fail with MPI_ERR_INTERN or MPI_ERR_BUFFER.
About the scatter example, we might not support this (and return with the same error as above), fully support it or simply fall back to the old bindings (but MPI_Iscatter() might be still broken).
About the pack example, we might use the old bindings or simply not support that (that is a dumb thing to do anyway).
Of course, if the compiler does not support the old ignore_tkr bindings, we do not have the option to fall back on these bindings.

As a reminder, I opened pending #2154 to correctly handle derived datatypes freed in the middle of a collective operation (can be done at the component level e.g. coll/libnbc or at the MPI level once for all components). Without this PR, non blocking collectives would either be busted or leak a datatype.

@jsquyres @bosilca @kawashima-fj could you please comment on that ?
I hope we can reach a consensus on how to move forward (e.g. how and with which limitations).

@jsquyres
Copy link
Member

jsquyres commented Jul 3, 2019

bot:lanl:retest

last = i; goto fn_exit;
}

mpi_errno = PMPI_Type_commit(&types[i+1]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to move this type_commit outside of the loop and only commit the last/final datatype.

bindings.h \
cdesc.h \
cdesc.c \
\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove the extra blank lines. cdesc.c should be the last entry in the list, and not have a \ continuation character.

@ggouaillardet
Copy link
Contributor Author

@jsquyres I addressed all your comments and updated the PR.

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR represents a ton of work -- thank you for doing this!

#include "ompi/mpi/fortran/base/constants.h"

void ompi_pack_cdesc(CFI_cdesc_t* x, MPI_Fint *incount, MPI_Fint *datatype,
char *outbuf, MPI_Fint *outsize, MPI_Fint *position,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't both the inbuf and outbuf of all these pack/unpack routines be descriptors?

I.e., don't we have to combine the inbuf, the MPI datatype, and the outbuf to do the underlying memcpy(ies) correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where I asked for guidance in a previous comment.

IMHO, that falls into the category "the enduser wants to do something pretty dumb but that is allowed by the standard".

Anyway, if we make outbuf a descriptor, then we cannot simply invoke PMPI_Pack() anymore (since it expects a contiguous outbuf), so it would be much easier to let the Fortran runtime handle this ... unless the compiler does not support ignore_tkr.

If simply returning an error when outbuf is not contiguous is not an option and we want to support compilers that cannot do ignore_tkr, then the only option is to do it by ourself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that there is a restriction on the outbuf to be contiguous -- i.e., it can be a descriptor. More comments below in the main thread...

@@ -580,11 +580,12 @@ subroutine MPI_Pack_f08(inbuf,incount,datatype,outbuf,outsize,position,comm,ierr
use :: mpi_f08_types, only : MPI_Datatype, MPI_Comm
implicit none
!DEC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, outbuf
!GCC$ ATTRIBUTES NO_ARG_CHECK :: inbuf, outbuf
OMPI_F08_GCC_ATTRIBUTES(inbuf)
!GCC$ ATTRIBUTES NO_ARG_CHECK :: outbuf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per above/comment in the C file, the outbuf can be a descriptor, too.

@bosilca
Copy link
Member

bosilca commented Jul 12, 2019

Going back to the discussion @ggouaillardet started last week, more specifically on his 3 array shape challenges. On all these points the issue seems to be related to the expected extent of the resulting datatype. Honestly, I am not sure I see what the outcome of mpi_send(a(1:2,1:2), 3, MPI_INT, ...) is expected to be ? I can see it as adding an extent of 0 to the resulting datatype, and then the count is simply sending the same data multiple times. Or, adding an extent according to one of the dimensions. Both these approaches look legit to me.

@jsquyres
Copy link
Member

In MPI-3.1 17.1.2 (see here), it says:

All nonblocking MPI functions (e.g., MPI_ISEND, MPI_PUT, MPI_FILE_WRITE_ALL_BEGIN) behave as if the user-specified elements of choice buffers are copied to a contiguous scratch buffer in the MPI runtime environment.

BTW, this behavior is only supposed to occur if the compile-time constant MPI_SUBARRAYS_SUPPORTED == .TRUE.. I don't recall offhand if that value was set in this PR. ... @ggouaillardet?

@jjhursey
Copy link
Member

bot:ibm:xl:retest

@open-mpi open-mpi deleted a comment from ibm-ompi Jul 24, 2019
@jsquyres
Copy link
Member

jsquyres commented Aug 24, 2019

Honestly, I am not sure I see what the outcome of mpi_send(a(1:2,1:2), 3, MPI_INT, ...) is expected to be ?

The behavior is described on MPI-3.1 p634. Line 18 starts a particularly interesting section:

All nonblocking MPI functions (e.g., MPI_ISEND, MPI_PUT, MPI_FILE_WRITE_ALL_BEGIN) behave as if the user-specified elements of choice buffers are copied to a contiguous scratch buffer in the MPI runtime environment. All datatype descriptions (in the example above, “3, MPI_REAL”) read and store data from and to this virtual contiguous scratch buffer. Displacements in MPI derived datatypes are relative to the beginning of this virtual contiguous scratch buffer.

I.e., it is useful to couch all of these conversations in the context of this "virtual contiguous scratch buffer".

Specifically: for all of Gille's shape issues: I unfortunately think they all must be supported. I do not find any text in MPI-3.1 that says that they can be unsupported by the implementation. They may not be performant (e.g., if someone does a pack from a [non-contig] descriptor to a [non-contig] descriptor -- the implementation, for simplicity, can pack from the [non-contig] descriptor to a contiguous buffer and then from that contiguous buffer to the [non-contig] descriptor), but we're talking correctness here, not performance. I think there's other places in the MPI spec that allow users to do non-performant things, too.


Also, I noticed this morning that per MPI-3.1 table 17.1, we have to change the name of the back-end Fortran symbols if descriptors are supported: they need to end in _f08ts (vs. plain _f08) and _fts.

The spec unfortunately says that we have to support descriptors in all of mpif.h, the mpi module, and the mpi_f08 module. I don't think we can support it in just the mpi_f08 module. 😦

Finally: per compilers that support type(*), dimension(..) but do not support ignore TKR, I guess the question is: do we care about those compilers? (I do not know which one @ggouaillardet has in mind) . And if so, how hard is it to remove "ignore TKR" from our implementation? But more specifically: if we have to support type(*), dimension(..) in all of mpif.h/mpi/mpi_f08, does ignore TKR matter?

@ggouaillardet
Copy link
Contributor Author

Thanks Jeff,

  • I will do the rename from _f08 to _f08ts
  • I do have a compiler in mind, but I cannot discuss it publically. Since moving to descriptors break the ABI, I already moved to using descriptors everywhere so we do not need to change the ABI again in the future. Some operations simply return MPI_ERR_INTERN if the buffer shape is not supported.
  • I will add support for all shapes (fallback to a contiguous copy if we cannot create the right derived datatype). this will require a few logic to be added into the non blocking operations (to free or copy-back the non-contiguous buffer when the operation completes) and causes a few headaches to correctly support persistent operations (e.g. perform a copy in at MPI_Start() time).
  • we have to support descriptors in all of mpif.h, the mpi module, and the mpi_f08 module I am a bit puzzled with that one. since mpif.h has no prototypes, how can we tell the compiler when an argument should be passed via a descriptor (e.g. a buffer) or its address (e.g. an INTEGER) ? I do not have a similar objection with use mpi.

FWIW, I have made some progresses with this PR, and my dev branch is at https://github.com/ggouaillardet/ompi/tree/dev/f08_cdesc
(I am also fixing mpif-h bindings that do not correctly ignore some parameters, and I created a bunch of tests at https://github.com/ggouaillardet/ompi-tests/tree/topic/ibm_collective_fortran_ignored_params)

@jsquyres
Copy link
Member

jsquyres commented Aug 25, 2019

  • I do have a compiler in mind, but I cannot discuss it publically. Since moving to descriptors break the ABI, I already moved to using descriptors everywhere so we do not need to change the ABI again in the future. Some operations simply return MPI_ERR_INTERN if the buffer shape is not supported.

Is there a way to return something more descriptive than MPI_ERR_INTERN -- i.e., something that indicates that the type is simply not supported by Open MPI? (vs., for example, an error in the application or an error in Open MPI) I ask simply to help us save time in the future if/when someone reports this "error", we can clearly tell "oh, ya, that's because it's not supported", as opposed to asking for a small reproducer, and/or looking for an error in Open MPI's code base.

  • I will add support for all shapes (fallback to a contiguous copy if we cannot create the right derived datatype). this will require a few logic to be added into the non blocking operations (to free or copy-back the non-contiguous buffer when the operation completes) and causes a few headaches to correctly support persistent operations (e.g. perform a copy in at MPI_Start() time).

I think that when we wrote this text for MPI-3.0, we were envisioning one possible implementation where a "union" datatype was created: i.e., take the "union" of the shape of the data in memory and the supplied MPI datatype, and create a new datatype representing the resulting shape. Then simply pass that datatype through the MPI engine. Hypothetically, one would therefore not need to modify the internal MPI engine at all -- because the result of this operation is just another MPI datatype, like any other MPI datatype.

Is that possible here?

  • we have to support descriptors in all of mpif.h, the mpi module, and the mpi_f08 module I am a bit puzzled with that one. since mpif.h has no prototypes, how can we tell the compiler when an argument should be passed via a descriptor (e.g. a buffer) or its address (e.g. an INTEGER) ? I do not have a similar objection with use mpi.

It is optional for mpif.h to have prototypes (see MPI-3.1 p611:33-34). I'm not aware of any MPI implementation that has them, but the spec allows for it.

Re-reading MPI-3.1 a little better than I did yesterday, I notice this (starting on p613:47):

To set MPI_SUBARRAYS_SUPPORTED to .TRUE. within a Fortran support method, it is required that all non-blocking and split-collective routines with buffer arguments are implemented according to 1B and 2B, i.e., with MPI_Xxxx_f08ts in the mpi_f08 module, and with MPI_XXXX_FTS in the mpi module and the mpif.h include file.
The mpi and mpi_f08 modules and the mpif.h include file will each correspond to exactly one implementation scheme from Table 17.1. However, the MPI library may contain multiple implementation schemes from Table 17.1.

This section -- and sections 17.1.2/3/4 -- says to me that each of mpif.h, the mpi module, and mpi_f08 modules can independently set MPI_SUBARRAYS_SUPPORTED to different values.

I guess that makes sense for exactly the case you are asking about: we can have mpi_f08 support TYPE(*),DIMENSION(..), but mpi and mpif.h do not.

...but how do we implement that? The name MPI_SUBARRAYS_SUPPORTED has a single back-end symbol: how can we have it have different values depending on whether you use mpif.h/mpi/mpi_f08?

Regardless, this opens a whole new can of worms:

  • Do we continue to have the non-descriptor-enabled symbols? E.g., do we have both the descriptor and non-descriptor symbols in libomi_*whatever*.so?
  • Do we have a configure option (or compile-type option?) to select between SUBARRAYS_SUPPORTED==.TRUE. or .FALSE.?
  • How many MPI tools supports the descriptor-enabled symbols?
  • Is it going to cause an armageddon for user apps if we start putting prototypes in mpif.h?

😲

@ggouaillardet
Copy link
Contributor Author

would MPI_ERR_UNSUPPORTED_DATAREP or MPI_ERR_UNSUPPORTED_OPERATION be a better fit compared to MPI_ERR_INTERN ? If not, would it be MPI compliant to define a new error code? (or can we simply display an ad-hoc error message and return with an existing error code?)

I do not fully understand what you mean by "union" datatype.
Here is a corner-case though

   INTEGER :: buf(0:2,1:3)
   CALL MPI_GATHER(..., recvbuf=buf(1:2, 1:3), recvcount=6, recvtype=MPI_INTEGER, ...)

and run this on a communicator with two MPI ranks.
If I get it right, that would require a MPI_Gatherw like operation that is not even part of the standard.

Note I chose to have MPI_Win_create errors with MPI_ERR_BUFFER if a user tries to use a subarray as the base argument.

I'd rather start with ts only for the mpi_f08 bindings.

...but how do we implement that? The name MPI_SUBARRAYS_SUPPORTED has a single back-end symbol

the definition is currently in a single place (e.g. mpif-config.h) and this file is currently included within module mpi and module mpi_f08.
I made a simple proof-of-concept (e.g. move the definition to an other file, and have only mpi_f08 include its own version. Bottom line, MPI_SUBARRAYS_SUPPORTED is .FALSE. for mpif-h and use-mpi but .TRUE. for use-mpi-f08.

We do need a configure option to manually disable ts bindings :
I found a blocker in gcc 9 that makes the bindings unusable ... see my report at https://gcc.gnu.org/ml/fortran/2019-08/msg00104.html

We should decide asap if we want to have both MPI_xyz_f08 and MPI_xyz_f08ts symbols in the library. I was planning to move xyz_f08.F90 into xyz_f08.F90.in to have the stuff generated at configure time and make it easier to troubleshoot. Obviously, that won't work if we want to have both symbols in the library.

@ibm-ompi
Copy link

ibm-ompi commented Feb 6, 2020

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/01f8be4f3c3350eff5399cfba60efac9

@jsquyres
Copy link
Member

jsquyres commented May 5, 2020

This issue kinda fell off the table for a few months. ☹️

@ggouaillardet I don't see a reply to your issue on the GCC mailing list. Did that get resolved?

Given the lateness in the game here, I'm kinda guessing that this is going to miss the boat for Open MPI v5.0.0.

@gpaulsen gpaulsen modified the milestones: Future, v6.0.0 May 5, 2020
@gpaulsen
Copy link
Member

gpaulsen commented May 5, 2020

Discussed on call today. If this can make it in by v5.0.x branch-date (5/14/2020), great, but probably next opportunity will be v6.0.0 (currently unscheduled).

@ibm-ompi
Copy link

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/09792ae4918db9664b9dc6145ebfe8e2

@awehrfritz
Copy link

It's been a while since there has been any activity on this PR. Is there any interest in getting this feature set merged at some point?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants