Skip to content

libnbc does not retain datatypes #1304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ggouaillardet opened this issue Jan 15, 2016 · 5 comments
Closed

libnbc does not retain datatypes #1304

ggouaillardet opened this issue Jan 15, 2016 · 5 comments
Assignees

Comments

@ggouaillardet
Copy link
Contributor

This ticket is about the issue initially reported by Thomas Ponweiser at http://www.open-mpi.org/community/lists/users/2016/01/28265.php

libnbc does not retain datatypes, from a pragmatic point of view, that means that if a datatype is used in a collective operation, and then this datatype is MPI_Type_free() before the collective completes, there is a risk the obj_reference_count becomes 0 and the datatype is freed before being used.

this scenario is explicitly permitted by the MPI standard

http://www.mpi-forum.org/docs/mpi-1.1/mpi-11-html/node58.html
"Any communication that is currently using this datatype will complete normally." And: " Freeing a datatype does not affect any other datatype that was built from the freed datatype."

@ggouaillardet ggouaillardet self-assigned this Jan 15, 2016
@bosilca
Copy link
Member

bosilca commented Jan 15, 2016

The first part of the statement is taken care by the PML base (pml_base_recvreq.h:67 for receive requests and pml_base_sendreq.h:98 for sends). It sounded like an acceptable solution long ago, but now that I think about I see the problem. As we do not retain the datatype at the upper level of the API (the MPI API in this case), when the user free the datatype the first OMPI internal communication operation will put the datatype refcount down to 0, and release it. I'll take ownership of this ticket while coming up with a solution.

The second part of the last statement "Freeing a datatype does not affect any other datatype that was built from the freed datatype." is covered by the current implementation, we increase the refcount (during the creation of the structures used for get_elements in ompi_datatype_set_args), so the datatype cannot be released on the first user call to MPI_TYPE_FREE.

@bosilca bosilca assigned bosilca and unassigned ggouaillardet Jan 15, 2016
@ggouaillardet
Copy link
Contributor Author

@bosilca i made #PR1305 as a proof of concept (only ibcast is implemented yet)

this for libnbc module only, so you might want to do things differently and at an higher level so it works for all modules.
but if you think it is good enough, i will be happy to implement all non blocking functions in libnbc

@ggouaillardet
Copy link
Contributor Author

by the way, and out of curiosity, is there any rationale for using "public" types (e.g. MPI_Datatype) instead of internal OpenMPI ones (e.g. ompi_datatype_t) ?

@bosilca
Copy link
Member

bosilca commented Jan 15, 2016

Except for portability, in the sense the a single collective communication library could work with multiple MPI implementations, (which was/is the case for libnbc) I do not think there is any rationale.

@hjelmn
Copy link
Member

hjelmn commented Jan 21, 2016

@ggouaillardet I have been slowly changing libnbc to use the internal datatypes/calls since the upstream libnbc will probably never be updated. Has not been high priority though.

ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Oct 4, 2016
MPI standard states user MPI_Datatype(s) can be free'd
after a call to a non blocking collective and before the non-blockin
collective completes.
Retain user (only) MPI_Datatype(s) when the non blocking call is invoked,
and set a request callback so they are free'd when the MPI_Request
completes.

Thanks Thomas Ponweiser for reporting this

Fixes open-mpi#1304
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Oct 27, 2016
MPI standard states user MPI_Datatype(s) can be free'd
after a call to a non blocking collective and before the non-blockin
collective completes.
Retain user (only) MPI_Datatype(s) when the non blocking call is invoked,
and set a request callback so they are free'd when the MPI_Request
completes.

Thanks Thomas Ponweiser for reporting this

Fixes open-mpi#1304

Signed-off-by: Gilles Gouaillardet <[email protected]>
jjhursey pushed a commit to jjhursey/ompi that referenced this issue Mar 6, 2017
MPI standard states user MPI_Datatype(s) can be free'd
after a call to a non blocking collective and before the non-blockin
collective completes.
Retain user (only) MPI_Datatype(s) when the non blocking call is invoked,
and set a request callback so they are free'd when the MPI_Request
completes.

Thanks Thomas Ponweiser for reporting this

Fixes open-mpi#1304

Signed-off-by: Gilles Gouaillardet <[email protected]>
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Sep 1, 2017
MPI standard states a user MPI_Op and/or user MPI_Datatype can be free'd
after a call to a non blocking collective and before the non-blocking
collective completes.
Retain user (only) MPI_Op and MPI_Datatype when the non blocking call is
invoked, and set a request callback so they are free'd when the MPI_Request
completes.

Thanks Thomas Ponweiser for reporting this

Fixes open-mpi#2151
Fixes open-mpi#1304

Signed-off-by: Gilles Gouaillardet <[email protected]>
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Sep 1, 2017
MPI standard states a user MPI_Op and/or user MPI_Datatype can be free'd
after a call to a non blocking collective and before the non-blocking
collective completes.
Retain user (only) MPI_Op and MPI_Datatype when the non blocking call is
invoked, and set a request callback so they are free'd when the MPI_Request
completes.

Thanks Thomas Ponweiser for reporting this

Fixes open-mpi#2151
Fixes open-mpi#1304

Signed-off-by: Gilles Gouaillardet <[email protected]>
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Sep 1, 2017
MPI standard states a user MPI_Op and/or user MPI_Datatype can be free'd
after a call to a non blocking collective and before the non-blocking
collective completes.
Retain user (only) MPI_Op and MPI_Datatype when the non blocking call is
invoked, and set a request callback so they are free'd when the MPI_Request
completes.

Thanks Thomas Ponweiser for reporting this

Fixes open-mpi#2151
Fixes open-mpi#1304

Signed-off-by: Gilles Gouaillardet <[email protected]>
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Apr 9, 2019
MPI standard states a user MPI_Op and/or user MPI_Datatype can be free'd
after a call to a non blocking collective and before the non-blocking
collective completes.
Retain user (only) MPI_Op and MPI_Datatype when the non blocking call is
invoked, and set a request callback so they are free'd when the MPI_Request
completes.

Thanks Thomas Ponweiser for reporting this

Fixes open-mpi#2151
Fixes open-mpi#1304

Signed-off-by: Gilles Gouaillardet <[email protected]>
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Apr 9, 2019
MPI standard states a user MPI_Op and/or user MPI_Datatype can be free'd
after a call to a non blocking collective and before the non-blocking
collective completes.
Retain user (only) MPI_Op and MPI_Datatype when the non blocking call is
invoked, and set a request callback so they are free'd when the MPI_Request
completes.

Thanks Thomas Ponweiser for reporting this

Fixes open-mpi#2151
Fixes open-mpi#1304

Signed-off-by: Gilles Gouaillardet <[email protected]>
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Jul 4, 2019
MPI standard states a user MPI_Op and/or user MPI_Datatype can be free'd
after a call to a non blocking collective and before the non-blocking
collective completes.
Retain user (only) MPI_Op and MPI_Datatype when the non blocking call is
invoked, and set a request callback so they are free'd when the MPI_Request
completes.

Thanks Thomas Ponweiser for reporting this

Fixes open-mpi#2151
Fixes open-mpi#1304

Signed-off-by: Gilles Gouaillardet <[email protected]>
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Jul 8, 2019
MPI standard states a user MPI_Op and/or user MPI_Datatype can be free'd
after a call to a non blocking collective and before the non-blocking
collective completes.
Retain user (only) MPI_Op and MPI_Datatype when the non blocking call is
invoked, and set a request callback so they are free'd when the MPI_Request
completes.

Thanks Thomas Ponweiser for reporting this

Fixes open-mpi#2151
Fixes open-mpi#1304

Signed-off-by: Gilles Gouaillardet <[email protected]>
ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Jul 12, 2019
MPI standard states a user MPI_Op and/or user MPI_Datatype can be free'd
after a call to a non blocking collective and before the non-blocking
collective completes.
Retain user (only) MPI_Op and MPI_Datatype when the non blocking call is
invoked, and set a request callback so they are free'd when the MPI_Request
completes.

Thanks Thomas Ponweiser for reporting this

Fixes open-mpi#2151
Fixes open-mpi#1304

Signed-off-by: Gilles Gouaillardet <[email protected]>

(cherry picked from commit open-mpi/ompi@0fe756d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants