Skip to content

ompi: open/close the bml framework in the pml/ob1 component #6129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

ggouaillardet
Copy link
Contributor

Since only the pml/ob1 component is using the bml framework,
this component can open/close the bml framework instead of ompi/runtime.

Signed-off-by: Gilles Gouaillardet [email protected]

Since only the pml/ob1 component is using the bml framework,
this component can open/close the bml framework instead of ompi/runtime.

Signed-off-by: Gilles Gouaillardet <[email protected]>
@ggouaillardet
Copy link
Contributor Author

@hjelmn can you please review this PR ?

When I run master on my cluster (it has IB hardware, and both verbs and ucx), I always get the annoying warning message

$ mpirun -np 2 ./ring_c
[...]
By default, for Open MPI 4.0 and later, infiniband ports on a device
are not used by default.  The intent is to use UCX for these devices.
You can override this policy by setting the btl_openib_allow_ib MCA parameter

At first I was surprised since pml/ucx is used.

So I changed to

$ mpirun --mca pml ucx -np 2 ./ring_c

but still got the same warning message since bml/r2 opens the btl framework.
since only pml/ob1 seems to be using the bml framework, I moved open/close the bml framework into the pml/ob1 component, and now there is no more warning when I mpirun --mca pml ucx ....

ideally, there would be no warning at all since pml/ucx is selected.

could we even more delay things (e.g. only try the btl components if pml/ob1 is selected) in order to get rid of this message ?
Unless the enduser is dumb enough to blacklist all the btl components (including btl/self), I do not see what could go wrong if we delay things a bit more.

@ggouaillardet
Copy link
Contributor Author

Or maybe a simpler option would be to delay the warning messages in btl/openib when add_procs() is invoked for the first time. I can try to make a proof of concept from tomorrow if that makes more sense to you.

@hjelmn
Copy link
Member

hjelmn commented Nov 28, 2018

Nope. The reason we open the bml in init is because ob1 is not the only bml user. We should probably fix the warning instead.

@hjelmn
Copy link
Member

hjelmn commented Nov 28, 2018

Ahh. See your other comment now. Please fix the warning :)

@ggouaillardet
Copy link
Contributor Author

I made #6137 to get rid of the UCX warning when useless.

I did not notice the bml framework is also used by osc/rdma ...

in that case, do we still need to open/close the bml framework in the pml/ob1 component ?
it is already open in ompi_mpi_init() before opening the pml and osc frameworks.

@hjelmn
Copy link
Member

hjelmn commented Dec 3, 2018

There is no harm in also opening/closing the bml framework in ob1. It just increments the framework reference count.

@ggouaillardet
Copy link
Contributor Author

thanks for the explanation ! I will now close this PR since the better fix is now in #6137

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants