-
Notifications
You must be signed in to change notification settings - Fork 898
WIP: config: default to DSO for plugins with external deps #8800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
With the new default to link all plugins as static, the dependencies of all the plugins (e.g. transport plugins) become direct dependencies of top-level binaries (e.g. opal_wrapper aka mpicc). These direct dependencies are simply bad, because the dependency is false, it is unnecessary for running most binaries. But if that's not sufficient justification, the current default behavior is different from old behavior in a user-unfriendly way: mpicc --version mpicc: error while loading shared libraries: libugni.so.0: cannot open shared object file: No such file or directory The workaround the user needs to figure out is to build with `--enable-mca-dso` set, either for all or for a specific set of components. In PRRTE open-mpi#871 it was decided that a behavior that's more user friendly than this can be provided by implementing a smart default for which components should default to DSO compile mode. One more component that also should default to DSO is `btl-sm` but only when Cray `xpmem` is found (it is enabled by default). This would require some kind of conditional logic. See this issue for rationale: openpmix/prrte#871 This patch to OMPI is a twin to the patches that were already committed to PRRTE and PMIX: PRRTE e3914b0ad0df30939d329fe615a0ec37e64dcf2a PMIX 0b8f747 Signed-off-by: Alexei Colin <[email protected]>
Can one of the admins verify this patch? |
ok to test |
@rhc54 what do you say about this part?:
|
The plugins that themselves depend on an external .so library, must not be linked in statically, because that external dep becomes a dep of all mpi binaries, including mpicc, etc. In latest master, this is the default already, but we didn't bump the snapshot yet. See open-mpi/ompi#8800
That patch is incomplete though (btl-sm needs to also be DSO if and only if xpmem is used), so won't be able to rely on this default. Commit to the ebuild to explicitly set --enable-mca-dso comming soon. Also see: open-mpi/ompi#8800
This was an intentional behavior choice on the part of the Open MPI community. This patch should not be accepted. We should think about how to minimize dependency leakage for the wrapper compiler, but this approach is not the way to do that. |
Ok. There seems to be ambiguity on what the community decided. These commits suggest the opposite decision: PRRTE e3914b0ad0df30939d329fe615a0ec37e64dcf2a |
Two different communities with different needs, I'm afraid. The biggest difference being that no application ever links against PRRTE, while they do against OMPI. |
FWIW, I'm not sure that Ralph's fix in PRRTE or PMIx was the right solution to that problem, either. If a customer configures OMPI with --enable-mca-static, we can't have a broken build, and that's what that patch suggests we'd have. |
Let's chat about it this week - I may have misunderstood how that option worked. In the case you cite, I would have expected the OMPI-PRRTE and OMPI-PMIx integration to pass appropriate configure options to "do the right thing". If that isn't going to happen, then yes, we might have a problem. |
In this comment openpmix/prrte#871 (comment) Josh essentially said that that users should not compile with The question of whether the default behavior should try to be helpful or not seems somewhat orthogonal to the above underlying issue that the build with all components as static won't work on some systems. If the user did not pass any flags, what is the gain of producing a broken build for them by default? Changing the default like Ralph did seems like a no-loss proposition: build still broken with Default A: unless instructed otherwise with I don't see any issues with Default B, and Default B seems superior to Default A: with B you'll get a prted and mpicc that work out-of-the-box by default on all systems. It's much more tolerable to produce broken binaries in response to the user choosing flags incompatible with their system, than it is to produce broken binaries by default and expect the user to figure out the right flags that yield working binaries. P.S. I'm not trying to defend this PR. I'm happy passing the flag explicitly. Just thinking of what's the best approach would be. |
@acolinisi @alexeicolin - can we close this PR? |
Again, this was a decision the OPen MPI community made. A cray system is an example of a system where DSOs directly negatively impact performance. The right solution isn't to try to change defaults on dsos, but to fix the library dependency problem for mpicc. We always knew that we'd have to fix these things, but they're real problems for large systems, that need fixing. I strongly disagree with the approach the PMIx team has taken here; for large systems, DSOs are a problem and we need to sort out dependencies to avoid relying on DSOs, rather than sticking our head in the sand. |
I don't have an issue with a different approach - I was simply trying to "stick a finger in the dyke" as it was causing problems and complaints from a number of places. I'd welcome a better alternative if someone can develop it. |
Please go ahead and close. All I could contribute is the opinion from user's perspective summarized in my comment #8800 (comment) |
@alexeicolin I don't think anyone is arguing that we need to fix this issue. We need to fix it before launch. I think it's the how to fix it that we're arguing about. There's absolutely no reason that mpicc/mpirun/etc. NEED to depend on libugni.so. Given the DSO load scalability problems, we need to break that dependency by not linking mpicc against the same library set as the MPI application, not by breaking the dependency by creating more artifacts that don't scale. For what it's worth, the PMIx developers agree with this sentiment. They just wanted to unbreak things faster for their users. Which means that they're only going to fix the core issue of scalability around DSO loading because of OMPI holding the line. I thought we had an issue on this, but clearly we don't. I've cut #9869 and made it a blocker for the 5.0 release. |
With the new default to link all plugins as static, the dependencies of all the plugins (e.g. transport plugins) become direct dependencies of top-level binaries (e.g. opal_wrapper aka mpicc).
These direct dependencies are simply bad, because the dependency is false, it is unnecessary for running most binaries.
But if that's not sufficient justification, the current default behavior is
different from old behavior in a user-unfriendly way:
The workaround the user needs to figure out is to build with
--enable-mca-dso
set, either for all or for a specific set of components; or set LD_LIBRARY_PATH system-wide or for each usage ofmpicc
and all other binaries, neither of which is reasonable.In openpmix/prrte#871 it was decided that a behavior that's more user friendly than this can be provided by implementing a smart default for which components should default
to DSO compile mode. This PR implements this default.
One more component that also should default to DSO is
btl-sm
but only when Crayxpmem
is found (it is enabled by default). This would require some kind of conditional logic. So, this PR is incomplete.See this issue for rationale: openpmix/prrte#871
This patch to OMPI is a twin to the patches that were already committed to
PRRTE and PMIX:
PRRTE e3914b0ad0df30939d329fe615a0ec37e64dcf2a
PMIX 0b8f747
Signed-off-by: Alexei Colin [email protected]