-
Notifications
You must be signed in to change notification settings - Fork 900
ORTED: issue with libpath and Intel compilers. #729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
iirc @rhc54 already added some mechanism to set PATH and LD._LIBRARY_PATH in order to correctly remote start orte. also, this mechanism is generic (e.g works with rsh but also slurm and other plm). |
What is wrong with the --enable-orterun-prefix-by-default configure option? Or the --prefix command line option? Both are designed to solve this specific problem. |
Ah, I see - you're talking about the secondary library path to the Intel libs. We do have the "-x LD_LIBRARY_PATH" cmd line option which does exactly what you describe. There is also an MCA param version of it. Are those not adequate? Of course, they all presume that the libs are in the same place on every node, which may also not be true. However, we haven't come up with a solution to that problem. |
yep, I think we are now all talking about the path to the intel runtime. fwiw, I hit the second problem on an heterogeneous cluster with shared storage only. mpirun a.out automagically does I think the orted wrapping part is ok, but the app wrapping part can be improved. |
Should the Intel libraries be rpath/runpath-ed by the wrapper compilers? |
there are several things here ...
|
Sure, I get the tradeoffs. But the point I'm making is that we already rpath/runpath the OMPI runtime libraries by default. Should we also rpath/runpath the compiler runtime libraries by default, too? (I realize that it is a different solution than what is proposed by this PR, and also realize that there are workarounds that can be used to find compiler runtime libraries, too) |
Hi, and thanks for the answers, Actually the LDFLAGS/rpath set at compile time cannot be used in our case (we don't know in advance which version of the Intel compiler is installed and where). Our problem is mostly with the ORTED daemon, which need some Intel runtime, and the corresponding paths are not set when plm SSHes the compute node (just before launching ORTED). |
FWIW, we don't necessarily recommend this behavior: i.e., compile OMPI with one compiler suite/version and then run with another. Compiler vendors do tend to guarantee that this is supposed to work (assuming that you're just changing the version of the same compiler suite -- vs. changing compiler suites), but we've run into subtle problems with this kind of behavior over the years. YMMV. |
@ggouaillardet I've seen this option; but setting it each time is kind of problematic, more if it's done on a production cluster. We could also set in the mca config file a default path to the runtime, but at each update of the compilers we would have to change this option. The advantages of my solution was that by setting plm_rsh_propagate_libpath to true (in the mca config file), we would just have to launch mpirun, automatically sending LD_LIBRARY_PATH before launching ORTED. This would be helpfull on a production cluster, as we don't know where the compilers are installed. @jsquyres my concern was more about where are located compilers (sorry if I explained poorly my problem) |
@ntimeu Are you saying that you configure/compile/install OMPI with the Intel compiler suite vA.B.C in location X, but then run OMPI with the Intel computer suite vA.B.C in location Y? |
@jsquyres yes, it depends where the client installs its compilers (for runtime). |
I see -- you're an ISV, not a cluster administrator. So you can't control where the users install the Intel compilers; indeed, it may be in a different location than where you installed them. Two questions:
|
As this is against the master, remember we also have the MCA param to forward an envar - isn't that also sufficient? |
iirc, there is the -static-intel option, but there are some restrictions ... |
@jsquyres yes actually I'm working at Bull, we compile and package OpenMPI for our clients.
@rhc54 if you mean The main point of my PR is to provide a way to automatically set before launching ORTED the Intel runtime path on remote nodes, so that we can set this option to true in the mca config file provided to our clients. |
As you can tell, we are nervous about changes like this because of past experiences with unintended consequences. Also, FWIW: I never said anything about what Intel may or may not officially support - not sure where @ggouaillardet got that notion. Let me think about this a bit - I understand the problem you are trying to solve, but I'm not sure this is the best way to solve it. As Jeff noted, you have no idea if the compiler version they are using is compatible with what you built against, what you propose isn't restricted to the Intel runtime libraries and could easily lead to problems, etc. I'm puzzled by your statement about the static linking solution - AFAIK, there is no licensing issue with distributing Intel runtimes. You only need a license to run the actual compiler. Are you sure you've checked that out with Intel? It would seem the cleanest of all the proposed solutions. |
@rhc54 i think you posted something like that on the ML (but i cannot find it ...) i am not sure there is a clean solution at all ... now what about the application ? i also understand this new mca param is somehow convenient. so on one hand i am reluctant to provide an easy option that is somehow dangerous |
I believe the best solution is to statically link against the Intel runtime used to build the executables. IANAL, but I have been in several recent meetings that touched similar subjects, and I suspect there is a misunderstanding here over the Intel licensing. Again, in full disclosure, although I am an Intel employee, I am not affiliated with the Intel compiler group nor authorized to speak for them. Bull needs to contact their Intel rep and get an official answer. If that proves unsatisfactory, then I think the next best solution is for Bull to add this to their repo of custom patches they maintain for their distro. Until we hear of some general problem, I personally would rather not have something this specific in the general distro. |
I configure'd with LDFLAGS=-static-intel and orted still required the intel dynamic runtime. I am thinking that orted could use static libraries for opal, orte and compiler runtime. any thoughts ? |
I vaguely recall this when Brian Barrett had a similar issue - I forget what he had to do to get the ORTE tools to build statically, but there was some additional stuff required. Not sure why he didn't commit it - it's been a long time since he did it. |
I do like the idea of a The only problem is: I'm not entirely sure how to do that.
...after thinking about this for a few minutes, it may be sufficient to do something like this: if enable_static_runtime was passed
RUNTIME_LDFLAGS=--static
if compiler suite is intel
RUNTIME_LDFLAGS="$RUNTIME_LDFLAGS --static-intel"
fi
fi Assuming It's worth a try...? |
This PR is getting a bit stale -- is there any interest in trying what I suggested in #729 (comment)? |
I'll sign up to implement the --enable-static-runtime configure time option. I've created This pull request will continue to represent the approach of a runtime flag to "Add something to the LD_LIBRARY_PATH". |
This patch add an MCA option to the plm_rsh module, mca_plm_rsh_propagate_libpath (bool). When set to true it automatically prepends to the remoten ode's LD_LIBRARY_PATH the current LD_LIBRARY_PATH before launching the ORTED daemon.
This PR has been replaced by #818. |
When compiled with Intel compiler suite, ORTED is linked against several Intel libraries. This causes ORTED to crash before being launched due to the lack of propagation of the current LD_LIBRARY_PATH over SSH.
This patch add an MCA option to orte's plm_rsh module,
mca_plm_rsh_propagate_libpath (bool). When set to true it automatically
prepends to the remote node's LD_LIBRARY_PATH the current LD_LIBRARY_PATH
before launching the ORTED daemon.