Skip to content

ORTED: issue with libpath and Intel compilers. #729

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

ORTED: issue with libpath and Intel compilers. #729

wants to merge 1 commit into from

Conversation

ntimeu
Copy link

@ntimeu ntimeu commented Jul 21, 2015

When compiled with Intel compiler suite, ORTED is linked against several Intel libraries. This causes ORTED to crash before being launched due to the lack of propagation of the current LD_LIBRARY_PATH over SSH.

This patch add an MCA option to orte's plm_rsh module,
mca_plm_rsh_propagate_libpath (bool). When set to true it automatically
prepends to the remote node's LD_LIBRARY_PATH the current LD_LIBRARY_PATH
before launching the ORTED daemon.

@ggouaillardet
Copy link
Contributor

iirc @rhc54 already added some mechanism to set PATH and LD._LIBRARY_PATH in order to correctly remote start orte. also, this mechanism is generic (e.g works with rsh but also slurm and other plm).
this option is easier to use (e.g. a flag vs full paths) but is limited to rsh.
do I miss some more value from this pr ?

@rhc54
Copy link
Contributor

rhc54 commented Jul 21, 2015

What is wrong with the --enable-orterun-prefix-by-default configure option? Or the --prefix command line option? Both are designed to solve this specific problem.

@rhc54
Copy link
Contributor

rhc54 commented Jul 21, 2015

Ah, I see - you're talking about the secondary library path to the Intel libs. We do have the "-x LD_LIBRARY_PATH" cmd line option which does exactly what you describe. There is also an MCA param version of it. Are those not adequate?

Of course, they all presume that the libs are in the same place on every node, which may also not be true. However, we haven't come up with a solution to that problem.

@ggouaillardet
Copy link
Contributor

yep, I think we are now all talking about the path to the intel runtime.

fwiw, I hit the second problem on an heterogeneous cluster with shared storage only.
I "solved" it by automatically adding wrappers for both orted and app.
iirc, I briefly mentioned this on the develop ml a few months ago.

mpirun a.out automagically does
remotestart /.../orted_wrapper /.../orted
and orted fork and exec
/.../app_wrapper a.out

I think the orted wrapping part is ok, but the app wrapping part can be improved.
I can share this (but not before thursday) if you have any interest

@jsquyres
Copy link
Member

Should the Intel libraries be rpath/runpath-ed by the wrapper compilers?

@ggouaillardet
Copy link
Contributor

there are several things here ...
first, orted must find the intel runtime, and we currently have two options for that

  • configure with LDFLAGS=-Wl,-rpath,/opt/intel/...
  • use recently added mca param
    then a.out must find the intel runtime, and we can use similar mechanism plus -x LD_LIBRARY_PATH
    I think the mca param offers the more flexibility since we can use one "safe" runtime for orted, and let the user choose the right one for a.out
    imho, rpath is more of a hack to get something working quickly.
    an other option (which might be already possible, I honestly never checked) could be to have a specific LDFLAGS for orted (so we can do rpath here and here only) and make sure this is not propagated to the compiler wrappers.

@jsquyres
Copy link
Member

Sure, I get the tradeoffs. But the point I'm making is that we already rpath/runpath the OMPI runtime libraries by default. Should we also rpath/runpath the compiler runtime libraries by default, too? (I realize that it is a different solution than what is proposed by this PR, and also realize that there are workarounds that can be used to find compiler runtime libraries, too)

@ntimeu
Copy link
Author

ntimeu commented Jul 21, 2015

Hi, and thanks for the answers,

Actually the LDFLAGS/rpath set at compile time cannot be used in our case (we don't know in advance which version of the Intel compiler is installed and where).

Our problem is mostly with the ORTED daemon, which need some Intel runtime, and the corresponding paths are not set when plm SSHes the compute node (just before launching ORTED).

@ggouaillardet
Copy link
Contributor

@ntimeu this was implemented in 55ddd6d
try mpirun --mca plm.rsh.pass_libpath /opt/intel/... ...
/* I cannot test this right now, so please adjust the syntax if needed */

@jsquyres
Copy link
Member

FWIW, we don't necessarily recommend this behavior: i.e., compile OMPI with one compiler suite/version and then run with another. Compiler vendors do tend to guarantee that this is supposed to work (assuming that you're just changing the version of the same compiler suite -- vs. changing compiler suites), but we've run into subtle problems with this kind of behavior over the years. YMMV.

@ntimeu
Copy link
Author

ntimeu commented Jul 21, 2015

@ggouaillardet I've seen this option; but setting it each time is kind of problematic, more if it's done on a production cluster. We could also set in the mca config file a default path to the runtime, but at each update of the compilers we would have to change this option.

The advantages of my solution was that by setting plm_rsh_propagate_libpath to true (in the mca config file), we would just have to launch mpirun, automatically sending LD_LIBRARY_PATH before launching ORTED. This would be helpfull on a production cluster, as we don't know where the compilers are installed.

@jsquyres my concern was more about where are located compilers (sorry if I explained poorly my problem)

@jsquyres
Copy link
Member

@ntimeu Are you saying that you configure/compile/install OMPI with the Intel compiler suite vA.B.C in location X, but then run OMPI with the Intel computer suite vA.B.C in location Y?

@ntimeu
Copy link
Author

ntimeu commented Jul 23, 2015

@jsquyres yes, it depends where the client installs its compilers (for runtime).

@jsquyres
Copy link
Member

I see -- you're an ISV, not a cluster administrator. So you can't control where the users install the Intel compilers; indeed, it may be in a different location than where you installed them.

Two questions:

  1. Isn't there some option to build the intel compiler runtime libs statically into Open MPI? I have a dim recollection of this option existing (specific to the Intel compiler suite), but I don't remember what it's called. If this option exists, it might be a good workaround.
  2. Is there much of a difference between what you proposed and mpirun -x LD_LIBRARY_PATH ...? If not, is there a reason that the -x method is not sufficient?

@rhc54
Copy link
Contributor

rhc54 commented Jul 23, 2015

As this is against the master, remember we also have the MCA param to forward an envar - isn't that also sufficient?

@ggouaillardet
Copy link
Contributor

iirc, there is the -static-intel option, but there are some restrictions ...
iirc, -x LD_LIBRARY_PATH is ok for a.out but not for orted
(libpath must be used for orted)
fwiw, I used to compile with gcc/g++/ifort so orted does not require the
intel runtime.
Ralph already made it clear this is not something that is officially
supported by intel.

@ntimeu
Copy link
Author

ntimeu commented Jul 23, 2015

@jsquyres yes actually I'm working at Bull, we compile and package OpenMPI for our clients.

  1. as pointed out by @ggouaillardet , we cannot statically link the Intel runtime to OpenMPI and redistribute it directly, because of licensing policy (and compiling specifically ORTED with gcc is not supported).
  2. (here also @ggouaillardet is right) mpirun -x <env> sets the environment for the application code (which is launched by ORTED). But here we need to set the correct libpath just before ORTED launch (because ORTED needs the Intel runtime), so -x is not sufficient.

@rhc54 if you mean plm_rsh_pass_libpath <path>, yes I'm aware of it, but we cannot set it from our side, we do not know where our clients install theirs compilers.

The main point of my PR is to provide a way to automatically set before launching ORTED the Intel runtime path on remote nodes, so that we can set this option to true in the mca config file provided to our clients.

@rhc54
Copy link
Contributor

rhc54 commented Jul 24, 2015

As you can tell, we are nervous about changes like this because of past experiences with unintended consequences.

Also, FWIW: I never said anything about what Intel may or may not officially support - not sure where @ggouaillardet got that notion.

Let me think about this a bit - I understand the problem you are trying to solve, but I'm not sure this is the best way to solve it. As Jeff noted, you have no idea if the compiler version they are using is compatible with what you built against, what you propose isn't restricted to the Intel runtime libraries and could easily lead to problems, etc.

I'm puzzled by your statement about the static linking solution - AFAIK, there is no licensing issue with distributing Intel runtimes. You only need a license to run the actual compiler. Are you sure you've checked that out with Intel? It would seem the cleanest of all the proposed solutions.

@ggouaillardet
Copy link
Contributor

@rhc54 i think you posted something like that on the ML (but i cannot find it ...)
some time ago, i wrote something similar http://www.open-mpi.org/community/lists/users/2015/04/26743.php
(of course, not disagreeing at that time does not mean i quoted you correctly)

i am not sure there is a clean solution at all ...
vendor build openmpi with runtime X
ideally, orted should use runtime X (and, except the potential legal issues, that can be easily achieved by redistributing the runtime)

now what about the application ?
if it was compiled by the enduser with runtime Y, that means that
a) the application runs with runtime Y, which is not ideal because ompi was built with runtime X
b) the application runs with runtime X, which is not ideal because the application was built with runtime Y

i also understand this new mca param is somehow convenient.
just set a boolean flag once for all, and it will likely work

so on one hand i am reluctant to provide an easy option that is somehow dangerous
(who really understands backward and forward compatibility of runtimes ? not limited to intel compiler)
and on the other hand, i do not see a simple and clean solution.

@rhc54
Copy link
Contributor

rhc54 commented Jul 24, 2015

I believe the best solution is to statically link against the Intel runtime used to build the executables. IANAL, but I have been in several recent meetings that touched similar subjects, and I suspect there is a misunderstanding here over the Intel licensing. Again, in full disclosure, although I am an Intel employee, I am not affiliated with the Intel compiler group nor authorized to speak for them. Bull needs to contact their Intel rep and get an official answer.

If that proves unsatisfactory, then I think the next best solution is for Bull to add this to their repo of custom patches they maintain for their distro. Until we hear of some general problem, I personally would rather not have something this specific in the general distro.

@ggouaillardet
Copy link
Contributor

I configure'd with LDFLAGS=-static-intel and orted still required the intel dynamic runtime.
I noticed libopen-pal.so does require the runtime (as shown by ldd) even if it was built with -static-intel

I am thinking that orted could use static libraries for opal, orte and compiler runtime.
I did not try to build dynamic and static ompi, but I suspect orted might use the dynamic libraries in that case. if I am right, what about adding the following configure option(s)
--static-orted
and possibly
--only-static-orted (static libs are built only for orted, but they are not installed)

any thoughts ?

@rhc54
Copy link
Contributor

rhc54 commented Jul 24, 2015

I vaguely recall this when Brian Barrett had a similar issue - I forget what he had to do to get the ORTE tools to build statically, but there was some additional stuff required. Not sure why he didn't commit it - it's been a long time since he did it.

@jsquyres
Copy link
Member

I do like the idea of a --enable-static-runtime kind of configure option. This would seem to solve all the problems, and not require the user to know/do anything at runtime.

The only problem is: I'm not entirely sure how to do that.

  1. With the Intel compiler, there's the --static-intel option, but that doesn't make a wholly static orted, for example (it just statically links in the Intel libraries -- not all libraries).
  2. Are there universal options for this for the other compilers?

...after thinking about this for a few minutes, it may be sufficient to do something like this:

if enable_static_runtime was passed
    RUNTIME_LDFLAGS=--static
    if compiler suite is intel
        RUNTIME_LDFLAGS="$RUNTIME_LDFLAGS --static-intel"
    fi
fi

Assuming $(RUNTIME_LDFLAGS) is used when linking the orted (and any other relevant executables?), Libtool might automatically translate --static into whatever is relevant for the underlying compiler, and we only have one special case for the Intel compiler suite.

It's worth a try...?

@jsquyres
Copy link
Member

This PR is getting a bit stale -- is there any interest in trying what I suggested in #729 (comment)?

@gpaulsen
Copy link
Member

I'll sign up to implement the --enable-static-runtime configure time option. I've created
#818 to track that work.

This pull request will continue to represent the approach of a runtime flag to "Add something to the LD_LIBRARY_PATH".

This patch add an MCA option to the plm_rsh module,
mca_plm_rsh_propagate_libpath (bool). When set to true it automatically
prepends to the remoten ode's LD_LIBRARY_PATH the current LD_LIBRARY_PATH
before launching the ORTED daemon.
@jsquyres
Copy link
Member

This PR has been replaced by #818.

@jsquyres jsquyres closed this Oct 12, 2015
@ntimeu ntimeu deleted the PR/libpath-auto-append branch March 23, 2016 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants