-
Notifications
You must be signed in to change notification settings - Fork 901
v4.0.x: Backport: Ensure that --host / --hostfile nodes are always used in order provided #6508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Corresponds to following commits to OMPI master: 35a5971 2794ae4 aed06e6 5aa775c Signed-off-by: Ralph Castain <[email protected]>
Signed-off-by: Ralph Castain <[email protected]>
This is a big change. I see:
I think that these are technically backwards-incompatible changes, right? Realistically, the end-user impact will likely be zero (i.e., I find it hard to believe that any real world user would be selecting / passing MCA params to the bzip/gzip components), but still -- we should acknowledge that we're intentionally breaking backwards compatibility here. Also, I see changes to DSS packed/unpacked messages. Does this PR change the wire protocol? If so, does this have implications in container environments, e.g., where mpirun is outside the container and is a different version of OMPI than is inside the container? |
Nobody could use those components because they were only used for CR, which was removed from the release branches long ago.
We have always required that the daemons and mpirun all be at the same release level. Within that constraint, this doesn't break anything. However, if mpirun is outside the container, and the daemons are from some other version inside the container, then yes - launch will be broken. |
@gpaulsen this is huge to merge in while we are in rc state. I'd say this waits till 4.0.2. |
@hppritcha I agree. 4.0.2 would be sufficient. |
@rhc54 FWIW, I'm getting assertion failures with this PR:
|
Signed-off-by: Ralph Castain <[email protected]>
@jsquyres Should be okay now - I missed a spot in the backport. |
@rhc54 Something still seems to be missing. I'm not getting assertion failures any more, but I'm getting the old/bad behavior:
|
I'm beginning to wonder if this is worth all the pain...hitting my limits on how much time I can devote to it. |
I'm going to take myself off as a review to prevent the Pull Request Reminder from bugging me. Add me back when this is ready for review. |
@ggouaillardet - Ralph said in today's web-ex that he realistically won't have time for this in the coming months. Is this something you could help with? |
Removing milestone as we're currently unsure of the scope, schedule of getting this PR working. |
Backport: Ensure that nodes are always used in order provided
Corresponds to following commits to OMPI master:
35a5971
2794ae4
aed06e6
5aa775c
Signed-off-by: Ralph Castain [email protected]