-
Notifications
You must be signed in to change notification settings - Fork 900
IBM MTT "make check" fail #2966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Duplicate of Issue #1893 - we need to investigate. It's intermittent so hard to pin down. here is the output from
|
PR #3468 might be related (might fix this issue - need to check) |
😞 That PR does not seem to fix the |
Instead of failing now it is hanging (I have to manually kill it as it will hang the MTT runs - there is no timeout mechanism in I suspect that the change made to fix Issue #3450 made this failure a hang now. |
@bosilca @hjelmn It looks like the recent changes to opal_fifo just changed the failure mode for IBM. Can you have a look at Josh's comment (#2966 (comment))? |
First and foremost, the failure is on ppc64le, I have no access to such an architecture. What is puzzling is that I was under the impresison that for PPC we were using atomic load/store (via OPAL_HAVE_ATOMIC_LLSC_PTR) but apparently this is not the case. I run 10k tests on different Intel architectures with an OMPI compiler is optimized mode, and I had no failure. No really sure how to approach this issue. |
@jjhursey Looks like we're going to need some IBM help on this one... |
Yeah that's fine. I can investigate, but I'm not back full time yet so it'll be a little while before I can probably get eyes on this. I just wanted to keep this issue updated. |
I'm going to ask @nysal to drive this from our side, since Josh is apparently skydiving. |
There was a request to re-assess this ticket after PR #3661 - I did so and the problem still persists. We'll continue to investigate. |
I thought we'd agreed not to support 32bit anymore in master. |
I can reproduce this on the 64 bit default build. I think the |
Its possible the LL/SC fifo implementation has a bug. I will take a look this week and see if there is anything obvious. |
I also see this problem pop up once in a while with SUSE openmpi2 packaging, on ppc64le only. |
Ref: PR #2526 |
I am seeing the same test - |
test_fifo still hangs in Fedora on ppc64le with 2.1.6rc1 |
Ok. Will dig deeper tomorrow. I know we have this fixed on master. Need to see what could be missing. |
Yeah, mtt hasn't run yet for 5.0.x and master last night, so we'll see where we stand then. So far pgi on the v4.1.x branch is the only failure. |
may resolve this. |
Now that #8649 is merged are the |
I need to cherry-pick that back to v4 and v5, but yeah I think we're probably good here. |
Note that the pr for v4.0.x was rejected, so we should consider adding
|
Our internal (IBM) MTT has been updated to build v4.0.x with |
Closing this, haven't seen an mtt make-check failure recently. Will re-open if it pops up again. |
@gpaulsen @jjhursey IBM is getting an MTT
make check
fail: https://mtt.open-mpi.org/index.php?do_redir=2391The error is that the
opal_fifo
test is failing in 32 bit POWER on master.The text was updated successfully, but these errors were encountered: