-
Notifications
You must be signed in to change notification settings - Fork 900
Test failures: opal_fifo : test fix #2526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@amckinstry Unfortunately I can't tell much from that detailed link -- it just shows that it failed, but not why. Can you shared opal_fifo.log, and/or a backtrace from a core dump? |
I was kinda hoping you'd seen this before, and could just point me at a patch :-) The build systems hadn't kept opal_fifo.log, so I've needed to set up a ppc64el environment by hand. Its hanging in opal_fifo.c:200, on pthread_join(). In gdb, I can see the threads (8 + pthread master). They're in 👍
and 135 looks suspicious:
Interrupting shows its looping around the do{} loop in opal_fifo_pop_atomic() indefinitely. |
Patch from Thibaut Paumard [email protected] for this bug. |
@amckinstry I'm sorry, I don't think that patch is correct. I note that
Making Plus, returning from the function invoked from |
Agreed, the patch is not correct. Still need to figure out the answer. |
There was a regression in PPC atomics. Should be fixed in the latest 2.0.2 release candidate. Please test. |
I am seeing the same test - |
FWIW - This still occurs with 2.1.6rc1 |
@opoplawski Well that's disappointing. Is this also happening with 3.0.3, 3.1.3, and/or the latest nightly 4.0.1 snapshot? |
I'm not seeing with 3.1.3 or with 4.0.0 - so that's good. |
Ok, great. @hjelmn This implies that we're still missing an atomic fix from the v2.x branch...? |
Maybe. Could be a opal_fifo_t fix that is missing. |
I don't think anybody will backport the new atomic operations into 2.x. This ticket can be closed. |
The test opal_fifo is failing on Debian for the git master, snapshot of November 25.
The test previously passed on 2.0.1 systems; works on most architectures except kfreebsd-i386 (kfreebsd-amd64 works) and ppc64el:
https://buildd.debian.org/status/package.php?p=openmpi&suite=experimental
See, e.g.
https://buildd.debian.org/status/fetch.php?pkg=openmpi&arch=kfreebsd-i386&ver=2.0.2%7Egit.20161225-2&stamp=1480937654
Any ideas? I'm setting up a test system to grab the logs and debug
The text was updated successfully, but these errors were encountered: