-
Notifications
You must be signed in to change notification settings - Fork 900
config: re-enable GCC inline ASM check for PGI #2048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We disabled this support a long time ago. Probably safe to assume whatever bug we were working around no longer exists. Closes open-mpi#2044 Signed-off-by: Nathan Hjelm <[email protected]>
@PHHargrove This should fix the link issue with PGI 16.x. |
@hjelmn I will try that patch out on the PGI 16.x for PPC64 system when I am able. However, I am not sure I agree with the assumption that the problem seen in 2008 is no longer an issue. So, I'd like to find time to test PGI 12, 13, 14 and 15 on x86-64 as well. IIRC, anything older then that won't support the minimum C99 feature set required by OMPI. If you are waiting for a review from me, you won't get it until that testing can be completed. -Paul |
What versions of the PGI compiler does Open MPI plan to support? It might be worth testing some older versions to make sure this ASM protection wasn't important there. If so then we should make the configure logic a bit more complete (and document which versions it applies to). |
@jjhursey If Paul is correct then we should probably say we require PGI 12 or newer. If all the known asm bugs were indeed fixed in 10.8 then this is the complete fix. We should re-open the discussion of killing the .asm files if PGI is indeed working with inline asm. |
Going to run this branch with various versions from 10.9 -> 15.10 on x86_64. |
PGI 10.9 - Pass... Well mostly. The asm tests pass but I get this when building libopen-pal.la:
Its unrelated to this PR though. |
PGI 12.10 - ASM Pass |
Could we also get ia32 (via |
@PHHargrove Sure. Will add that to the list. Hoping to figure out this damn -pthreads error though so I can throw the opal_lifo/opal_fifo tests at it as well. Have no idea where the switch is coming from. It isn't in the Makefile (configure correctly finds -pthread does not work). |
PGI 12.6 - ASM Pass |
PGI 12.6 ia32 - ASM Pass, lifo pass, fifo pass |
PGI 16.7 for PPC64 (the original motivation) FAILS If all the amd64 and ia32 tests are clear, we still cannot claim "Closes #2044". Some details:
Respective gdb runs on the core files:
|
@PHHargrove Could also be bugs in our inline asm. Doubtful though since it works with gcc and xlc. |
Running list:
|
@PHHargrove I will look at the generated assembly from PGI on the ppc64 VM. Might be able to figure out why it is failing. Could be Monday before I get a chance though. |
"make all" finishes. Let me know if something else is needed to resolve the "?" in that line. |
Can we verify today that you are able to login to the VM? |
@PHHargrove Thats all i needed to resolve the ?. VM login looks good. |
@PHHargrove The problem could be with the ll/sc atomics. I might have a typo in the clobbers or something. Will test a couple of things and see what shakes out. If it is a PGI bug we can file it and re-enable PGI ppc64 inline asm when they fix it. |
Ok, the full suite of versions I have are tested and working. I just need to figure out ppc64 now. |
Hmm, looks like on our testbed we have PGI 15.7, 16.1, and 16.5 for x86_64. Can test those next week if needed. |
Heads-up that you probably need --enable-debug to reproduce the problem on ppc64: My initial ppc64 runs were with --enable-debug. -Paul |
@PHHargrove Ok, thats interesting. didn't configure with --enable-debug. That suggests a clobber or something similar since we do -O2 when not building with --enable-debug. |
If you have a licence for 16.5 then you should be able to d/l and install 16.7 (their current latest) for x86-64 as well. Registration is required for the d/l, but you don't need to provide license info. |
@sjeaugey Does nvidia care about old versions of PGI? Would it be ok to cut off anything older than 10.8? |
I think 6 years is pretty old indeed. Let me check. |
Pretty sure I see the compiler bug.
The This almost made me laugh (or cry) because it was shockingly familiar from my past work keeping x86(-64) asm working as PGI asm continued to fix old bugs and introduce new ones:
It seems reasonable to assume that with BTW: what's up with the unused |
Looks like a PGI asm bug to me. This:
Produces this:
Notice that when loading r5 with the new value it is using |
You are a step behind me. -Paul |
Hah, didn't even look :). Awesome. Two independent reviews of the generated code came to the same conclusion. |
The unused arguments... No idea what I was thinking. Was cleaning up the code now. Probably an artifact of writing the atomics that didn't hurt so didn't get cleaned up. Feel free to clean them up as part of the work around. |
Ack. Grrr. |
The Close and comment button is bugger than Comment :D. |
Even more fun. I made newval an input-output operand to see what it would do and it became a byte load! |
OK, I'll drop the "foo". While my tests of the work-around are running, here is a minimal test case for your amusement:
|
You can also drop the unused *addr input operand. I have a feeling I was setting ret a different way before the li %0,0/ori %0,%0,1 code was finalized. I will clean up the sc_32 version. |
Nice clean test case there. Hopefully pgi gets this resolved in 16.6 (err 16.8). The workaround will get us up an running for older versions. Is v16 the first one that supports ppc64? |
In the absence of the "memory" clobber, that input might be required to prevent the compiler from moving a store to the same location over the asm (though the At some point my changes become too large to be considered de minimis, which is a problem since I don't have a contributor's agreement. So, how about this:
I am testing 16.7 now.
I didn't know until last night that such a beast existed. |
Ugh, looks like the workaround will have to be applied to all the _64 atomics. Oh well. Should be easy enough to do. |
Yup, I just came to the same conclusion when "make check" still failed after I fixed the generated code for sc_64. In hindsight, it should have been obvious that if that trivial test case would fail then pretty much anything w/ a 64-bit integer (not pointer) input would be vulnerable to the compiler bug. The failure of (It seems that Good news: the patch below resolves the problems with sc_64 at Handing the remaining work over to you, Nathan. -Paul
|
FYI: I am sending my test case to Portland Group. -Paul |
From what I understand nvidia is ok with this change. It will actually help performance when using PGI. After this is merged I will evaluate the .asm files and see if they can be killed on master. We will leave them on v2.0.x and use them if the PGI version is < 10.8. |
:bot:mellanox:retest |
@hjelmn I don't think this fix was ever moved to the v2.x release series. Can you file a PR for it? I'd be happy to test/review. |
Today I finally received a TPR number from PGI: 23064 |
We disabled this support a long time ago. Probably safe to assume
whatever bug we were working around no longer exists.
Closes #2044
Signed-off-by: Nathan Hjelm [email protected]