config: re-enable GCC inline ASM check for PGI #2048

hjelmn · 2016-09-02T18:45:25Z

We disabled this support a long time ago. Probably safe to assume
whatever bug we were working around no longer exists.

Closes #2044

Signed-off-by: Nathan Hjelm [email protected]

We disabled this support a long time ago. Probably safe to assume whatever bug we were working around no longer exists. Closes open-mpi#2044 Signed-off-by: Nathan Hjelm <[email protected]>

hjelmn · 2016-09-02T18:45:47Z

@PHHargrove This should fix the link issue with PGI 16.x.

PHHargrove · 2016-09-02T18:55:24Z

@hjelmn I will try that patch out on the PGI 16.x for PPC64 system when I am able.

However, I am not sure I agree with the assumption that the problem seen in 2008 is no longer an issue. So, I'd like to find time to test PGI 12, 13, 14 and 15 on x86-64 as well. IIRC, anything older then that won't support the minimum C99 feature set required by OMPI. If you are waiting for a review from me, you won't get it until that testing can be completed.

-Paul

jjhursey · 2016-09-02T18:57:11Z

What versions of the PGI compiler does Open MPI plan to support? It might be worth testing some older versions to make sure this ASM protection wasn't important there. If so then we should make the configure logic a bit more complete (and document which versions it applies to).

hjelmn · 2016-09-02T19:04:25Z

@jjhursey If Paul is correct then we should probably say we require PGI 12 or newer. If all the known asm bugs were indeed fixed in 10.8 then this is the complete fix. We should re-open the discussion of killing the .asm files if PGI is indeed working with inline asm.

hjelmn · 2016-09-02T19:07:00Z

Going to run this branch with various versions from 10.9 -> 15.10 on x86_64.

hjelmn · 2016-09-02T19:30:35Z

PGI 10.9 - Pass... Well mostly. The asm tests pass but I get this when building libopen-pal.la:

pgcc-Error-Unknown switch: -pthread

Its unrelated to this PR though.

hjelmn · 2016-09-02T20:07:23Z

PGI 12.10 - ASM Pass

PHHargrove · 2016-09-02T20:11:31Z

PGI 10.9 - Pass
Well, that is a surprise (to me anyway) independent of the ASM.

Could we also get ia32 (via -m32) tested w/ as many of the available PGI versions as possible? The 64-bit CAS code for ia32 has been particularly "tough" on compiler's support for inline ASM since it uses pretty much all of the available registers. It was responsible for several of the asm bugs (PGI and otherwise) we found in various compilers when building GASNet's atomics.

hjelmn · 2016-09-02T20:13:30Z

@PHHargrove Sure. Will add that to the list. Hoping to figure out this damn -pthreads error though so I can throw the opal_lifo/opal_fifo tests at it as well. Have no idea where the switch is coming from. It isn't in the Makefile (configure correctly finds -pthread does not work).

hjelmn · 2016-09-02T20:15:26Z

PGI 12.6 - ASM Pass

hjelmn · 2016-09-02T20:31:16Z

PGI 12.6 ia32 - ASM Pass, lifo pass, fifo pass

PHHargrove · 2016-09-02T20:57:32Z

PGI 16.7 for PPC64 (the original motivation) FAILS make check on this branch (detail below).
I suspect PGI's handing of GCC-style inline asm for PPC is less than perfect this early in its lifetime.

If all the amd64 and ia32 tests are clear, we still cannot claim "Closes #2044".
However, perhaps the configury should continue to blacklist PGI, but only for PPC64.
That is still a step out of the dark ages of 2008.

Some details:

../../../config/test-driver: line 107: 21663 Segmentation fault      (core dumped) "$@" > $log_file 2>&1
FAIL: opal_lifo
../../../config/test-driver: line 107: 21687 Segmentation fault      (core dumped) "$@" > $log_file 2>&1
FAIL: opal_fifo

Respective gdb runs on the core files:

Core was generated by `/home/ubuntu/ompi/BLD/test/class/.libs/lt-opal_lifo '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000010001ff8 in opal_lifo_pop_atomic (lifo=0x3fffdffc76b8) at ../../../opal/class/opal_lifo.h:223
223             next = (opal_list_item_t *) item->opal_list_next;
(gdb) print *item
Cannot access memory at address 0x30d34590

Core was generated by `/home/ubuntu/ompi/BLD/test/class/.libs/lt-opal_lifo '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000010001ff8 in opal_obj_new_debug (type=0x30d34590,
    file=0x10030d34540 "\355\276\257\336\355\276\257\336\320c\027\217\377?", line=256)
    at ../../../opal/class/opal_object.h:265
265         object->cls_init_file_name = file;

(gdb) where
#0  0x0000000010001ff8 in opal_obj_new_debug (type=0x30d34590,
    file=0x10030d34540 "\355\276\257\336\355\276\257\336\320c\027\217\377?", line=256)
    at ../../../opal/class/opal_object.h:265
#1  0x00003fffdffc7500 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

hjelmn · 2016-09-02T21:00:42Z

@PHHargrove Could also be bugs in our inline asm. Doubtful though since it works with gcc and xlc.

hjelmn · 2016-09-02T21:02:00Z

Running list:

PGI 10.9 x86_64 - Pass (asm, fifo, lifo)
PGI 10.9 ia32 - Pass (asm, fifo, lifo)
PGI 12.6 x86_64 - Pass (asm, lifo, fifo)
PGI 12.6 ia32 - Pass (asm, lifo, fifo)
PGI 12.10 x86_64 - Pass (asm, lifo, fifo)
PGI 12.10 ia32 - Pass (asm, fifo, lifo)
PGI 13.2 x86_64 - Pass (asm, fifo, lifo)
PGI 13.2 ia32 - Pass (asm, fifo, lifo)
PGI 13.7 x86_64 - Pass (asm, lifo, fifo)
PGI 13.7 ia32 - Pass (asm, lifo, fifo)
PGI 13.10 x86_64 - Pass (asm, lifo, fifo)
PGI 13.10 ia32 - Pass (asm, lifo, fifo)
PGI 14.10 x86_64 - Pass (asm, fifo, lifo)
PGI 14.10 ia32 - X - Can't test
PGI 15.10 x86_64 - Pass (asm, fifo, lifo)
PGI 15.10 ia32 - X - Can't test
PGI 16.7 ppc64 - Pass (asm), Fail (lifo, fifo)

hjelmn · 2016-09-02T21:03:08Z

@PHHargrove I will look at the generated assembly from PGI on the ppc64 VM. Might be able to figure out why it is failing. Could be Monday before I get a chance though.

PHHargrove · 2016-09-02T21:06:42Z

PGI 16.7 ppc64 - Pass? (asm), Fail (lifo, fifo)

"make all" finishes.
"make check" at top level gets as far as the failing lifo and fifo tests.
However, before that point the "tests/asm" directory passed all tests.

Let me know if something else is needed to resolve the "?" in that line.

PHHargrove · 2016-09-02T21:08:05Z

I will look at the generated assembly from PGI on the ppc64 VM. Might be able to figure out why it is failing. Could be Monday before I get a chance though.

Can we verify today that you are able to login to the VM?

hjelmn · 2016-09-02T21:16:06Z

@PHHargrove Thats all i needed to resolve the ?. VM login looks good.

hjelmn · 2016-09-02T21:44:40Z

@PHHargrove The problem could be with the ll/sc atomics. I might have a typo in the clobbers or something. Will test a couple of things and see what shakes out. If it is a PGI bug we can file it and re-enable PGI ppc64 inline asm when they fix it.

hjelmn · 2016-09-02T22:01:26Z

Ok, the full suite of versions I have are tested and working. I just need to figure out ppc64 now.

hjelmn · 2016-09-02T22:04:27Z

Hmm, looks like on our testbed we have PGI 15.7, 16.1, and 16.5 for x86_64. Can test those next week if needed.

PHHargrove · 2016-09-02T22:05:06Z

@hjelmn

Heads-up that you probably need --enable-debug to reproduce the problem on ppc64:

My initial ppc64 runs were with --enable-debug.
I have now tested w/o that flag and PASS all the tests (asm, lifo, fifo).
I would have expected the opposite!

-Paul

hjelmn · 2016-09-02T22:06:11Z

@PHHargrove Ok, thats interesting. didn't configure with --enable-debug. That suggests a clobber or something similar since we do -O2 when not building with --enable-debug.

PHHargrove · 2016-09-02T22:09:10Z

If you have a licence for 16.5 then you should be able to d/l and install 16.7 (their current latest) for x86-64 as well. Registration is required for the d/l, but you don't need to provide license info.

hjelmn · 2016-09-02T22:12:28Z

@sjeaugey Does nvidia care about old versions of PGI? Would it be ok to cut off anything older than 10.8?

sjeaugey · 2016-09-02T22:54:56Z

I think 6 years is pretty old indeed. Let me check.

PHHargrove · 2016-09-03T01:10:41Z

@hjelmn

Pretty sure I see the compiler bug.
Have a look at the following, focusing on the stdcx. and the instruction immediately before it:

0000000010002120 <opal_atomic_sc_64>:
    10002120:   f8 ff 61 f8     std     r3,-8(r1)
    10002124:   f0 ff 81 f8     std     r4,-16(r1)
    10002128:   f8 ff 61 e8     ld      r3,-8(r1)
    1000212c:   78 1b 64 7c     mr      r4,r3
    10002130:   f0 ff a1 80     lwz     r5,-16(r1)
    10002134:   ad 19 a0 7c     stdcx.  r5,0,r3
    10002138:   00 00 a0 38     li      r5,0
    1000213c:   08 00 c2 40     bne-    10002144 <opal_atomic_sc_64+0x24>
    10002140:   01 00 a5 60     ori     r5,r5,1
    10002144:   ec ff c1 90     stw     r6,-20(r1)
    10002148:   e8 ff a1 90     stw     r5,-24(r1)
    1000214c:   ea ff 61 e8     lwa     r3,-24(r1)
    10002150:   20 00 80 4e     blr

The lwz immediately before the stdcx. is a 32-bit load with zero-extension. It is loading the from the location where the 2nd arg (r4) was spilled in the prologue. So, the sc_64 operation is corrupting newval (zeroing the upper 32-bits) when setting up the asm argument(s).

This almost made me laugh (or cry) because it was shockingly familiar from my past work keeping x86(-64) asm working as PGI asm continued to fix old bugs and introduce new ones:

   * GASNETI_PGI_ASM_BUG2843
   *   Compiler suffers from "tpr 17075" in which extended asm() may load only 32 bits of
   *   a 64-bit operand at -O1 (but is OK at -O0 and -O2).
   *

It seems reasonable to assume that with -O2 there is no spill and thus no incorrect load to corrupt the value.

BTW: what's up with the unused foo arguments to the asm for both sc operations?

hjelmn · 2016-09-03T03:17:56Z

Looks like a PGI asm bug to me. This:

static inline int opal_atomic_sc_64(volatile int64_t *addr, int64_t newval)
{
    int32_t ret;

    __asm__ __volatile__ ("   stdcx.  %2, 0, %1  \n\t"
                          "   li      %0,0       \n\t"
                          "   bne-    1f         \n\t"
                          "   ori     %0,%0,1    \n\t"
                          "1:"
                          : "=r" (ret)
                          : "r" (addr), "r" (newval)
                          : "cc", "memory");
    return ret;
}

Produces this:

        .align  4
        .type   opal_atomic_sc_64,@function
opal_atomic_sc_64:                      # @opal_atomic_sc_64
.Lfunc_begin19:
        .loc    6 253 0                 # ../../opal/include/opal/sys/powerpc/atomic.h:253:0
        .cfi_startproc
# BB#0:                                 # %L.entry
        #DEBUG_VALUE: opal_atomic_sc_64:addr <- [%X1+-8]
        #DEBUG_VALUE: opal_atomic_sc_64:newval <- [%X1+-16]
        std 3, -8(1)
        std 4, -16(1)
        .loc    6 264 1 prologue_end    # ../../opal/include/opal/sys/powerpc/atomic.h:264:1
.Ltmp244:
        ld 3, -8(1)
        lwz 5, -16(1)
        #APP
        stdcx. 5, 0, 3

        li 3, 0

        bne-     0, .Ltmp245

        ori 3, 3, 1

.Ltmp245:
        #NO_APP
        std 3, -24(1)
        lwa 3, -24(1)
        blr
.Ltmp246:
        .long   0
        .quad   0
.Lfunc_end19:
        .size   opal_atomic_sc_64, .Lfunc_end19-.Lfunc_begin19
        .cfi_endproc

Notice that when loading r5 with the new value it is using lwz 5, -16(1) which is load word (32-bit) and 0. It should be using a double-word load. I am trying to figure out a workaround.

PHHargrove · 2016-09-03T03:22:35Z

@hjelmn

You are a step behind me.
I diagnosed the same code gen error (32-bit load) one comment before yours.
I have a work-around in final testing right now.
Stay tuned...

-Paul

hjelmn · 2016-09-03T03:24:26Z

Hah, didn't even look :). Awesome. Two independent reviews of the generated code came to the same conclusion.

hjelmn · 2016-09-03T03:25:51Z

The unused arguments... No idea what I was thinking. Was cleaning up the code now. Probably an artifact of writing the atomics that didn't hurt so didn't get cleaned up. Feel free to clean them up as part of the work around.

hjelmn · 2016-09-03T03:25:58Z

Ack. Grrr.

hjelmn · 2016-09-03T03:26:31Z

The Close and comment button is bugger than Comment :D.

hjelmn · 2016-09-03T03:28:04Z

Even more fun. I made newval an input-output operand to see what it would do and it became a byte load! lbz 5, -24(1)

PHHargrove · 2016-09-03T03:30:52Z

OK, I'll drop the "foo".

While my tests of the work-around are running, here is a minimal test case for your amusement:

$ cat q.c
long identity_function(long input) {
    long output;
    __asm__ __volatile__ (  "mr    %0,%1"
                          : "=r" (output)
                          : "r"  (input) );
    return output;
}
$ pgcc -c q.c
$ objdump -d q.o

q.o:     file format elf64-powerpcle
Disassembly of section .text:

0000000000000000 <identity_function>:
   0:   f8 ff 61 f8     std     r3,-8(r1)
   4:   f8 ff 81 80     lwz     r4,-8(r1)
   8:   78 23 83 7c     mr      r3,r4
   c:   f0 ff 61 f8     std     r3,-16(r1)
  10:   20 00 80 4e     blr
        ...

hjelmn · 2016-09-03T03:32:45Z

You can also drop the unused *addr input operand. I have a feeling I was setting ret a different way before the li %0,0/ori %0,%0,1 code was finalized. I will clean up the sc_32 version.

hjelmn · 2016-09-03T03:34:13Z

Nice clean test case there. Hopefully pgi gets this resolved in 16.6 (err 16.8). The workaround will get us up an running for older versions.

Is v16 the first one that supports ppc64?

PHHargrove · 2016-09-03T03:45:44Z

I will clean up the sc_32 version.

You can also drop the unused *addr input operand.

In the absence of the "memory" clobber, that input might be required to prevent the compiler from moving a store to the same location over the asm (though the __volatile__ is supposed to do that too). With a "memory" clobber and the __volatile__ it can certainly be removed.

At some point my changes become too large to be considered de minimis, which is a problem since I don't have a contributor's agreement. So, how about this:

I will post the fix (1 line change, with a few lines of preprocessor to make it conditional) here
You fix everything else.

Hopefully pgi gets this resolved in 16.6

I am testing 16.7 now.
If I understand correctly, the version is year.month, to the next release will be 16.9 or higher.

Is v16 the first one that supports ppc64?

I didn't know until last night that such a beast existed.
However, based on what I learned while looking for it I think 16.5 and 16.7 have been the ONLY releases for ppc64 so far, and both are identified as beta. I assume @sjeaugey will have the real answer.

hjelmn · 2016-09-03T03:58:08Z

Ugh, looks like the workaround will have to be applied to all the _64 atomics. Oh well. Should be easy enough to do.

PHHargrove · 2016-09-03T04:16:24Z

Ugh, looks like the workaround will have to be applied to all the _64 atomics. Oh well. Should be easy enough to do.

Yup, I just came to the same conclusion when "make check" still failed after I fixed the generated code for sc_64. In hindsight, it should have been obvious that if that trivial test case would fail then pretty much anything w/ a 64-bit integer (not pointer) input would be vulnerable to the compiler bug.

The failure of opal_lifo test is no longer a SEGV, but it is still a test failure.
The ompi_rb_tree and opal_fifo tests still SEGV.

(It seems that ompi_rb_tree has been failing since the start, but I missed it because it scrolled off my screen.)

Good news: the patch below resolves the problems with sc_64 at -O0.
All that is needed is a cast to (long long) or (void) - either one works.
Note that a cast to int64_t or intptr_t does not help at all (nor do the unsigned versions).

Handing the remaining work over to you, Nathan.
I suggest some sort of FOO64(newval) macro rather than an #if/#else/#endif in every atomic.
Have a look at the existing OPAL_ASM_ADDR() macro in the same file (also a compiler bug work-around I provided, fwiw).

-Paul

diff --git a/opal/include/opal/sys/powerpc/atomic.h b/opal/include/opal/sys/powerpc/atomic.h
index d104379..7c28399 100644
--- a/opal/include/opal/sys/powerpc/atomic.h
+++ b/opal/include/opal/sys/powerpc/atomic.h
@@ -257,7 +257,11 @@ static inline int opal_atomic_sc_64(volatile int64_t *addr, int64_t newval)
                           "   ori     %0,%0,1    \n\t"
                           "1:"
                           : "=r" (ret), "=m" (*addr), "=r" (foo)
+#if defined(__PGI) // Work-around bad-code gen at -O0
+                          : "r" (addr), "r" ((long long)newval)
+#else
                           : "r" (addr), "r" (newval)
+#endif
                           : "cc", "memory");
     return ret;
 }

PHHargrove · 2016-09-03T05:03:20Z

@hjelmn

FYI: I am sending my test case to Portland Group.
I'll try to remember to report the assigned TPR number here when I have one.

-Paul

hjelmn · 2016-09-06T21:16:21Z

From what I understand nvidia is ok with this change. It will actually help performance when using PGI. After this is merged I will evaluate the .asm files and see if they can be killed on master. We will leave them on v2.0.x and use them if the PGI version is < 10.8.

hjelmn · 2016-09-06T21:30:39Z

:bot:mellanox:retest

jjhursey · 2016-09-15T15:52:20Z

@hjelmn I don't think this fix was ever moved to the v2.x release series. Can you file a PR for it? I'd be happy to test/review.

PHHargrove · 2016-09-26T21:31:30Z

Today I finally received a TPR number from PGI: 23064

config: re-enable GCC inline ASM check for PGI

795833b

We disabled this support a long time ago. Probably safe to assume whatever bug we were working around no longer exists. Closes open-mpi#2044 Signed-off-by: Nathan Hjelm <[email protected]>

hjelmn closed this Sep 3, 2016

hjelmn reopened this Sep 3, 2016

hjelmn merged commit 08d08e6 into open-mpi:master Sep 6, 2016

config: re-enable GCC inline ASM check for PGI #2048

config: re-enable GCC inline ASM check for PGI #2048

Uh oh!

Conversation

hjelmn commented Sep 2, 2016

Uh oh!

hjelmn commented Sep 2, 2016

Uh oh!

PHHargrove commented Sep 2, 2016

Uh oh!

jjhursey commented Sep 2, 2016

Uh oh!

hjelmn commented Sep 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hjelmn commented Sep 2, 2016

Uh oh!

hjelmn commented Sep 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hjelmn commented Sep 2, 2016

Uh oh!

PHHargrove commented Sep 2, 2016

Uh oh!

hjelmn commented Sep 2, 2016

Uh oh!

hjelmn commented Sep 2, 2016

Uh oh!

hjelmn commented Sep 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PHHargrove commented Sep 2, 2016

Uh oh!

hjelmn commented Sep 2, 2016

Uh oh!

hjelmn commented Sep 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hjelmn commented Sep 2, 2016

Uh oh!

PHHargrove commented Sep 2, 2016

Uh oh!

PHHargrove commented Sep 2, 2016

Uh oh!

hjelmn commented Sep 2, 2016

Uh oh!

hjelmn commented Sep 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hjelmn commented Sep 2, 2016

Uh oh!

hjelmn commented Sep 2, 2016

Uh oh!

PHHargrove commented Sep 2, 2016

Uh oh!

hjelmn commented Sep 2, 2016

Uh oh!

PHHargrove commented Sep 2, 2016

Uh oh!

hjelmn commented Sep 2, 2016

Uh oh!

sjeaugey commented Sep 2, 2016

Uh oh!

PHHargrove commented Sep 3, 2016

Uh oh!

hjelmn commented Sep 3, 2016

Uh oh!

PHHargrove commented Sep 3, 2016

Uh oh!

hjelmn commented Sep 3, 2016

Uh oh!

hjelmn commented Sep 3, 2016

Uh oh!

hjelmn commented Sep 3, 2016

Uh oh!

hjelmn commented Sep 3, 2016

Uh oh!

hjelmn commented Sep 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hjelmn commented Sep 2, 2016 •

edited

Loading

hjelmn commented Sep 2, 2016 •

edited

Loading

hjelmn commented Sep 2, 2016 •

edited

Loading

hjelmn commented Sep 2, 2016 •

edited

Loading

hjelmn commented Sep 2, 2016 •

edited

Loading

hjelmn commented Sep 3, 2016 •

edited

Loading

hjelmn commented Sep 3, 2016 •

edited

Loading