Skip to content

Conversation

@awlauria
Copy link
Contributor

@awlauria awlauria commented Feb 3, 2021

  • Fix check for newer version of xl compilers.

    • By default newer xlc compilers only define ibmxl now.
  • Fix configury where most compilers will get mislabeled as 'gnu'.

Tested this on gcc, clang, xlc, and pgi.

gcc (~35 warnings remain):

$. gcc --version
gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5)

xlc (~20 warnings remain):

$. xlc --version
IBM XL C/C++ for Linux, V16.1.1 (5725-C73, 5765-J13)
Version: 16.01.0001.0008

clang (~90 warnings remain):

$. clang --version
clang version 9.0.1 (Red Hat 9.0.1-2.module+el8.2.0+5494+7b8075cf)
Target: powerpc64le-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

pgi (~400 warnings remain):

$. pgcc --version
pgcc (aka nvc) 20.11-0 linuxpower target on Linuxpower 
PGI Compilers and Tools
Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.

companion pr's (openpmix):

openpmix/openpmix#2042
openpmix/openpmix#2048
openpmix/openpmix#2049
openpmix/openpmix#2053

prrte:
openpmix/prrte#736

@awlauria
Copy link
Contributor Author

awlauria commented Feb 3, 2021

PGI still throws quite a few warnings, but it's better than the thousands upon thousands it was throwing before. It is quite picky.

int recv_count = 0, send_count = 0;
int scatter_count = (count + comm_size - 1) / comm_size; /* ceil(count / comm_size) */
int curr_count = (rank == root) ? count : 0;
int rem_count = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this. In C99 (which we require) variables can be declared anywhere.

Copy link
Contributor Author

@awlauria awlauria Feb 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think goto's are allowed to jump across declarations. gcc seems pretty lax on it, maybe because it's smart enough to know it won't change as currently written. But it could lead to undefined behavior in the future if someone overlooks that.

I see this on the top of page 139:

EXAMPLE 2    A goto statement  is  not  allowed  to  jump  past  any  declarations  of 
objects  with  variably modified types.  A jump within the scope, however, is permitted.
http://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf

Copy link
Member

@bosilca bosilca Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the jumped variable would have been used in the code after the goto I would understand, but that's not the case, all variables remain used in the code block before the goto label. This is also what the Example 2 you mentioned indicates, it is incorrect to jump the variable definition.

I hate to change a perfectly correct code just to please a particular compiler, but as humans are much more adaptable than software we will cope with the new code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that in these cases there is no harm (the code is fine as is), but I think it's a good change in general. The compiler won't throw an error if someone were to try to use these uninitialized variables after the goto, and the developer will be left scratching their head.

@awlauria awlauria force-pushed the fix_pgi_warnings branch 2 times, most recently from 3f4aab3 to 2ef3ae1 Compare February 4, 2021 01:11
@awlauria
Copy link
Contributor Author

awlauria commented Feb 4, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

*/
static inline void opal_atomic_lock_init (opal_atomic_lock_t *lock, bool value)
{
#ifdef OPAL_HAVE_CLANG_BUILTIN_ATOMIC_C11_FUNC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say why we need to distinguish between Clang and the rest of the pack? According to https://en.cppreference.com/w/c/atomic/atomic_flag_clear atomic_flag_clear is supposed to take volatile atomic_flag* as argument (not volatile void*, which is now passed for the rest).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that clang uses it's own builtin-c11 atomic functions:

[awlauria@f8n02 clang_build]$ cat clang_test.c 
#include <stdatomic.h>

int main() {
 void *lock;
 atomic_flag_clear ((volatile void *) lock);
}
[awlauria@f8n02 clang_build]$ clang clang_test.c 
clang_test.c:5:2: error: member reference base type 'volatile void' is not a structure or union
 atomic_flag_clear ((volatile void *) lock);
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib64/clang/9.0.1/include/stdatomic.h:167:63: note: expanded from macro 'atomic_flag_clear'
#define atomic_flag_clear(object) __c11_atomic_store(&(object)->_Value, 0, __ATOMIC_SEQ_CST)
                                                      ~~~~~~~~^ ~~~~~~
1 error generated.

Checking with -E:

int main() {
 void *lock;
 __c11_atomic_store(&((volatile void *) lock)->_Value, 0, 5);
}

Dropping the volatile and just using a void * is fine.

Copy link
Contributor Author

@awlauria awlauria Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think gcc/others will do an implicit cast and not warn as long as the pointer is volatile - but it probably would be better to cast it to a volatile atomic_flag*.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make that change today - will test it across the different compilers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pass void* in the first place? Why not cast to volatile atomic_lock*?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like pgi replaces atomic_clear(volatile atomic_lock *) with __atomic_clear(volatile void * ptr, int memorder), and doesn't like the cast to volatile atomic_lock:

pgcc -E clang_test.c gives me:

int main() {
 volatile atomic_flag*  lock;
 __atomic_clear ((lock), 5);
}

I can't seem to make it happy without the cast to a volatile void *.
Maybe this should be a more of a PGI specific check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like pleasing a broken compiler then. The definition of atomic_flag_clear is clear: the argument is a volatile atomic_flag*. If the PGI compiler trips on that its their problem, not ours :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (not positive) pgi is using these atomic built-ins:
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

which defines it as Built-in Function: void __atomic_clear (bool *ptr, int memorder)
Not sure if it's technically doing the 'wrong thing'.

Even if so - it might be worth a work-around, since this is throwing a few thousand warnings as it's propagated in many places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If others agree that we want to workaround the PGI compiler here, we should report it to PGI and have a special case for PGI, treating all others the "right way" (instead of having a workaround for the one compiler that trips over our workaround)

@ibm-ompi
Copy link

ibm-ompi commented Feb 4, 2021

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/52695f3adfd831b9fc72a0839a3751b6

@ibm-ompi
Copy link

ibm-ompi commented Feb 4, 2021

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/c173667b52d8ec1b9bd40d2b1f3032dc

Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
@awlauria awlauria force-pushed the fix_pgi_warnings branch 2 times, most recently from be4a36b to bc3290e Compare February 8, 2021 21:34
Most compilers define the __GNU__ macro, so put it at the bottom
as a catch-all.

Tested with gcc, xlc, pgi and clang.

Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
@awlauria
Copy link
Contributor Author

I plucked some of these into:
#8488
for easier reviewing. Once that gets through I'll rebase this.

Copy link
Member

@jjhursey jjhursey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the change suggested in atomic_stdc.h this looks good to me.

Note that the branch needs to be rebased from master as there are some merge conflicts that GH has highlighted.

@awlauria
Copy link
Contributor Author

I'll cherry-pick these back slowly. This will be a painful process.

@awlauria awlauria closed this Sep 3, 2021
@awlauria awlauria deleted the fix_pgi_warnings branch March 17, 2022 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants