Skip to content

Commit d7e3f87

Browse files
committed
gcc_builtin: fix performance regression on x86_64
in order to work around a bug in older gcc versions on x86_64, __atomic_thread_fence (__ATOMIC_SEQ_CST) was replaced with __atomic_thread_fence (__ATOMIC_ACQUIRE) based on the asumption that this did not introduce performance regressions. It was recently found that this did introduce some performance regression, mainly at scale on fat nodes. So simply use an asm memory globber to both workaround older gcc bugs and fix the performance regression. Thanks S. Biplab Raut for bringing this issue to our attention. Refs. #8603 Signed-off-by: Gilles Gouaillardet <[email protected]>
1 parent 91efa83 commit d7e3f87

File tree

1 file changed

+4
-5
lines changed
  • opal/include/opal/sys/gcc_builtin

1 file changed

+4
-5
lines changed

opal/include/opal/sys/gcc_builtin/atomic.h

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@
1313
* Copyright (c) 2011 Sandia National Laboratories. All rights reserved.
1414
* Copyright (c) 2014-2018 Los Alamos National Security, LLC. All rights
1515
* reserved.
16-
* Copyright (c) 2016-2017 Research Organization for Information Science
17-
* and Technology (RIST). All rights reserved.
16+
* Copyright (c) 2016-2021 Research Organization for Information Science
17+
* and Technology (RIST). All rights reserved.
1818
* Copyright (c) 2018 Triad National Security, LLC. All rights
1919
* reserved.
2020
* $COPYRIGHT$
@@ -61,9 +61,8 @@ static inline void opal_atomic_rmb(void)
6161
{
6262
#if OPAL_ASSEMBLY_ARCH == OPAL_X86_64
6363
/* work around a bug in older gcc versions where ACQUIRE seems to get
64-
* treated as a no-op instead of being equivalent to
65-
* __asm__ __volatile__("": : :"memory") */
66-
__atomic_thread_fence (__ATOMIC_SEQ_CST);
64+
* treated as a no-op instead */
65+
__asm__ __volatile__("": : :"memory");
6766
#else
6867
__atomic_thread_fence (__ATOMIC_ACQUIRE);
6968
#endif

0 commit comments

Comments
 (0)