Skip to content

Commit 4707c7c

Browse files
committed
osc/rdma: make locking code more robust
Under heavy load the locking code could fail if the underlying btl module started to return OPAL_ERR_OUT_OF_RESOURCE on atomic operations. This commit updates the code to gracefully handle btl errors. Signed-off-by: Nathan Hjelm <[email protected]>
1 parent cc4a0fa commit 4707c7c

File tree

2 files changed

+163
-141
lines changed

2 files changed

+163
-141
lines changed

ompi/mca/osc/rdma/osc_rdma.h

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
* University of Stuttgart. All rights reserved.
99
* Copyright (c) 2004-2005 The Regents of the University of California.
1010
* All rights reserved.
11-
* Copyright (c) 2007-2016 Los Alamos National Security, LLC. All rights
11+
* Copyright (c) 2007-2017 Los Alamos National Security, LLC. All rights
1212
* reserved.
1313
* Copyright (c) 2010 Cisco Systems, Inc. All rights reserved.
1414
* Copyright (c) 2012-2013 Sandia National Laboratories. All rights reserved.
@@ -512,4 +512,12 @@ static inline void ompi_osc_rdma_aggregation_return (ompi_osc_rdma_aggregation_t
512512
opal_free_list_return(&mca_osc_rdma_component.aggregate, (opal_free_list_item_t *) aggregation);
513513
}
514514

515+
516+
__opal_attribute_always_inline__
517+
static bool ompi_osc_rdma_oor (int rc)
518+
{
519+
/* check for OPAL_SUCCESS first to short-circuit the statement in the common case */
520+
return (OPAL_SUCCESS != rc && (OPAL_ERR_OUT_OF_RESOURCE == rc || OPAL_ERR_TEMP_OUT_OF_RESOURCE == rc));
521+
}
522+
515523
#endif /* OMPI_OSC_RDMA_H */

0 commit comments

Comments
 (0)