Skip to content

Commit ca59d47

Browse files
committed
-mca report_bindings_full on whole machine, plus numa markers
This PR started out as a change to the existing --report-bindings feature, but there's no readily-available topology loaded with WHOLE_SYSTEM at the mpirun level where that output is printed. It seems excessive to add that much extra data and data collection into mpirun for this feature, so I just made an equivalent feature that operates at the rank level where the topology is readily available. -mca report_bindings_full 1 I combined a few related features here: 1. --report-bindings on whole machine 2. mark unallowed (eg cgroup) parts of the whole machine with ~ 3. numa markers 4. allow hwloc tree to not have sockets/cores Parts 1,2,3 don't affect the original --report-bindings feature. Only the whole-machine output has a concept of unallowed PUs, and also the numa markers are only enabled for the whole-machine output. 1. whole machine: The topology sent to the pretty-print functions usually doesn't have the WHOLE_SYSTEM flag, and in general we don't want WHOLE_SYSTEM. But for pretty-printing I think it makes more sense, so the new option passes a WHOLE_SYSTEM topology to the pretty-print function. Examples of what pretty-printing looks like with/without whole system: Suppose the machine is [..../..../..../....][..../..../..../....] 0 4 8 12 16 20 24 28 and we run with cgset -r cpuset.cpus=24,25,28,29 mycgroup1 cgset -r cpuset.cpus=26,27,30,31 mycgroup2 to leave only these hardware threads active: mycgroup1: [~~~~/~~~~/~~~~/~~~~][~~~~/~~~~/..~~/..~~] mycgroup2: [~~~~/~~~~/~~~~/~~~~][~~~~/~~~~/~~../~~..] Without whole-system the printout (for both of the above) would be (-np 2) MCW rank 0 bound to socket 1[core 0[hwt 0-1]]: [][BB/..] MCW rank 1 bound to socket 1[core 1[hwt 0-1]]: [][../BB] With whole-system the output is this, which I think is more informative mycgroup1 (-np 2): MCW rank 0 bound to socket 1[core 6[hwt 0-1]]: [~~~~/~~~~/~~~~/~~~~][~~~~/~~~~/BB~~/..~~] MCW rank 1 bound to socket 1[core 7[hwt 0-1]]: [~~~~/~~~~/~~~~/~~~~][~~~~/~~~~/..~~/BB~~] mycgroup2 (-np 2): MCW rank 0 bound to socket 1[core 6[hwt 2-3]]: [~~~~/~~~~/~~~~/~~~~][~~~~/~~~~/~~BB/~~..] MCW rank 1 bound to socket 1[core 7[hwt 2-3]]: [~~~~/~~~~/~~~~/~~~~][~~~~/~~~~/~~../~~BB] 2. mark unallowed (~) When using the whole-machine option there's a bitmask available to identify the allowed PUs, eg omitting PUs not in our cgroup. To distinguish those PUs I'm using "~" 3. numa markers (<>) I like having numa markers as well as the existing separators between sockets and cores. They're a little harder since the numas are more fluid, eg sockets always contain cores not vice versa, so you can hard code a loop over sockets follwed by a loop over cores. But numas might be be above or below sockets in the tree. This code identifies which level should be considered the child of the numas, and has each of the hard coded loops capable of adding numa markers. Currently I don't have any tunable to turn off the numa markers. A lot of machines have fairly simple numa output where each socket contains one numa, and that ends up looking like this: [<..../..../..../....>][<..../..../..../....>] If others feel that's too cluttered I'm okay with having some tunable so people have to ask for numa markers. 4. allow hwloc tree to not have sockets/cores I may be behind the times on hwloc development, but as far as I know hwloc trees aren't guaranteed to have sockets and cores, just a MACHINE at the top and PU at the bottom. So I added a little code to the loops so it would still print the PUs on a hypothetical machine that lacked any structuring of the PUs into cores/sockets. Signed-off-by: Mark Allen <[email protected]>
1 parent 7c3aeb3 commit ca59d47

File tree

7 files changed

+555
-11
lines changed

7 files changed

+555
-11
lines changed
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
#
2+
# Copyright (c) 2019 IBM Corporation. All rights reserved.
3+
# $COPYRIGHT$
4+
#
5+
# Additional copyrights may follow
6+
#
7+
# $HEADER$
8+
#
9+
10+
sources = \
11+
hook_report_bindings_full.h \
12+
hook_report_bindings_full_component.c \
13+
hook_report_bindings_full_fns.c
14+
15+
# This component will only ever be built statically -- never as a DSO.
16+
17+
noinst_LTLIBRARIES = libmca_hook_report_bindings_full.la
18+
19+
libmca_hook_report_bindings_full_la_SOURCES = $(sources)
20+
libmca_hook_report_bindings_full_la_LDFLAGS = -module -avoid-version
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
#
2+
# Copyright (c) 2019 IBM Corporation. All rights reserved.
3+
#
4+
# $COPYRIGHT$
5+
#
6+
# Additional copyrights may follow
7+
#
8+
# $HEADER$
9+
#
10+
11+
# Make this a static component
12+
AC_DEFUN([MCA_ompi_hook_report_bindings_full_COMPILE_MODE], [
13+
AC_MSG_CHECKING([for MCA component $2:$3 compile mode])
14+
$4="static"
15+
AC_MSG_RESULT([$$4])
16+
])
17+
18+
# MCA_hook_report_bindings_full_CONFIG([action-if-can-compile],
19+
# [action-if-cant-compile])
20+
# ------------------------------------------------
21+
AC_DEFUN([MCA_ompi_hook_report_bindings_full_CONFIG],[
22+
AC_CONFIG_FILES([ompi/mca/hook/report_bindings_full/Makefile])
23+
24+
$1
25+
])
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
/*
2+
* Copyright (c) 2019 IBM Corporation. All rights reserved.
3+
* $COPYRIGHT$
4+
*
5+
* Additional copyrights may follow
6+
*
7+
* $HEADER$
8+
*/
9+
#ifndef MCA_HOOK_REPORT_BINDINGS_FULL_H
10+
#define MCA_HOOK_REPORT_BINDINGS_FULL_H
11+
12+
#include "ompi_config.h"
13+
14+
#include "ompi/constants.h"
15+
16+
#include "ompi/mca/hook/hook.h"
17+
#include "ompi/mca/hook/base/base.h"
18+
19+
BEGIN_C_DECLS
20+
21+
OMPI_MODULE_DECLSPEC extern const ompi_hook_base_component_1_0_0_t mca_hook_report_bindings_full_component;
22+
23+
extern int mca_hook_report_bindings_full_verbose;
24+
extern int mca_hook_report_bindings_full_output;
25+
extern bool hook_report_bindings_full_enable_mpi_init;
26+
27+
void ompi_hook_report_bindings_full_mpi_init_bottom(int argc, char **argv, int requested, int *provided);
28+
29+
END_C_DECLS
30+
31+
#endif /* MCA_HOOK_REPORT_BINDINGS_FULL_H */
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
/*
2+
* Copyright (c) 2019 IBM Corporation. All rights reserved.
3+
* $COPYRIGHT$
4+
*
5+
* Additional copyrights may follow
6+
*
7+
* $HEADER$
8+
*/
9+
10+
#include "ompi_config.h"
11+
12+
#include "hook_report_bindings_full.h"
13+
14+
static int ompi_hook_report_bindings_full_component_open(void);
15+
static int ompi_hook_report_bindings_full_component_close(void);
16+
static int ompi_hook_report_bindings_full_component_register(void);
17+
18+
/*
19+
* Public string showing the component version number
20+
*/
21+
const char *mca_hook_report_bindings_full_component_version_string =
22+
"Open MPI 'report_bindings_full' hook MCA component version " OMPI_VERSION;
23+
24+
/*
25+
* Instantiate the public struct with all of our public information
26+
* and pointers to our public functions in it
27+
*/
28+
const ompi_hook_base_component_1_0_0_t mca_hook_report_bindings_full_component = {
29+
30+
/* First, the mca_component_t struct containing meta information
31+
* about the component itself */
32+
.hookm_version = {
33+
OMPI_HOOK_BASE_VERSION_1_0_0,
34+
35+
/* Component name and version */
36+
.mca_component_name = "report_bindings_full",
37+
MCA_BASE_MAKE_VERSION(component, OMPI_MAJOR_VERSION, OMPI_MINOR_VERSION,
38+
OMPI_RELEASE_VERSION),
39+
40+
/* Component open and close functions */
41+
.mca_open_component = ompi_hook_report_bindings_full_component_open,
42+
.mca_close_component = ompi_hook_report_bindings_full_component_close,
43+
.mca_register_component_params = ompi_hook_report_bindings_full_component_register,
44+
45+
// Force this component to always be considered - component must be static
46+
//.mca_component_flags = MCA_BASE_COMPONENT_FLAG_ALWAYS_CONSIDER,
47+
},
48+
.hookm_data = {
49+
/* The component is checkpoint ready */
50+
MCA_BASE_METADATA_PARAM_CHECKPOINT
51+
},
52+
53+
/* Component functions */
54+
.hookm_mpi_initialized_top = NULL,
55+
.hookm_mpi_initialized_bottom = NULL,
56+
57+
.hookm_mpi_finalized_top = NULL,
58+
.hookm_mpi_finalized_bottom = NULL,
59+
60+
.hookm_mpi_init_top = NULL,
61+
.hookm_mpi_init_top_post_opal = NULL,
62+
.hookm_mpi_init_bottom = ompi_hook_report_bindings_full_mpi_init_bottom,
63+
.hookm_mpi_init_error = NULL,
64+
65+
.hookm_mpi_finalize_top = NULL,
66+
.hookm_mpi_finalize_bottom = NULL,
67+
};
68+
69+
int mca_hook_report_bindings_full_verbose = 0;
70+
int mca_hook_report_bindings_full_output = -1;
71+
bool hook_report_bindings_full_enable_mpi_init = false;
72+
bool hook_report_bindings_full = false;
73+
74+
static int ompi_hook_report_bindings_full_component_open(void)
75+
{
76+
// Nothing to do
77+
return OMPI_SUCCESS;
78+
}
79+
80+
static int ompi_hook_report_bindings_full_component_close(void)
81+
{
82+
// Nothing to do
83+
return OMPI_SUCCESS;
84+
}
85+
86+
static int ompi_hook_report_bindings_full_component_register(void)
87+
{
88+
89+
/*
90+
* Component verbosity level
91+
*/
92+
// Inherit the verbosity of the base framework, but also allow this to be overridden
93+
if( ompi_hook_base_framework.framework_verbose > MCA_BASE_VERBOSE_NONE ) {
94+
mca_hook_report_bindings_full_verbose = ompi_hook_base_framework.framework_verbose;
95+
}
96+
else {
97+
mca_hook_report_bindings_full_verbose = MCA_BASE_VERBOSE_NONE;
98+
}
99+
(void) mca_base_component_var_register(&mca_hook_report_bindings_full_component.hookm_version, "verbose",
100+
NULL,
101+
MCA_BASE_VAR_TYPE_INT, NULL,
102+
0, 0,
103+
OPAL_INFO_LVL_9,
104+
MCA_BASE_VAR_SCOPE_READONLY,
105+
&mca_hook_report_bindings_full_verbose);
106+
107+
mca_hook_report_bindings_full_output = opal_output_open(NULL);
108+
opal_output_set_verbosity(mca_hook_report_bindings_full_output, mca_hook_report_bindings_full_verbose);
109+
110+
/*
111+
* If the component is active for mpi_init
112+
*/
113+
hook_report_bindings_full_enable_mpi_init = false;
114+
(void) mca_base_component_var_register(&mca_hook_report_bindings_full_component.hookm_version, "enable_mpi_init",
115+
"Enable report_bindings_full behavior on mpi_init",
116+
MCA_BASE_VAR_TYPE_BOOL, NULL,
117+
0, 0,
118+
OPAL_INFO_LVL_3,
119+
MCA_BASE_VAR_SCOPE_READONLY,
120+
&hook_report_bindings_full_enable_mpi_init);
121+
122+
// User can set OMPI_MCA_report_bindings_full too
123+
int hook_report_bindings_full = false;
124+
(void) mca_base_var_register("ompi", NULL, NULL, "report_bindings_full",
125+
"Enable report_bindings_full behavior at mpi_init",
126+
MCA_BASE_VAR_TYPE_BOOL, NULL,
127+
0, 0,
128+
OPAL_INFO_LVL_3,
129+
MCA_BASE_VAR_SCOPE_READONLY,
130+
&hook_report_bindings_full);
131+
132+
if (hook_report_bindings_full) {
133+
hook_report_bindings_full_enable_mpi_init = true;
134+
}
135+
136+
return OMPI_SUCCESS;
137+
}
138+
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
/*
2+
* Copyright (c) 2019 IBM Corporation. All rights reserved.
3+
* $COPYRIGHT$
4+
*
5+
* Additional copyrights may follow
6+
*
7+
* $HEADER$
8+
*/
9+
10+
#include "ompi_config.h"
11+
12+
#include "hook_report_bindings_full.h"
13+
14+
#ifdef HAVE_DLFCN_H
15+
#include <dlfcn.h>
16+
#endif
17+
18+
#include "ompi/communicator/communicator.h"
19+
#include "ompi/mca/pml/pml.h"
20+
#include "ompi/mca/pml/base/base.h"
21+
//#include "opal/mca/dl/base/base.h" -- was going to use opal_dl_open etc
22+
#include "opal/mca/hwloc/base/base.h"
23+
24+
typedef void (*VoidFuncPtr)(void); // a function pointer to a function that takes no arguments and returns void.
25+
static void ompi_report_bindings();
26+
27+
void ompi_hook_report_bindings_full_mpi_init_bottom(int argc, char **argv, int requested, int *provided)
28+
{
29+
if( hook_report_bindings_full_enable_mpi_init ) {
30+
ompi_report_bindings();
31+
}
32+
}
33+
34+
// ----------------------------------------------------------------------------
35+
36+
static void
37+
ompi_report_bindings()
38+
{
39+
int myrank, nranks;
40+
int ret, i;
41+
char binding_string[1024];
42+
int len;
43+
int *lens, *disps;
44+
MPI_Comm active_comm = MPI_COMM_WORLD;
45+
char **all_binding_strings = NULL;
46+
47+
// early return in the case of spawn
48+
if (ompi_mpi_comm_parent != MPI_COMM_NULL) { return; }
49+
50+
// pick a comm, probably COMM_WORLD, only shrink it if it's quite large
51+
myrank = ompi_comm_rank(active_comm);
52+
nranks = ompi_comm_size(active_comm);
53+
if (nranks > 16*1024) {
54+
ret = ompi_comm_split(MPI_COMM_WORLD, (myrank<=16*1024)?1:MPI_UNDEFINED, 0, &active_comm, false);
55+
if (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) {
56+
return;
57+
}
58+
if (active_comm == MPI_COMM_NULL) { return; }
59+
myrank = ompi_comm_rank(active_comm);
60+
nranks = ompi_comm_size(active_comm);
61+
}
62+
63+
// produce binding string for the current rank
64+
hwloc_topology_t whole_system = NULL;
65+
hwloc_cpuset_t mycpus;
66+
hwloc_topology_init(&whole_system);
67+
hwloc_topology_set_flags(whole_system, HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM);
68+
hwloc_topology_load(whole_system);
69+
mycpus = hwloc_bitmap_alloc();
70+
hwloc_get_cpubind(whole_system, mycpus, HWLOC_CPUBIND_PROCESS);
71+
opal_hwloc_base_cset2mapstr_with_numa(binding_string, sizeof(binding_string), whole_system, mycpus);
72+
hwloc_bitmap_free(mycpus);
73+
hwloc_topology_destroy(whole_system);
74+
75+
// Collecting the data at rank 0 to print it in an ordered manner isn't totally
76+
// necessary, but makes the output a little nicer.
77+
len = strlen(binding_string) + 1;
78+
lens = malloc(nranks * sizeof(int));
79+
disps = malloc(nranks * sizeof(int));
80+
81+
active_comm->c_coll->coll_gather(
82+
&len, 1, MPI_INT,
83+
lens, 1, MPI_INT,
84+
0, active_comm, active_comm->c_coll->coll_gather_module);
85+
if (myrank == 0) {
86+
int tlen = 0;
87+
char *p;
88+
for (i=0; i<nranks; ++i) {
89+
disps[i] = tlen;
90+
tlen += lens[i];
91+
}
92+
all_binding_strings = malloc(nranks * sizeof(char*) + tlen);
93+
p = (char*) (all_binding_strings + nranks);
94+
for (i=0; i<nranks; ++i) {
95+
all_binding_strings[i] = p;
96+
p += lens[i];
97+
}
98+
active_comm->c_coll->coll_gatherv(
99+
binding_string, strlen(binding_string) + 1, MPI_CHAR,
100+
&all_binding_strings[0][0], lens, disps, MPI_CHAR,
101+
0, active_comm, active_comm->c_coll->coll_gatherv_module);
102+
} else {
103+
// matching above call from rank 0, just &all_binding_strings[0][0]
104+
// isn't legal here, and those args aren't used at non-root anyway
105+
active_comm->c_coll->coll_gatherv(
106+
binding_string, strlen(binding_string) + 1, MPI_CHAR,
107+
NULL, NULL, NULL, MPI_CHAR,
108+
0, active_comm, active_comm->c_coll->coll_gatherv_module);
109+
}
110+
111+
// print them from rank 0
112+
if (myrank == 0) {
113+
for (i=0; i<nranks; ++i) {
114+
printf("MCW %d: %s\n", i, all_binding_strings[i]);
115+
}
116+
}
117+
118+
if (active_comm != MPI_COMM_WORLD) { ompi_comm_free(&active_comm); }
119+
free(lens);
120+
free(disps);
121+
if (myrank == 0) { free(all_binding_strings); }
122+
}

opal/mca/hwloc/base/base.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
* Copyright (c) 2013-2017 Intel, Inc. All rights reserved.
44
* Copyright (c) 2017 Research Organization for Information Science
55
* and Technology (RIST). All rights reserved.
6+
* Copyright (c) 2019 IBM Corporation. All rights reserved.
67
* $COPYRIGHT$
78
*
89
* Additional copyrights may follow
@@ -279,6 +280,9 @@ OPAL_DECLSPEC int opal_hwloc_base_cset2str(char *str, int len,
279280
OPAL_DECLSPEC int opal_hwloc_base_cset2mapstr(char *str, int len,
280281
hwloc_topology_t topo,
281282
hwloc_cpuset_t cpuset);
283+
OPAL_DECLSPEC int opal_hwloc_base_cset2mapstr_with_numa(char *str, int len,
284+
hwloc_topology_t topo,
285+
hwloc_cpuset_t cpuset);
282286

283287
/* get the hwloc object that corresponds to the given processor id and type */
284288
OPAL_DECLSPEC hwloc_obj_t opal_hwloc_base_get_pu(hwloc_topology_t topo,

0 commit comments

Comments
 (0)