Skip to content

Commit cf174c2

Browse files
authored
Merge pull request #31 from lrbison/alltoallv_validation
Add alltoallv_validation
2 parents 309a3dc + f628b8b commit cf174c2

File tree

9 files changed

+1746
-0
lines changed

9 files changed

+1746
-0
lines changed

alltoallv_validation/.gitignore

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
Makefile
2+
aclocal.m4
3+
autom4te.cache
4+
config
5+
config.log
6+
config.status
7+
configure
8+
src/Makefile
9+
src/stamp-h1
10+
src/test_config.h
11+
**.in
12+
**~
13+
**.o
14+
src/alltoallv_ddt
15+
src/sanity
16+
src/.deps

alltoallv_validation/Makefile.am

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# -*- makefile -*-
2+
#
3+
# Copyright (c) 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved.
4+
#
5+
# $HEADER$
6+
#
7+
8+
ACLOCAL_AMFLAGS = -I config
9+
10+
# The reporting subdir must be built before all others
11+
12+
SUBDIRS = src

alltoallv_validation/README.md

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
# Alltoallv Validation of complex datatypes
2+
3+
This test creates a variety of configurations for testing data validation of
4+
the alltoallv collective using non-standard datatypes.
5+
6+
The approach is the following sequence:
7+
- Create some datatype
8+
- Determine the packed size, and allocate both packed and unpacked buffers to
9+
hold the send data.
10+
- Fill the packed buffer with a test pattern, then sendrecv it to the unpacked
11+
send buffer by sending from a MPI_BYTES buffer to the test datatype.
12+
- Perform the alltoallv collective
13+
- Transfer the received data back into a packed format.
14+
- Verify the contents of the packed format using knowledge of what data was
15+
being sent.
16+
- Verify that no buffer under-runs or over-runs occured in the buffers by
17+
checking some guard bytes.
18+
19+
Validation is the only purpose of this test. It should not be used for
20+
performance timing, as many extra memory copies and assignments are performed.
21+
No timing is printed.
22+
23+
The code is written in C++ only to access a predictable random number generator.
24+
All MPI calls are done via C interface.
25+
26+
## Test Overview
27+
28+
Tests are broken into complexity levels.
29+
30+
### Level 1
31+
32+
Level 1 types are composed of basic MPI types like `MPI_CHAR`, `MPI_REAL`,
33+
`MPI_INT64_T` and so forth. The data types are not exhaustive, only 9 are used.
34+
Executing only the level 1 tests will perform only 9 tests: both sending and
35+
receiving the same datatype.
36+
37+
### Level 2
38+
39+
Level 2 types are collections of Level 1 types. There are 7 Level 1 types in
40+
various configurations including:
41+
42+
- increasing the count, using the same type
43+
- contiguous and non-contiguous vectors
44+
- contiguous and non-contiguous vectors with negative stride
45+
46+
Level 2 tests all exchange compatible types, therefore all combinations of the
47+
above are used as send and receive types. With 7 types, Level 2 executes 49
48+
tests.
49+
50+
All level 2 tests are performed with the same basic datatype (MPI_INT).
51+
52+
Note that each "one" of these types is a vector, so setting `--item-count` to 10
53+
really means you are sending 10 vectors each with some number (happens to be 12)
54+
of basic types.
55+
56+
### Level 3
57+
58+
Level 3 tests collections of two different Level 1 types. We test MPI_INT and
59+
MPI_CHAR together. These tests create the type using MPI_Type_create_struct in
60+
various orders and configurations including:
61+
- contiguous and non-contiguous in-order elements
62+
- contiguous and non-contiguous reverse-order elements
63+
- Negative lower bounds
64+
- Padding in extents
65+
66+
There are 6 Level 3 tests, and like Level 2 tests they are all compatible types,
67+
so 36 total tests are executed.
68+
69+
### Level 4
70+
71+
There are two hand-made Level 4 tests. These are composed of several layers of
72+
level 2 and level 3 types in combination with each other to make collections of
73+
different kinds of types in vectors with various paddings and spacings. Best to
74+
read the code for these. They are not cross-compatible, so only 2 tests are
75+
executed.
76+
77+
Again note that these constructed tytes are somewhat large themselves (hundreds
78+
of bytes), so setting a high `--item-count` could result in longer runtimes.
79+
80+
### Total
81+
82+
As of the initial version of this program, there were 96 tests. The
83+
configuration where all ranks send and receive 1 count for only 1 iteration
84+
results in each rank sending and receiving approximately 2.7KBytes of data per
85+
rank during the full test battery.
86+
87+
However there is not so much data that the execution time is unreasonable. Test
88+
execution of 32 ranks on a single host using all default options takes less than
89+
5 seconds, and most ranks send about 630 KBytes.
90+
91+
# Compile
92+
```
93+
$ ./autogen.sh && ./configure && make
94+
95+
$ mpirun -n 13 src/alltoallv_ddt
96+
Rank 0 sent 254104 bytes, and received 265152 bytes.
97+
[OK] All tests passsed. Executed 96 tests with seed 0 with 13 total ranks).
98+
99+
```
100+
101+
# Usage
102+
```
103+
Test alltoallv using various ddt's and validate results.
104+
This test uses pseudo-random sequences from C++'s mt19937 generator.
105+
The test (but not necessarily the implementation) is deterministic
106+
when the options and number of ranks remain the same.
107+
Options:
108+
[-s|--seed <seed>] Change the seed to shuffle which datapoints are exchanged
109+
[-c|--item-count <citems>] Each rank will create <citems> to consider for exchange (default=10).
110+
[-i|--prob-item <prob>] Probability that rank r will send item k to rank q. (0.50)
111+
[-r|--prob-rank <prob>] Probability that rank r will send anything to rank q. (0.90)
112+
[-w|--prob-world <prob>] Probability that rank r will do anything at all. (0.95)
113+
[-t|--iters <iters>] The number of iterations to test each dtype.
114+
[-o|--only <high,low>] Only execute a specific test signified by the pair high,low.
115+
[-v|--verbose=level ] Set verbosity during execution (0=quiet (default). 1,2,3: loud).
116+
[-h|--help] Print this help and exit.
117+
[-z|--verbose-rank] Only the provided rank will print. Default=0. ALL = -1.
118+
```
119+
120+
Some recommended test cases:
121+
```
122+
# no ranks exchange any data
123+
alltoallv_ddt -w 0
124+
125+
# same as alltoall: all ranks exchange same amount of data
126+
alltoallv_ddt -w 1 -r 1 -i 1
127+
128+
# perform a different test each time you run, or repeat the same test:
129+
alltoallv_ddt -s $RANDOM
130+
alltoallv_ddt -s 1234
131+
```
132+
133+
Note since alltoall is a hefty collective, and we go to the trouble of
134+
validating every single message, caution should be used when exercising large
135+
numbers of ranks, large numbers of counts, or large numbers of iterations.
136+
137+
# Debugging
138+
139+
In the case of data validation failure: re-run the test harness on only the
140+
failing test (using `--only` and increase the verbosity up to 3. You may also
141+
need to set the verbosity of a particular rank with `-z`).
142+
143+
For example at verbosity 0, we only know that validation failed on rank 1, but
144+
not which test.
145+
146+
```
147+
mpirun -n 2 src/alltoallv_ddt -z 1 -v 3 -w 1
148+
Rank 1 failed to validate data!
149+
ERROR: Validation failed on rank 1!
150+
```
151+
152+
Setting the rank-specific verbosity to that rank (or to all ranks) and the
153+
verbosity up to 2 reveals some additional details including which test, and what
154+
part of the buffer:
155+
156+
```
157+
$ mpirun -n 2 src/alltoallv_ddt -z 1 -v 3 -w 1
158+
--- Starting test 2,1. Crossing 0 x 0
159+
Rank 1 failed to validate data!
160+
0010: 42-42 99-43 44-44 45-45 46-46 47-47 48-48 49-49 50-50 51-51 -- CORRUPT
161+
0020: 52-52 53-53 54-54 55-55 56-56 57-57 58-58 59-59 60-60 61-61 -- VALID
162+
ERROR: Validation failed on rank 1!
163+
```
164+
165+
Buffer addresses are provided. These are base-10 addresses relative to the
166+
packed representation of the datatype. The first number is what was received,
167+
the second number is what was expected. To avoid too much print-outs,
168+
subsequent CORRUPT lines are skipped and only the next valid line is printed, so
169+
output will allways appear to alternate between CORRUPT and VALID.

alltoallv_validation/autogen.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
autoreconf -ivf

alltoallv_validation/configure.ac

Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# -*- shell-script -*-
2+
#
3+
# Copyright (c) 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved.
4+
#
5+
# $HEADER$
6+
#
7+
# modified from ompi-tests/cxx-test-suite's autoconf by Luke Robison 2024.
8+
9+
#
10+
# Init autoconf
11+
#
12+
13+
AC_PREREQ([2.63])
14+
AC_INIT([alltoallv_validation], [1.0], [[email protected]], [openmpi-cxx-test-suite])
15+
AC_CONFIG_AUX_DIR([config])
16+
AC_CONFIG_MACRO_DIR([config])
17+
18+
#
19+
# Get the version of ompitest that we are configuring
20+
#
21+
22+
echo "Configuring Open MPI C++ test suite"
23+
24+
AM_INIT_AUTOMAKE([1.10 foreign dist-bzip2 no-define])
25+
26+
# If Automake supports silent rules, enable them.
27+
m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])])
28+
29+
# Setup the reporting/ompitest_config.h file
30+
31+
AH_TOP([/* -*- c -*-
32+
*
33+
* ompitest configuation header file.
34+
*
35+
* Function: - OS, CPU and compiler dependent configuration
36+
*/
37+
38+
#ifndef OMPITEST_CONFIG_H
39+
#define OMPITEST_CONFIG_H
40+
])
41+
AH_BOTTOM([#endif /* OMPITEST_CONFIG_H */])
42+
43+
#
44+
# This is useful later
45+
#
46+
47+
AC_CANONICAL_HOST
48+
AC_DEFINE_UNQUOTED(OMPITEST_ARCH, "$host",
49+
[Architecture that we are compiled for])
50+
51+
#
52+
# We always want debugging flags
53+
#
54+
CXXFLAGS="$CXXFLAGS -g"
55+
CFLAGS="$CFLAGS -g"
56+
57+
#
58+
# Get various programs
59+
# C compiler - bias towards mpicc
60+
#
61+
62+
if test "$CC" != ""; then
63+
BASE="`basename $CC`"
64+
else
65+
BASE=
66+
fi
67+
if test "$BASE" = "" -o "$BASE" = "." -o "$BASE" = "cc" -o \
68+
"$BASE" = "gcc" -o "$BASE" = "xlc" -o \
69+
"$BASE" = "icc" -o "$BASE" = "pgcc"; then
70+
AC_CHECK_PROG(HAVE_MPICC, mpicc, yes, no)
71+
if test "$HAVE_MPICC" = "yes"; then
72+
CC=mpicc
73+
export CC
74+
fi
75+
fi
76+
77+
CFLAGS_save="$CFLAGS"
78+
AC_PROG_CC
79+
CFLAGS="$CFLAGS_save"
80+
81+
#
82+
# Get various programs
83+
# C++ compiler - bias towards mpic++, with fallback to mpiCC
84+
#
85+
86+
if test "$CXX" != ""; then
87+
BASE="`basename $CXX`"
88+
else
89+
BASE=
90+
fi
91+
if test "$BASE" = "" -o "$BASE" = "." -o "$BASE" = "CC" -o \
92+
"$BASE" = "g++" -o "$BASE" = "c++" -o "$BASE" = "xlC" -o \
93+
"$BASE" = "icpc" -o "$BASE" = "pgCC"; then
94+
AC_CHECK_PROG(HAVE_MPICPP, mpic++, yes, no)
95+
if test "$HAVE_MPICPP" = "yes"; then
96+
CXX=mpic++
97+
export CXX
98+
else
99+
AC_CHECK_PROG(HAVE_MPICXX, mpiCC, yes, no)
100+
if test "$HAVE_MPICXX" = "yes"; then
101+
CXX=mpiCC
102+
export CXX
103+
fi
104+
fi
105+
fi
106+
107+
CXXFLAGS_save="$CXXFLAGS"
108+
AC_PROG_CXX
109+
CXXFLAGS="$CXXFLAGS_save"
110+
111+
#
112+
# Find ranlib
113+
#
114+
115+
AC_PROG_RANLIB
116+
117+
#
118+
# Ensure that we can compile and link an MPI program
119+
#
120+
121+
# See if we can find <mpi.h>
122+
AC_CHECK_HEADER([mpi.h], [],
123+
[AC_MSG_WARN([Cannot find mpi.h])
124+
AC_MSG_ERROR([Cannot continue])
125+
])
126+
127+
#
128+
# See if we can find the symbol MPI_Init. Be a little smart and use
129+
# AC CHECK_FUNC if we're using mpicc, or AC CHECK_LIB otherwise.
130+
# Aborts if MPI_Init is not found.
131+
#
132+
base=`basename $CC`
133+
bad=0
134+
AS_IF([test "$base" = "mpicc"],
135+
[AC_CHECK_FUNC([MPI_Init], [], [bad=1])],
136+
[AC_CHECK_LIB([mpi], [MPI_Init], [], [bad=1])])
137+
138+
AS_IF([test "$bad" = "1"],
139+
[AC_MSG_WARN([Cannot link against MPI_Init])
140+
AC_MSG_ERROR([Cannot continue])
141+
])
142+
143+
#
144+
# Party on
145+
#
146+
147+
AC_CONFIG_HEADERS([src/test_config.h])
148+
AC_CONFIG_FILES([
149+
Makefile
150+
src/Makefile
151+
])
152+
AC_OUTPUT

alltoallv_validation/src/Makefile.am

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# -*- makefile -*-
2+
#
3+
4+
bin_PROGRAMS = \
5+
alltoallv_ddt \
6+
sanity
7+
8+
alltoallv_ddt_SOURCES = \
9+
$(common_sources) \
10+
alltoallv_ddt.cpp
11+
12+
sanity_SOURCES = \
13+
$(common_sources) \
14+
sanity.cpp
15+
16+
common_sources = typemap.c

0 commit comments

Comments
 (0)