Skip to content
This repository was archived by the owner on Sep 30, 2022. It is now read-only.

Commit 640bcf6

Browse files
author
rhc54
authored
Merge pull request #1262 from ggouaillardet/topic/v1.10/coll_fixes
Topic/v1.10/coll fixes
2 parents d99bdc6 + a6cf208 commit 640bcf6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+1080
-785
lines changed
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
/* This comment applies to all collectives (including the basic
2+
* module) where we allocate a temporary buffer. For the next few
3+
* lines of code, it's tremendously complicated how we decided that
4+
* this was the Right Thing to do. Sit back and enjoy. And prepare
5+
* to have your mind warped. :-)
6+
*
7+
* Recall some definitions (I always get these backwards, so I'm
8+
* going to put them here):
9+
*
10+
* extent: the length from the lower bound to the upper bound -- may
11+
* be considerably larger than the buffer required to hold the data
12+
* (or smaller! But it's easiest to think about when it's larger).
13+
*
14+
* true extent: the exact number of bytes required to hold the data
15+
* in the layout pattern in the datatype.
16+
*
17+
* For example, consider the following buffer (just talking about
18+
* true_lb, extent, and true extent -- extrapolate for true_ub:
19+
*
20+
* A B C
21+
* --------------------------------------------------------
22+
* | | |
23+
* --------------------------------------------------------
24+
*
25+
* There are multiple cases:
26+
*
27+
* 1. A is what we give to MPI_Send (and friends), and A is where
28+
* the data starts, and C is where the data ends. In this case:
29+
*
30+
* - extent: C-A
31+
* - true extent: C-A
32+
* - true_lb: 0
33+
*
34+
* A C
35+
* --------------------------------------------------------
36+
* | |
37+
* --------------------------------------------------------
38+
* <=======================extent=========================>
39+
* <======================true extent=====================>
40+
*
41+
* 2. A is what we give to MPI_Send (and friends), B is where the
42+
* data starts, and C is where the data ends. In this case:
43+
*
44+
* - extent: C-A
45+
* - true extent: C-B
46+
* - true_lb: positive
47+
*
48+
* A B C
49+
* --------------------------------------------------------
50+
* | | User buffer |
51+
* --------------------------------------------------------
52+
* <=======================extent=========================>
53+
* <===============true extent=============>
54+
*
55+
* 3. B is what we give to MPI_Send (and friends), A is where the
56+
* data starts, and C is where the data ends. In this case:
57+
*
58+
* - extent: C-A
59+
* - true extent: C-A
60+
* - true_lb: negative
61+
*
62+
* A B C
63+
* --------------------------------------------------------
64+
* | | User buffer |
65+
* --------------------------------------------------------
66+
* <=======================extent=========================>
67+
* <======================true extent=====================>
68+
*
69+
* 4. MPI_BOTTOM is what we give to MPI_Send (and friends), B is
70+
* where the data starts, and C is where the data ends. In this
71+
* case:
72+
*
73+
* - extent: C-MPI_BOTTOM
74+
* - true extent: C-B
75+
* - true_lb: [potentially very large] positive
76+
*
77+
* MPI_BOTTOM B C
78+
* --------------------------------------------------------
79+
* | | User buffer |
80+
* --------------------------------------------------------
81+
* <=======================extent=========================>
82+
* <===============true extent=============>
83+
*
84+
* So in all cases, for a temporary buffer, all we need to malloc()
85+
* is a buffer of size true_extent. We therefore need to know two
86+
* pointer values: what value to give to MPI_Send (and friends) and
87+
* what value to give to free(), because they might not be the same.
88+
*
89+
* Clearly, what we give to free() is exactly what was returned from
90+
* malloc(). That part is easy. :-)
91+
*
92+
* What we give to MPI_Send (and friends) is a bit more complicated.
93+
* Let's take the 4 cases from above:
94+
*
95+
* 1. If A is what we give to MPI_Send and A is where the data
96+
* starts, then clearly we give to MPI_Send what we got back from
97+
* malloc().
98+
*
99+
* 2. If B is what we get back from malloc, but we give A to
100+
* MPI_Send, then the buffer range [A,B) represents "dead space"
101+
* -- no data will be put there. So it's safe to give B-true_lb to
102+
* MPI_Send. More specifically, the true_lb is positive, so B-true_lb is
103+
* actually A.
104+
*
105+
* 3. If A is what we get back from malloc, and B is what we give to
106+
* MPI_Send, then the true_lb is negative, so A-true_lb will actually equal
107+
* B.
108+
*
109+
* 4. Although this seems like the weirdest case, it's actually
110+
* quite similar to case #2 -- the pointer we give to MPI_Send is
111+
* smaller than the pointer we got back from malloc().
112+
*
113+
* Hence, in all cases, we give (return_from_malloc - true_lb) to MPI_Send.
114+
*
115+
* This works fine and dandy if we only have (count==1), which we
116+
* rarely do. ;-) So we really need to allocate (true_extent +
117+
* ((count - 1) * extent)) to get enough space for the rest. This may
118+
* be more than is necessary, but it's ok.
119+
*
120+
* Simple, no? :-)
121+
*
122+
*/
123+
124+

ompi/mca/coll/basic/coll_basic_allgather.c

Lines changed: 18 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
* University of Stuttgart. All rights reserved.
1010
* Copyright (c) 2004-2005 The Regents of the University of California.
1111
* All rights reserved.
12-
* Copyright (c) 2014 Research Organization for Information Science
12+
* Copyright (c) 2014-2016 Research Organization for Information Science
1313
* and Technology (RIST). All rights reserved.
1414
* $COPYRIGHT$
1515
*
@@ -91,9 +91,10 @@ mca_coll_basic_allgather_inter(void *sbuf, int scount,
9191
struct ompi_communicator_t *comm,
9292
mca_coll_base_module_t *module)
9393
{
94-
int rank, root = 0, size, rsize, err, i;
95-
char *tmpbuf = NULL, *ptmp;
96-
ptrdiff_t rlb, slb, rextent, sextent, incr;
94+
int rank, root = 0, size, rsize, i, err;
95+
char *tmpbuf_free = NULL, *tmpbuf, *ptmp;
96+
ptrdiff_t rlb, rextent, incr;
97+
ptrdiff_t gap, span;
9798
ompi_request_t *req;
9899
mca_coll_basic_module_t *basic_module = (mca_coll_basic_module_t*) module;
99100
ompi_request_t **reqs = basic_module->mccb_reqs;
@@ -116,17 +117,13 @@ mca_coll_basic_allgather_inter(void *sbuf, int scount,
116117
MCA_COLL_BASE_TAG_ALLGATHER,
117118
MCA_PML_BASE_SEND_STANDARD, comm));
118119
if (OMPI_SUCCESS != err) {
119-
return err;
120+
goto exit;
120121
}
121122
} else {
122123
/* receive a msg. from all other procs. */
123124
err = ompi_datatype_get_extent(rdtype, &rlb, &rextent);
124125
if (OMPI_SUCCESS != err) {
125-
return err;
126-
}
127-
err = ompi_datatype_get_extent(sdtype, &slb, &sextent);
128-
if (OMPI_SUCCESS != err) {
129-
return err;
126+
goto exit;
130127
}
131128

132129
/* Do a send-recv between the two root procs. to avoid deadlock */
@@ -135,14 +132,14 @@ mca_coll_basic_allgather_inter(void *sbuf, int scount,
135132
MCA_PML_BASE_SEND_STANDARD,
136133
comm, &reqs[rsize]));
137134
if (OMPI_SUCCESS != err) {
138-
return err;
135+
goto exit;
139136
}
140137

141138
err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, 0,
142139
MCA_COLL_BASE_TAG_ALLGATHER, comm,
143140
&reqs[0]));
144141
if (OMPI_SUCCESS != err) {
145-
return err;
142+
goto exit;
146143
}
147144

148145
incr = rextent * rcount;
@@ -152,20 +149,21 @@ mca_coll_basic_allgather_inter(void *sbuf, int scount,
152149
MCA_COLL_BASE_TAG_ALLGATHER,
153150
comm, &reqs[i]));
154151
if (MPI_SUCCESS != err) {
155-
return err;
152+
goto exit;
156153
}
157154
}
158155

159156
err = ompi_request_wait_all(rsize + 1, reqs, MPI_STATUSES_IGNORE);
160157
if (OMPI_SUCCESS != err) {
161-
return err;
158+
goto exit;
162159
}
163160

164161
/* Step 2: exchange the resuts between the root processes */
165-
tmpbuf = (char *) malloc(scount * size * sextent);
166-
if (NULL == tmpbuf) {
167-
return err;
168-
}
162+
span = opal_datatype_span(&sdtype->super, (int64_t)scount * (int64_t)size, &gap);
163+
tmpbuf_free = (char *) malloc(span);
164+
if (NULL == tmpbuf_free) { err = OMPI_ERR_OUT_OF_RESOURCE; goto exit; }
165+
166+
tmpbuf = tmpbuf_free - gap;
169167

170168
err = MCA_PML_CALL(isend(rbuf, rsize * rcount, rdtype, 0,
171169
MCA_COLL_BASE_TAG_ALLGATHER,
@@ -222,8 +220,8 @@ mca_coll_basic_allgather_inter(void *sbuf, int scount,
222220
}
223221

224222
exit:
225-
if (NULL != tmpbuf) {
226-
free(tmpbuf);
223+
if (NULL != tmpbuf_free) {
224+
free(tmpbuf_free);
227225
}
228226

229227
return err;

ompi/mca/coll/basic/coll_basic_allreduce.c

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -78,8 +78,8 @@ mca_coll_basic_allreduce_inter(void *sbuf, void *rbuf, int count,
7878
struct ompi_communicator_t *comm,
7979
mca_coll_base_module_t *module)
8080
{
81-
int err, i, rank, root = 0, rsize;
82-
ptrdiff_t lb, extent;
81+
int err, i, rank, root = 0, rsize, line;
82+
ptrdiff_t extent, dsize, gap;
8383
char *tmpbuf = NULL, *pml_buffer = NULL;
8484
ompi_request_t *req[2];
8585
mca_coll_basic_module_t *basic_module = (mca_coll_basic_module_t*) module;
@@ -98,16 +98,14 @@ mca_coll_basic_allreduce_inter(void *sbuf, void *rbuf, int count,
9898
* simultaniously. */
9999
/*****************************************************************/
100100
if (rank == root) {
101-
err = ompi_datatype_get_extent(dtype, &lb, &extent);
101+
err = ompi_datatype_type_extent(dtype, &extent);
102102
if (OMPI_SUCCESS != err) {
103103
return OMPI_ERROR;
104104
}
105-
106-
tmpbuf = (char *) malloc(count * extent);
107-
if (NULL == tmpbuf) {
108-
return OMPI_ERR_OUT_OF_RESOURCE;
109-
}
110-
pml_buffer = tmpbuf - lb;
105+
dsize = opal_datatype_span(&dtype->super, count, &gap);
106+
tmpbuf = (char *) malloc(dsize);
107+
if (NULL == tmpbuf) { err = OMPI_ERR_OUT_OF_RESOURCE; line = __LINE__; goto exit; }
108+
pml_buffer = tmpbuf - gap;
111109

112110
/* Do a send-recv between the two root procs. to avoid deadlock */
113111
err = MCA_PML_CALL(irecv(rbuf, count, dtype, 0,

ompi/mca/coll/basic/coll_basic_alltoallw.c

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
* Copyright (c) 2013 Los Alamos National Security, LLC. All rights
1515
* reserved.
1616
* Copyright (c) 2013 FUJITSU LIMITED. All rights reserved.
17-
* Copyright (c) 2014 Research Organization for Information Science
17+
* Copyright (c) 2014-2016 Research Organization for Information Science
1818
* and Technology (RIST). All rights reserved.
1919
* Copyright (c) 2014 Cisco Systems, Inc. All rights reserved.
2020
* $COPYRIGHT$
@@ -42,10 +42,10 @@ mca_coll_basic_alltoallw_intra_inplace(void *rbuf, int *rcounts, const int *rdis
4242
mca_coll_base_module_t *module)
4343
{
4444
mca_coll_basic_module_t *basic_module = (mca_coll_basic_module_t*) module;
45-
int i, j, size, rank, err=MPI_SUCCESS, max_size;
46-
MPI_Request *preq;
47-
char *tmp_buffer;
48-
ptrdiff_t ext;
45+
int i, j, size, rank, err = MPI_SUCCESS, max_size;
46+
MPI_Request *preq, *reqs = NULL;
47+
char *tmp_buffer, *save_buffer = NULL;
48+
ptrdiff_t ext, gap;
4949

5050
/* Initialize. */
5151

@@ -59,17 +59,17 @@ mca_coll_basic_alltoallw_intra_inplace(void *rbuf, int *rcounts, const int *rdis
5959

6060
/* Find the largest receive amount */
6161
for (i = 0, max_size = 0 ; i < size ; ++i) {
62-
ompi_datatype_type_extent (rdtypes[i], &ext);
63-
ext *= rcounts[i];
62+
ext = opal_datatype_span(&rdtypes[i]->super, rcounts[i], &gap);
6463

6564
max_size = ext > max_size ? ext : max_size;
6665
}
6766

6867
/* Allocate a temporary buffer */
69-
tmp_buffer = calloc (max_size, 1);
68+
tmp_buffer = save_buffer = calloc (max_size, 1);
7069
if (NULL == tmp_buffer) {
7170
return OMPI_ERR_OUT_OF_RESOURCE;
7271
}
72+
tmp_buffer -= gap;
7373

7474
/* in-place alltoallw slow algorithm (but works) */
7575
for (i = 0 ; i < size ; ++i) {
@@ -129,7 +129,12 @@ mca_coll_basic_alltoallw_intra_inplace(void *rbuf, int *rcounts, const int *rdis
129129

130130
error_hndl:
131131
/* Free the temporary buffer */
132-
free (tmp_buffer);
132+
free (save_buffer);
133+
if( MPI_SUCCESS != err ) { /* Free the requests. */
134+
if( NULL != reqs ) {
135+
mca_coll_basic_free_reqs(basic_module->mccb_reqs, 2);
136+
}
137+
}
133138

134139
/* All done */
135140

ompi/mca/coll/basic/coll_basic_exscan.c

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ mca_coll_basic_exscan_intra(void *sbuf, void *rbuf, int count,
4747
mca_coll_base_module_t *module)
4848
{
4949
int size, rank, err;
50-
ptrdiff_t true_lb, true_extent, lb, extent;
50+
ptrdiff_t dsize, gap;
5151
char *free_buffer = NULL;
5252
char *reduce_buffer = NULL;
5353

@@ -81,15 +81,14 @@ mca_coll_basic_exscan_intra(void *sbuf, void *rbuf, int count,
8181

8282
/* Get a temporary buffer to perform the reduction into. Rationale
8383
* for malloc'ing this size is provided in coll_basic_reduce.c. */
84-
ompi_datatype_get_extent(dtype, &lb, &extent);
85-
ompi_datatype_get_true_extent(dtype, &true_lb, &true_extent);
84+
dsize = opal_datatype_span(&dtype->super, count, &gap);
8685

87-
free_buffer = (char*)malloc(true_extent + (count - 1) * extent);
86+
free_buffer = (char*)malloc(dsize);
8887
if (NULL == free_buffer) {
8988
return OMPI_ERR_OUT_OF_RESOURCE;
9089
}
91-
reduce_buffer = free_buffer - lb;
92-
err = ompi_datatype_copy_content_same_ddt(dtype, count,
90+
reduce_buffer = free_buffer - gap;
91+
err = ompi_datatype_copy_content_same_ddt(dtype, count,
9392
reduce_buffer, (char*)sbuf);
9493

9594
/* Receive the reduced value from the prior rank */

0 commit comments

Comments
 (0)