Skip to content

External32 fix #7919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

External32 fix #7919

wants to merge 3 commits into from

Conversation

markalle
Copy link
Contributor

@markalle markalle commented Jul 9, 2020


[commit 1]

adding .ompi_id field to opal datatypes for predefined types

external32 requires us to distinguish MPI_LONG from MPI_INT64_T
but the prior to this commit the two MPI types ompi_mpi_long and
ompi_mpi_int64_t are constructed with the same OPAL initialization
referring to the same OPAL_DATATYPE_INT8 for the opal .id field
and for the .common.type in the description entry.

This commit adds an .ompi_id field (only for the predefined
OMPI types) to both the opal datatype and to the description
at desc[].elem.common.ompi_id. OPAL only needs to know how
many there are when it walks the arrays, so the actual values
in .ompi_id remain opaque to it. The .ompi_id isn't required
to be set, as opal datatypes don't have to be associated with
an OMPI datatype.

The macros set .dt.super.ompi_id for the predefined types,
but there are extra steps to reset the .desc entry's .ompi_id
since the initial construction via the opal macros points
many .desc entries at the same place based on its underlying
opal type. Since the point is to let MPI_LONG's .desc diverge
from MPI_INT64_T's for example, the .desc entries had to be
reset.

Former datatypes constants with the same value like
OMPI_DATATYPE_MPI_LONG
OMPI_DATATYPE_MPI_UINT64_T
were separated of course, but cases like
OMPI_DATATYPE_MPI_C_DOUBLE_COMPLEX
OMPI_DATATYPE_MPI_CXX_DOUBLE_COMPLEX
are also being separated for a more implementation-specific reason
rather than it being logically necessary. The opal datatype
initialization sets the .desc field statically at shared offsets
into shared opal_datatype_predefined_elem_desc[]. So later when
I want to set an .ompi_id in .desc[] I have to privatize everything.
In this example "everything" means both
ompi_mpi_c_double_complex
ompi_mpi_cxx_dblcplex
but putting both of those into ompi_datatype_basicDatatypes[]
requires separating their constants.

Other changes required to support this:

.ptypes:

opal datatypes have a .ptypes field that previously was just an
array counting how many of each base OPAL type was contained, and
this was used to figure out the size of the packed type. But
to get the correct size we need ompi_ids in that array. So this
turns .ptypes into a struct containing the original array
(relocated to .ptypes.bdt_count[]) and a new .ptypes.ompi_dt_count[].
Since types don't have to have an .ompi_id, there's also a flag to
indicate whether .ompi_dt_count[] has valid data or not.

convertor.master.ompi_remote_sizes[]:

The converter had a .remote_sizes[] that's indexed by the base opal
types. But to specify sizes for external32 I added .ompi_remote_sizes[]
which is the same thing but indexed by ompi_id. Converters don't
have to specify a set of .ompi_remote_sizes[] since opal types don't
even have to have .ompi_ids, so for initialization the same call that
sets up the .ompi_remote_sizes[] also sets .ompi_remote_sizes_is_set.
The opal sizes are used if the ompi sizes/IDs aren't set.


[commit 2]

external32 fixes

  • let external32 use the non-optimized datatype description so
    it can know what it's supposed to be packing
  • redo the pFunctions arguments to take a vector
    (ddt_elem_desc_t) to be iterated over and a packed buf
    as its main arguments
  • VECTOR traversal macros to walk an .elem from a datatype's
    description

Here's a gist for a testcase pack2.c
https://gist.github.com/markalle/6d70cf8ca14761e94bce9d0240000a3e
% mpicc -o x pack2.c
% mpirun -np 1 ./x

Without this fix if sizeof(long) is 8 and a machine is little endian
an input buffer of (long)1 == [01 00 00 00 00 00 00 00] packs external32
as [00 00 00 00 00 00 00 01], using 8 bytes. But the external32
representation for MPI_LONG is supposed to be 4 bytes.

This commit uses .ompi_id and .ompi_remote_sizes[] if available
to figure out what the external32 size should be (_r_blength).

In the functions like copy_int8_heterogenous() that are made from
pack/unpack with pFunction[], vector walking macros are used to
walk the vector buffer and the byte swapping conversion function
has more arguments to know the size and endianness of both the
"from" and "to" element.

There's a chance that partial copies are being handled right in unpack.
The VECTOR walking macroes at least are built to take an input
"elements_already_done" to pick back up in the middle of a vector,
and are able to stop in the middle of walking the vector too. I didn't
change the loops in opal_datatype_pack.c though, and it's not currently
making any effort to allow partial walking of the vector described by an
_elem.

A few other misc changes that went into this:

  • changed the bit location for OPAL_ARCH_LONGIS64 because it wasn't living
    under the OPAL_ARCH_LONGISxx mask which appears to be how those masks
    are designed to work
  • the terminating condition in pack used to say "bConverted == local_size"
    but bConverted is counting bytes on the "to" side and for pack the "to"
    side is the remote_size.

@markalle
Copy link
Contributor Author

markalle commented Jul 9, 2020

bot:mellanox:retest
bot:ompi:retest

@markalle
Copy link
Contributor Author

markalle commented Jul 9, 2020

bot:mellanox:retest

@markalle markalle requested a review from bosilca July 9, 2020 17:51
@awlauria
Copy link
Contributor

awlauria commented Jul 9, 2020

It looks like some tests are failing (external32 in Mellanox ci) - might be legit. But I am not sure how to see the output.

@awlauria
Copy link
Contributor

awlauria commented Jul 9, 2020

retest

@awlauria
Copy link
Contributor

awlauria commented Jul 9, 2020

bot:ompi:retest

@awlauria
Copy link
Contributor

awlauria commented Jul 9, 2020

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@markalle
Copy link
Contributor Author

Thanks, testing shows I need to take into account more data from the description[] entries. I think I've got it figured out and I'll hope to finish it off tomorrow, we'll see.

@markalle
Copy link
Contributor Author

I added more to the pack.c test here
https://gist.github.com/markalle/6d70cf8ca14761e94bce9d0240000a3e

I believe this is now handling the full scope of what a pElem.elem from a type's description[] amounts to. For example
MPI_Type_vector(2, 2, 5, MPI_INT, &dt);
that looks like [INT,INT,<empty12b>,INT,INT] would have

    elem.count = 2
    elem.blocklen = 2
    elem.extent = 20

I'm now having pack/unpack loop over the elem.count and call one pFunction[] for each elem.blocklen base items (ints in this case) that need packed.

I don't believe the converter stack is saving enough info however for partial packs, eg 3 INTs in the above to be packed/unpacked in one call and then pick up where it left off.

@markalle markalle force-pushed the external32_fix branch 2 times, most recently from fc06a8d to 4f55fcf Compare July 15, 2020 18:10
@markalle
Copy link
Contributor Author

I think I'm happy with this PR now. I redid it so the pack/unpack level is iterating at the _elem level (which is what it was already doing) and the pFunction() deals with vectors (which it kind of was already doing but not really).

Separate from this PR Austen has been looking at pack performance vs mvapich and their bottom-level macros are also built around walking vectors so I think letting that be done at the next level down from pack/unpack pFunction() level makes sense.

#define OMPI_DATATYPE_INITIALIZER_LONG OPAL_DATATYPE_INITIALIZER_INT16
#define OMPI_DATATYPE_INITIALIZER_UNSIGNED_LONG OPAL_DATATYPE_INITIALIZER_UINT16
#endif
#define OMPI_DATATYPE_INITIALIZER_LONG OPAL_DATATYPE_INITIALIZER_LONG
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was not to pollute OPAL with types that don't have a well defined size, and delegate all that mess up in the MPI layer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree except I don't see how else to handle it. I don't like adding more OPAL_DATATYPE_* entries, but I didn't see any other mechanism that would allow pack/unpack to figure out that an MPI_LONG had been passed in and it therefore needed to pack it as 4 bytes (for external32). The type construction of MPI_LONG maps to OPAL_DATATYPE_INT8 and after that the opal pack/unpack routines have no way to know whether an incoming OPAL_DATATYPE_INT8 came from an MPI_LONG or an MPI_INT8_T.

I think that's the reason OPAL_DATATYPE_BOOL exists when it could just be OPAL_DATATYPE_INT1.

I'm considering the possibility that opal_datatype_add(,pdtAdd,,,) might be able to see pdtAdd == ompi_mpi_long and where it does

        pLast->elem.common.type      = pdtAdd->id;

add another field like

        pLast->elem.common.ext32sz = 4;

but I'm not confident whether that would catch everything. I don't think opal_datatype_add() is exclusively given base types. Is that a plausible avenue to pursue?

@@ -92,7 +92,19 @@ static dte_data_representation_t* ompi_datatype_2_dte_data_rep[OMPI_DATATYPE_MAX
#else
&DTE_ZERO,
#endif
&DTE_ZERO /*OPAL_DATATYPE_UNAVAILABLE 25 */

#if SIZEOF_LONG == 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use OMPI datatypes not OPAL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't seem to be how hcoll is using that array. A little further down in the same file it has:

    int opal_type_id = dtype->super.id;
    ...
    dte_data_rep =  *ompi_datatype_2_dte_data_rep[opal_type_id];

so it seems to be an array mapping each opal type to something hcoll recognizes. So if I'm adding another OPAL_DATATYPE_ entry I think I'd need to expand this array too.

ptrdiff_t *advance );
#define COPY_TO_VECTOR 1
#define COPY_FROM_VECTOR 2
typedef size_t (*conversion_fct_t)( opal_convertor_t* pConvertor, int mode,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass const dt_elem_desc_t* instead of this large number of parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea a lot, much cleaner. Thanks, will change.

size_t vector_len,
size_t nblocks,
size_t count_per_block,
size_t extent_between_blocks,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extent can be negative

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll update to dt_elem_desc_t* and that'll go away

size_t count_per_block,
size_t extent_between_blocks,
size_t vector_element_size,
int vector_is_bigendian,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why you need the 2 is_bigendian, as one of the sides of the pack/unpack operation is always local.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason for that change is packing an 8 byte MPI_LONG down to the 4 bytes specified by external32. Previously it was only packing 8 bytes into 8 bytes, and in that case you don't need to know which buffer is which, just knowing they differ in endianness is enough to swap the bytes around. But if you're packing 8 bytes down to 4 you need to know which 4 bytes to pick. For that you need to know if "from" is big endian and "to" is little endian or vice versa.

However now that I've switched the arguments specifying the two buffers to be

  1. a vector and
  2. a packed buf
    I agree that argument became redundant because the vector is always local with its endianness specified in opal_local_arch.

* The granularity for that is an "element" though, so you could
* stop after writing 3 INT4 in the above example, then pick
* back up at the next INT4, but you can't stop in the middle of
* an INT4.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True but only for the heterogeneous case. I have tried before to use a 0 trick, to simulate the conversion as a gather/scatter operation, and only move the bytes that are not zero. The code should be still here somewhere using the name partial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really put any work into that. I just saw the current code didn't seem to have the ability to stop in the middle of a base type so I didn't try to make my code have that ability either. It doesn't seem like it should be a hard problem if the interface was changed to use bytes for how much iterating had been done, but I'm probably missing something.

_source, *SPACE, _elem->extent,
*DESTINATION, *SPACE, _r_blength,
&advance );
// _count is the total number of basic datatypes in ELEM, not necessarily contiguous
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have gone the same way pack/unpack.h have been updated. There is a clear mechanism on how to deal with all the corner cases, and a nomenclature. This code uses a different one, making it more difficult to maintain in the long term.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about how I changed the buffer specifications from being "from" and "to" and switched to specifying them as "vector_buf" and "packed_buf" with a COPY_TO_VECTOR / COPY_FROM_VECTOR in "mode"?

The previous interface doesn't have enough arguments to describe the elem vector. So if you're going to maintain the style as "from" and "to" then both of them have to become dt_elem_desc_t* because sometimes "from" is the vector and sometimes "to" is the vector.

So you'd have pFunction take two dt_elem_desc_t* arguments for "from" and "to", but there would be an invisible requirement on the side that one of the two dt_elem_desc_t* descriptions had to be a degenerate case describing a packed buffer. I could do that, but I'm not sure it's better

opal/util/arch.c Outdated
@@ -67,7 +67,7 @@ static inline int32_t opal_arch_ldisintel( void )
j = j-1;
}
}
return (pui[j] & (1 << i) ? 1 : 0);
return ((pui[j] & (1u << i)) ? 1 : 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i being capped at 31 the 1u is not necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one concerns me: that's some kind of merge error, I didn't intentionally change that. After I make the changes to use dt_elem_desc_t I'll have to review my commit more carefully to make sure there's nothing else unintentional in there.

opal/util/arch.h Outdated
@@ -202,7 +203,8 @@

/* BYTE 2 */
#define OPAL_ARCH_LONGISxx 0x0000c000 /* mask for sizeof long */
#define OPAL_ARCH_LONGIS64 0x00001000
#define OPAL_ARCH_LONGIS32 0x00004000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentionned above, I would prefer to leave the messy types to the OMPI layer, and keep the OPAL datatype simpler ([u]int[8,16,32,64,128]_t).

@markalle
Copy link
Contributor Author

Repushed. Removed quite a few arguments to the pFunction called from pack/unpack, switched the arguments related to vector to use a dt_elem_desc_t*.

The biggest iffy part of this PR to me is @boslica's first comment about bloating the number of OPAL_DATATYPE_* entries with LONG as a new one. I think that is the model currently in use for other types whose external32 size might diverge from system sizeof() (ex BOOL). But I do agree I'd like something better.

Currently when pack/unpack start pulling up description[] entries with elem.common.type = OPAL_DATATYPE_INT8 any information about whether that came from MPI_LONG or MPI_INT8_T is gone.

Since I think BOOL is already bloating the OPAL_DATATYPE_* entries for the same reason I'd prefer the easy solution of kicking the can down the road and using the same solution here of adding an OPAL_DATATYPE_LONG.

But if you want a rework of that, I could try finding a place to add such info but would want to get the rough idea discussed before implementing it. I was considering adding another field elem.common.ompitype that would use the OMPI .id#. Or possibly go straight to the desired data and add an elem.common.ext32size. I'm guessing if all the type initialiization's set it and also opal_datatype_add(,pdtAdd,,,) took an extra arg and set it, that might get the new field set for every description[] entry in a datatype, and if I'm understanding the paths through all the locations that elem.common.type gets set, that would only ever try to set set to the .ompitype / .ext32size to incoming base OMPI types, not constructed ones. I'm seeing OMPI_DATATYPE_* has some squashing too and doesn't actually have an OMPI_DATATYPE for LONG, but bloating that to include LONG I assume would be less controversial. If the new field was .ompitype then then the .remote_sizes[] array wouldn't be indexed by .type anymore, it would be by .ompitype.

@markalle
Copy link
Contributor Author

@boslica I've added a second commit that allows the external32 sizes to be found without requiring the addition of OPAL_DATATYPE_LONG, but it involves a lot of changes.

It puts the extra data at the same level as elem.common.type with a modest set of changes to get that data created, and another group of changes addressing the .ptypes[] array that no longer had enough information to compute the remote_size.

If the design of the second commit is adequate then I can go back and remove everything about OPAL_DATATYPE_LONG

@ibm-ompi
Copy link

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/56533b3c5377ef616df0f9d5fe5afcb0

@jjhursey
Copy link
Member

Huh PGI stumbled on this - I'm not sure why the other builders didn't:

"ompi_datatype_external32.c", line 133: error: identifier "MPI_REAL16" is
          undefined
      opal_datatype_set_external32_size(&MPI_REAL16->super, 16);

@markalle
Copy link
Contributor Author

Thanks, I was about to add #if OMPI_HAVE_FORTRAN_REAL16 etc for the optional types, but then I decided to just skip over those since they already have an explicit size and there's only a need to specify the external32 size it's possible for it to diverge from sizeof() of the corresponding type.

@markalle
Copy link
Contributor Author

Okay, I've decided to reorganize the new fields one more time in a more generic way than adding .external32size fields. It will still involve some degree of OMPI information being fed into the OPAL datatypes, but I don't see any less intrusive way than that to handle it. So no need to re-review until my next repush.

@markalle
Copy link
Contributor Author

Repushed.

The first commit allows an opal_datatype_t to have an optional .ompi_id (both at the datatype and down lower in its .desc entry). That plus a convertor's .ompi_remote_sizes[] array lets us figure out remote sizes for types that were otherwise indistinguishable at the opal level (eg MPI_LONG vs MPI_INT64_T).

The need for .desc values to diverge inside ompi_mpi_long and ompi_mpi_int64_t for example where those .desc are statically initialized as the same offset into opal_datatype_predefined_elem_desc[] is one significant element of the change. Another notable change is the basic datatypes "#define OMPI_DATATYPE_MPI_INT 0x38" etc moving away from mapping different ompi datatype constants to the same value. This is mainly so the ompi_datatype_basicDatatypes[] array covers all the ompi_mpi_int etc statically initialized types.

As far as design, the only options I can think of are something like either

  1. my initial PR that bloated OMPI_DATATYPE_* with more types like OMPI_DATATYPE_MPI_LONG so it can have its remote_size diverge from other ompi types mapping to the same opal type, or
  2. this current PR that puts an optional opaque .ompi_id into the opal datatype so individualized remote_sizes can be set for the different MPI datatypes

Option 1 was a lot smaller than this PR, but there I only addressed MPI_LONG, where in principle any MPI datatype could have an external32 size that diverges from the corresponding sizeof() on the system. So the current PR is certainly more general

@jladd-mlnx jladd-mlnx requested a review from vspetrov October 15, 2020 14:14
@jladd-mlnx
Copy link
Member

@vspetrov - can you review this, please.

@markalle
Copy link
Contributor Author

Repushed just into the 3rd commit, for adding a few more entries into ompi_op_ddt_map[] to cover all the OMPI_DATATYPE_* that are now more separated than before when some OMPI_DATATYPE_* were #defined as other OMPI_DATATYPE_*

@lanl-ompi
Copy link
Contributor

Can one of the admins verify this patch?

@markalle
Copy link
Contributor Author

Which parts of this PR are most concerning to people? I'd like to get these changes in sometime.

The opaque .ompi_id is a sizeable change, but I don't see how else to ensure enough info is available in an opal type that corresponds to an MPI type to know how big its MPI external32 representation should be.

I'd be interested in description pElem walking utilities that could normalize all the pack/unpack routines, but I'm not sure I'm the person to design them. This PR doesn't change the external pack/unpack's pStack walking, but does make the bottom level call that processes a pElem capable of stopping in the middle of a vector and of having a subsequent call pick up where it left off

@ibm-ompi
Copy link

ibm-ompi commented Feb 4, 2021

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/9e3efedf177a06b6567641b2c99cb644

@ibm-ompi
Copy link

ibm-ompi commented Feb 4, 2021

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/82714ff88e509d9e8757c2fb0e3418eb

@ibm-ompi
Copy link

ibm-ompi commented Feb 4, 2021

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/bb56bb74cfa46d405db54a8d5cdc8314

external32 requires us to distinguish MPI_LONG from MPI_INT64_T
but the prior to this commit the two MPI types ompi_mpi_long and
ompi_mpi_int64_t are constructed with the same OPAL initialization
referring to the same OPAL_DATATYPE_INT8 for the opal .id field
and for the .common.type in the description entry.

This commit adds an .ompi_id field (only for the predefined
OMPI types) to both the opal datatype and to the description
at desc[].elem.common.ompi_id.  OPAL only needs to know how
many there are when it walks the arrays, so the actual values
in .ompi_id remain opaque to it.  The .ompi_id isn't required
to be set, as opal datatypes don't have to be associated with
an OMPI datatype.

The macros set .dt.super.ompi_id for the predefined types,
but there are extra steps to reset the .desc entry's .ompi_id
since the initial construction via the opal macros points
many .desc entries at the same place based on its underlying
opal type.  Since the point is to let MPI_LONG's .desc diverge
from MPI_INT64_T's for example, the .desc entries had to be
reset.

Former datatypes constants with the same value like
    OMPI_DATATYPE_MPI_LONG
    OMPI_DATATYPE_MPI_UINT64_T
were separated of course, but cases like
    OMPI_DATATYPE_MPI_C_DOUBLE_COMPLEX
    OMPI_DATATYPE_MPI_CXX_DOUBLE_COMPLEX
are also being separated for a more implementation-specific reason
rather than it being logically necessary.  The opal datatype
initialization sets the .desc field statically at shared offsets
into shared opal_datatype_predefined_elem_desc[].  So later when
I want to set an .ompi_id in .desc[] I have to privatize everything.
In this example "everything" means both
    ompi_mpi_c_double_complex
    ompi_mpi_cxx_dblcplex
but putting both of those into ompi_datatype_basicDatatypes[]
requires separating their constants.

Other changes required to support this:

.ptypes:

opal datatypes have a .ptypes field that previously was just an
array counting how many of each base OPAL type was contained, and
this was used to figure out the size of the packed type.  But
to get the correct size we need ompi_ids in that array.  So this
turns .ptypes into a struct containing the original array
(relocated to .ptypes.bdt_count[]) and a new .ptypes.ompi_dt_count[].
Since types don't have to have an .ompi_id, there's also a flag to
indicate whether .ompi_dt_count[] has valid data or not.

convertor.master.ompi_remote_sizes[]:

The converter had a .remote_sizes[] that's indexed by the base opal
types.  But to specify sizes for external32 I added .ompi_remote_sizes[]
which is the same thing but indexed by ompi_id.  Converters don't
have to specify a set of .ompi_remote_sizes[] since opal types don't
even have to have .ompi_ids, so for initialization the same call that
sets up the .ompi_remote_sizes[] also sets .ompi_remote_sizes_is_set.
The opal sizes are used if the ompi sizes/IDs aren't set.

Signed-off-by: Mark Allen <[email protected]>
@markalle
Copy link
Contributor Author

markalle commented Feb 4, 2021

Rebased since my other pack/unpack PR (a non-contiguous performance improvement from mpich) got merged in and both of these PRs touched the same macros. So whichever went in first would cause a small update in the other.

@gpaulsen
Copy link
Member

bot:mellanox:retest
bot:ompi:retest

* let external32 use the non-optimized datatype description so
  it can know what it's supposed to be packing
* redo the pFunctions arguments to take a vector
  (ddt_elem_desc_t) to be iterated over and a packed buf
  as its main arguments
* VECTOR traversal utility to walk an .elem from a datatype's
  description (using conv_ptr and count_desc same as elsewhere)

Here's a gist for a testcase pack2.c
https://gist.github.com/markalle/6d70cf8ca14761e94bce9d0240000a3e
% mpicc -o x pack2.c
% mpirun -np 1 ./x

Without this fix if sizeof(long) is 8 and a machine is little endian
an input buffer of (long)1 == [01 00 00 00  00 00 00 00] packs external32
as [00 00 00 00  00 00 00 01], using 8 bytes.  But the external32
representation for MPI_LONG is supposed to be 4 bytes.

This commit uses .ompi_id and .ompi_remote_sizes[] if available
to figure out what the external32 size should be (_r_blength).

In the functions like copy_int8_heterogenous() that are made from
pack/unpack with pFunction[], vector walking macros are used to
walk the vector buffer and the byte swapping conversion function
has more arguments to know the size and endianness of both the
"from" and "to" element.

A few other misc changes that went into this:
- changed the bit location for OPAL_ARCH_LONGIS64 because it wasn't living
  under the OPAL_ARCH_LONGISxx mask which appears to be how those masks
  are designed to work
- the terminating condition in pack used to say "bConverted == local_size"
  but bConverted is counting bytes on the "to" side and for pack the "to"
  side is the remote_size.

Signed-off-by: Mark Allen <[email protected]>
The OMPI types used to have more items #defined to the same values,
so OMPI_DATATYPE_MPI_INT was one of OMPI_DATATYPE_MPI_INT16_T or
OMPI_DATATYPE_MPI_INT32_T etc.  The reduction code uses some
map that's expected to have a value for each OMPI_DATATYPE_*
so it neede more entries.

Signed-off-by: Mark Allen <[email protected]>
@markalle
Copy link
Contributor Author

Since one of the main suggestions in previous review was to not reinvent the wheel and model the bookkeeping off existing design, I've switched my vector walking code to use .conv_ptr and .count_desc the same as it's done elsewhere. It's not modeled exactly off of the bookkeeping in say opal_generic_simple_unpack_function and unpack_partial_blocklen, because those don't appear to have the ability to pick up in the middle of a block. I still have my own vector walking utility functions to handle all the cases that I think the existing code is missing, but they iterate .conv_ptr / .count_desc the same as the other pack/unpack code

@gpaulsen gpaulsen requested a review from bosilca February 26, 2021 14:36
@gpaulsen
Copy link
Member

@ggouaillardet @bosilca @vspetrov Anyone able to re-review? I'm hopeful this is ready.

@gpaulsen
Copy link
Member

@ggouaillardet after @markalle rebases, can you please review to see if this will resolve #7851 ?

@gpaulsen
Copy link
Member

@markalle Please rebase.

@gpaulsen
Copy link
Member

Talking with @markalle We believe this is covered by #8735, with the small caveat that PR 8735 only covers MPI_LONG and MPI_UNSIGNED_LONG.

Closing for now.

@gpaulsen gpaulsen closed this Apr 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants