You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here's a gist for a testcase pack.c
https://gist.github.com/markalle/ea3e48e8987bcd8a18923304e80833d4
% mpicc -o x pack.c
% mpirun -np 1 ./x
Without this fix if sizeof(long) is 8 and a machine is little endian
an input buffer of (long)1 == [01 00 00 00 00 00 00 00] packs external32
as [00 00 00 00 00 00 00 01], using 8 bytes. But the external32
representation for MPI_LONG is supposed to be 4 bytes.
I don't see much in the design to support the sizing of packed datatypes.
Convertors have a remote_sizes[] array, but those are indexed by
OPAL_DATATYPE_* which are buckets that contain multiple MPI datatypes.
Eg MPI_LONG is OPAL_DATATYPE_INT8 (if sizeof(long) == 8), but so is
MPI_INT64_T which has an external32 size of 8.
So the OPAL bucket is too big and as far as I can tell the information is
lost that would allow MPI_LONG to be packed to a different size.
For this PR I followed the design of BOOL (which has its own OPAL_DATATYPE_
entry as opposed to mapping to OPAL_DATATYPE_INT1), and separated out another
OPAL_DATATYPE_ bucket for MPI_LONG and MPI_UNSIGNED_LONG. I don't like
this solution because there's no guarantee that any MPI datatype has
a sizeof() that matches the size specified in external32. This PR
only carves out another special case for long and unsigned long.
The MPI_LONG datatype now maps to the new OPAL_DATATYPE_LONG which allows
the external32 converter to set a remote_sizes[] of 4 for it.
Then in the functions like copy_int8_heterogenous() that are made from
pack/unpack with pFunction[], there are new arguments for from_is_bigendian
and to_is_bigendian that allow conversions when from/to have different
sizes, and the input sizes arguments are for the base elements to be
converted in a contiguous chunk, the extents between contigous chunks are
handled at the next level up in the pack/unpack functions.
Prior to this checkin the "bottom" of the pack/unpack loops processed a
single element from the datatype description[] entry, but inside the
pFunction[] call it didn't loop through the elem->count and elem->blocklen
right. In this PR I'm putting that loop in pack/unpack and having pFunctions
like copy_int8_heterogeneous() processing contiguous chunks.
I don't believe partial copies are handled correctly before or after this
PR. pack doesn't try at all to handle this, it exclusively loops over all
the blocks and all the elements in each block. Unpack has more code for
partial copies, but doesn't seems to account for how each elem is
basically a type_vector.
Other misc changes that went into this:
- added a datatype converter arch_id flag for OPAL_ARCH_LONGIS32
and set it for the external32 convertor
- changed the bit location for OPAL_ARCH_LONGIS64 because it wasn't living
under the OPAL_ARCH_LONGISxx mask which appears to be how those masks
are designed to work
- the terminating condition in pack used to say "bConverted == local_size"
but bConverted is counting bytes on the "to" side and for pack the "to"
side is the remote_size.
Signed-off-by: Mark Allen <[email protected]>
0 commit comments