Skip to content

Reenable heterogeneous support. #8735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 5, 2021

Conversation

bosilca
Copy link
Member

@bosilca bosilca commented Mar 30, 2021

This commit fixes the support for heterogeneous environments and specifically for external32. The root cause was that during the datatype optimization process types that are contiguous in memory are collapsed together in order to decrease the number of conversion (or memcpy) function calls. The resulting type however, does not have the same conversion rules as the types it replaced, leading to an incorrect (or absent) conversion in some cases. This patch marks the datatypes where types have been collapsed during the optimization process with a flag, allowing the convertor to detect if the optimized type can be used in heterogeneous setups.

Fixes #7851 (and provides an alternative to #7919).

Signed-off-by: George Bosilca [email protected]

@bosilca bosilca changed the title Reenable the heterogeneous support. Reenable heterogeneous support. Mar 30, 2021
@markalle
Copy link
Contributor

Looks like a good improvement, thanks.

I don't think this addresses the MPI_LONG issue that #7919 does though. I just now ran it on an array of two longs that were stored little endian, eg
in : 01 00 00 00, 00 00 00 00, 02 00 00 00, 00 00 00 00
then packed external32 it's currently coming back as
packed : 00 00 00 00, 00 00 00 01, 00 00 00 00, 00 00 00 02
so it's using 8 bytes for each MPI_LONG, because it thinks it's packing an OPAL_INT8 because the information that its was packing an MPI_LONG is gone.

I think I'm fine with the bookkeeping going on in the heterogeneous pack/unpack, looks similar to a lot of the other code. I'll have a bit of merging to make 7919 compatible after this goes in, but it shouldn't be a problem.

bosilca added 3 commits April 2, 2021 00:22
This commit fixes the support for heterogeneous environments and
specifically for external32. The root cause was that during the datatype
optimization process types that are contiguous in memory are collapsed
together in order to decrease the number of conversion (or memcpy)
function calls. The resulting type however, does not have the same
conversion rules as the types it replaced, leading to an incorrect (or
absent) conversion in some cases. This patch marks the datatypes where
types have been collapsed during the optimization process with a flag,
allowing the convertor to detect if the optimized type can be used in
heterogeneous setups.

Signed-off-by: George Bosilca <[email protected]>
The only 2 types that have an external32 representation with a different
size than most current architectures, and as a result are more
challenging to handle.

This patch also brings back the support for packing and unpacking to and
from external32 for all datatypes and does a little cleaning in the
datatype API and comments.

Signed-off-by: George Bosilca <[email protected]>
@bosilca bosilca force-pushed the fix/external32_support branch from 6168b12 to 4e56e83 Compare April 2, 2021 06:39
@gpaulsen gpaulsen requested a review from markalle April 2, 2021 13:43
@gpaulsen
Copy link
Member

gpaulsen commented Apr 2, 2021

@markalle Can you please re-review as @bosilca has updated.

@bosilca
Copy link
Member Author

bosilca commented Apr 2, 2021

At this point we need a little bit more discussion on this PR. Some of the changes proposed in the #7919 are nice and could lead to significant performance boost, but at this point I need to figure out a way to pull them in.

@markalle
Copy link
Contributor

markalle commented Apr 5, 2021

Should I try merging 7919 with this? It looks like my pFunctions can shrink a bunch. I had written the pFunctions to take an _elem vector description and process it in a vector manner. This code looks more oriented toward processing the _elem from the outer loops that are calling pFunction and each call into pFunction is just doing one block. That's fine with me. And whichever part I had in place to switch from the optimized to the non-optimized description I think is covered in this PR already too.

@bosilca
Copy link
Member Author

bosilca commented Apr 5, 2021

The second part of this PR brings back the OPAL level support for MPI_LONG and MPI_UNSIGNED_LONG. Originally I was hoping to be able to add these types at the MPI level, to avoid cluttering the OPAL layer with this. Unfortunately, it forced the OMPI layer to drastically alter the master convertor for the 2 additional types, resulting in a more complex and error prone code. Thus, the missing 2 types, OPAL_DATATYPE_LONG and OPAL_DATATYPE_UNSIGNED_LONG, are now officially back in OPAL with all the support for pack and unpack (including to and from the XDR format used by the external32 data representation). As is the code works but it's more complicated than necessary.

Regarding #7919, the code is complicated adding another layer of vector handling. I think it would be more readable if we stick to a single way of parsing vectors, and push it as low as possible in the conversion functions. In other words, we need to somehow mix [pack|unpack]_predefined_heterogeneous with the COPY_TYPE_HETEROGENEOUS macros. I tried by hand and the code is not readable and definitively not easy to debug (gdb refuses to go into the macros despite the additional ggdb flag).

@jsquyres
Copy link
Member

jsquyres commented Apr 5, 2021

I spoke with @bosilca about this in Slack. Our conclusions:

  1. There's a possible optimization in External32 fix #7919 that might be useful on top of the stuff in this PR (seeing some benchmark data that shows whether the External32 fix #7919 optimization is effective would be good).
  2. But this PR is otherwise good-to-go.
  3. So let's merge this PR, since we need it for general datatype correctness. If the optimization from External32 fix #7919 shows to be effective, it can be brought it as a new PR.

Note: there will be an additional PR after this one for fixing #8442

@jsquyres jsquyres merged commit ca65630 into open-mpi:master Apr 5, 2021
@markalle
Copy link
Contributor

markalle commented Apr 6, 2021

I was being extra cautious to support more varied architectures in 7919. Lots of the C types don't have defined sizes -- long is merely the only deviation between external32 and the most common datatype sizes found in machines we typically use. Being pragmatic that's all I really care about as I never use any other type of machine either. But if we're trying to make it a complete design I think you have to bloat OPAL_DATATYPE_* to be a 1:1 match for every datatype in OMPI_* -- because how do you know which ones are potential deviations between the system size and the size specified in "external32"?

My initial version of 7919 did the same thing, just adding OPAL_DATATYPE_LONG/UNSIGNED_LONG and it worked fine, then later I changed it to the large blob of code to track an opaque .ompi_id throughout the opal datatypes so any type could have its external32 deviate from the system size, and opal would be able to lookup the right specification coming from the OMPI side.

@bosilca
Copy link
Member Author

bosilca commented Apr 7, 2021

At some point in the past, we had full support for heterogeneous architectures, and it was used in production. As alternative architectures vanished, so was our ability to test and correctly maintain a working conversion code. I think the last effort around this topic was from @ggouaillardet when he validated the code on SPARC.

Anyway, the design of the datatype engine allows for support for additional datatypes without loading the base engine with unnecessary types. Basically, the OPAL datatype layer should act as a service provider, providing the framework to do pack and unpack, by working on the specialized, vector-based typemap. Any layer should be able to add "predefined" types by reserving a space in the OPAL datatype layer (via the id), and then providing the pFunction (pack/unpack). For all heterogeneous cases, we don't even need the conversion functions as we fall back to a direct use of memcpy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

error with MPI_Unpack_external ("external32",...) and derived datatype
4 participants