-
Notifications
You must be signed in to change notification settings - Fork 901
Reenable heterogeneous support. #8735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Looks like a good improvement, thanks. I don't think this addresses the MPI_LONG issue that #7919 does though. I just now ran it on an array of two longs that were stored little endian, eg I think I'm fine with the bookkeeping going on in the heterogeneous pack/unpack, looks similar to a lot of the other code. I'll have a bit of merging to make 7919 compatible after this goes in, but it shouldn't be a problem. |
This commit fixes the support for heterogeneous environments and specifically for external32. The root cause was that during the datatype optimization process types that are contiguous in memory are collapsed together in order to decrease the number of conversion (or memcpy) function calls. The resulting type however, does not have the same conversion rules as the types it replaced, leading to an incorrect (or absent) conversion in some cases. This patch marks the datatypes where types have been collapsed during the optimization process with a flag, allowing the convertor to detect if the optimized type can be used in heterogeneous setups. Signed-off-by: George Bosilca <[email protected]>
Signed-off-by: George Bosilca <[email protected]>
The only 2 types that have an external32 representation with a different size than most current architectures, and as a result are more challenging to handle. This patch also brings back the support for packing and unpacking to and from external32 for all datatypes and does a little cleaning in the datatype API and comments. Signed-off-by: George Bosilca <[email protected]>
6168b12
to
4e56e83
Compare
At this point we need a little bit more discussion on this PR. Some of the changes proposed in the #7919 are nice and could lead to significant performance boost, but at this point I need to figure out a way to pull them in. |
Should I try merging 7919 with this? It looks like my pFunctions can shrink a bunch. I had written the pFunctions to take an _elem vector description and process it in a vector manner. This code looks more oriented toward processing the _elem from the outer loops that are calling pFunction and each call into pFunction is just doing one block. That's fine with me. And whichever part I had in place to switch from the optimized to the non-optimized description I think is covered in this PR already too. |
The second part of this PR brings back the OPAL level support for MPI_LONG and MPI_UNSIGNED_LONG. Originally I was hoping to be able to add these types at the MPI level, to avoid cluttering the OPAL layer with this. Unfortunately, it forced the OMPI layer to drastically alter the master convertor for the 2 additional types, resulting in a more complex and error prone code. Thus, the missing 2 types, OPAL_DATATYPE_LONG and OPAL_DATATYPE_UNSIGNED_LONG, are now officially back in OPAL with all the support for pack and unpack (including to and from the XDR format used by the external32 data representation). As is the code works but it's more complicated than necessary. Regarding #7919, the code is complicated adding another layer of vector handling. I think it would be more readable if we stick to a single way of parsing vectors, and push it as low as possible in the conversion functions. In other words, we need to somehow mix [pack|unpack]_predefined_heterogeneous with the COPY_TYPE_HETEROGENEOUS macros. I tried by hand and the code is not readable and definitively not easy to debug (gdb refuses to go into the macros despite the additional ggdb flag). |
I spoke with @bosilca about this in Slack. Our conclusions:
Note: there will be an additional PR after this one for fixing #8442 |
I was being extra cautious to support more varied architectures in 7919. Lots of the C types don't have defined sizes -- long is merely the only deviation between external32 and the most common datatype sizes found in machines we typically use. Being pragmatic that's all I really care about as I never use any other type of machine either. But if we're trying to make it a complete design I think you have to bloat OPAL_DATATYPE_* to be a 1:1 match for every datatype in OMPI_* -- because how do you know which ones are potential deviations between the system size and the size specified in "external32"? My initial version of 7919 did the same thing, just adding OPAL_DATATYPE_LONG/UNSIGNED_LONG and it worked fine, then later I changed it to the large blob of code to track an opaque |
At some point in the past, we had full support for heterogeneous architectures, and it was used in production. As alternative architectures vanished, so was our ability to test and correctly maintain a working conversion code. I think the last effort around this topic was from @ggouaillardet when he validated the code on SPARC. Anyway, the design of the datatype engine allows for support for additional datatypes without loading the base engine with unnecessary types. Basically, the OPAL datatype layer should act as a service provider, providing the framework to do pack and unpack, by working on the specialized, vector-based typemap. Any layer should be able to add "predefined" types by reserving a space in the OPAL datatype layer (via the id), and then providing the pFunction (pack/unpack). For all heterogeneous cases, we don't even need the conversion functions as we fall back to a direct use of memcpy. |
This commit fixes the support for heterogeneous environments and specifically for external32. The root cause was that during the datatype optimization process types that are contiguous in memory are collapsed together in order to decrease the number of conversion (or memcpy) function calls. The resulting type however, does not have the same conversion rules as the types it replaced, leading to an incorrect (or absent) conversion in some cases. This patch marks the datatypes where types have been collapsed during the optimization process with a flag, allowing the convertor to detect if the optimized type can be used in heterogeneous setups.
Fixes #7851 (and provides an alternative to #7919).
Signed-off-by: George Bosilca [email protected]