-
Notifications
You must be signed in to change notification settings - Fork 900
Pack/Unpack external32 with long double still broken #8918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For me the ppc build works from master, but I assume that's because my build isn't using opal_dt_swap_long_double() and yours is probably taking the x86 path? So mine is just swapping the bytes without any extra ieee struct related to conversion I'm not really familiar with what the opal_dt_swap_long_double() routine is doing. Are you x86? |
Yes. |
IIRC, a long time ago in a galaxy far far away, I worked on this in the context of heterogenous clusters, when communicating The same routine is used by I also noted MPICH does a different packing (simple byte swap (and hence symmetrical)?) So I think we should first refer to the MPI standard in order to find the right way to |
@ggouaillardet According to the MPI standard, the proper way to pack MPI does not define a predefined datatype for binary128. Heterogeneous support may not have all the bits required to implement pack/unpack external32. EDIT: A trivial implementation of pack/unpack external32 for |
I agree with a caveat about I'm not sure if __float128 gets us enough portablity (eg it worked from from my x86 linux machine but my mac didn't seem to have quadmath.h). But as far as functionality if I store the data "123" as a long double then as a __float128, the bits come out as you described, for example:
So above is really just long winded say of saying/confirming that long double at least on my x86 is 80 bit extended precision and __float128 makes it ieee 754 quad precision. |
I made a couple routines to test with that convert both directions at least for the basic cases. I don't actually understand the two special exponents that describe subnormal numbers and the various NaNs, although I think the 80 bit extended precision and 128 bit quad precision use the same special exponents so a conversion like this gist that doesn't have special cases for the special exponents might be okay: https://gist.github.com/markalle/650a6389518d04e54c9cd102485a95fd My reason for making these conversions would be for architectures where we don't have __float128. If we do have __float128 then of course we should just use that. I didn't incorporate the above into OMPI, and would need to look closer to understand how to change where the conversion is made. From my past reading I recall what @ggouaillardet is describing. It always seemed like the conversion routines didn't have enough arguments to describe what they were trying to do, because all they handled was byte swapping (and only of same-sized arguments) and they assumed it was all symmetrical. |
@markalle Here you have my own version of these routines, not including any byte swapping as that part is trivial. I don't think your routines are correct, the explicit vs. implicit significand bit requires care, although I may be wrong. https://gist.github.com/dalcinl/05cccf7b11cdf169a750485f67b499b7 In my code, the |
I do see one difference between your code and mine for decoding a stored float128: if e==0 you're turning off the highest bit of the mantissa, where mine universally turned it on. For the rest I think we're doing the same operations, except I did all mine with characters because unsigned gets really confusing to me for mixing endianness. In particular I'm concerned that your code might be relying on the architecture endianness matching what you're trying to store. But a likely use case here is we're running little endian and intending to encode into a big endian float128. So for example if the sign and exponent was 0x400d, the little endian machine would have those two bytes stored as I'm equally suspicious of all the frac and f[0123] and f0123 parts when mixing endianness. I like that it's fewer operations, and I haven't done any timings but am willing to assume yours is faster. But I'd be a lot more comfortable with character based encoding if the storage endianness isn't the same as the architecture endianness. |
@markalle I do not really care what version is ultimately used. However, I warn you about about the handling of explicit vs. hidden bit. Your testing with value As I said before, my version DOES NOT include any byte-swapping to big-endian, and should be tested in some actual big-endian architecture (both 32 and 64 bits), as I'm not sure about my struct definitions for that case. |
@bosilca could you please give us some insights? There are two sub-issues here:
|
I made a PR using a lightly modified version of your f80 pack/unpack. By adding a little more context added to pFunction's arguments I sidesteped my concern, about @dalcinl's function by ordering his function so it's always operating on data in the local-arch format, so sometimes it does opal_dt_swap_bytes first followed by opal_dt_swap_long_double and sometimes vice-versa |
@awlauria X86_64 is enough. Ideally, we should also try it in a big-endian arch. However, I'm not aware of any big-endian arch where |
Important but not a blocker for rc, probably want it for release. |
Yeah, testing is a problem. The original bug is hit on x86_64, but trying to hit all the paths in the fix is hard. |
I did another repush to #8941 as the fix. I think it's done, but testing is a concern. I ran mac x86_64 which hits
I'll try PPC64 and see what other paths I can hit next. Other paths it would be nice to include are
|
f80 and big endian may be impossible to test, at least with MPI code. f80 is an Intel format, I do not expect any other big-endian CPU to implement it. On CPUs other than Intel, I expect |
I made this comment over in the #8941 fix PR: The systems I've tested on now are
That's not a bad sampling, and even though it's not using the full OMPI on the old qemu system, my standalone program uses most of the OMPI code. |
This one fell into the background again, but I think it's all done and ready to go over in #8941 |
Looks like we have a FIX in #8941. We just need a review and then cherry-picks to the release branches. |
All mpi4py pack/unpack tests with external32 passed for branches:
|
awesome thanks. |
@gpaulsen This is a followup on #8769. Pack/Unpack external32 with
long double
is still broken.I'm using a very recent
ompi:master
at 17b723b. The test from this gist (thanks @markalle) is failing on my Fedora 34 workstation.The text was updated successfully, but these errors were encountered: