Skip to content

Conversation

@bosilca
Copy link
Member

@bosilca bosilca commented Jan 30, 2019

The issue was a little complicated due to the internal stack used in the
convertor. The main issue was that in the case where we run out of iov
space to save the raw description of the data while hanbdling a
repetition (loop), instead of saving the current position and bailing out
directly we reading of the next predefined type element. It worked in
most cases, except the one identified by the HDF5 test. However, the
biggest issue here was the drop in performance for all ensuing calls to
the convertor pack/unpack, as instead of handling contiguous loops as a
whole (and minimizing the number of memory copies) we copied data
description by data description.

Signed-off-by: George Bosilca [email protected]

@bosilca bosilca self-assigned this Jan 30, 2019
@ggouaillardet ggouaillardet changed the title Provide a better fix for #8285. Provide a better fix for #6285. Jan 30, 2019
@ggouaillardet
Copy link
Contributor

minor typo in the commit message, it should refer #6285

@hjelmn
Copy link
Member

hjelmn commented Jan 30, 2019

06:32:13 FAIL: ddt_raw2

Looks like something is not right either with the test or the PR.

@edgargabriel
Copy link
Member

edgargabriel commented Jan 30, 2019

@bosilca something is not quite right, the result is still not correct (although the offsets are different now). Will look into it later today to see whether I can provide more details.

@bosilca
Copy link
Member Author

bosilca commented Jan 30, 2019

ok let me check again, on my machine all datatype tests pass.

@ggouaillardet
Copy link
Contributor

@bosilca I pushed an other commit to this PR. ddt_raw2 fails without it, and passes with.
If it works for you, please squash both commits and merge into master, I will PR for the release branches after.

@bosilca
Copy link
Member Author

bosilca commented Jan 31, 2019

Oh the newbie mistake! I did had the feeling that my pull-rebase didn't quite worked but all my tests passed so I went ahead and pushed, and indeed I inherited both my fix and @ggouaillardet. Thanks @ggouaillardet for the better fix of the better fix.

The issue was a little complicated due to the internal stack used in the
convertor. The main issue was that in the case where we run out of iov
space to save the raw description of the data while hanbdling a
repetition (loop), instead of saving the current position and bailing out
directly we reading of the next predefined type element. It worked in
most cases, except the one identified by the HDF5 test. However, the
biggest issue here was the drop in performance for all ensuing calls to
the convertor pack/unpack, as instead of handling contiguous loops as a
whole (and minimizing the number of memory copies) we copied data
description by data description.

Signed-off-by: George Bosilca <[email protected]>
@edgargabriel
Copy link
Member

I can confirm that the hdf5 data corruption reported by @ax3l and @psychocoderHPC is fixed with this commit.

@ax3l
Copy link

ax3l commented Jan 31, 2019

When this is merged I will also test again dev with both reported issues in #6285.

Don't forget to backport all related issues also to 2.X, please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants