Skip to content

Conversation

@bosilca
Copy link
Member

@bosilca bosilca commented Mar 7, 2018

As discussed on mpi-forum/mpi-issues#77 (comment)
the conversion to double in the MPI_Wtime decrease the range
and accuracy of the resulting timer. By setting the timer to
0 at the first usage we basically maintain the accuracy for
194 days even for gettimeofday.

Signed-off-by: George Bosilca [email protected]

@bosilca bosilca added this to the Future milestone Mar 7, 2018
@bosilca bosilca requested a review from hjelmn March 7, 2018 18:33
@ibm-ompi
Copy link

ibm-ompi commented Mar 7, 2018

The IBM CI (GNU Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/7c2ade236017945328afb6d446af3d36

@ibm-ompi
Copy link

ibm-ompi commented Mar 7, 2018

The IBM CI (XL Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/3c1a63356106737ead55087bab035a0b

@ibm-ompi
Copy link

ibm-ompi commented Mar 7, 2018

The IBM CI (PGI Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/4c1c181636566b9960e6c98335a5beb8

@jsquyres
Copy link
Member

jsquyres commented Mar 7, 2018

@bosilca Looks like you have some kind of error:

pwtime.c:95:15: error: use of undeclared identifier 'tv'
    wtime += (tv.tv_sec - time_origin.tv_sec);
              ^
1 error generated.
Error while processing pwtime.c.

@ibm-ompi
Copy link

ibm-ompi commented Mar 7, 2018

The IBM CI (GNU Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/635054a1e3c30ebd6a85536f41bccab8

@ibm-ompi
Copy link

ibm-ompi commented Mar 7, 2018

The IBM CI (XL Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/2b30f2f908f408fe9e0db6f642e3b773

@ibm-ompi
Copy link

ibm-ompi commented Mar 7, 2018

The IBM CI (PGI Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/36e2afa763057f37ed9a5c4099e707b5

@ggouaillardet
Copy link
Contributor

@bosilca out of curiosity, why didn't you choose to initialize time_origin in MPI_Init() and hence get rid of an if(OPAL_UNLIKELY(...)) ?

@bosilca
Copy link
Member Author

bosilca commented Mar 8, 2018

@ggouaillardet I tried to minimize the spread of the patch. I didn't think that the unlikely if would make a big difference in the execution path, so I was not concerned about it. In fact, my first version did this down in OPAL timer, but then I decided that this change is too MPI specific to deserve to be down in OPAL.

@ggouaillardet
Copy link
Contributor

@bosilca, got it, thanks !

@bosilca
Copy link
Member Author

bosilca commented Mar 8, 2018

Can someone check that I didn't miss any case when dealing with profiling please.

@kawashima-fj
Copy link
Member

@bosilca time_origin is an external symbol. We need ompi_ prefix.

@ggouaillardet
Copy link
Contributor

@bosilca and you will also need OMPI_DECLSPEC.

As a matter of style, you can either use C99 initializers (e.g. struct timexyz time_origin = {0}; or not initialize tv_nsec at all to keep the code consistent (since OPAL_UNLIKELY(...) does not test it).

Also, do you have any reason to initialize tp at line 77 (this is not coming from this PR though) ?

As discussed on mpi-forum/mpi-issues#77 (comment)
the conversion to double in the MPI_Wtime decrease the range
and accuracy of the resulting timer. By setting the timer to
0 at the first usage we basically maintain the accuracy for
194 days even for gettimeofday.

Signed-off-by: George Bosilca <[email protected]>
@bosilca
Copy link
Member Author

bosilca commented Mar 8, 2018

I fixed some of the suggestions. Regarding the time_origin symbol, it is supposed to be extern so that I can be shared between wtime.o and pwtime.o, but it should not the DECLSPEC as I don't want it to be visible from outside.

@ggouaillardet
Copy link
Contributor

:bot:mellanox:retest

@ggouaillardet
Copy link
Contributor

@bosilca thanks, that PR looks good to me !

@hjelmn
Copy link
Member

hjelmn commented Mar 12, 2018

Looks good to me as well. I still think double is a horrible type for a timer but that is one of the more minor mistakes we have to live with from MPI-1. This PR at least makes it so we don't loose precision.

@bosilca bosilca merged commit bf3dd8a into open-mpi:master Mar 16, 2018
@bosilca bosilca deleted the topic/fix_wtime branch March 16, 2018 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants