Skip to content

uninitialized error_code affecting MPI_File_read() #5489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
markalle opened this issue Jul 26, 2018 · 7 comments
Closed

uninitialized error_code affecting MPI_File_read() #5489

markalle opened this issue Jul 26, 2018 · 7 comments

Comments

@markalle
Copy link
Contributor

I'm seeing a difference between the old romio314 and romio321 in the mpioimpl.h file where it had old code

/* The MPI_DATATYPE_ISCOMMITTED macro now always sets err_=0.
  This is an optimistic approach for Open MPI, but it is likely other
  upper layers already checked the datatype was committed.
  Not setting err_ is incorrect since it can lead to use of
  uninitialized variable.*/
#define MPIO_DATATYPE_ISCOMMITTED(dtype_, err_) do { err_ = 0; } while (0)

vs the new romio321 code in master that just says

#define MPIO_DATATYPE_ISCOMMITTED(dtype_, err_) do {} while (0)

And like the above comment says, err_ goes uninitialized. In particular you can trace through romio321//romio/mpi-io/read.c to see what happens with error_code. It's an uninitialized stack variable that goes through three macros none of which set it. The macros consistently set error_code to a failure if they see something wrong, but they don't consistently set it to success when things are fine.

The last macro in particular is where the failure to set error_code becomes an issue because it checks the uninitialized value at the end for != MPI_SUCCESS.

#define MPIO_CHECK_DATATYPE(fh, datatype, myname, error_code)       \
    do {                                                            \
        if (datatype == MPI_DATATYPE_NULL) {                        \
            error_code = MPIO_Err_create_code(MPI_SUCCESS,          \
                                              MPIR_ERR_RECOVERABLE, \
                                              myname, __LINE__,     \
                                              MPI_ERR_TYPE,         \
                                              "**dtypenull", 0);    \
        }                                                           \
        else {                                                      \
            MPIO_DATATYPE_ISCOMMITTED(datatype, error_code);        \
        }                                                           \
        if (error_code != MPI_SUCCESS) {                            \
            error_code = MPIO_Err_return_file(fh, error_code);      \
            goto fn_exit;                                           \
        }                                                           \
    } while (0)

My opinion is that "normal" behavior of both functions and macros ought to be not only to set error_code = some_failure_code when things go wrong, but also to set error_code = SUCCESS when they pass. And I'd say that for both the top level macros like MPIO_CHECK_DATATYPE() as well as the further down macros like MPIO_DATATYPE_ISCOMMITTED().

So design-wise I think the old romio314 had the setting in the right place. Thoughts?

But my git archeology isn't showing me much: can anybody tell if that was a change we made in romio314 or if that was what the original code from romio314 said?

@gpaulsen
Copy link
Member

@ggouaillardet ?

@markalle
Copy link
Contributor Author

I'm hoping this is isolated, rather than representative of some collection of fixes we made on top of romio314 that haven't been carried over into romio321.

I downloaded mpich-3.1.4 and found it has the bad code (doesn't set error_code) vs our initial checkin into romio314/ that already has the code fixed. The checkin comment at that point says

commit 92f6c7c1e210c559471a05aaac9b19e0bd3d71bb
Author: Gilles Gouaillardet <[email protected]>
Date:   Thu Apr 30 18:56:53 2015 +0900
    ROMIO 3.1.4 refresh: apply post romio-3.1.4 patches

So I guess that means the fix is in some romio-3.1.4 patch? Anybody know where that patch is? I could probably diff our original romio314/ vs mpich-3.1.4 to figure that out, but I'm guessing somebody already knows

@markalle
Copy link
Contributor Author

Oh, I can't read... 92f6c7c is the patch in question that got applied early on to romio314. And it's not too gigantic.

@markalle
Copy link
Contributor Author

#5493

@markalle
Copy link
Contributor Author

and #5494 for master

@markalle
Copy link
Contributor Author

#5494 for master
#5497 for v4.0.x
(And closed the initial 5493 that had also been for v4.0.x)

@gpaulsen
Copy link
Member

gpaulsen commented Sep 7, 2018

merged and closing.

@gpaulsen gpaulsen closed this as completed Sep 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants