Skip to content

Possible Race condition in close_open_file_descriptors() #6303

@uberlinuxguy

Description

@uberlinuxguy

Thank you for taking the time to submit an issue!

Background information

There is a possible close/read race condition in the close_open_file_descriptors() function of all odls modules. The trigger is unknown, and the condition is not easily reproducible. It is possible that the underlying trigger is a bug in libc or the underlying open/read/close system calls in the kernel, but there is a safe work around.

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

Looking at the code, this effects all active branches, and possibly stale branches, of openmpi.

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

OpenMPI was installed from source when this bug was detected.

Please describe the system on which you are running

  • Operating system/version: CentOS 7
  • Computer hardware: x86_64 arch
  • Network type: OpenIB

Details of the problem

close_open_file_descriptors() seems to go through the open fs in /proc/self/fd/ and closes them all. However, one of the last fd's it closes is the fd on the DIR structure returned by opendir(). In most instances, this works fine. Under certain, currently unknown, circumstances (possibly kernel or libc related) a segmentation fault happens in the readdir() function on the DIR that is closed by while loop.

The proposal is to skip the fd of the open DIR structure. I actually have a working patch for the odls_default_module and plan to port the patch to the other odls modules. I do not, however, have a way to test the patch with the Cray Alps launcher. I also do not have a good way to trigger this possible race condition, but the proposed patch will avoid it.

I am planning to issue pull requests for all of the active branches as well as master per the guidelines.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions