-
Notifications
You must be signed in to change notification settings - Fork 936
Fix oob_tcp tcp_component_close segfault with active listeners #6796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix oob_tcp tcp_component_close segfault with active listeners #6796
Conversation
|
Can one of the admins verify this patch? |
oob_tcp in non-HNP mode shares libevent event_base with oob_base [1]. orte_oob_base_close calls: (1) oob_tcp component_shutdown, then (2) opal_progress_thread_finalize, then (3) oob_tcp tcp_component_close [2]. opal_progress_thread_finalize calls tracker_destructor [3] that frees the event_base [4]. If any oob_tcp event listeners are active at this time, oob_tcp will crash trying to delete them at [5] [6]. This change moves oob_tcp event listener cleanup from component_close to component_shutdown so that it happens before the event_base is freed. [1] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/tcp/oob_tcp_listener.c#L160 [2] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/base/oob_base_frame.c#L95 [3] https://github.com/open-mpi/ompi/blob/v4.0.1/opal/runtime/opal_progress_threads.c#L232 [4] https://github.com/open-mpi/ompi/blob/v4.0.1/opal/runtime/opal_progress_threads.c#L65 [5] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/tcp/oob_tcp_component.c#L192 [6] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/tcp/oob_tcp_listener.c#L955 Signed-off-by: Orivej Desh <[email protected]>
10c7447 to
78b7e34
Compare
|
I was debugging a reliable This is the initial (expected) part of the backtrace: This is the rest that crashes: It crashes because the event base with its lock were freed here: |
|
ok to test |
jsquyres
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch -- thank you!
Backport open-mpi/ompi#6796 Signed-off-by: Ralph Castain <[email protected]>
oob_tcpin non-HNP mode shares libeventevent_basewithoob_base[1].orte_oob_base_closecalls:(1)
oob_tcp component_shutdown, then(2)
opal_progress_thread_finalize, then(3)
oob_tcp tcp_component_close[2].opal_progress_thread_finalizecallstracker_destructor[3] that frees theevent_base[4]. If anyoob_tcpevent listeners are active at this time,oob_tcpwill crash trying to delete them at [5] [6].This change moves
oob_tcpevent listener cleanup fromcomponent_closetocomponent_shutdownso that it happens before theevent_baseis freed.[1] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/tcp/oob_tcp_listener.c#L160
[2] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/base/oob_base_frame.c#L95
[3] https://github.com/open-mpi/ompi/blob/v4.0.1/opal/runtime/opal_progress_threads.c#L232
[4] https://github.com/open-mpi/ompi/blob/v4.0.1/opal/runtime/opal_progress_threads.c#L65
[5] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/tcp/oob_tcp_component.c#L192
[6] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/tcp/oob_tcp_listener.c#L955