Skip to content

smsc/cma: Add a check for CAP_SYS_PTRACE between processes #10694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 24, 2022

Conversation

jjhursey
Copy link
Member

  • If you run in an environment (e.g., container) where the CAP_SYS_PTRACE
    capability is not provided then the two processes will not be able to
    use process_vm_readv/process_vm_writev even if all of the other
    checks currently in the code pass.
    The result is errors when trying to call one of these two functions
    which is difficult for the called (i.e., btl/sm) to recover from.
  • Use the kcmp system call as a proxy for the process_vm_readv/process_vm_writev
    functions. kcmp is a lightweight check in the kernel and is sufficient
    to detect if the two processes have the necessary capabilities.
  • Refs

@jjhursey
Copy link
Member Author

jjhursey commented Aug 19, 2022

I found this when working in a local Docker container launching a couple of processes.

Without this commit

shell$ docker run --rm --user 998:995 my-ompi-image mpirun -np 2 --mca smsc_base_verbose 1 /opt/hpc/local/nas/bin/lu.W.x

 NAS Parallel Benchmarks 3.4 -- LU Benchmark

 Size:   33x  33x  33  (class W)
 Iterations:  300
 Total number of processes:      2

lu.W.x: pml_ob1_sendreq.h:234: mca_pml_ob1_send_request_fini: Assertion `NULL == sendreq->rdma_frag' failed.

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x7fffa16fcd8f in ???
#1  0x7fffa16fb657 in ???
#2  0x7fffa20304d7 in ???
#3  0x7fffa12ca3f8 in ???
#4  0x7fffa12a4913 in ???
#5  0x7fffa12bdbef in ???
#6  0x7fffa12bdc93 in ???
#7  0x7fffa1cb3d87 in mca_pml_ob1_send_request_fini
	at /working/ompi/ompi/mca/pml/ob1/pml_ob1_sendreq.h:234
#8  0x7fffa1cb5b2f in mca_pml_ob1_send
	at /working/ompi/ompi/mca/pml/ob1/pml_ob1_isend.c:335
#9  0x7fffa1a8bb13 in PMPI_Send
	at /working/ompi/ompi/mpi/c/send.c:93
#10  0x7fffa1f27aeb in ompi_send_f
	at /working/ompi/ompi/mpi/fortran/mpif-h/profile/psend_f.c:78
#11  0x1001191f in ???
#12  0x10007363 in ???
#13  0x10001a43 in ???
#14  0x100017b3 in ???
#15  0x7fffa12aa877 in ???
#16  0x7fffa12aaa63 in ???
#17  0xffffffffffffffff in ???
--------------------------------------------------------------------------
prterun noticed that process rank 0 with PID 0 on node c669c7922905 exited on signal 6 (Aborted).
--------------------------------------------------------------------------

Obviously, btl/sm was not happy. Passing either -mca btl ^sm or -mca smsc ^cma allowed it to run successfully

With this commit

This PR will allow the smsc/cma to properly check and disqualify itself in this scenario (I turned on smsc_base_verbose 1 so you can see the error/warning message):

shell$ docker run --rm --user 998:995 my-ompi-image mpirun -np 2 --mca smsc_base_verbose 1 /opt/hpc/local/nas/bin/lu.W.x
[3303c3eb7c1b:00012] mca_smsc_cma_module_get_endpoint: can not proceed. processes do not have the necessary permissions (i.e., CAP_SYS_PTRACE). PID 12 <-> 11 (rc = -1) (errno: 1: Operation not permitted)
[3303c3eb7c1b:00011] mca_smsc_cma_module_get_endpoint: can not proceed. processes do not have the necessary permissions (i.e., CAP_SYS_PTRACE). PID 11 <-> 12 (rc = -1) (errno: 1: Operation not permitted)


 NAS Parallel Benchmarks 3.4 -- LU Benchmark

 Size:   33x  33x  33  (class W)
 Iterations:  300
 Total number of processes:      2

 Time step    1
 Time step   20
 Time step   40
 Time step   60
 Time step   80
 ...

If we add the necessary capabilities then all is golden:

shell$ docker run --rm --cap-add SYS_PTRACE --user 998:995 my-ompi-image mpirun -np 2 --mca smsc_base_verbose 1 /opt/hpc/local/nas/bin/lu.W.x


 NAS Parallel Benchmarks 3.4 -- LU Benchmark

 Size:   33x  33x  33  (class W)
 Iterations:  300
 Total number of processes:      2

 Time step    1
 Time step   20
 Time step   40
 Time step   60
 Time step   80
...

@jjhursey
Copy link
Member Author

I'm going to add a configure test for the SYS_kcmp - so put a WIP on this for a moment.

@jjhursey
Copy link
Member Author

Ok. It's ready for review now.

 * If you run in an environment (e.g., container) where the `CAP_SYS_PTRACE`
   capability is not provided then the two processes will not be able to
   use `process_vm_readv`/`process_vm_writev` even if all of the other
   checks currently in the code pass.
   The result is errors when trying to call one of these two functions
   which is difficult for the called (i.e., `btl/sm`) to recover from.
 * Use the `kcmp` system call as a proxy for the `process_vm_readv`/`process_vm_writev`
   functions. `kcmp` is a lightweight check in the kernel and is sufficient
   to detect if the two processes have the necessary capabilities.
 * Refs
   * Capabilities : https://man7.org/linux/man-pages/man7/capabilities.7.html
   * `kcmp` : https://man7.org/linux/man-pages/man2/kcmp.2.html

Signed-off-by: Joshua Hursey <[email protected]>
@jjhursey jjhursey force-pushed the cma-check-cap-sys-ptrace branch from fc86bf2 to 56132aa Compare August 23, 2022 13:28
@jjhursey jjhursey merged commit d9d5c54 into open-mpi:main Aug 24, 2022
@jjhursey jjhursey deleted the cma-check-cap-sys-ptrace branch August 24, 2022 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants