-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Remove ability to run OCaml code in a Unix signal handler #1107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This behaviour does not happen when using the systhreads library, I think. See [caml_thread_try_leave_blocking_section]. I agree that we shouldn't be running OCaml code from signal handlers. |
|
Before going on, please read some background: The original motivation for running nontrivial code from signal handlers (as opposed to just recording the pending signal and polling it later) was to implement Hence the idea to run the Caml signal handler straight from the Unix signal handler if we know we are blocked in a system call. (That's the origin of the "blocking section" concept that you see in I knew about the reentrancy issues with signal handlers, but reasoned that if the program is really blocked inside a system call, it is not running C library code. This is fragile and less and less true, as we tend to run more and more code between Nowadays, if we know for sure that the old BSD behavior is dead and all Unix variants implement the POSIX behavior, we could rely on signal handling to terminate blocking system calls with EINTR error. Then Note that in this approach we can't use the |
|
@xavierleroy Except there is always a race between any check for pending signals and the start of a system call such as One way of mitigating this is via the "pipe trick", which I think @stedolan is implementing at the moment. |
|
There is a somewhat complementary suggestion to handle |
|
@mshinwell: yes, the race condition exists, or maybe there are two races conditions. With proper atomic instructions to check and clear pending signals, the effect of the race is to delay the handling of the signal to after the syscall returned normally. Without atomic instructions there is also a possibility of losing the signal altogether. I was thinking of special-casing signal handlers that just raise an exception, as in the |
|
@mshinwell: any pointer to the "pipe trick" ? |
|
@xavierleroy I think the pipe trick refers to creating a pipe and have the signal handler write the signal number to the pipe. Then adding it to any select/poll/epoll calls will abort the call early, the signal number can be read from the pipe and the ocaml signal handler can be called. Under linux ther is also signalfd() for this. Not sure how that helps with read/write calls. |
|
As a side note: I looked at the python source and it seems to simply ignore the race that a signal happens right before the syscall. The signal will be recorded and the signal handler gets called when the syscall returns normally. Using a pipe or signalfd avoids this for select/poll/epoll but I have no idea how to avoid this race for e.g. open(). |
|
👍 for the pipe trick. That and Linux's signalfd seem to be two of the only reasonable ways to handle signals that could possibly interrupt code that you don't control. (The third is to write to a global of type |
|
You can't atomically read and compare the variable and then call a syscall like open() if unset. The signal could still arrive between the read and syscall. I think the only thing that can be used there is setjmp/longjump. |
|
If the "pipe trick" means wrapping system calls with |
|
I think you have to use setjmp/longjmp for the general case |
|
Poking around Mantis was enlightening (particularly PR#3659). I think the simplest example worth discussing is raising an exception from a signal handler. I see three possible means of doing it:
1 and 2 have essentially unfixable race conditions, although those of 2 are much less serious (delaying processing of a signal rather than ignoring it or corrupting data). 3 is the most robust, but requires more significant code changes and doesn't apply to arbitrary code. The race in 2 is as @mshinwell pointed out, although as @xavierleroy noted it only causes a delay, not a loss of a signal. The race in 1 is worse: a signal may arrive just after I think the best option is to go for option 2 by default, possibly in combination with 3 in the cases where it's easy. This means that blocking sections will stop being interruptible, and it will become the responsibility of C code to handle EINTR by leaving and re-entering the blocking section before retrying. That means that blocking C code not written with care for signal handling will by default block pending signals until the next If that sounds like a sensible solution, I'll open a pull request to that effect.
Right, if only async signal-safe code runs inside blocking sections, then we're on firmer ground (in particular, the issue I report here about interrupting
The self-pipe trick is indeed the trick of wrapping syscalls with In principle, unlike |
|
@stedolan: in the specific case of a signal handler that just raises an exception, I don't see the race condition in your solution #1. I agree that it obviously doesn't work for signal handlers that return normally. Concerning |
|
@xavierleroy Race in option #1: you setjmp() and check the result. If it's not the first time then there was a signal and you handle it. If it's the first time you run your normal code. The signal handler calls longjmp(). Now what happens if your code does "retval = read(fd, &buf, size);" and just when read returned a signal hits? The result of read is still in the return register 0 and not yet saved in the retval variable. The signal handler gets called which then calls longjmp(), loosing the result from read in the process. How is the signal handler supposed to know it must not longjmp() right then? |
The race is when
You're right, this sounds unpleasant. I'd initially thought it would be easy enough to just change the channel code in |
|
@mrvn : your scenario is not the one I asked about. (Raising a Caml exception from the signal handler.) @stedolan : that's a good example indeed, but I suspect solution #2 is vulnerable as well if |
That is indeed the case, and it's the cause of MPR#7503. I'm working on a patch to change this, which should make option 2 viable. |
|
In option #2 when you leave early you clean up to a consisten state first. So for the channel case you would never leave between the read call and adding the offset. The problem with option #2 is that the signal can happen just before the syscall. In that case the signal is recorded but you still call the syscall and it does not return with EINTR right away. Most cases it will finish normaly and only then the ocaml signal handler can run. Worst case the syscall blocks forever. I think for option #2 to be viable all syscalls must be changed so they use a variant that never blocks too long and then loop. E.g. for select that means always using a timeout even if -1 was specified. Unfortunatly I don't think all of them have one. For things like open() we should probably just ignore the verry small chance that it can get stuck with a pending signal recorded. If you do want open() to be aborted by signals then don't set the alarm so close to the call that it might be missed or use a repeating signal. |
|
Superseded by #1128 |
Co-authored-by: cuihtlauac <[email protected]>
(This branch is for discussion, not immediate merging. It contains a breaking change)
If a signal arrives during a blocking section, the runtime will try to execute the signal handler (arbitrary OCaml code) directly inside the Unix signal handling context. This is unsafe, and I'd like to see it removed, but first I'd like to understand the uses of this feature.
The issue is that only a very small whitelisted set of functions can safely be called from a Unix signal handler. In particular, it is not safe to call
mallocfrom a Unix signal handler (malloccan safely be called from multiple threads, but if a signal handler interruptsmallocand itself callsmalloc, then per-thread datastructures will be corrupted).It is essentially impossible to know that a piece of OCaml code does not call malloc. Obviously, code which allocates may call
malloc. However, merely writing a reference may callmalloc, if theref_tableoverflows. Writing an immediate value likeNoneinto a reference may callmalloc, if the value that was previously there needs to be darkened and the grey stack overflows. Even throwing an exception may callmallocto allocate space for the backtrace.So, the current approach is fundamentally unsafe. Is it relied upon?