`os.proc.call`'s `timeout` has a termination race-condition from `SIGTERM` and `SIGKILL`

The current implementation of `os.proc.call`'s `timeout` flag uses two consecutive calls to `p.destroy()` and `p.forciblyDestroy()`. The effects of these calls are to first send `SIGTERM`, and then send `SIGKILL` to the process.

The roles of these two signals are:
* `SIGTERM`: instruct the process to terminate, the process may intercept this and perform necessary clean-up operations, or may decide to ignore it entirely
* `SIGKILL`: instruct the process to terminate immediately -- this signal cannot be intercepted.

By sending these two signals back-to-back, the parent process produces a race-condition between how quickly the child can execute its `SIGTERM` handler and clean up resources and the issue of the `SIGKILL`. In my experiments, I've found that `SIGKILL` it the cause of process exit the vast majority of the time. This means that the (potentially necessary) clean-up of the process is often not performed or worse interrupted. If the handler itself contained code to write file contents back to disk, modify a database, and so on, these operations may be corrupted. If the child process itself has children that need terminating, this could not be issued, leading to the parent process hanging. 

## What are the possible desired outcomes?
There are three ways that the timeout should be terminating the process:

1. Only send `SIGTERM`: it doesn't matter how long it takes, we need to ensure safe clean-up
2. Only send `SIGKILL`: the process has no important state, it should be terminated immediately
3. Send `SIGTERM`, wait an appropriate amount of time, then send `SIGKILL`: we want to offer the process an opportunity to clean-up, but if this takes too long (perhaps the clean-up process itself is hanging), we want to forcibly terminate -- this is the scenario done by `os.lib`, albeit without allowing sufficient time to perform the handler.

## What is normally done?
The `SIGKILL` signal is useful to issue when a process is not responding in a timely fashion to its `SIGTERM` event and the two are usually sent together with a _delay_. The Linux `timeout` command offers this with the `-k n` flag, which sends a `SIGKILL`  signal `n` seconds after the original timeout sent `SIGTERM`.

## Solutions
The race condition caused by the consecutive calls to `destroy` and `forciblyDestroy` is a bug, and could be addressed by supporting a similar system allowing outcomes (1), (2), or (3) configurably and safely.

For backwards compatibility, however, it might be wise to just support (1) with the current system.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

`os.proc.call`'s `timeout` has a termination race-condition from `SIGTERM` and `SIGKILL` #284

What are the possible desired outcomes?

What is normally done?

Solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

os.proc.call's timeout has a termination race-condition from SIGTERM and SIGKILL #284

Description

What are the possible desired outcomes?

What is normally done?

Solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`os.proc.call`'s `timeout` has a termination race-condition from `SIGTERM` and `SIGKILL` #284