Skip to content

os/exec: add fields for managing termination signals and pipes #50436

Closed
@bcmills

Description

@bcmills

Background

#23019 (accepted but not yet implemented; CC @ianlancetaylor @bradfitz) proposed to change exec.Cmd.Wait to stop the goroutines that are copying I/O to and from a completed exec.Cmd; see that proposal for further background on the problem it aims to address. However, as noted in #23019 (comment) and #23019 (comment), any feasible implementation of the proposal requires the use of an arbitrary timeout, and the proposal does not include a mechanism to adjust that timeout. (Given our history with the Go project's builders, I am extremely skeptical that any particular hard-coded timeout can strike an appropriate balance between robustness and latency.)

#31774, #22757, and #21135 proposed to allow users of exec.CommandContext to customize the signal sent to the command when the context is canceled. They were all declined due to lack of concrete demand for the feature (#21135 (comment), #22757 (comment), #31774 (comment)). We have since accrued a number of copies of functions that work around the feature's absence. In the Go project alone, we have:

I'm attempting to add yet another variation (in CL 373005) in order to help diagnose #50014. However, for this variation (prompted by discussions with @aclements and @prattmic) I have tried to make this variation a minimally-invasive change on top of the exec.Cmd API.

I believe I have achieved that goal: the API requires the addition of only 2–3 new fields and no new methods or top-level functions. You can view (and try out) a prototype as github.com/bcmills/more/os/moreexec, which provides a drop-in replacement for a subset of the exec.Cmd API.

Proposal

I propose the addition of the following fields to the exec.Cmd struct, along with their corresponding implementation:

	// Context is the context that controls the lifetime of the command
	// (typically the one passed to CommandContext).
	Context context.Context

	// If Interrupt is non-nil, Context must also be non-nil and Interrupt will be
	// sent to the child process when Context is done.
	//
	// If the command exits with a success code after the Interrupt signal has
	// been sent, Wait and similar methods will return Context.Err()
	// instead of nil.
	//
	// If the Interrupt signal is not supported on the current platform
	// (for example, if it is os.Interrupt on Windows), Start may fail
	// (and return a non-nil error).
	Interrupt os.Signal

	// If WaitDelay is non-zero, the command's I/O pipes will be closed after
	// WaitDelay has elapsed after either the command's process has exited or
	// (if Context is non-nil) Context is done, whichever occurs first.
	// If the command's process is still running after WaitDelay has elapsed,
	// it will be terminated with os.Kill before the pipes are closed.
	//
	// If the command exits with a success code after pipes are closed due to
	// WaitDelay and no Interrupt signal has been sent, Wait and similar methods
	// will return ErrWaitDelay instead of nil.
	//
	// If WaitDelay is zero (the default), I/O pipes will be read until EOF,
	// which might not occur until orphaned subprocesses of the command have
	// also closed their descriptors for the pipes.
	WaitDelay time.Duration

The new Context field is exported only in order to simplify the documentation for the Interrupt and WaitDelay fields. (It was requested and rejected in #46699, but the objection there was my own — due to concerns about the interactions with the API in this proposal. It could be excised from this proposal without damaging anything but documentation clarity.)

The new Interrupt field sets the signal to be sent when the Context is done. exec.CommandContext explicitly sets it to os.Kill in order to maintain the existing behavior of exec.CommandContext, but I expect many users on Unix platforms will want to set it to os.Interrupt or syscall.SIGQUIT instead.

The new WaitDelay field sets the interval to wait for input and output after process termination or an interrupt signal. That interval turns out to be important for many testing applications (such as the Go Playground implementation and the cmd/go test suite). It also generalizes nicely to the use-cases in #23019: setting WaitDelay without Context provides bounded I/O wait times without sending a preceding signal.

Compatibility

I believe that this proposal is entirely backward-compatible (in contrast with #23019). The zero-values for the new fields provide exactly the same behavior as a Cmd returned by exec.Command or exec.CommandContext today.

Caveats

This proposal does not address graceful shutdown on Windows (#22757 (comment); CC @mvdan). However, it may be possible to extend it to do so by providing special-case Windows behavior when the Interrupt field is set to os.Interrupt, or by adding an InterruptFunc func(*Cmd) callback that would also be invoked when Context is done.

The proposed API also does not provide a mechanism to send an Interrupt signal followed by os.Kill after a delay but still wait for subprocesses to close all I/O pipes. I believe the use-cases for that scenario are sufficiently niche to be provided only by third-party libraries: sending SIGKILL to the parent process makes it likely that subprocesses will not know to shut down, so in the vast majority of cases users should either not send SIGKILL at all (WaitDelay == 0), forcibly terminate the pipes to try to kill the subprocesses with SIGPIPE (WaitDelay > 0), or do something platform-specific to try to forcibly shut down an entire process group (outside the scope of this proposal).

Alternatives considered

In #31774 (comment), @bradfitz suggested a field Kill func(*os.Process), which would presumably be added instead of the Interrupt field in this proposal. However, I believe that such a field would be simultaneously too complex and not powerful enough:

  • The Kill field would be too complex for most Unix applications, which overwhelmingly only need to send one of SIGTERM, SIGINT, SIGQUIT, or SIGKILL — why pass a whole callback when you really just want to say which signal you need?

  • A *os.Process callback would still not be powerful enough for Windows applications. If I understand the discussion in os: Interrupt is not sendable on Windows #6720 correctly (CC @alexbrainman), CTRL_BREAK_EVENT is sent to an entire process group, not a single *os.Process, so Windows users would also need a mechanism for creating (or determining) such a group, or some completely separate out-of-band way to request that the process terminate (such as by sending it a particular input or IPC message).

Given the above, the Interrupt field seems more ergonomic: it gives the right behavior for Unix users, and if Windows users want to do something more complex they can set Interrupt to nil and start a separate goroutine in between the calls to (*Cmd).Start and (*Cmd).Wait to implement whatever custom logic they want.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions