Skip to content

os: StartProcess clearing O_NONBLOCK can lead to goroutine hang on Linux #43894

@lmb

Description

@lmb

What version of Go are you using (go version)?

go version go1.15.7 linux/amd64

Does this issue reproduce with the latest release?

Yes

What did you do?

func TestHang(t *testing.T) {
	c, err := net.Dial("udp", "127.0.0.1:0")
	if err != nil {
		t.Fatal(err)
	}
	defer c.Close()

	f, err := c.(*net.UDPConn).File()
	if err != nil {
		t.Fatal(err)
	}
	defer f.Close()

	cat, err := os.StartProcess("/bin/cat", []string{"/bin/cat"}, &os.ProcAttr{
		Files: []*os.File{os.Stdin, os.Stdout, os.Stderr, f},
	})
	if err != nil {
		t.Fatal(err)
	}
	defer cat.Wait()
	defer cat.Kill()

	go func() {
		time.Sleep(time.Second)
		t.Log("Closing")
		c.Close()
	}()

	buf := make([]byte, 1)
	// This read never returns.
	t.Log(c.Read(buf))
}

What did you expect to see?

The read on c returns after the goroutine calls Close(). The same problem exists for other syscalls like accept.

What did you see instead?

The read on c blocks indefinitely. The root cause is that os.StartProcess calls f.Fd() which in turn calls poll.FD.SetBlocking. That method does the following:

  1. Clear O_NONBLOCK from the open file descriptor flags
  2. Set fd.isBlocking to true

The problem in the example I've given is that O_NONBLOCK is shared by both f and c, but only f.isBlocking is ever set to true. The call to c.Close() sees that c.isBlocking == 0 and therefore blocks while trying to acquire runtime_Semacquire(&fd.csema) in poll.FD.Close. The same interaction is possible when using exec.Cmd.ExtraFiles since that just ends up calling os.StartProcess.

I think there are two related problems here:

  1. Clearing O_NONBLOCK from an fd can hang the go runtime.
  2. StartProcess clears O_NONBLOCK.

This behaviour is the root cause of a bug I fixed my library: cloudflare/tableflip@cae714b I was aware that the runtime clears O_NONBLOCK when calling File.Fd() while writing the library, and tried my best to avoid the problem and still got stung by this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsFixThe path to resolution is known, but the work has not been done.OS-Linux

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions