Skip to content

net/http: HTTP/2 with MaxConnsPerHost hangs or crashes #34941

Closed
@rhysh

Description

@rhysh

What version of Go are you using (go version)?

$ go1.13 version
go version go1.13.1 darwin/amd64
$ go-tip version
go version devel +f6c624a22a Wed Oct 16 15:58:33 2019 +0000 darwin/amd64
$ go1.12 version
go version go1.12.10 darwin/amd64
$ go1.11 version
go version go1.11.13 darwin/amd64

Does this issue reproduce with the latest release?

Yes: The test hangs with Go 1.13. The test panics with tip.

What operating system and processor architecture are you using (go env)?

go env Output
$ go1.13 env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/rhys/Library/Caches/go-build"
GOENV="/Users/rhys/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY="*"
GONOSUMDB="*"
GOOS="darwin"
GOPATH="/Users/rhys/go"
GOPRIVATE="*"
GOPROXY="direct"
GOROOT="/Users/rhys/go/version/go1.13"
GOSUMDB="off"
GOTMPDIR=""
GOTOOLDIR="/Users/rhys/go/version/go1.13/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/49/zmds5zsn75z1283vtzxyfr5hj7yjq4/T/go-build216795429=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Using an http.Client with an http.Transport configured to use http/2 and with MaxConnsPerHost = 1, I started a large number of concurrent http/2 requests: more than the server would allow on a single TCP connection.

In the test environment, I did this by having the server delay its responses until I knew a large number of requests were either active on the connection or waiting for permission to use the connection, at which point I allowed the http.Handler to respond.

What did you expect to see?

I expected that every request I started would eventually be seen by the server, and that the client would receive the server's response.

What did you see instead?

In Go 1.11 where the default behavior is to use a single TCP connection for each http/2 authority, the requests queue up and once I allow the handler to return they are all processed. That's fine in this context.

In Go 1.12 where the http.Transport is correctly able to use more than one TCP connection for each http/2 authority (but doesn't yet know how to use MaxConnsPerHost), it passes all of the requests on to the server immediately. That's also fine in this context.

In Go 1.13, the server only sees the first 249 requests. The others appear stuck indefinitely. I see that as a bug.

In tip, the Transport panics. That's a second bug.


$ go1.13 test -v ./repro_test.go
=== RUN   TestMaxConns
2019/10/16 14:27:43 http: TLS handshake error from 127.0.0.1:50300: remote error: tls: bad certificate
--- FAIL: TestMaxConns (1.15s)
    repro_test.go:82: starts=249 finishes=0
    repro_test.go:82: starts=249 finishes=0
    repro_test.go:82: starts=249 finishes=0
    repro_test.go:82: starts=249 finishes=0
    repro_test.go:82: starts=249 finishes=0
    repro_test.go:82: starts=249 finishes=249
    repro_test.go:82: starts=249 finishes=249
    repro_test.go:82: starts=249 finishes=249
    repro_test.go:82: starts=249 finishes=249
    repro_test.go:82: starts=249 finishes=249
    repro_test.go:86: HTTP/2 requests started: 249 != 300
    repro_test.go:89: HTTP/2 requests completed: 249 != 300
FAIL
FAIL    command-line-arguments  2.287s
FAIL

$ go-tip test -v ./repro_test.go
=== RUN   TestMaxConns
2019/10/16 14:27:57 http: TLS handshake error from 127.0.0.1:50603: remote error: tls: bad certificate
panic: net/http: internal error: connCount underflow

goroutine 371 [running]:
net/http.(*Transport).decConnsPerHost(0xc0000d63c0, 0x0, 0x0, 0xc0000c6540, 0x5, 0xc0000160d0, 0xf, 0x0)
        /usr/local/go/src/net/http/transport.go:1318 +0x604
net/http.(*Transport).roundTrip(0xc0000d63c0, 0xc00024f200, 0x0, 0xc000626ac0, 0x100e518)
        /usr/local/go/src/net/http/transport.go:542 +0x762
net/http.(*Transport).RoundTrip(0xc0000d63c0, 0xc00024f200, 0xc0000d63c0, 0x0, 0x0)
        /usr/local/go/src/net/http/roundtrip.go:17 +0x35
net/http.send(0xc00024f200, 0x13bd4c0, 0xc0000d63c0, 0x0, 0x0, 0x0, 0xc000010338, 0x241ef00, 0x1, 0x0)
        /usr/local/go/src/net/http/client.go:252 +0x43e
net/http.(*Client).send(0xc000096e10, 0xc00024f200, 0x0, 0x0, 0x0, 0xc000010338, 0x0, 0x1, 0x20300000000000)
        /usr/local/go/src/net/http/client.go:176 +0xfa
net/http.(*Client).do(0xc000096e10, 0xc00024f200, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/http/client.go:692 +0x3c8
net/http.(*Client).Do(...)
        /usr/local/go/src/net/http/client.go:560
command-line-arguments.TestMaxConns.func2(0xc00040ca70, 0xc00009e2f8, 0xc00040e700, 0xc000590468, 0xc000086300)
        /tmp/repro_test.go:69 +0x1fb
created by command-line-arguments.TestMaxConns
        /tmp/repro_test.go:63 +0x451
FAIL    command-line-arguments  1.250s
FAIL

And for context, the passing results with Go 1.11 and Go 1.12:

$ go1.11 test -v ./repro_test.go
=== RUN   TestMaxConns
2019/10/16 14:27:50 http: TLS handshake error from 127.0.0.1:50307: remote error: tls: bad certificate
--- PASS: TestMaxConns (1.15s)
    repro_test.go:82: starts=155 finishes=0
    repro_test.go:82: starts=250 finishes=0
    repro_test.go:82: starts=250 finishes=0
    repro_test.go:82: starts=250 finishes=0
    repro_test.go:82: starts=250 finishes=0
    repro_test.go:82: starts=300 finishes=300
    repro_test.go:82: starts=300 finishes=300
    repro_test.go:82: starts=300 finishes=300
    repro_test.go:82: starts=300 finishes=300
    repro_test.go:82: starts=300 finishes=300
PASS
ok      command-line-arguments  2.281s
$ go1.12 test -v ./repro_test.go
=== RUN   TestMaxConns
2019/10/16 14:27:35 http: TLS handshake error from 127.0.0.1:49994: remote error: tls: bad certificate
--- PASS: TestMaxConns (1.17s)
    repro_test.go:82: starts=84 finishes=0
    repro_test.go:82: starts=174 finishes=0
    repro_test.go:82: starts=263 finishes=0
    repro_test.go:82: starts=300 finishes=0
    repro_test.go:82: starts=300 finishes=0
    repro_test.go:82: starts=300 finishes=300
    repro_test.go:82: starts=300 finishes=300
    repro_test.go:82: starts=300 finishes=300
    repro_test.go:82: starts=300 finishes=300
    repro_test.go:82: starts=300 finishes=300
PASS
ok      command-line-arguments  3.505s

package repro

import (
	"context"
	"crypto/tls"
	"net/http"
	"net/http/httptest"
	"sync"
	"sync/atomic"
	"testing"
	"time"
)

func TestMaxConns(t *testing.T) {
	totalRequests := 300

	allow := make(chan struct{})
	var (
		starts   int64
		finishes int64
	)
	h := func(w http.ResponseWriter, r *http.Request) {
		if !r.ProtoAtLeast(2, 0) {
			t.Errorf("Request is not http/2: %q", r.Proto)
			return
		}
		atomic.AddInt64(&starts, 1)
		<-allow
	}

	s := httptest.NewUnstartedServer(http.HandlerFunc(h))
	s.TLS = &tls.Config{
		NextProtos: []string{"h2"},
	}
	s.StartTLS()
	defer s.Close()

	transport := s.Client().Transport.(*http.Transport)
	clientConfig := transport.TLSClientConfig
	transport.TLSClientConfig = nil

	// make a request to trigger HTTP/2 autoconfiguration
	resp, err := s.Client().Get(s.URL)
	if err == nil {
		resp.Body.Close()
	}
	// now allow the client to connect to the ad-hoc test server
	transport.TLSClientConfig.RootCAs = clientConfig.RootCAs

	transport.MaxConnsPerHost = 1

	ctx := context.Background()
	ctx, cancel := context.WithCancel(ctx)
	defer cancel()

	var wg sync.WaitGroup
	for i := 0; i < totalRequests; i++ {
		req, err := http.NewRequest("GET", s.URL, nil)
		if err != nil {
			t.Fatalf("NewRequest: %s", err)
		}
		wg.Add(1)
		go func() {
			defer wg.Done()
			defer atomic.AddInt64(&finishes, 1)
			ctx, cancel := context.WithCancel(ctx)
			defer cancel()
			req = req.WithContext(ctx)
			resp, err := s.Client().Do(req)
			if err != nil {
				return
			}
			resp.Body.Close()
		}()
	}

	for i := 0; i < 10; i++ {
		if i == 5 {
			close(allow)
		}
		time.Sleep(100 * time.Millisecond)
		t.Logf("starts=%d finishes=%d", atomic.LoadInt64(&starts), atomic.LoadInt64(&finishes))
	}

	if have, want := atomic.LoadInt64(&starts), int64(totalRequests); have != want {
		t.Errorf("HTTP/2 requests started: %d != %d", have, want)
	}
	if have, want := atomic.LoadInt64(&finishes), int64(totalRequests); have != want {
		t.Errorf("HTTP/2 requests completed: %d != %d", have, want)
	}
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions