Skip to content

net: race where Dialer.DialContext returns connections with altered write deadlines #16523

@dsnet

Description

@dsnet

Consider the following test which starts a serial chain of reverse proxies and then slams the chain with many concurrent requests.

const (
    // Adjust these two values to control frequency of i/o error occurrence.
    // numConcurrentRequests seems to have greater effect on my system.
    numProxyChain = 20
    numConcurrentRequests = 20

    port = 8080
    topPort = port + numProxyChain
)

func main() {
    // Start a chain of reverse proxies.
    go http.ListenAndServe(fmt.Sprintf("0.0.0.0:%d", port), http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        if _, err := w.Write([]byte("the quick brown fox jumped over the lazy dog")); err != nil {
            panic(err)
        }
    }))
    for i := port + 1; i <= topPort; i++ {
        url, _ := url.Parse(fmt.Sprintf("http://localhost:%d", i-1))
        rp := httputil.NewSingleHostReverseProxy(url)
        rp.Transport = &http.Transport{
            TLSClientConfig: &tls.Config{InsecureSkipVerify: true}, // This is crucial. Don't know why!
        }
        go http.ListenAndServe(fmt.Sprintf("0.0.0.0:%d", i), rp)
    }

    var fail, total uint64
    go func() {
        // Stall until servers are ready.
        for {
            if _, err := http.Get(fmt.Sprintf("http://localhost:%d", topPort)); err == nil {
                break
            }
        }

        // Slam the servers!
        for {
            var wg sync.WaitGroup
            for i := 0; i < numConcurrentRequests; i++ {
                wg.Add(1)
                go func() {
                    defer wg.Done()
                    atomic.AddUint64(&total, 1)
                    now := time.Now()
                    resp, err := http.Get(fmt.Sprintf("http://localhost:%d", topPort))
                    d := time.Now().Sub(now)
                    if err != nil {
                        atomic.AddUint64(&fail, 1)
                        fmt.Println(err, d)
                        return
                    }
                    if resp.Status != "200 OK" {
                        atomic.AddUint64(&fail, 1)
                        fmt.Println(resp.Status, d)
                    }
                    resp.Body.Close()
                }()
            }
            wg.Wait()
        }
    }()

    time.Sleep(5 * time.Second)
    nf, nt := atomic.LoadUint64(&fail), atomic.LoadUint64(&total)
    fmt.Printf("Failure rate %0.2f%% (%d of %d)\n", float64(nf)/float64(nt)*100, nf, nt)
    if nf > 0 {
        os.Exit(1)
    }
}

Running this test on go1.7rc3 produces several failed GET requests:

...
2016/07/27 22:41:32 http: proxy error: write tcp 127.0.0.1:54671->127.0.0.1:8080: i/o timeout
502 Bad Gateway 41.756424ms
2016/07/27 22:41:33 http: proxy error: write tcp 127.0.0.1:58984->127.0.0.1:8081: i/o timeout
502 Bad Gateway 29.08107ms
2016/07/27 22:41:33 http: proxy error: write tcp 127.0.0.1:59777->127.0.0.1:8080: i/o timeout
502 Bad Gateway 35.419027ms
Failure rate 0.70% (19 of 2720)

While running this test on go1.6.2 produces no failed GET requests:

Failure rate 0.00% (0 of 2920)

I don't believe it is proper behavior to fail with an i/o timeout error after only ~40ms of real time. Something is causing the connection to timeout too early. Git bisect indicates the source of this issue as 1518d43.

This is a regression from go1.6.2

/CC @bradfitz @broady @adg

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions