Skip to content

sql/pgwire: previously-idle connection blocks for ~15 minutes before returning successfully #13823

@mberhault

Description

@mberhault

This could be anything from libpq, TLS connection code, or the server side.

cockroach sha: 019eeb7 (beta-20170216)
libpq sha: ba5d4f7a35561e22fbdf7a39aa0070f4d460cfc0 (latest as of now)

The shorturl example creates a sql.DB object at startup and reuses it forever.
The load is incredibly rare, some at startup (create table if not exists and a few inserts/updates), then only upon requests to https://crdb.io.

When no requests have been issued for a while (seemingly just a few minutes), the first request issued will block for ~15 minutes, then finally return successfully, and with the right statements results.

The client goroutine profile shows it sitting in:

goroutine 14 [IO wait, 14 minutes]:
net.runtime_pollWait(0x7f34966b5dc8, 0x72, 0x3)
	/home/marc/go/src/runtime/netpoll.go:160 +0x59
net.(*pollDesc).wait(0xc42004e450, 0x72, 0xc420166f90, 0xc420018130)
	/home/marc/go/src/net/fd_poll_runtime.go:73 +0x38
net.(*pollDesc).waitRead(0xc42004e450, 0x93dfe0, 0xc420018130)
	/home/marc/go/src/net/fd_poll_runtime.go:78 +0x34
net.(*netFD).Read(0xc42004e3f0, 0xc420181400, 0x400, 0x400, 0x0, 0x93dfe0, 0xc420018130)
	/home/marc/go/src/net/fd_unix.go:243 +0x1a1
net.(*conn).Read(0xc42014a020, 0xc420181400, 0x400, 0x400, 0x0, 0x0, 0x0)
	/home/marc/go/src/net/net.go:173 +0x70
crypto/tls.(*block).readFromUntil(0xc420131710, 0x7f3496679138, 0xc42014a020, 0x5, 0xc42014a020, 0x8bfe0d044af9e97f)
	/home/marc/go/src/crypto/tls/conn.go:476 +0x91
crypto/tls.(*Conn).readRecord(0xc420161180, 0x7e3d17, 0xc420161288, 0x0)
	/home/marc/go/src/crypto/tls/conn.go:578 +0xc4
crypto/tls.(*Conn).Read(0xc420161180, 0xc42024b000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/home/marc/go/src/crypto/tls/conn.go:1113 +0x116
bufio.(*Reader).fill(0xc420058cc0)
	/home/marc/go/src/bufio/bufio.go:97 +0x10c
bufio.(*Reader).Read(0xc420058cc0, 0xc4200d90a0, 0x5, 0x200, 0x2, 0x7e3050, 0x0)
	/home/marc/go/src/bufio/bufio.go:209 +0x1bc
io.ReadAtLeast(0x93baa0, 0xc420058cc0, 0xc4200d90a0, 0x5, 0x200, 0x5, 0xc42016a0c0, 0x20, 0x18)
	/home/marc/go/src/io/io.go:307 +0xa4
io.ReadFull(0x93baa0, 0xc420058cc0, 0xc4200d90a0, 0x5, 0x200, 0x45a4f0, 0xc420024800, 0x7f34967114b0)
	/home/marc/go/src/io/io.go:325 +0x58
github.com/lib/pq.(*conn).recvMessage(0xc4200d9080, 0xc42016a0c0, 0x733720, 0x1, 0x97dc10)
	/home/marc/cockroach/src/github.com/lib/pq/conn.go:957 +0x117
github.com/lib/pq.(*conn).recv1Buf(0xc4200d9080, 0xc42016a0c0, 0x0)
	/home/marc/cockroach/src/github.com/lib/pq/conn.go:1007 +0x39
github.com/lib/pq.(*conn).recv1(0xc4200d9080, 0x49579a, 0xc420161180)
	/home/marc/cockroach/src/github.com/lib/pq/conn.go:1028 +0x87
github.com/lib/pq.(*conn).readParseResponse(0xc4200d9080)
	/home/marc/cockroach/src/github.com/lib/pq/conn.go:1574 +0x2f
github.com/lib/pq.(*conn).prepareTo(0xc4200d9080, 0x7c6dc9, 0x6e, 0x0, 0x0, 0x3ca)
	/home/marc/cockroach/src/github.com/lib/pq/conn.go:784 +0x63c
github.com/lib/pq.(*conn).query(0xc4200d9080, 0x7c6dc9, 0x6e, 0xc4201440b0, 0x1, 0x1, 0x0, 0x0, 0x0)
	/home/marc/cockroach/src/github.com/lib/pq/conn.go:855 +0x3ae
github.com/lib/pq.(*conn).Query(0xc4200d9080, 0x7c6dc9, 0x6e, 0xc4201440b0, 0x1, 0x1, 0x1, 0x0, 0x0, 0x10)
	/home/marc/cockroach/src/github.com/lib/pq/conn.go:827 +0x64
database/sql.(*DB).queryConn(0xc420105810, 0xc4201487e0, 0xc4201440a0, 0x7c6dc9, 0x6e, 0xc420167a60, 0x1, 0x1, 0xc420144090, 0x0, ...)
	/home/marc/go/src/database/sql/sql.go:1092 +0x5d7
database/sql.(*DB).query(0xc420105810, 0x7c6dc9, 0x6e, 0xc420167a60, 0x1, 0x1, 0x1, 0xc420010901, 0xc420144090, 0xc4201679d8)
	/home/marc/go/src/database/sql/sql.go:1079 +0xf0
database/sql.(*DB).Query(0xc420105810, 0x7c6dc9, 0x6e, 0xc420167a60, 0x1, 0x1, 0x1, 0x9500000000000000, 0x5)
	/home/marc/go/src/database/sql/sql.go:1062 +0x90
main.getShortysByOwner(0xc42016a0a0, 0x16, 0x0, 0x0, 0x0, 0x0, 0x0)
	/home/marc/cockroach/src/github.com/cockroachdb/examples-go/shorturl/db.go:148 +0x181
main.handleNew(0x9408e0, 0xc4201540d0, 0xc42023a780)
	/home/marc/cockroach/src/github.com/cockroachdb/examples-go/shorturl/handlers.go:92 +0xac
main.handleSettings(0x9408e0, 0xc4201540d0, 0xc42023a780)
	/home/marc/cockroach/src/github.com/cockroachdb/examples-go/shorturl/handlers.go:73 +0x249
main.handleRoot(0x9408e0, 0xc4201540d0, 0xc42023a780)
	/home/marc/cockroach/src/github.com/cockroachdb/examples-go/shorturl/handlers.go:41 +0xff
net/http.HandlerFunc.ServeHTTP(0x7e3458, 0x9408e0, 0xc4201540d0, 0xc42023a780)
	/home/marc/go/src/net/http/server.go:1726 +0x44
net/http.(*ServeMux).ServeHTTP(0x961380, 0x9408e0, 0xc4201540d0, 0xc42023a780)
	/home/marc/go/src/net/http/server.go:2022 +0x7f
net/http.serverHandler.ServeHTTP(0xc42013d000, 0x9408e0, 0xc4201540d0, 0xc42023a780)
	/home/marc/go/src/net/http/server.go:2202 +0x7d
net/http.(*conn).serve(0xc420072c00, 0x9410e0, 0xc420011000)
	/home/marc/go/src/net/http/server.go:1579 +0x4b7
created by net/http.(*Server).Serve
	/home/marc/go/src/net/http/server.go:2293 +0x44d

oauth_proxy log:

206.252.208.94 - [email protected] [27/Feb/2017:18:16:33 +0000] crdb.io GET 127.0.0.1:8080 "/" HTTP/1.1 "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36" 200 1426 935.473

Showing 935s before successful response.
Strangely enough, slow responses always seem to be 935.X seconds. No others were found

Chrome shows:
req_details

During that time, netstat shows a single active tcp connection from the client to the cockroach cluster:

cockroach@shared-0001:~$ netstat -n|grep 26257
tcp        0    160 192.168.1.4:59236       40.70.216.11:26257      ESTABLISHED

However, that one cockroach node (cockroach-catrina-0003) shows many:

cockroach@cockroach-catrina-0003:~$ netstat -n|grep 13\.77
tcp6       0      0 192.168.1.6:26257       13.77.108.155:42468     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:35464     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:42916     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:57832     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:54214     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:59562     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:49342     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:59236     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:55074     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:58614     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:53834     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:33986     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:41164     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:53384     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:59656     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:58912     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:53778     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:35144     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:55374     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:55528     ESTABLISHED
tcp6       0      0 192.168.1.6:26257       13.77.108.155:57758     ESTABLISHED

No particular messages in the cockroach node logs.
/debug/events shows nothing out of the ordinary, the time spent in sql is reasonable:

2017/02/27 18:09:29.402223	3220.552906	13.77.108.155:59236
#   github.com/cockroachdb/cockroach/pkg/util/log.WithEventLog			/go/src/github.com/cockroachdb/cockroach/pkg/util/log/trace.go:66
#   github.com/cockroachdb/cockroach/pkg/sql.NewSession				/go/src/github.com/cockroachdb/cockroach/pkg/sql/session.go:153
#   github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*v3Conn).setupSession	/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/v3.go:309
#   github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*v3Conn).serve		/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/v3.go:333
#   github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*Server).ServeConn		/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/server.go:415
#   github.com/cockroachdb/cockroach/pkg/server.(*Server).Start.func8.1		/go/src/github.com/cockroachdb/cockroach/pkg/server/server.go:536
#   github.com/cockroachdb/cockroach/pkg/util/netutil.(*Server).ServeWith.func1	/go/src/github.com/cockroachdb/cockroach/pkg/util/netutil/net.go:137
18:09:29.402821	 .   598	... [client=13.77.108.155:59236,user=shorty,n3] session: requesting 10240 bytes from the pool
18:09:29.402830	 .     9	... [client=13.77.108.155:59236,user=shorty,n3] preparing: SELECT id, url, public, date_added FROM urls WHERE added_by = $1 AND reserved = false ORDER BY date_added DESC
18:09:29.429146	 . 26315	... [client=13.77.108.155:59236,user=shorty,n3] client.Txn did AutoCommit. err: <nil> txn: "unnamed" id=90a53b47 key=/Table/11/1/51/1/2017-02-27T18:13:42.112225Z/3/0 rw=true pri=0.01511914 iso=SERIALIZABLE stat=COMMITTED epo=0 ts=1488218969.418401338,0 orig=1488218969.418401338,0 max=1488218969.918401338,0 wto=false rop=false
18:09:29.430185	 .  1040	... [client=13.77.108.155:59236,user=shorty,n3] SELECT id, url, public, date_added FROM urls WHERE (added_by = $1) AND (reserved = false) ORDER BY date_added DESC
18:09:29.438038	 .  7853	... [client=13.77.108.155:59236,user=shorty,n3] SELECT 1 done

However, the number of sql connections reported by cockroach is continuously increasing, even long after the client has been restarted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions