Enable TCP Keepalives on TCP connections to MySQL #194
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We recently took had a production outage when we had an Amazon RDS MySQL instance lock up with many requests pending. We rebooted it, and it came up on a new IP, but we had to restart our Go apps to bring the site back up, which had hit their (*DB).SetMaxOpenConns() limit with outstanding requests. The go-mysql-driver doesn't enable TCP SO_KEEPALIVE and doesn't have any concept of client-side timeouts, so it would've waited forever for a response from a server that doesn't exist any more.
This patch enables the TCP SO_KEEPALIVE option after connecting to MySQL via TCP, which allows the kernel to notice when the server has dropped off of the network for an extended period (the Linux default is 2 hours), and then throw a TCP read error. Without this option set, a TCP connection will never timeout on its own.
For comparison, the mysql-connector in C (libmysql) unconditionally enables TCP keepalives at sql-common/client.c line 3834.
This also adds a DSN option "keepalivePeriod", which is passed to (*TCPConn).SetKeepAlivePeriod(), to change how often the kernel sends a keepalive packet to the server, soliciting a response.
I tried to use the existing style as much as possible, ran gofmt, updated the tests, and updated the documentation. Let me know I missed anything. Thanks!
-- Aaron