Description
Issue description
I manage an application called orchestrator which talks to thousands of MySQL servers and is used to detect MySQL failures and if a master or intermediate master fails to re-arrange the replication topology of the cluster to ensure it can continue to be used. This system has a reasonably short connect timeout of 1 second.
The problem I see is that we get quite a lot of errors when connecting to "the database" prior to doing operations such as reconfiguring the server or reconfiguring from which other MySQL server it should replicate. The failed connect errors are the concern. See: https://jira.percona.com/browse/DISTMYSQL-261 for some context of the issue from the application side.
Orchestrator is currently using the v1.6.0 version of the MySQL driver. I'm aware there's a newer version v1.7.0 but as far as I can see there's no change in logic around the topic being discussed.
The error message itself is "extremely vague": driver: bad connection
does not indicate the actual problem and the logging does not indicate to which host:port this is actually happening and given the fact I'm continually polling a large number of MySQL servers identifying the source of the problem and the exact cause is actually quite important as orchestrator is intended to determine if the MySQL host is healthy or not, so identifying the reason for a connection failure is important.
The bad connection errors are not that frequent but do add up. The problem is it's hard with current code to identify the source of the issue OR the specific issue that is being caused in a single log line.
I see that currently the driver logging combines multiple different conditions under the same umbrella term errBadConnNoWrite
and in some cases it logs the error independently of the error returned to the caller.
It would seem better, given recent changes in go error handling, to extend the errBadConnNoWrite
errors returned to the caller into separate errors for each condition triggered so that applications can still detect this error with errors.Is(err, errBadConnNoWrite)
but by wrapping the specific error with the error picked up earlier in the code (within the driver) the full error can be returned to the caller and identified more completely.
Ideally I'd like the driver to report in the error something like the mc.cfg.Addr
value of the host being talked to. If that is not considered acceptable then it would be necessary for the caller to be adapted to record this information for all connections so it can be logged with the error received from the driver when it happens.
Summarising: the exact cause of driver: bad connection
errors is not clearly identified in the error returned to the caller.
I believe it would be good to return a more detailed error to clarify each of the different cases where it is returned more explicitly and suggest that users use errors.Is()
to check for errBadConnNoWrite
if existing functionality is required. If possible include the address of the host where this happens.
If there is a better way to identify the specific errors and the host to which they correspond then please share your thoughts.
Error log
See: https://jira.percona.com/browse/DISTMYSQL-261 for some details/examples
Configuration
Driver version (or git SHA):
- v1.6.0
*Go version:
- Not entirely sure. This is seen on CentOS 8 where I see the version pulled in by yum to be 1.18.4
- current
quay.io/centos/centos:stream8
docker images now show1.19.2
. - packages were built in Nov '22 so the version may be a bit older. I can find out if needed.
*Server version:
- MySQL 8.0, versions currently from 8.0.28-8.0.31
- MySQL 5.7.40 or so
*Server OS:
- CentOS 8, x86_64