-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Description
Please answer these questions before submitting your issue.
What version of gRPC are you using?
v1.4.0
What version of Go are you using (go version
)?
1.8
What operating system (Linux, Windows, …) and version?
Linux, Ubuntu 14.04.5 LTS
What did you do?
If possible, provide a recipe for reproducing the error.
Our server serves @ 1.5 million RPM. We get 503s from our network proxy (envoy https://github.com/lyft/envoy) on a GracefulStop
of the GRPC server.
We have a hot restart mechanism where we use SO_REUSEPORT
to start a new server and drain the old server. The new server starts up fine and starts handling requests while clients of the old server on reports 503 (as the server is doing a graceful stop)
According to Matt Klein at Lyft @mattklein123 , the below could be a potential issue:
There is a race condition inherent with GOAWAY and http/2. Basically, the GOAWAY can cross with new streams being sent. Those streams would then be reset by the server that sent GOAWAY. There is a workaround that people use (which Envoy does) which I'm sure Go is not doing. That workaround is basically to send 2 GOAWAY frames with a delay between them. The first GOAWAY has last stream ID set to max stream ID, after a delay, a real GOAWAY is sent.
What did you expect to see?
We expected to see a clean draining of the requests and no 5xx
What did you see instead?
503s from the client.