Skip to content

Clients get 503s on server GracefulStop under heavy load. #1387

@gitsen

Description

@gitsen

Please answer these questions before submitting your issue.

What version of gRPC are you using?

v1.4.0

What version of Go are you using (go version)?

1.8

What operating system (Linux, Windows, …) and version?

Linux, Ubuntu 14.04.5 LTS

What did you do?

If possible, provide a recipe for reproducing the error.
Our server serves @ 1.5 million RPM. We get 503s from our network proxy (envoy https://github.com/lyft/envoy) on a GracefulStop of the GRPC server.
We have a hot restart mechanism where we use SO_REUSEPORT to start a new server and drain the old server. The new server starts up fine and starts handling requests while clients of the old server on reports 503 (as the server is doing a graceful stop)
According to Matt Klein at Lyft @mattklein123 , the below could be a potential issue:
There is a race condition inherent with GOAWAY and http/2. Basically, the GOAWAY can cross with new streams being sent. Those streams would then be reset by the server that sent GOAWAY. There is a workaround that people use (which Envoy does) which I'm sure Go is not doing. That workaround is basically to send 2 GOAWAY frames with a delay between them. The first GOAWAY has last stream ID set to max stream ID, after a delay, a real GOAWAY is sent.

What did you expect to see?

We expected to see a clean draining of the requests and no 5xx

What did you see instead?

503s from the client.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions