-
Notifications
You must be signed in to change notification settings - Fork 62
fix leaking connections when user client closes connection #220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix leaking connections when user client closes connection #220
Conversation
Signed-off-by: Teo Koon Peng <[email protected]>
…ctions Signed-off-by: Teo Koon Peng <[email protected]>
Signed-off-by: Teo Koon Peng <[email protected]>
Signed-off-by: Teo Koon Peng <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is a really great change! And also a helpful commit message. I will add you to the organization so you can push to the googlecloudrobotics/core
repo as otherwise the presubmits will fail with a permissions error.
} | ||
} | ||
})) | ||
defer func() { ts.Close() }() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be defer ts.Close()
if err := r.start(backendAddress); err != nil { | ||
t.Fatal("failed to start relay: ", err) | ||
} | ||
defer func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just do defer r.stop()
here. As far as I can tell, t.Fatal() inside a deferred function is not handled well: golang/go#29207
// StopRelayRequest forgets a relaying request, this causes the next chunk from the backend | ||
// with the relay id to not be recognized, resulting in the relay server returning an error. | ||
func (r *broker) StopRelayRequest(requestId string) { | ||
delete(r.resp, requestId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please hold the mutex while accessing the map, Go maps are not thread-safe.
@@ -384,7 +396,7 @@ func (c *Client) postResponse(remote *http.Client, br *pb.HttpResponse) error { | |||
return fmt.Errorf("couldn't read relay server's response body: %v", err) | |||
} | |||
if resp.StatusCode != http.StatusOK { | |||
err := fmt.Errorf("relay server responded %s: %s", http.StatusText(resp.StatusCode), body) | |||
err := NewRelayServerError(fmt.Sprintf("relay server responded %s: %s", http.StatusText(resp.StatusCode), body)) | |||
if resp.StatusCode == http.StatusBadRequest { | |||
// http-relay-server may have restarted during the request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extend this comment to say "or the client cancelled the request."
@@ -643,8 +655,11 @@ func (c *Client) handleRequest(remote *http.Client, local *http.Client, pbreq *p | |||
log.Printf("[%s] Failed to post response to relay: %v", *resp.Id, err) | |||
}, | |||
) | |||
if _, ok := err.(*backoff.PermanentError); ok { | |||
if _, ok := err.(*RelayServerError); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a change in behavior: Before, only HTTP 400 would terminate the loop, now anything other than HTTP 200 terminates the loop. For example, if the nginx ingress is overloaded and returns 500s for a while, this will now terminate the request. I think this is a good thing: There is no reason for us to drop a chunk from the middle of the response body, and it's better to terminate the request.
I think you could go even further: any error returned from RetryNotify indicates that we aren't sure we've posted the response, and that we should stop this loop rather than incorrectly continuing with the next response chunk. WDYT?
Please also delete the next comment which is no longer true (this could be a "transient" 5xx error that lasted too long). Maybe // The relay server was unreachable for too long, so we dropped the chunk and should abort the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good, silently dropping chunks will cause clients to receive corrupted data, it breaks the normal assumption of http/tcp connections.
Moved to #222 |
Thanks for the comments, I updated the other PR with the suggestions here. |
Same as #220, but ported to this repo. --- When the user client closes the connection, it is not propagated to the backend. For simple "one shot" requests, there is no huge impact as the response would just be dropped, however for long running requests (chunked encoding or websockets), the backend doesn't know that the user client has closed the connection and it keeps sending new updates to the relay client, which also doesn't know and so keep forwarding that to the relay server. The fix applied is: * Add `StopRelayRequest` to the broker to "forget" a relaying request. This will cause the relay server to response with a permanent error the next time the relay client tries to send a response for that id. This will in turn cause the relay client to close the connection to the backend. * There is a bug in the relay client when checking for backoff permanent errors. When the backoff operation encounters a permanent error, it actually unwraps it and return the underlying error, but the client is still checking for `backoff.Permanent`, resulting in the check to always fail and never handling it. --------- Signed-off-by: Teo Koon Peng <[email protected]>
When the user client closes the connection, it is not propagated to the backend. For simple "one shot" requests, there is no huge impact as the response would just be dropped, however for long running requests (chunked encoding or websockets), the backend doesn't know that the user client has closed the connection and it keeps sending new updates to the relay client, which also doesn't know and so keep forwarding that to the relay server.
The fix applied is:
StopRelayRequest
to the broker to "forget" a relaying request. This will cause the relay server to response with a permanent error the next time the relay client tries to send a response for that id. This will in turn cause the relay client to close the connection to the backend.backoff.Permanent
, resulting in the check to always fail and never handling it.