Skip to content

Delay introduced in GotConn breaks CATS #316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
maxmoehl opened this issue Apr 13, 2023 · 2 comments · Fixed by cloudfoundry/gorouter#342
Closed

Delay introduced in GotConn breaks CATS #316

maxmoehl opened this issue Apr 13, 2023 · 2 comments · Fixed by cloudfoundry/gorouter#342

Comments

@maxmoehl
Copy link
Member

After upgrading routing-release to v0.262.0, which includes cloudfoundry/gorouter#337, CATS break with the error message endpoint_failure (readLoopPeekFailLocked: %!w(<nil>)) [1].

This has been raised on the community slack.

[1] full access log:

2023-04-12T14:24:21.37+0000 [RTR/1] OUT CATS-3-APP-cff0de4ce830eef9.cf.hermione.env.wg-ard.ci.cloudfoundry.org - [2023-04-12T14:24:21.366723458Z] "GET / HTTP/1.1" 502 0 67 "-" "curl/7.64.0" "10.0.3.16:5338" "10.0.32.16:61006" x_forwarded_for:"35.246.186.6, 10.0.3.16" x_forwarded_proto:"http" vcap_request_id:"6b99f92c-85ba-492e-46ae-7947c03f05e6" response_time:0.010030 gorouter_time:0.000251 app_id:"49a15a31-4836-4091-9c7b-79bd143dc945" app_index:"0" instance_id:"a5413569-5b9d-4485-4b62-2c4f" x_cf_routererror:"endpoint_failure (readLoopPeekFailLocked: %!w(<nil>))" x_b3_traceid:"2dd9d057f4e6584c6d2120b753f059ba" x_b3_spanid:"6d2120b753f059ba" x_b3_parentspanid:"-" b3:"2dd9d057f4e6584c6d2120b753f059ba-6d2120b753f059ba"
@maxmoehl
Copy link
Member Author

I'll prepare a PR to remove the delay but I would like to have a second opinion here. Since @domdom82 is on vacation, maybe @geofffranks can take a look at this?

maxmoehl added a commit to sap-contributions/gorouter that referenced this issue Apr 13, 2023
The delay causes a race condition in the go transport that results in a
502 Bad Gateway with:
  `endpoint_failure (readLoopPeekFailLocked: %!w(<nil>))`.

This happens because the transport peeks the first few bytes on the
connection and gets some data even though it doesn't expect any. This
causes it to go into an error state even though there is no error
resulting in the formatting directive to break.

This commit removes the delay and adds a note why we can't do this for
now. This will reduce the amount of requests we can retry because the
client will send data before we know that the connection is good. After
we sent _some_ data we can't be sure that the server hasn't started
processing, hence no retry in such cases.

See: https://cloudfoundry.slack.com/archives/C033ALST37V/p1680888356483179
See: golang/go#31259
Resolves: cloudfoundry/routing-release#316
geofffranks pushed a commit to cloudfoundry/gorouter that referenced this issue Apr 17, 2023
The delay causes a race condition in the go transport that results in a
502 Bad Gateway with:
  `endpoint_failure (readLoopPeekFailLocked: %!w(<nil>))`.

This happens because the transport peeks the first few bytes on the
connection and gets some data even though it doesn't expect any. This
causes it to go into an error state even though there is no error
resulting in the formatting directive to break.

This commit removes the delay and adds a note why we can't do this for
now. This will reduce the amount of requests we can retry because the
client will send data before we know that the connection is good. After
we sent _some_ data we can't be sure that the server hasn't started
processing, hence no retry in such cases.

See: https://cloudfoundry.slack.com/archives/C033ALST37V/p1680888356483179
See: golang/go#31259
Resolves: cloudfoundry/routing-release#316
@geofffranks
Copy link
Contributor

Turns out i was on vacation too :)

domdom82 pushed a commit to domdom82/routing-release that referenced this issue Jul 12, 2023
This allows modifying and expanding the CAs in a development using ops-files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants