Skip to content

Create a experimental HealthCheck GRPC Handler #6225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

alanprot
Copy link
Member

@alanprot alanprot commented Sep 20, 2024

What this PR does:
WIP/Proposal

Create a experimental HealthCheck GRPC Handler.

This health check will evaluate the targets and immediately terminate the request if a target is marked as unhealthy.

Immediately terminating the request when the target is unresponsive provides two major benefits:

  • Prevent overload on distributors
  • Avoid trying to call unhealthy SG and so, retry the request on a healthy one faster.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@alanprot alanprot force-pushed the health-check-circuit-breaker branch 5 times, most recently from 4f2808a to 93c84ca Compare September 24, 2024 05:12
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand the difference between this health check and the existing gRPC client pool https://github.com/cortexproject/cortex/blob/master/pkg/ring/client/pool.go#L60.

Can we just add the health check to the existing pool client? They seem similar functionality to me.

@alanprot
Copy link
Member Author

alanprot commented Oct 1, 2024

Can you help me understand the difference between this health check and the existing gRPC client pool https://github.com/cortexproject/cortex/blob/master/pkg/ring/client/pool.go#L60.

Can we just add the health check to the existing pool client? They seem similar functionality to me.

The pool only removes the client when the healthcheck fails, but create a new one right away when a new request come in.

func (p *Pool) GetClientFor(addr string) (PoolClient, error) {

This change here is "short circuiting" the requests if the healthcheck fails.

An alternative for this is to create a classic circuit breaker (ex: https://failsafe-go.dev/circuit-breaker/) but i think as a first step using a normal health check config would be simpler.

@alanprot alanprot force-pushed the health-check-circuit-breaker branch from 93c84ca to ad37a39 Compare October 1, 2024 18:33
Signed-off-by: alanprot <[email protected]>
Signed-off-by: alanprot <[email protected]>
@alanprot alanprot force-pushed the health-check-circuit-breaker branch from dc42443 to f886278 Compare October 8, 2024 19:39
@alanprot alanprot marked this pull request as ready for review October 8, 2024 19:39
Signed-off-by: alanprot <[email protected]>
@alanprot alanprot force-pushed the health-check-circuit-breaker branch from f886278 to 0ea9d5b Compare October 8, 2024 19:51
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 8, 2024
@alanprot alanprot merged commit e6e9fea into cortexproject:master Oct 8, 2024
16 checks passed
@alanprot alanprot mentioned this pull request Oct 10, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants