Skip to content

Conversation

@arjan-bal
Copy link
Contributor

@arjan-bal arjan-bal commented Feb 6, 2025

In #8031, the balancer.Balancer embedding in balancerWrapper was removed. This causes endpointsharding to panic when casting the EndpointMap value to a balancer.Balancer. This PR fixes the bug and adds a test.

RELEASE NOTES: None

@arjan-bal arjan-bal added Type: Bug Area: Resolvers/Balancers Includes LB policy & NR APIs, resolver/balancer/picker wrappers, LB policy impls and utilities. labels Feb 6, 2025
@arjan-bal arjan-bal requested review from dfawley and easwars February 6, 2025 14:25
@arjan-bal arjan-bal added this to the 1.71 Release milestone Feb 6, 2025
@arjan-bal arjan-bal changed the title endpointsharding: Cast value in EndpointMap to balancerWrapper instead of Balancer endpointsharding: Cast EndpointMap values to balancerWrapper instead of Balancer Feb 6, 2025
@arjan-bal arjan-bal changed the title endpointsharding: Cast EndpointMap values to balancerWrapper instead of Balancer endpointsharding: Cast EndpointMap values to *balancerWrapper instead of Balancer Feb 6, 2025
@codecov
Copy link

codecov bot commented Feb 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.38%. Comparing base (f227ba9) to head (a001dd9).
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8069      +/-   ##
==========================================
+ Coverage   82.25%   82.38%   +0.12%     
==========================================
  Files         387      387              
  Lines       39040    39065      +25     
==========================================
+ Hits        32114    32182      +68     
+ Misses       5593     5557      -36     
+ Partials     1333     1326       -7     
Files with missing lines Coverage Δ
balancer/endpointsharding/endpointsharding.go 89.20% <100.00%> (+11.80%) ⬆️

... and 30 files with indirect coverage changes

@arjan-bal arjan-bal changed the title endpointsharding: Cast EndpointMap values to *balancerWrapper instead of Balancer endpointsharding: cast EndpointMap values to *balancerWrapper instead of Balancer Feb 6, 2025

// Execute the resolver error path. pickfirst will ignore the resolver error
// as it has a good resolver state.
mr.ReportError(errors.New("test error"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests that the operation was a nop, which isn't really a good test of the functionality. I.e. the child could not even get the call and the test would still pass. I think it would be better to have a test that calls ResolverError before it's working and ensures that it results in pick errors / TF state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The children of enpointsharding wouldn't get created if the resolver doesn't produce a good list of endpoints first.
I can use a stub child balancer which modifies pickfirst's behaviour of ignoring resolver errors when it has a working state.

Copy link
Member

@dfawley dfawley Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Or you should be able to make it start failing (disconnect subchannel) and then produce a ResolverError. The errors from pick should change IIUC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stopped the backends, reported a resolver error and verified the RPC error message.

@easwars easwars self-requested a review February 6, 2025 16:07
@arjan-bal arjan-bal force-pushed the endpointsharding-fix-resolver-error branch from 6234f46 to 28b14ff Compare February 6, 2025 16:32
@arjan-bal arjan-bal force-pushed the endpointsharding-fix-resolver-error branch from 28b14ff to a96f1db Compare February 6, 2025 16:35
@arjan-bal arjan-bal requested a review from dfawley February 6, 2025 16:46
cc, err := grpc.NewClient(mr.Scheme()+":///", grpc.WithResolvers(mr), grpc.WithTransportCredentials(insecure.NewCredentials()))
dOpts := []grpc.DialOption{
grpc.WithResolvers(mr), grpc.WithTransportCredentials(insecure.NewCredentials()),
// Use a large backoff dealy to avoid the error picker being updated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/dealy/delay

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the typo.

// too quickly.
grpc.WithConnectParams(grpc.ConnectParams{
Backoff: backoff.Config{
BaseDelay: 100 * time.Second,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit/Optional: This could be 2*defaultTestTimeout instead of an arbitrarily large value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

for ; ctx.Err() == nil; <-time.After(time.Millisecond) {
_, err := client.EmptyCall(ctx, &testpb.Empty{})
if err == nil {
t.Fatalf("EmptyCall returned unexpected error: <nil>, want %q", "test error")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: instead of saying unexpected error nil, it could say succeeded when expected to fail with "test error"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the error message.

@arjan-bal arjan-bal merged commit 9afb49d into grpc:master Feb 7, 2025
15 checks passed
@arjan-bal arjan-bal deleted the endpointsharding-fix-resolver-error branch February 7, 2025 05:40
janardhanvissa pushed a commit to janardhanvissa/grpc-go that referenced this pull request Feb 15, 2025
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 7, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Area: Resolvers/Balancers Includes LB policy & NR APIs, resolver/balancer/picker wrappers, LB policy impls and utilities. Type: Bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants