Skip to content

Conversation

jahstreet
Copy link
Contributor

@jahstreet jahstreet commented Mar 17, 2025

In #1896 we added a feature to prefer the order of the remote DCs on the x-DC failovers. As described in #1896 (comment), the default is misleading to errors. With this tiny change I propose to default to empty list i/o containing a single empty string value.

cc @nitinitt

@jahstreet jahstreet changed the title Default to empty DC failover preferred remoted DCs for map based config Default to empty DC failover preferred remoted DCs Mar 17, 2025
@jahstreet jahstreet force-pushed the default-to-empty-failover-preferred-dcs branch from 8e1836e to 1872c81 Compare March 17, 2025 08:19
@jahstreet
Copy link
Contributor Author

jahstreet commented Jun 16, 2025

Edited the commit with reformatted code. Also rebased while there.

@jahstreet jahstreet force-pushed the default-to-empty-failover-preferred-dcs branch from 1872c81 to 2f420ab Compare June 16, 2025 08:06
@netudima
Copy link

is it possible to add a unit test to cover this case?

@jahstreet
Copy link
Contributor Author

jahstreet commented Jun 21, 2025

Yes, sure. Should be easy.

@jahstreet
Copy link
Contributor Author

Now that I've checked, I cannot remember back the exact error I observed during my testing and also seems like putting non existing DCs to the preferredRemoteDcs config is kinda makes it no-op and narrows down to behave as its is unset. So this change is needed only for the cosmetic-perfectioninsm purposes (which is optional).

I came to that conclusion after attempting to build a test to reproduce the problem.

I will close this MR until finding back what was causing the issues. Sry for the noise.

@jahstreet jahstreet closed this Jun 23, 2025
@absurdfarce
Copy link
Contributor

So I can demonstrate pretty conclusively that without this patch we are getting a non-empty list for preferred remote DCs in the default case:

@Test
public void default_preferred_dcs_should_be_empty() {

   CqlSession session = new CqlSessionBuilder().build();
   DriverExecutionProfile profile = session.getContext().getConfig().getDefaultProfile();
   assertThat(profile.getStringList(DefaultDriverOption.LOAD_BALANCING_DC_FAILOVER_PREFERRED_REMOTE_DCS)).isEmpty();
}
java.lang.AssertionError: 
Expecting empty but was: [""]

The fix provided by @jahstreet in this PR does indeed make this problem go away (as expected).

That said, I agree with @jahstreet that it's not immediately clear to me that there's a problem here. The fact that the preferred DC list is always non-empty puts us into buildRemoteQueryPlanPreferred() here which means that we'll prefer nodes from a DC whose name is the empty string first in this method. But we won't find any nodes in that DC so... we basically wind up looping through the existing nodes anyway, which leaves us with something very much like what we'd get out of buildRemoteQueryPlanAll().

All of that said, the analysis above is entirely dependent on the current impl of these functions; that could easily change and a change in buildRemoteQueryPlanPreferred() might cause unexpected side effects if it's still being used when we don't expect it to be. I agree with @jahstreet that this is hardly the most significant problem we're facing but it seems to me like (a) we do have a pretty clear error in the code in this PR and (b) we have a very simple fix (in the form of this PR) for it. So my inclination is to say let's merge it.

@netudima is right to ask about a test for this case but honestly I'm not sure how we'd do that. You could try a variation on BasicLoadBalancingPolicyPreferredRemoteDcsTest with a different impl of createAndInitPolicy() which sets up the preferred remote DCs list to be either empty or (probably better) equal to the default. But here you hit a problem based on the behaviour mentioned above; because buildRemoteQueryPlanPreferred() and buildRemoteQueryPlanAll() should produce similar results here there's no way to tell which function was actually used. I suppose there's an argument that it doesn't matter; our only interest is in making sure that the expected value is returned in this case and we don't care which of those two functions actually produces it... as long as we can confirm we get the right output. I guess I'd be okay with that... but it seems like we'd need a pretty good comment (at least) saying what the pitfall is and why we're doing it.

@jahstreet
Copy link
Contributor Author

Thanks @absurdfarce for summarizing. Your comments are always detailed enough while kept straight to the point. I need to learn doing it alike from you!

My initial motivation to do that change was a weird routing bug which I observed in my remote environment causing empty query plan to be returned, which I mistakenly thought was caused by this default setting, but most probably it was something different. I wasn't able to remember and reproduce it back while coming up with the test case.

For now, I see this problem as a Magic String problem, not more than that.

Happy to push it through though if we all agree on it.

wrt the comment - can it be as simple as "Default preferred remote DC config to empty list"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants