-
Notifications
You must be signed in to change notification settings - Fork 888
Support Session suspend and resume #2034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 4.x
Are you sure you want to change the base?
Conversation
d679372
to
7379588
Compare
integration-tests/src/test/java/com/datastax/oss/driver/core/session/SuspendIT.java
Outdated
Show resolved
Hide resolved
If an application wants to use OpenJDK CRaC it must terminate all connections to nodes before checkpoint. Here we expose a high-level API in SessionLifecycleManager without relying on CRaC itself.
7379588
to
701df2e
Compare
Hi, is there anything I could improve about this PR? Are the CI failures known issues or is it something I could address? |
If the purpose of CRaC is to take a consistent snapshot of the system, then how is it persisting memory state of the JVM process? I am thinking about all in-flight requests that client application may be waiting for the response. My initial thought is that we should wait for all of them to complete before considering Cassandra part suspended. |
@lukasz-antoniak The necessity of waiting for in-flight requests depends on the usecase. Checkpoint is certainly a disruptive operation, and we expect that this is executed when the application is in a quiescent state - for example after removing the node from load balancer. In other cases the checkpoint is executed in the staging environment after a build and warmup load - though in this case we might need a way to reconfigure (add) nodes. |
There are two main areas I wanted to understand better:
Regarding (1) above, you mentioned this on the mailing list:
I’m concerned that restoring a driver session from a checkpoint (rather than close + re-create) could be a source for hard-to-track bugs, due to stale topology metadata, in-progress queue state, etc. Users would also be limited in where they could restore their checkpoints, since driver internal state is dependent on the local datacenter, for example. But if a restored session re-creates connections, then that’s likely going to dominate start-up time and make the gains of CRaC less visible. How are you thinking about this trade-off? |
I don't claim any familiarity with the current work on CRaC but I have some concerns here that aren't different from what @lukasz-antoniak and @aratno are talking about above. The question with in-flight requests isn't simply whether the driver will retry them; we use counts of in-flight requests as a proxy for the load on a node in the default load balancing policy. The disconnect between the (potentially very stale) data represented by in-flight requests and snapshot time vs. the current state of a system could very easily lead to some strange LBP operations. You'd almost be better off somehow capturing the driver state after establishing a control connection but before any individual connections are established... but that brings you into the areas @aratno was referring to (or at least I think he was). The driver gathers a fair amount of metadata information when it establishes a control connection to the cluster; if we include this data in any kind of snapshot state it would have to be revalidated when we (re)connect anyway. But if we're already doing such a revalidation is there... a benefit to pre-loading it in the first place? So much of what the control connection does is built up from information it gets from the server.. it just seems hard to imagine how much of it could be safely cached in a way that would give you clear performance wins. I also haven't spent any time looking at what Project Leyden is doing but it does seem like AOT classloading might be a reasonable way to get some level of performance gain without delving too far into the handling of information we get from the cluster. I'm very interested to hear how these approaches compare in your thinking @rvansa. |
This is a valid concern. I would expect that stale metadata shouldn't affect correctness (distributed applications should tolerate that). Regrettably I don't have enough insight into Cassandra to speak more concretely - I am roughly basing my expectations on Infinispan as I've spent couple of years developing that in the past.
The setup of connections is dominated by network latency, and with a local datacenter that means milliseconds or lower tens if multiple roundtrips are required for the handshake. Compare that to overall startup time in seconds for small application, and sometimes minutes for legacy leviathans. Anecdotally speaking, CRaC can restore app from, say 200 MB image in 50-100 ms, if we're talking 200 GB apps this goes to ~5 seconds. |
@absurdfarce Shouldn't these statistics that affect load balancing be reset when you force the nodes down? If that bit is missing, could you point me to parts of code that I should adjust, or ideally a test that validates a behaviour that uses those? Normally I would try to keep PR minimal, but if this would be severe problem I can try to address that. I think that the discussion is a bit vague when we don't have data to back up the claims. But we're touching something @aratno mentioned above - CRaC needs all parts of the application to be ready. The application might not be based around the Cassandra driver, that could be just a small part of its interface to a larger system. The driver setup might not be performance bottleneck at all, so maybe we're not saving so much on this front, but enable savings in a completely different part of the application. That's why my focus in here would be correctness, not performance. |
This PR is somewhat based on discussions on https://lists.apache.org/thread/9sms1sk8fd739mp7699wrbj0vnd0kzd1
If an application wants to use OpenJDK CRaC it must terminate all connections to nodes before checkpoint. Here we expose a high-level API in SessionLifecycleManager without relying on CRaC itself. Also it is pretty clear that it does not pose any risk to application that won't use this API.
Our current goal is to support CRaC checkpoint in Spring Boot application. My first attempt in spring-projects/spring-boot#44505 was declined because the way Cassandra Java Driver was accessed seemed too low-level for Spring Boot; I expect that with the API this PR introduces the Spring Boot integration could just invoke the methods without relying on driver internals.
I expect that in the future
SessionLifecycleManager
could expose methods for hinting the driver if all nodes died and the driver has to reconnect to a completely new node.