Skip to content

[CRaC] DefaultLifecycleProcessor and stopping beans on first refresh #34510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rvansa opened this issue Feb 27, 2025 · 8 comments
Closed

[CRaC] DefaultLifecycleProcessor and stopping beans on first refresh #34510

rvansa opened this issue Feb 27, 2025 · 8 comments
Labels
status: declined A suggestion or change that we don't feel we should currently apply

Comments

@rvansa
Copy link

rvansa commented Feb 27, 2025

Hi,

I started hacking a solution that would close Cassandra connections on checkpoint, following [1] and registering a Lifecycle that would let the Cassandra session 'suspend'. The checkpoint and restore works in my prototype when I issue the checkpoint via jcmd <pid> JDK.checkpoint, but if I try automatic checkpoint through -Dspring.context.checkpoint=onRefresh the Lifecycle.stop() is not called: at this point DefaultLifecycleProcessor.running is false and stopForRestart() is not invoked.

I would like to ask what's the reason and if there's any better pattern; [2] explains that the connections are established too early (before first refresh). Is that a show-stopper? We can close those connections and keep going.

An alternative solution would be to not use the Lifecycle and register as a org.crac.Resource; however I understand that it's preferred to integrate the handling into Spring using that rather than directly.

CC @sdeleuze

[1] https://github.com/spring-projects/spring-boot/blob/main/spring-boot-project/spring-boot-autoconfigure/src/main/java/org/springframework/boot/autoconfigure/jdbc/DataSourceCheckpointRestoreConfiguration.java
[2] spring-projects/spring-data-cassandra#1486

@jhoeller
Copy link
Contributor

Generally, Lifecycle.stop() is only invoked for components that received a Lifecycle.start() callback before, i.e. only what got started actually needs to be stopped. As long as the activity that you intend to stop in stop() only gets started in start(), you should be fine.

This is also the pattern followed for an onRefresh checkpoint: We aim to not even start the components there yet, in order to avoid the need for explicit stopping to begin with. Only for later checkpoints at runtime, there is an actual need to stop components.

For core framework components, we explicitly have all such activity in start() methods rather than regular init methods for that reason. In that sense, init methods are mirrored by destroy methods (matching bean instance creation/destruction), while start methods are mirrored by stop methods (for dynamic activities that can get started/stopped/restarted in the same bean instances).

@rvansa
Copy link
Author

rvansa commented Feb 28, 2025

@jhoeller Thanks for those answers! The symmetry makes sense - could you make any suggestion for integration of components that don't have a proper lifecycle and do startup as part of bean instantiation? I've checked Cassandra code and the CqlSession (DefaultSession) starts the connections from constructor eagerly, while it serves as a bean, too, so it's instantiated in order to be injected. Not sure about proxy options here. I might be able to override a handful of components to enforce a lazy start, but that feels rather intrusive.

I've placed a few breakpoints into the starting application and it seems that DefaultLifecycleProcessor.start() is not invoked at all (.refresh() is invoked from ApplicationContext <- SpringApplication.run), it becomes running by the refresh, only the SmartLifecycle beans get started as part of the refresh. So effectively this is breaking the concept that a component has to be started to become running.

I've implemented the lifecycle with running = true from the start, which is probably a hack. However since component stopping already checks if Lifecycle.isRunning(), it sounds safe to ignore the status of DLP itself (just a little excessive to go through the beans to see that, in the general case, none is started).

@sdeleuze
Copy link
Contributor

@rvansa Is the CqlSession exposed as a Spring bean by Spring Data/Boot ?

@sdeleuze sdeleuze added the status: waiting-for-feedback We need additional information before we can continue label Feb 28, 2025
@rvansa
Copy link
Author

rvansa commented Mar 3, 2025

@sdeleuze Yes, it is produced by CassandraAutoConfiguration:

        @Bean
	@ConditionalOnMissingBean
	@Lazy
	public CqlSession cassandraSession(CqlSessionBuilder cqlSessionBuilder)

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 3, 2025
@sdeleuze
Copy link
Contributor

sdeleuze commented Mar 5, 2025

If am not sure we have something actionable on Spring Framework side. I mean, the choice to expose directly beans using a third-party class which open a connection very early at constructor level is not a good fit with training run use cases. I see 2 potential ways to solve this.

Either at Spring Boot level (see related issue above) or Spring Data level, the CqlSession is wrapped into a proper lifecycle bean (which would have side effects).

Or alternatively, on Cassandra side you can either ask them a way to not create the connection eagerly in the constructor, or ask them to introduce an org.crac.Resource internally.

@rvansa Are you fine with us closing this issue?

@sdeleuze sdeleuze added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Mar 5, 2025
@rvansa
Copy link
Author

rvansa commented Mar 6, 2025

the CqlSession is wrapped into a proper lifecycle bean (which would have side effects).

Would that mean that application code requesting CqlSession to be injected would receive only a proxy that instantiates the actual DefaultSession lazily on first use (which would be actually forced after refresh through lifecycle)?

Or alternatively, on Cassandra side you can either ask them a way to not create the connection eagerly in the constructor

That would be a significant change in behaviour, IMO requiring major release. And it's not up to the fail-fast principle.

or ask them to introduce an org.crac.Resource internally

Technically I think this would be the best solution. However from my experience projects are not too keen on accepting an external dependency; maintainers often consider that a job of integrating frameworks like Spring.

A middle-ground solution is exposing .suspend()/.resume() methods that would hide the access to internal components, and wrapping invocation through the SB lifecycle. That requires more cooperation and more time, but could satisfy both sides.

@sdeleuze So it depends whether you want to offer CRaC support, and whether it should be available in weeks/months/years.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 6, 2025
@rvansa
Copy link
Author

rvansa commented Mar 6, 2025

I've posted to https://lists.apache.org/thread/9sms1sk8fd739mp7699wrbj0vnd0kzd1 to see what is Cassandra community view on this.

@sdeleuze
Copy link
Contributor

sdeleuze commented Mar 7, 2025

I think our strategy is to provide a reasonable support for CRaC when we can, but we don't want to invest too much trying to fight against either by design limitations of the technology or lack of support in libraries.

JVM AOT cache from Project Leyden is going to provide gradually caracteristics close to what Project CRaC provides today (even if we won't reach exactly the same level of startup time reduction) with less compatibility issues on all OS. Since we have to chose where we put most of our time and energy, I think that AOT cache is a better investment.

@mp911de has explained some Cassandra level limitations in the Spring Boot issue that has been declined, and I am also going to decline this one.

@sdeleuze sdeleuze added status: declined A suggestion or change that we don't feel we should currently apply and removed status: waiting-for-triage An issue we've not yet triaged or decided on status: feedback-provided Feedback has been provided labels Mar 7, 2025
@sdeleuze sdeleuze closed this as not planned Won't fix, can't repro, duplicate, stale Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: declined A suggestion or change that we don't feel we should currently apply
Projects
None yet
Development

No branches or pull requests

4 participants