Skip to content

Explicit handling of nested splits in SplitState #3903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Explicit handling of nested splits in SplitState #3903

wants to merge 1 commit into from

Conversation

hpoettker
Copy link
Contributor

Fixes #3857 and allows to control the maximal number of concurrently running steps with the thread pool size of the TaskExecutor.

Nested splits are explicitly handled in SplitState and do not require dedicated threads. The added test stalls if executed against main.

@benas Do you see a chance to merge this PR (with requested changes, of course) such that it is included in Spring Batch 5.0?

@fmbenhassine
Copy link
Contributor

Hi @hpoettker, thank you for your PR!

@benas Do you see a chance to merge this PR (with requested changes, of course) such that it is included in Spring Batch 5.0?

I did not check this PR and its corresponding bug report in details yet, but it seems that it could introduce a small breaking change. So yes, this could be addressed in v5. However, I'm not sure it will be in v5.0 as we are planning to focus on major changes first. This is a good candidate for a minor release of the v5 line (v5.1, v5.2, etc). Please follow the updates on this issue to see in which release it will be addressed. Thank you.

@hpoettker
Copy link
Contributor Author

Thanks for the update! I'd also consider the PR a breaking change in behavior although I assume that no one knowingly relies on the current behavior as it is rather unexpected.

Just for completeness: The fix would also solve a problem in the Data Flow's Composed Task Runner which actively uses nested splits and the thread pool size of a ThreadPoolTaskExecutor to control concurreny: spring-cloud/spring-cloud-dataflow#4279

@hpoettker
Copy link
Contributor Author

I've resolved some merge conflicts and bumped the year in the copyright header.

It would be great to have open discussion about the PR and the issue that it addresses. I think it's closely linked to the deprecation of the throttle limit, i.e. #2218. Concurrency of batch steps should be fully controllable by using only a suitable TaskExecutor. This is currently not possible if nested splits are involved as explained in the conversation of #3857.

Input from the Spring Cloud Dataflow team would be particularly interesting as the Composed Task Runner would benefit greatly from this change, in my opinion.

@hpoettker
Copy link
Contributor Author

This PR is now more than 2 years old but I'm convinced more than ever that it addresses an important issue. I think that the concept of the chunk-oriented step in Spring Batch is extremely valuable. But it can only reach its full potential if it allows to scale batch jobs in a straightforward manner by "just" adding a thread pool. The removal of the throttle limit is the most important part of this story. But concurrency control for split jobs follows right after that in my opinion.

This is a good candidate for a minor release of the v5 line (v5.1, v5.2, etc).

As we're already closer in time to the release of 5.1 than that of 5.0, I'd say that this is now!

Please share your ideas on this topic. I'd be happy to implement any change or test that is required.

@fmbenhassine
Copy link
Contributor

Yes, this PR can now be merged. I will plan it for 5.1.

Please share your ideas on this topic

Intra-step concurrency or step-level concurrency (ie processing items concurrently within the same step) is different than inter-step / job-level concurrency (ie processing entire steps concurrently or in parallel within a flow). Those two types of concurrency are at different levels and should not be related in any way.

This PR addresses inter-step concurrency, and I don't think the deprecation of throttle-limit is related to the handling of nested splits. The only case I see a possible entanglement is probably when the same task executor is used to run both step flows and concurrent threads of the same step. The throttle-limit is really an implementation detail of the TaskExecutorRepeatTemplate. If I provide a custom RepeatTemplate that does not use a throttle-limit, I am not sure I would solve #3857 anyway.

I correctly read your comment that The removal of the throttle limit is the most important part of this story. But concurrency control for split jobs follows right after that in my opinion and I agree. I just want to make clear that those are two different, unrelated issues.

This PR addresses #3857 and will be merged in 5.1. However, the deprecation/removal of throttle-limit will be reviewed as part of #3950.

@fmbenhassine
Copy link
Contributor

Rebased and merged as 58a4134. Thanks again for yet another great contribution!

@hpoettker hpoettker deleted the nested-splits branch July 21, 2023 17:05
@hpoettker
Copy link
Contributor Author

@fmbenhassine Thanks for sharing your insight and clearly delineating the distinction between job-level and step-level concurrency. It's really appreciated!

The throttle limit is indeed not directly related to this job-level concurreny issue. They are only connected in the broader sense that both are about controlling the maximum degree of concurrency in a job/step.

From a user perspective, I think it is nice if I can reliably control concurrency by passing task executors with the desired maximal thread pool size. For the job level, this didn't work before this change as jobs experienced liveness failure when thread pools were too small or executed a far smaller number of steps in parallel than the maximal thread pool size. The ideal behavior is of course that there are no liveness failures in any case and that as many steps are executed in parallel as there are threads assigned to this purpose (and where the flow of the job allows it).

The same should be true for the step-level analogously. Here the problem is currently more that it's not enough to provide a large thread pool if I want a high degree of concurrency because I still have to set the throttle limit or a custom RepeatTemplate. But this is of course addressed by the deprecation of the throttle limit.

The plans for the step level in #3950 are much more ambitious but also very welcome. Is there anything the community could do to assist?

@hpoettker
Copy link
Contributor Author

hpoettker commented Sep 3, 2023

I'm wondering whether my reasoning about concurrency control above through thread pools still makes sense in the age of virtual threads. It may very well not! The threads for the job level don't do much work at all and the threads for the step level are usually much more IO bound than CPU bound.

Thinking along this line could lead to the conclusion that the throttle limit should not be removed. It should rather be introduced for the job level too. Then I could use newVirtualThreadPerTaskExecutor for both levels but still have caps to not overload any systems.

But despite the lively discussion in #1008, I'm not sure that the overhead of platform threads is a real-life problem for batch work loads. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SplitBuilder.add(Flow) causes hung execution in some cases
2 participants