-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Revisit the chunk-oriented processing model implementation #3950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, is there any plan to revisit this any time soon? In particular, I am looking at #1101 |
@bwgjoseph This is in progress. The PoC of the new implementation was announced in v5.1 last year: https://docs.spring.io/spring-batch/reference/whatsnew.html#new-chunk-oriented-step-implementation. If you want to help, I would be grateful if you give the experimental feature a try and share your feedback (not here, but on the experimental repo's issue tracker as explained here). |
It's been awhile, let me try to see if there's anything that I can try out. No promises. Thanks! |
I've checked, and what's useful for my team is addressing fault-tolerance, but is not the case for the current implementation as what was described in the repo.
|
Problem statement
The current implementation of the chunk-oriented processing model works in most cases, but has several issues related to transaction management and fault-tolerance [1] as well as concurrency [2].
Transaction management and fault-tolerance
I believe the main reason for that is the current structure of the code that uses multiple nested repeat/retry/transaction callbacks as summarized in #1189 (comment). When unfolded, the code looks like the following (this is a copy/paste of the current code, but unfolded):
This structure makes the implementation complex, hard to test and to think about, which limits the maintainability of the code in the long term.
Concurrency model
The current approach to concurrency with a "parallel iteration" concept (based on
TaskExecutorRepeatTemplate
,ResultHolderResultQueue
,RepeatSynchronizationManager
,TransactionSynchronizationManager
, etc) is not friendly to concurrent executions as it requires a lot of state synchronization at different levels (Step level with a semaphore, chunk level with ThreadLocal for execution contexts, and item level with locks). This results in several issues likemaxItemCount
not being honored in a multi-threaded step, inconsistent state when a transaction is rolled-back leading to optimistic locking issues, throttling issues, poor performance, etc [2].Possible solution
The goal here is to analyse and refactor the internal implementation of the chunk-oriented processing model with minimal changes to the current behaviour. I believe using a
for
loop (similar to the pseudo-code mentioned in the docs) coupled with an implementation of the producer-consumer pattern for concurrency would make the code easier to think about, test and maintain in the long term. As a side note, several open source, JSR-352 compliant implementations do use afor
loop to implement this chunk-oriented model, so the current code structure and concurrency approach could be reviewed without compromising the correctness of the end result.Execution plan suggestion
In order to minimize behavioural changes, this issue should be addressed as follows:
ChunkOrientedStep
class that implementsStep
(ideally not based onTasklet
step). This class should not deal with any fault-tolerance featuresFaultTolerantChunkOrientedStep
class that extendsChunkOrientedStep
to add fault-tolerance featuresIt would not be possible to change the current builders to accept the old and new implementations at the same time. Therefore, we would need to introduce the new implementation as experimental in v5 and phase the old implementation out over time. The new implementation can be registered directly in job definitions without using the builders.
References:
[1]: Transaction management / Fault tolerance issues
[2]: Concurrency issues
The text was updated successfully, but these errors were encountered: