Cu 8699qbr8e Multiprocessing empty batches #35
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Turns out the current multiprocessing pipeline had a few issues.
It would work fine if number of process wasn't specified (since it would do them in sequence).
However, if the number of processes specified was 2 (or greater) processing an empty list of texts would fail. This was becaue the assumption of the method was that work would be distributed accross all workers, and subsequently there's a future for every worked. But in case of an empty input there were no futures.
Furthermore, if the number of processes was 3 (or greater), passing a small list of input texts also often resulted in an exception being raised. This is because with 2 processes, only 1 external worker is spawned, and it gets some texts to manage, and subsequently leaves nothing for the main process (which is fine). But if there's 3 processes, only 1 of the 2 extra processes gets any work. And as such, the assumption of the number of futures available to be waited for was incorrect.
This PR fixes the above issues.
It also adds a few simple tests to make sure they remain fixed.
NOTE: I ran the new tests without the change in this PR and they did in fact error out (as was expected due to the above).