Skip to content

Simplify concurrent.futures.process code by using itertools.batched() #114221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 27, 2024

Conversation

NewUserHa
Copy link
Contributor

Advantages:

  • use new standard library, to encourage users to also use stdlibs.
d=list(range(50000))

list(concurrent.futures.process._get_chunks(d, chunksize=1))
# 105 ms ± 4.43 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

list(_get_chunks(d, chunksize=1))
# 5.87 ms ± 378 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  • performance. the difference is significant when chunksize is small, and the chunksize is usually =1.

Copy link
Member

@AA-Turner AA-Turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine -- there's a check that chunksize is >=1 in ProcessPoolExecutor.map.

A

@AA-Turner AA-Turner changed the title update concurrent.features.process get_chunks to use new itertools.batched() Use itertools.batched() in concurrent.futures.process._get_chunks Jan 18, 2024
@serhiy-storchaka
Copy link
Member

You can inline the it variable, and the inline _get_chunks() itself. This will remove some overhead for packing/unpacking *iterables.

@NewUserHa
Copy link
Contributor Author

inline _get_chunks() itself means moving its content back to map()?

@serhiy-storchaka
Copy link
Member

Yes, since it is a short one-liner.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@serhiy-storchaka serhiy-storchaka changed the title Use itertools.batched() in concurrent.futures.process._get_chunks Simplify concurrent.futures.process code by using itertools.batched() Jan 26, 2024
@serhiy-storchaka serhiy-storchaka merged commit 547c135 into python:main Jan 27, 2024
@serhiy-storchaka
Copy link
Member

Thank you for your contribution, @NewUserHa.

But note that usually all but the most trivial changes require opening an issue, and if it is an optimization, its effect should be demonstrated in benchmarks. Not only that some line in a tight loop becomes faster, but that the stdlib code that include this line becomes faster.

I accepted this change as exception, because it makes the code simpler and adds a demonstration of a new itertools function. But the bar may be higher for other changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants