-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
PERF: avoid copy in concatenate_array_managers if reindex already copies #44559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: avoid copy in concatenate_array_managers if reindex already copies #44559
Conversation
pandas/core/internals/concat.py
Outdated
@@ -94,7 +99,7 @@ def _concatenate_array_managers( | |||
# concatting along the columns -> combine reindexed arrays in a single manager | |||
assert concat_axis == 0 | |||
arrays = list(itertools.chain.from_iterable([mgr.arrays for mgr in mgrs])) | |||
if copy: | |||
if copy and axis1_needs_copy_this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be cleaner (and avoid copies in corner cases) to do this up in the for loop right before mgrs.append(mgr)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, good idea, and that also simplified it quite a bit
lgtm @jbrockmendel comment could be addressed here |
pandas/core/internals/concat.py
Outdated
@@ -77,10 +77,16 @@ def _concatenate_array_managers( | |||
# reindex all arrays | |||
mgrs = [] | |||
for mgr, indexers in mgrs_indexers: | |||
axis1_needs_copy = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick again: elsewhere in this file we do basically the same thing with made_copy = False
@jbrockmendel #42797 added an unconditional copy, while if you reindex the arrays, you already are sure you have a copy.
This gives a 20% improvement on some of the merge benchmarks.