Skip to content

Conversation

lukemanley
Copy link
Member

@lukemanley lukemanley commented Dec 10, 2022

@MarcoGorelli - thanks for the ping. I think this gets most of the bottleneck you highlighted:

import pandas as pd

dates = pd.date_range('1900', '2000').tz_localize('+01:00').strftime('%Y-%d-%m %H:%M:%S%z').tolist()
dates.append('2020-01-01 00:00:00+02:00')

%timeit pd.to_datetime(dates, format='%Y-%d-%m %H:%M:%S%z')

# 529 ms ± 24.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)   <- main
# 174 ms ± 977 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  <- PR

@lukemanley lukemanley added Datetime Datetime data dtype Performance Memory or execution speed performance labels Dec 10, 2022
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, thanks! Looks good to me pending green, and nice to see that this could be solved without having to Cythonize

@MarcoGorelli MarcoGorelli added this to the 2.0 milestone Dec 10, 2022
@MarcoGorelli MarcoGorelli merged commit eb23512 into pandas-dev:main Dec 10, 2022
@lukemanley lukemanley deleted the perf-return-parsed-tz-results branch December 20, 2022 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF:cythonize _return_parsed_timezone_results?
2 participants