Skip to content

Improve ISO Date Performance for JSON #30496

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jan 2, 2020

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Dec 26, 2019

benchmarks below

       before           after         ratio
     [9c6771c5]       [5c0f5682]
     <master>         <json-index-dates>
-        231±60ms          188±3ms     0.82  io.json.ToJSON.time_iso_format('split', 'df_td_int_ts')
-        496±20ms         218±40ms     0.44  io.json.ToJSON.time_iso_format('index', 'df_int_float_str')
-        486±20ms          207±2ms     0.42  io.json.ToJSON.time_iso_format('columns', 'df_int_float_str')
-         499±9ms          210±1ms     0.42  io.json.ToJSON.time_iso_format('index', 'df_date_idx')
-        503±20ms          206±2ms     0.41  io.json.ToJSON.time_iso_format('columns', 'df_int_floats')
-        515±20ms          210±1ms     0.41  io.json.ToJSON.time_iso_format('index', 'df')
-        524±80ms        209±0.5ms     0.40  io.json.ToJSON.time_iso_format('index', 'df_int_floats')
-        528±10ms          206±2ms     0.39  io.json.ToJSON.time_iso_format('columns', 'df_td_int_ts')
diff --git a/asv_bench/asv.conf.json b/asv_bench/asv.conf.json
index c04bbf53a..897de2f85 100644
--- a/asv_bench/asv.conf.json
+++ b/asv_bench/asv.conf.json
-        546±80ms        209±0.7ms     0.38  io.json.ToJSON.time_iso_format('index', 'df_td_int_ts')
-        568±60ms          208±3ms     0.37  io.json.ToJSON.time_iso_format('columns', 'df_date_idx')
-       598±100ms          206±2ms     0.35  io.json.ToJSON.time_iso_format('columns', 'df')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

Note that this mostly improves on DTI which on 0.25.3 can't even be written as ISO format, so I didn't add a whatsnew. Timedelta is the big bottleneck remaining

@jbrockmendel
Copy link
Member

CI notwithstanding, this looks like a nice cleanup+speedup

@jreback jreback added IO JSON read_json, to_json, json_normalize Datetime Datetime data dtype labels Dec 27, 2019
@pep8speaks
Copy link

pep8speaks commented Dec 27, 2019

Hello @WillAyd! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-02 00:00:36 UTC

@@ -481,6 +477,19 @@ static char *PyDateTimeToIso(JSOBJ obj, JSONTypeContext *tc, size_t *len) {
return result;
}

/* JSON callback */
static char *PyDateTimeToIsoCallback(JSOBJ obj, JSONTypeContext *tc,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another follow up I think I'll move the conversion routines to another file and keep only the core JSON serialization functionality within this ones; makes it a little easier to grok the difference between functions used as callbacks and those used to convert values into various formats

@WillAyd WillAyd marked this pull request as ready for review December 27, 2019 20:59
@jbrockmendel
Copy link
Member

LGTM

@jreback jreback added this to the 1.0 milestone Jan 1, 2020
}

castfunc(dataptr, &longVal, 1, NULL, NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need a Py_DECREF anywhere here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think so - here's a snippet in NumPy where there is no DECREF

https://github.com/numpy/numpy/blob/5ce770ae3de63861c768229573397cadd052f712/numpy/core/src/multiarray/scalarapi.c#L212

And testing locally did segfault when trying

@jreback jreback merged commit a895ac7 into pandas-dev:master Jan 2, 2020
@jreback
Copy link
Contributor

jreback commented Jan 2, 2020

thanks, might be worth a general whatsnew note for perf for json changes (e.g. list issues that impacted perf that you have done recently)

@WillAyd WillAyd deleted the json-index-dates branch January 2, 2020 01:09
@WillAyd
Copy link
Member Author

WillAyd commented Jan 2, 2020

Yea no problem; will address in follow up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants