Skip to content

CLN: use self.dtype internally in Categorical #22513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 30, 2018

Conversation

topper-123
Copy link
Contributor

Some clean-up that makes the code a bit easier to read IMO.

@pep8speaks
Copy link

pep8speaks commented Aug 26, 2018

Hello @topper-123! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 26, 2018 at 17:24 Hours UTC

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine, can u do a perf check

@codecov
Copy link

codecov bot commented Aug 26, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@fa47b8d). Click here to learn what that means.
The diff coverage is 71.42%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #22513   +/-   ##
=========================================
  Coverage          ?   92.03%           
=========================================
  Files             ?      169           
  Lines             ?    50780           
  Branches          ?        0           
=========================================
  Hits              ?    46737           
  Misses            ?     4043           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.44% <71.42%> (?)
#single 42.22% <14.28%> (?)
Impacted Files Coverage Δ
pandas/core/arrays/categorical.py 95.75% <71.42%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa47b8d...70ddcef. Read the comment docs.

@topper-123
Copy link
Contributor Author

topper-123 commented Aug 26, 2018

Ran ASV on benchmarks/categoricals.py. Result are varying a bit, which I attribute to randomness. For example:

      before           after         ratio
     [fa47b8d9]       [70ddcef7]
+          5.65μs           13.7μs     2.42  categoricals.CategoricalSlicing.time_getitem_list_like('monotonic_incr')
+          5.76ms           6.70ms     1.16  categoricals.CategoricalSlicing.time_getitem_bool_array('monotonic_incr')
+          3.42μs           3.91μs     1.14  categoricals.CategoricalSlicing.time_getitem_scalar('monotonic_decr')
+           610μs            689μs     1.13  categoricals.CategoricalSlicing.time_getitem_list('monotonic_decr')
-          6.12μs           5.55μs     0.91  categoricals.CategoricalSlicing.time_getitem_list_like('monotonic_decr')
-          6.46μs           5.65μs     0.88  categoricals.CategoricalSlicing.time_getitem_list_like('non_monotonic')
-          3.66μs           3.20μs     0.88  categoricals.CategoricalSlicing.time_getitem_scalar('monotonic_incr')
-          6.08ms           5.21ms     0.86  categoricals.CategoricalSlicing.time_getitem_bool_array('monotonic_decr')
-          4.66μs           4.00μs     0.86  categoricals.CategoricalSlicing.time_getitem_slice('non_monotonic')
-           729μs            567μs     0.78  categoricals.CategoricalSlicing.time_getitem_list('monotonic_incr')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

and

      before           after         ratio
     [fa47b8d9]       [70ddcef7]
+          15.6ms           26.0ms     1.67  categoricals.CategoricalSlicing.time_getitem_bool_array('non_monotonic')
+           648μs            933μs     1.44  categoricals.CategoricalSlicing.time_getitem_list('monotonic_decr')
+           613μs            854μs     1.39  categoricals.CategoricalSlicing.time_getitem_list('monotonic_incr')
+          4.88μs           6.10μs     1.25  categoricals.CategoricalSlicing.time_getitem_list_like('monotonic_incr')
-          3.66μs           3.20μs     0.88  categoricals.CategoricalSlicing.time_getitem_scalar('monotonic_incr')
-          4.39μs           3.50μs     0.80  categoricals.CategoricalSlicing.time_getitem_scalar('non_monotonic')
-           729μs            539μs     0.74  categoricals.CategoricalSlicing.time_getitem_list('non_monotonic')
-          6.70ms           4.69ms     0.70  categoricals.CategoricalSlicing.time_getitem_bool_array('monotonic_incr')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

Notice how categoricals.CategoricalSlicing.time_getitem_bool_array('monotonic_incr') in one run is slow (1.16) and th other fast (0.70).

Overall, I'd say this has is no significant effect on speed.

@gfyoung gfyoung added Categorical Categorical Data Type Clean labels Aug 27, 2018
Copy link
Member

@gfyoung gfyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmarks look alright to me.

cc @jreback

@jreback jreback added this to the 0.24.0 milestone Aug 28, 2018
@jreback
Copy link
Contributor

jreback commented Aug 28, 2018

@TomAugspurger a look pls. lgtm to me.

@TomAugspurger TomAugspurger merged commit 1a798be into pandas-dev:master Aug 30, 2018
@topper-123 topper-123 deleted the clean_categorical branch August 30, 2018 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Clean
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants