-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REF: share __getitem__ for Categorical/PandasArray/DTA/TDA/PA #36391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a question really, nice de-dupes
pandas/core/arrays/categorical.py
Outdated
result = self._codes[key] | ||
if result.ndim > 1: | ||
result = super().__getitem__(key) | ||
if np.ndim(result) > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is np.ndim(result) > 1
generally slower than something like not lib.is_scalar(result) and result.ndim > 1
?
result = 5
%timeit np.ndim(result) > 1
2.77 µs ± 48.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit not lib.is_scalar(result) and result.ndim > 1
154 ns ± 1.2 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
or is this not important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow that's neat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not lib.is_scalar(result) and result.ndim > 1?
The trouble here is if we have a tuple result. I'll take a look at finding a more performant solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did wonder about that. but pandas\tests\arrays\categorical and pandas\tests\extension\test_categorical.py didn't fail with this change. Do we have tests for this.
also is it possible to get a nested tuple result? np.ndim would give 2 in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like getattr(result, "ndim", 1) > 1
outperforms both np.ndim and lib.is_scalar checks, and dont need to worry about tuples. Will update.
also is it possible to get a nested tuple result? np.ndim would give 2 in this case?
Yes. Supporting list-like object-dtype is a massive PITA.
result = self._from_backing_data(result) | ||
return result | ||
|
||
def _validate_getitem_key(self, key): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this duplicates _validate_setitem_key
. is this needed?
in what cases could one of a getitem and a setitem key be valid and other invalid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DatetimeLikeArrayMixin has special logic here. #36210 is about deprecating that
@jbrockmendel merge away if good here (likely address @simonjayhawkins points as followups?) |
@@ -30,6 +31,9 @@ def _from_backing_data(self: _T, arr: np.ndarray) -> _T: | |||
""" | |||
raise AbstractMethodError(self) | |||
|
|||
def _box_func(self, x): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a docstring for this function?
No description provided.