-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
REF: Implement core._algos #32767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: Implement core._algos #32767
Changes from 3 commits
1c8141f
0e36d3e
d818d96
31a510e
5ebe2f6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
""" | ||
core._algos is for algorithms that operate on ndarray and ExtensionArray. | ||
These should: | ||
|
||
- Assume that any Index, Series, or DataFrame objects have already been unwrapped. | ||
- Assume that any list arguments have already been cast to ndarray/EA. | ||
- Not depend on Index, Series, or DataFrame, nor import any of these. | ||
- May dispatch to ExtensionArray methods, but should not import from core.arrays. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know what else you are planning to put here, but it could also be pure-numpy algos? That would also have value in being very clear. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yah, I have no problem with some functions in here being explicitly ndarray-only. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wasn't speaking about some functions, but the full file. But so, what other functions do you envision to put here? (as for a single function it's not worth creating a module I think) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That may end up being a reasonable organization, but it isn't obvious to me. e.g. if we end up with a |
||
""" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
""" | ||
transforms.py is for shape-preserving functions. | ||
""" | ||
|
||
import numpy as np | ||
|
||
from pandas.core.dtypes.common import ensure_platform_int | ||
|
||
|
||
def shift(values: np.ndarray, periods: int, axis: int, fill_value) -> np.ndarray: | ||
new_values = values | ||
|
||
# make sure array sent to np.roll is c_contiguous | ||
f_ordered = values.flags.f_contiguous | ||
if f_ordered: | ||
new_values = new_values.T | ||
axis = new_values.ndim - axis - 1 | ||
|
||
if np.prod(new_values.shape): | ||
new_values = np.roll(new_values, ensure_platform_int(periods), axis=axis) | ||
|
||
axis_indexer = [slice(None)] * values.ndim | ||
if periods > 0: | ||
axis_indexer[axis] = slice(None, periods) | ||
else: | ||
axis_indexer[axis] = slice(periods, None) | ||
new_values[tuple(axis_indexer)] = fill_value | ||
|
||
# restore original order | ||
if f_ordered: | ||
new_values = new_values.T | ||
|
||
return new_values |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -39,6 +39,7 @@ | |
from pandas.core.dtypes.missing import is_valid_nat_for_dtype, isna | ||
|
||
from pandas.core import missing, nanops, ops | ||
from pandas.core._algos.transforms import shift | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you import from "pandas.core._algos" instead? |
||
from pandas.core.algorithms import checked_add_with_arr, take, unique1d, value_counts | ||
from pandas.core.arrays.base import ExtensionArray, ExtensionOpsMixin | ||
import pandas.core.common as com | ||
|
@@ -773,26 +774,7 @@ def shift(self, periods=1, fill_value=None, axis=0): | |
|
||
fill_value = self._unbox_scalar(fill_value) | ||
|
||
new_values = self._data | ||
|
||
# make sure array sent to np.roll is c_contiguous | ||
f_ordered = new_values.flags.f_contiguous | ||
if f_ordered: | ||
new_values = new_values.T | ||
axis = new_values.ndim - axis - 1 | ||
|
||
new_values = np.roll(new_values, periods, axis=axis) | ||
|
||
axis_indexer = [slice(None)] * self.ndim | ||
if periods > 0: | ||
axis_indexer[axis] = slice(None, periods) | ||
else: | ||
axis_indexer[axis] = slice(periods, None) | ||
new_values[tuple(axis_indexer)] = fill_value | ||
|
||
# restore original order | ||
if f_ordered: | ||
new_values = new_values.T | ||
new_values = shift(self._data, periods, axis, fill_value) | ||
|
||
return type(self)._simple_new(new_values, dtype=self.dtype) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use
algos
instead of_algos
here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ambivalent on this, as I do like the non-private name, but we have a bunch of
import algorithms as algos
and i want to avoid ambiguity.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then let’s just create core.algorithms as a subdir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's a good point. The underscore is still only a small difference though. So maybe we should think of yet another name (or actually make algorithms.py into a module and rethink it more broader)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So yes, similar as what @jreback said.
But Brock, you wanted to better distinguish pure array algos, that don't need to deal with series/dataframe etc (which I think is a good goal). But then making algorithms.py into a submodule (and use that for the function in this PR) defeats that purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's my thought too.
Suggestions?