Skip to content

REF: Implement core._algos #32767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 19, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions pandas/core/_algos/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
"""
core._algos is for algorithms that operate on ndarray and ExtensionArray.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use algos instead of _algos here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ambivalent on this, as I do like the non-private name, but we have a bunch of import algorithms as algos and i want to avoid ambiguity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then let’s just create core.algorithms as a subdir

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's a good point. The underscore is still only a small difference though. So maybe we should think of yet another name (or actually make algorithms.py into a module and rethink it more broader)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yes, similar as what @jreback said.

But Brock, you wanted to better distinguish pure array algos, that don't need to deal with series/dataframe etc (which I think is a good goal). But then making algorithms.py into a submodule (and use that for the function in this PR) defeats that purpose?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then making algorithms.py into a submodule (and use that for the function in this PR) defeats that purpose?

That's my thought too.

So maybe we should think of yet another name

Suggestions?

These should:

- Assume that any Index, Series, or DataFrame objects have already been unwrapped.
- Assume that any list arguments have already been cast to ndarray/EA.
- Not depend on Index, Series, or DataFrame, nor import any of these.
- May dispatch to ExtensionArray methods, but should not import from core.arrays.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what else you are planning to put here, but it could also be pure-numpy algos? That would also have value in being very clear.
Eg the shift you moved here now is strictly numpy arrays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, I have no problem with some functions in here being explicitly ndarray-only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't speaking about some functions, but the full file.

But so, what other functions do you envision to put here? (as for a single function it's not worth creating a module I think)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't speaking about some functions, but the full file.

That may end up being a reasonable organization, but it isn't obvious to me. e.g. if we end up with a shift that dispatches to shift_ea and shift_ndarray, would we want those to all be in the same file or split across multiple files?

"""
33 changes: 33 additions & 0 deletions pandas/core/_algos/transforms.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
"""
transforms.py is for shape-preserving functions.
"""

import numpy as np

from pandas.core.dtypes.common import ensure_platform_int


def shift(values: np.ndarray, periods: int, axis: int, fill_value) -> np.ndarray:
new_values = values

# make sure array sent to np.roll is c_contiguous
f_ordered = values.flags.f_contiguous
if f_ordered:
new_values = new_values.T
axis = new_values.ndim - axis - 1

if np.prod(new_values.shape):
new_values = np.roll(new_values, ensure_platform_int(periods), axis=axis)

axis_indexer = [slice(None)] * values.ndim
if periods > 0:
axis_indexer[axis] = slice(None, periods)
else:
axis_indexer[axis] = slice(periods, None)
new_values[tuple(axis_indexer)] = fill_value

# restore original order
if f_ordered:
new_values = new_values.T

return new_values
22 changes: 2 additions & 20 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
from pandas.core.dtypes.missing import is_valid_nat_for_dtype, isna

from pandas.core import missing, nanops, ops
from pandas.core._algos.transforms import shift
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you import from "pandas.core._algos" instead?

from pandas.core.algorithms import checked_add_with_arr, take, unique1d, value_counts
from pandas.core.arrays.base import ExtensionArray, ExtensionOpsMixin
import pandas.core.common as com
Expand Down Expand Up @@ -773,26 +774,7 @@ def shift(self, periods=1, fill_value=None, axis=0):

fill_value = self._unbox_scalar(fill_value)

new_values = self._data

# make sure array sent to np.roll is c_contiguous
f_ordered = new_values.flags.f_contiguous
if f_ordered:
new_values = new_values.T
axis = new_values.ndim - axis - 1

new_values = np.roll(new_values, periods, axis=axis)

axis_indexer = [slice(None)] * self.ndim
if periods > 0:
axis_indexer[axis] = slice(None, periods)
else:
axis_indexer[axis] = slice(periods, None)
new_values[tuple(axis_indexer)] = fill_value

# restore original order
if f_ordered:
new_values = new_values.T
new_values = shift(self._data, periods, axis, fill_value)

return type(self)._simple_new(new_values, dtype=self.dtype)

Expand Down
22 changes: 2 additions & 20 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@
from pandas.core.dtypes.common import (
_NS_DTYPE,
_TD_DTYPE,
ensure_platform_int,
is_bool_dtype,
is_categorical,
is_categorical_dtype,
Expand Down Expand Up @@ -65,6 +64,7 @@
isna,
)

from pandas.core._algos.transforms import shift
import pandas.core.algorithms as algos
from pandas.core.arrays import (
Categorical,
Expand Down Expand Up @@ -1316,25 +1316,7 @@ def shift(self, periods, axis: int = 0, fill_value=None):
# that, handle boolean etc also
new_values, fill_value = maybe_upcast(self.values, fill_value)

# make sure array sent to np.roll is c_contiguous
f_ordered = new_values.flags.f_contiguous
if f_ordered:
new_values = new_values.T
axis = new_values.ndim - axis - 1

if np.prod(new_values.shape):
new_values = np.roll(new_values, ensure_platform_int(periods), axis=axis)

axis_indexer = [slice(None)] * self.ndim
if periods > 0:
axis_indexer[axis] = slice(None, periods)
else:
axis_indexer[axis] = slice(periods, None)
new_values[tuple(axis_indexer)] = fill_value

# restore original order
if f_ordered:
new_values = new_values.T
new_values = shift(new_values, periods, axis, fill_value)

return [self.make_block(new_values)]

Expand Down