Skip to content

Implement DataFrame.__array_ufunc__ #36955

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Nov 25, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
0d725e8
Implement DataFrame.__array_ufunc__
TomAugspurger Oct 6, 2020
c4c1470
remove unnecessary decorator
TomAugspurger Oct 7, 2020
4fcb1a4
Fixup
TomAugspurger Oct 7, 2020
971659e
fixup finalize
TomAugspurger Oct 7, 2020
6bd73dc
whatsnew
TomAugspurger Oct 7, 2020
1085be4
Merge remote-tracking branch 'upstream/master' into frame-array-ufunc
TomAugspurger Oct 7, 2020
0afdf49
fixup
TomAugspurger Oct 7, 2020
2260c83
fixup
TomAugspurger Oct 8, 2020
b3239e2
Merge remote-tracking branch 'upstream/master' into frame-array-ufunc
TomAugspurger Oct 8, 2020
b1d93f5
Merge remote-tracking branch 'upstream/master' into frame-array-ufunc
TomAugspurger Oct 9, 2020
9cfcba1
Merge remote-tracking branch 'upstream/master' into frame-array-ufunc
TomAugspurger Oct 18, 2020
919ebb5
Move to arraylike
TomAugspurger Oct 18, 2020
c73ab12
Merge remote-tracking branch 'upstream/master' into frame-array-ufunc
TomAugspurger Nov 12, 2020
99acb86
Merge remote-tracking branch 'upstream/master' into frame-array-ufunc
TomAugspurger Nov 13, 2020
acfe434
union
TomAugspurger Nov 13, 2020
9a35023
Merge remote-tracking branch 'upstream/master' into frame-array-ufunc
TomAugspurger Nov 14, 2020
2371499
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Nov 20, 2020
d283446
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Nov 22, 2020
816c6dc
Merge branch 'master' of https://github.com/pandas-dev/pandas into fr…
jbrockmendel Nov 25, 2020
a6b120a
docstring, typo fixup
jbrockmendel Nov 25, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,8 @@ Other enhancements
- :meth:`DatetimeIndex.searchsorted`, :meth:`TimedeltaIndex.searchsorted`, :meth:`PeriodIndex.searchsorted`, and :meth:`Series.searchsorted` with datetimelike dtypes will now try to cast string arguments (listlike and scalar) to the matching datetimelike type (:issue:`36346`)
-
- Added methods :meth:`IntegerArray.prod`, :meth:`IntegerArray.min`, and :meth:`IntegerArray.max` (:issue:`33790`)
- Calling a NumPy ufunc on a ``DataFrame`` with extension types now preserves the extension types when possible (:issue:`23743`).
- Calling a binary-input NumPy ufunc on multiple ``DataFrame`` objects now aligns, matching the behavior of binary operations and ufuncs on ``Series`` (:issue:`23743`).
- Where possible :meth:`RangeIndex.difference` and :meth:`RangeIndex.symmetric_difference` will return :class:`RangeIndex` instead of :class:`Int64Index` (:issue:`36564`)
- :meth:`DataFrame.to_parquet` now supports :class:`MultiIndex` for columns in parquet format (:issue:`34777`)
- Added :meth:`Rolling.sem()` and :meth:`Expanding.sem()` to compute the standard error of mean (:issue:`26476`).
Expand Down Expand Up @@ -470,6 +472,7 @@ Deprecations
- The default value of ``regex`` for :meth:`Series.str.replace` will change from ``True`` to ``False`` in a future release. In addition, single character regular expressions will *not* be treated as literal strings when ``regex=True`` is set. (:issue:`24804`)
- Deprecated automatic alignment on comparison operations between :class:`DataFrame` and :class:`Series`, do ``frame, ser = frame.align(ser, axis=1, copy=False)`` before e.g. ``frame == ser`` (:issue:`28759`)
- :meth:`Rolling.count` with ``min_periods=None`` will default to the size of the window in a future version (:issue:`31302`)
- Using "outer" ufuncs on DataFrames to return 4d ndarray is now deprecated. Convert to an ndarray first (:issue:`23743`)
- Deprecated slice-indexing on timezone-aware :class:`DatetimeIndex` with naive ``datetime`` objects, to match scalar indexing behavior (:issue:`36148`)
- :meth:`Index.ravel` returning a ``np.ndarray`` is deprecated, in the future this will return a view on the same index (:issue:`19956`)
- Deprecate use of strings denoting units with 'M', 'Y' or 'y' in :func:`~pandas.to_timedelta` (:issue:`36666`)
Expand Down Expand Up @@ -750,6 +753,7 @@ Other

- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` incorrectly raising ``AssertionError`` instead of ``ValueError`` when invalid parameter combinations are passed (:issue:`36045`)
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` with numeric values and string ``to_replace`` (:issue:`34789`)
- Fixed metadata propagation in :meth:`Series.abs` and ufuncs called on Series and DataFrames (:issue:`28283`)
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` incorrectly casting from ``PeriodDtype`` to object dtype (:issue:`34871`)
- Fixed bug in metadata propagation incorrectly copying DataFrame columns as metadata when the column name overlaps with the metadata name (:issue:`37037`)
- Fixed metadata propagation in the :class:`Series.dt`, :class:`Series.str` accessors, :class:`DataFrame.duplicated`, :class:`DataFrame.stack`, :class:`DataFrame.unstack`, :class:`DataFrame.pivot`, :class:`DataFrame.append`, :class:`DataFrame.diff`, :class:`DataFrame.applymap` and :class:`DataFrame.update` methods (:issue:`28283`) (:issue:`37381`)
Expand Down
144 changes: 143 additions & 1 deletion pandas/core/arraylike.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,15 @@
ExtensionArray
"""
import operator
from typing import Any, Callable
import warnings

from pandas.core.ops import roperator
import numpy as np

from pandas._libs import lib

from pandas.core.construction import extract_array
from pandas.core.ops import maybe_dispatch_ufunc_to_dunder_op, roperator
from pandas.core.ops.common import unpack_zerodim_and_defer


Expand Down Expand Up @@ -140,3 +147,138 @@ def __pow__(self, other):
@unpack_zerodim_and_defer("__rpow__")
def __rpow__(self, other):
return self._arith_method(other, roperator.rpow)


def array_ufunc(self, ufunc: Callable, method: str, *inputs: Any, **kwargs: Any):
"""
Compatibility with numpy ufuncs.

See also
--------
numpy.org/doc/stable/reference/arrays.classes.html#numpy.class.__array_ufunc__
"""
from pandas.core.generic import NDFrame
from pandas.core.internals import BlockManager

cls = type(self)

# for binary ops, use our custom dunder methods
result = maybe_dispatch_ufunc_to_dunder_op(self, ufunc, method, *inputs, **kwargs)
if result is not NotImplemented:
return result

# Determine if we should defer.
no_defer = (np.ndarray.__array_ufunc__, cls.__array_ufunc__)

for item in inputs:
higher_priority = (
hasattr(item, "__array_priority__")
and item.__array_priority__ > self.__array_priority__
)
has_array_ufunc = (
hasattr(item, "__array_ufunc__")
and type(item).__array_ufunc__ not in no_defer
and not isinstance(item, self._HANDLED_TYPES)
)
if higher_priority or has_array_ufunc:
return NotImplemented

# align all the inputs.
types = tuple(type(x) for x in inputs)
alignable = [x for x, t in zip(inputs, types) if issubclass(t, NDFrame)]

if len(alignable) > 1:
# This triggers alignment.
# At the moment, there aren't any ufuncs with more than two inputs
# so this ends up just being x1.index | x2.index, but we write
# it to handle *args.

if len(set(types)) > 1:
# We currently don't handle ufunc(DataFrame, Series)
# well. Previously this raised an internal ValueError. We might
# support it someday, so raise a NotImplementedError.
raise NotImplementedError(
"Cannot apply ufunc {} to mixed DataFrame and Series "
"inputs.".format(ufunc)
)
axes = self.axes
for obj in alignable[1:]:
# this relies on the fact that we aren't handling mixed
# series / frame ufuncs.
for i, (ax1, ax2) in enumerate(zip(axes, obj.axes)):
axes[i] = ax1.union(ax2)

reconstruct_axes = dict(zip(self._AXIS_ORDERS, axes))
inputs = tuple(
x.reindex(**reconstruct_axes) if issubclass(t, NDFrame) else x
for x, t in zip(inputs, types)
)
else:
reconstruct_axes = dict(zip(self._AXIS_ORDERS, self.axes))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally can you split this function up a bit (if easy)

if self.ndim == 1:
names = [getattr(x, "name") for x in inputs if hasattr(x, "name")]
name = names[0] if len(set(names)) == 1 else None
reconstruct_kwargs = {"name": name}
else:
reconstruct_kwargs = {}

def reconstruct(result):
if lib.is_scalar(result):
return result
if result.ndim != self.ndim:
if method == "outer":
if self.ndim == 2:
# we already deprecated for Series
msg = (
"outer method for ufunc {} is not implemented on "
"pandas objects. Returning an ndarray, but in the "
"future this will raise a 'NotImplementedError'. "
"Consider explicitly converting the DataFrame "
"to an array with '.to_numpy()' first."
)
warnings.warn(msg.format(ufunc), FutureWarning, stacklevel=4)
return result
raise NotImplementedError
return result
if isinstance(result, BlockManager):
# we went through BlockManager.apply
result = self._constructor(result, **reconstruct_kwargs, copy=False)
else:
# we converted an array, lost our axes
result = self._constructor(
result, **reconstruct_axes, **reconstruct_kwargs, copy=False
)
# TODO: When we support multiple values in __finalize__, this
# should pass alignable to `__fianlize__` instead of self.
# Then `np.add(a, b)` would consider attrs from both a and b
# when a and b are NDFrames.
if len(alignable) == 1:
result = result.__finalize__(self)
return result

if self.ndim > 1 and (
len(inputs) > 1 or ufunc.nout > 1 # type: ignore[attr-defined]
):
# Just give up on preserving types in the complex case.
# In theory we could preserve them for them.
# * nout>1 is doable if BlockManager.apply took nout and
# returned a Tuple[BlockManager].
# * len(inputs) > 1 is doable when we know that we have
# aligned blocks / dtypes.
inputs = tuple(np.asarray(x) for x in inputs)
result = getattr(ufunc, method)(*inputs)
elif self.ndim == 1:
# ufunc(series, ...)
inputs = tuple(extract_array(x, extract_numpy=True) for x in inputs)
result = getattr(ufunc, method)(*inputs, **kwargs)
else:
# ufunc(dataframe)
mgr = inputs[0]._mgr
result = mgr.apply(getattr(ufunc, method))

if ufunc.nout > 1: # type: ignore[attr-defined]
result = tuple(reconstruct(x) for x in result)
else:
result = reconstruct(result)
return result
1 change: 1 addition & 0 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -434,6 +434,7 @@ class DataFrame(NDFrame, OpsMixin):

_internal_names_set = {"columns", "index"} | NDFrame._internal_names_set
_typ = "dataframe"
_HANDLED_TYPES = (Series, Index, ExtensionArray, np.ndarray)

@property
def _constructor(self) -> Type[DataFrame]:
Expand Down
7 changes: 6 additions & 1 deletion pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@
from pandas.core.dtypes.missing import isna, notna

import pandas as pd
from pandas.core import indexing, missing, nanops
from pandas.core import arraylike, indexing, missing, nanops
import pandas.core.algorithms as algos
from pandas.core.base import PandasObject, SelectionMixin
import pandas.core.common as com
Expand Down Expand Up @@ -1927,6 +1927,11 @@ def __array_wrap__(
self, method="__array_wrap__"
)

def __array_ufunc__(
self, ufunc: Callable, method: str, *inputs: Any, **kwargs: Any
):
return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs)

# ideally we would define this to avoid the getattr checks, but
# is slower
# @property
Expand Down
76 changes: 1 addition & 75 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@ class Series(base.IndexOpsMixin, generic.NDFrame):
"""

_typ = "series"
_HANDLED_TYPES = (Index, ExtensionArray, np.ndarray)

_name: Label
_metadata: List[str] = ["name"]
Expand Down Expand Up @@ -683,81 +684,6 @@ def view(self, dtype=None) -> "Series":
# NDArray Compat
_HANDLED_TYPES = (Index, ExtensionArray, np.ndarray)

def __array_ufunc__(
self, ufunc: Callable, method: str, *inputs: Any, **kwargs: Any
):
# TODO: handle DataFrame
cls = type(self)

# for binary ops, use our custom dunder methods
result = ops.maybe_dispatch_ufunc_to_dunder_op(
self, ufunc, method, *inputs, **kwargs
)
if result is not NotImplemented:
return result

# Determine if we should defer.
no_defer = (np.ndarray.__array_ufunc__, cls.__array_ufunc__)

for item in inputs:
higher_priority = (
hasattr(item, "__array_priority__")
and item.__array_priority__ > self.__array_priority__
)
has_array_ufunc = (
hasattr(item, "__array_ufunc__")
and type(item).__array_ufunc__ not in no_defer
and not isinstance(item, self._HANDLED_TYPES)
)
if higher_priority or has_array_ufunc:
return NotImplemented

# align all the inputs.
names = [getattr(x, "name") for x in inputs if hasattr(x, "name")]
types = tuple(type(x) for x in inputs)
# TODO: dataframe
alignable = [x for x, t in zip(inputs, types) if issubclass(t, Series)]

if len(alignable) > 1:
# This triggers alignment.
# At the moment, there aren't any ufuncs with more than two inputs
# so this ends up just being x1.index | x2.index, but we write
# it to handle *args.
index = alignable[0].index
for s in alignable[1:]:
index = index.union(s.index)
inputs = tuple(
x.reindex(index) if issubclass(t, Series) else x
for x, t in zip(inputs, types)
)
else:
index = self.index

inputs = tuple(extract_array(x, extract_numpy=True) for x in inputs)
result = getattr(ufunc, method)(*inputs, **kwargs)

name = names[0] if len(set(names)) == 1 else None

def construct_return(result):
if lib.is_scalar(result):
return result
elif result.ndim > 1:
# e.g. np.subtract.outer
if method == "outer":
# GH#27198
raise NotImplementedError
return result
return self._constructor(result, index=index, name=name, copy=False)

if type(result) is tuple:
# multiple return values
return tuple(construct_return(x) for x in result)
elif method == "at":
# no return value
return None
else:
return construct_return(result)

def __array__(self, dtype=None) -> np.ndarray:
"""
Return the values as a NumPy array.
Expand Down
Loading