-
Notifications
You must be signed in to change notification settings - Fork 3
Allow rolling API to accept BaseIndexer subclass #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
60f41db
f269944
6b45a0d
f2e815b
266dc4e
6fab256
010aeb6
97b1386
691f72b
f94ae89
7e34fa0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
import abc | ||
from typing import Dict, Sequence, Tuple | ||
|
||
import numpy as np | ||
|
||
BeginEnd = Tuple[np.ndarray, np.ndarray] | ||
|
||
# TODO: Refactor MockFixedWindowIndexer, FixedWindowIndexer, | ||
# VariableWindowIndexer to also have `get_window_bounds` methods that | ||
# only calculates start & stop | ||
|
||
# TODO: Currently, when win_type is specified, it calls a special routine, | ||
# `roll_window`, while None win_type ops dispatch to specific methods. | ||
# Consider consolidating? | ||
|
||
|
||
class BaseIndexer(abc.ABC): | ||
|
||
def __init__(self, index, offset, keys): | ||
# TODO: The alternative is for the `rolling` API to accept | ||
# index, offset, and keys as keyword arguments | ||
self.index = index | ||
self.offset = offset | ||
self.keys = keys # type: Sequence[np.ndarray] | ||
|
||
@classmethod | ||
@abc.abstractmethod | ||
def get_window_bounds(cls, **kwargs: Dict) -> BeginEnd: | ||
""" | ||
Compute the bounds of a window. | ||
|
||
Users should subclass this class to implement a custom method | ||
to calculate window bounds | ||
|
||
Parameters | ||
---------- | ||
kwargs | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we explicity set the kwargs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we want to allow indexer authors to pass arbitrary data here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Scratch that, these should be explicit. |
||
A dictionary of keyword arguments obtained from the top level | ||
rolling API e.g. min_periods, win_type | ||
|
||
Returns | ||
------- | ||
BeginEnd | ||
A tuple of ndarray[int64]s, indicating the boundaries of each | ||
window | ||
|
||
""" | ||
return NotImplemented | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't need to return anything. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,6 +11,7 @@ | |
import numpy as np | ||
|
||
import pandas._libs.window as libwindow | ||
import pandas._libs._window as libwindow_refactor | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so I would either do this directly in .window or name this something non-private (as its already private). Note that if say refactoring needs to be done to make things cleaner we can do pre-cursors to master. |
||
from pandas.compat._optional import import_optional_dependency | ||
from pandas.compat.numpy import function as nv | ||
from pandas.util._decorators import Appender, Substitution, cache_readonly | ||
|
@@ -451,14 +452,19 @@ class Window(_Window): | |
|
||
Parameters | ||
---------- | ||
window : int, or offset | ||
window : int, offset, or BaseIndexer subclass | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ideally type as much as possible (again I know that things are not typed now), maybe makes sense to do a pre-cursor to type what we have now |
||
Size of the moving window. This is the number of observations used for | ||
calculating the statistic. Each window will be a fixed size. | ||
|
||
If its an offset then this will be the time period of each window. Each | ||
window will be a variable sized based on the observations included in | ||
the time-period. This is only valid for datetimelike indexes. This is | ||
new in 0.19.0 | ||
|
||
If a BaseIndexer subclass is passed, calculates the window boundaries | ||
based on the defined ``get_window_bounds`` method. Additional rolling | ||
keyword arguments, namely `min_periods`, `center`, `win_type`, and | ||
`closed` will be passed to `get_window_bounds`. | ||
min_periods : int, default None | ||
Minimum number of observations in window required to have a value | ||
(otherwise result is NA). For a window that is specified by an offset, | ||
|
@@ -599,7 +605,8 @@ def validate(self): | |
super().validate() | ||
|
||
window = self.window | ||
if isinstance(window, (list, tuple, np.ndarray)): | ||
if isinstance(window, (list, tuple, np.ndarray, | ||
libwindow_refactor.BaseIndexer)): | ||
pass | ||
elif is_integer(window): | ||
if window <= 0: | ||
|
@@ -661,6 +668,11 @@ def _pop_args(win_type, arg_names, kwargs): | |
win_type = _validate_win_type(self.win_type, kwargs) | ||
# GH #15662. `False` makes symmetric window, rather than periodic. | ||
return sig.get_window(win_type, window, False).astype(float) | ||
elif isinstance(window, libwindow_refactor.BaseIndexer): | ||
return window.get_window_span(win_type=self.win_type, | ||
min_periods=self.min_periods, | ||
center=self.center, | ||
closed=self.closed) | ||
|
||
def _apply_window(self, mean=True, **kwargs): | ||
""" | ||
|
@@ -1628,7 +1640,8 @@ def validate(self): | |
# min_periods must be an integer | ||
if self.min_periods is None: | ||
self.min_periods = 1 | ||
|
||
elif isinstance(self.window, libwindow_refactor.BaseIndexer): | ||
pass | ||
elif not is_integer(self.window): | ||
raise ValueError("window must be an integer") | ||
elif self.window < 0: | ||
|
@@ -2605,6 +2618,8 @@ def _get_center_of_mass(comass, span, halflife, alpha): | |
|
||
|
||
def _offset(window, center): | ||
# TODO: (MATT) If the window is a BaseIndexer subclass, | ||
# we need to pass in the materialized window | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the type of window here? It looks like it can be a sequence or an integer? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct. After a light audit, I anticipate the materialized window may be passed here and other times and Overall I think this routine is for label formatting. |
||
if not is_integer(window): | ||
window = len(window) | ||
offset = (window - 1) / 2. if center else 0 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,11 +9,13 @@ | |
import pytest | ||
|
||
from pandas.errors import UnsupportedFunctionCall | ||
import pandas._libs._window as libwindow_refactor | ||
import pandas.util._test_decorators as td | ||
|
||
import pandas as pd | ||
from pandas import ( | ||
DataFrame, Index, Series, Timestamp, bdate_range, concat, isna, notna) | ||
DataFrame, Index, Series, Timestamp, bdate_range, concat, date_range, | ||
isna, notna) | ||
from pandas.core.base import DataError, SpecificationError | ||
from pandas.core.sorting import safe_sort | ||
import pandas.core.window as rwindow | ||
|
@@ -54,6 +56,38 @@ def arithmetic_win_operators(request): | |
return request.param | ||
|
||
|
||
@pytest.fixture(params=['right', 'left', 'both', 'neither']) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. some of this seems a good candidate for a pre-cursor PR to master (the fixtures) |
||
def closed(request): | ||
return request.param | ||
|
||
|
||
@pytest.fixture(params=[True, False]) | ||
def center(request): | ||
return request.param | ||
|
||
|
||
@pytest.fixture(params=[None, 1]) | ||
def min_periods(request): | ||
return request.param | ||
|
||
|
||
@pytest.fixture | ||
def dummy_custom_indexer(): | ||
|
||
class DummyIndexer(libwindow_refactor.BaseIndexer): | ||
|
||
def __init__(self, index, offset, keys): | ||
super().__init__(index, offset, keys) | ||
|
||
def get_window_bounds(self, **kwargs): | ||
pass | ||
|
||
idx = date_range('2019', freq='D', periods=3) | ||
offset = offsets.BusinessDay(1) | ||
keys = ['A'] | ||
return DummyIndexer(index=idx, offset=offset, keys=keys) | ||
|
||
|
||
class Base: | ||
|
||
_nan_locs = np.arange(20, 40) | ||
|
@@ -4154,3 +4188,20 @@ def test_rolling_cov_offset(self): | |
|
||
expected2 = ss.rolling(3, min_periods=1).cov() | ||
tm.assert_series_equal(result, expected2) | ||
|
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not averse to actually putting the tests in a separate file (may make review easier). so I think a pre-cursor PR to move pandas/tests/test_window.py -> pandas/tests/window/test_window.py makes sense (then we can add on things in an easy namespace) |
||
class TestCustomIndexer: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This class isn't necessary. Since we have |
||
|
||
def test_custom_indexer_validates(self, | ||
dummy_custom_indexer, | ||
win_types, | ||
closed, | ||
min_periods, | ||
center): | ||
# Test passing a BaseIndexer subclass does not raise validation errors | ||
s = Series(range(10)) | ||
s.rolling(dummy_custom_indexer, | ||
win_type=win_types, | ||
center=center, | ||
min_periods=min_periods, | ||
closed=closed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whatever you can type would be great