-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Enable indexing with nullable Boolean #31591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
75c915f
9c5b9f0
2441b40
d71d1ba
4d3a264
543ef9a
d3e7a69
ad7ae66
6991394
f6e9ce5
1234407
9b7e879
efdd29a
7fa36b6
b8e3d6b
bc3fe3f
73ad221
547d7bc
5649445
bb3d143
f107252
7b924b7
46d77df
ac71cbf
e5ed092
9fcdb23
c2dfa93
a9a12b1
7c10f33
cf3d60d
157d8b9
250f228
647f0f6
6ccd96d
a9e73de
adc3075
29ff823
0a58605
b38a209
5088cbb
54efdd9
c6b81ed
67800c6
4c334f3
578fd3c
a559385
705947e
4974778
319b525
8007ce4
a10765f
d7fc3b7
bca582e
6f9a298
e1e39fe
5a72b2f
c0e8dc7
a293bc6
607d9ed
2e7f9b3
bfe472b
a6294f8
c6d23f6
c8ee434
fbda99d
3bf9327
dd65b0d
974ec5d
8f2d7bb
080d1d2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,8 +20,9 @@ Nullable Boolean data type | |
Indexing with NA values | ||
----------------------- | ||
|
||
pandas does not allow indexing with NA values. Attempting to do so | ||
will raise a ``ValueError``. | ||
pandas allows indexing with ``NA`` values in a boolean array, which are treated as ``False``. | ||
|
||
.. versionchanged:: 1.0.2 | ||
|
||
.. ipython:: python | ||
:okexcept: | ||
|
@@ -30,12 +31,11 @@ will raise a ``ValueError``. | |
mask = pd.array([True, False, pd.NA], dtype="boolean") | ||
s[mask] | ||
|
||
The missing values will need to be explicitly filled with True or False prior | ||
to using the array as a mask. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you keep this example but reword it as something like "if you want different behaviour, you can fill manually with fillna(True)" ? |
||
If you would prefer to keep the ``NA`` values you can manually fill them with ``fillna(True)``. | ||
|
||
.. ipython:: python | ||
|
||
s[mask.fillna(False)] | ||
s[mask.fillna(True)] | ||
|
||
.. _boolean.kleene: | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,6 +13,7 @@ | |
is_iterator, | ||
is_list_like, | ||
is_numeric_dtype, | ||
is_object_dtype, | ||
is_scalar, | ||
is_sequence, | ||
) | ||
|
@@ -2189,10 +2190,12 @@ def check_bool_indexer(index: Index, key) -> np.ndarray: | |
"the indexed object do not match)." | ||
) | ||
result = result.astype(bool)._values | ||
else: | ||
# key might be sparse / object-dtype bool, check_array_indexer needs bool array | ||
elif is_object_dtype(key): | ||
# key might be object-dtype bool, check_array_indexer needs bool array | ||
result = np.asarray(result, dtype=bool) | ||
result = check_array_indexer(index, result) | ||
else: | ||
result = check_array_indexer(index, result) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. might be able to simply to check_array_indexer right before returning (iow for all cases) here. (try in a followon) |
||
|
||
return result | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -158,21 +158,23 @@ def test_getitem_boolean_array_mask(self, data): | |
result = pd.Series(data)[mask] | ||
self.assert_series_equal(result, expected) | ||
|
||
def test_getitem_boolean_array_mask_raises(self, data): | ||
dsaxton marked this conversation as resolved.
Show resolved
Hide resolved
|
||
def test_getitem_boolean_na_treated_as_false(self, data): | ||
# https://github.com/pandas-dev/pandas/issues/31503 | ||
mask = pd.array(np.zeros(data.shape, dtype="bool"), dtype="boolean") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you take here something with True's as well? (now it will give an empty result) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it would be good to run the test also for both a boolean array and a list as mask (to ensure the list works) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like the list input may not be working properly, will work on fixing that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jorisvandenbossche So I think the issue with list masks containing bools and pd.NA was that Made an update there to recognize pd.NA and also updated the test; hopefully CI will still pass. The assumption that boolean indexers are ones that can be cast as numpy boolean arrays seems to happen in a lot of places (e.g., https://github.com/pandas-dev/pandas/blob/master/pandas/core/indexes/base.py#L4147) so I could see this causing problems. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, the "old" code of still accepting object dtype makes this a bit more complex indeed. Maybe instead of casting to a numpy array, we could use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that should theoretically work in combination with the right change to |
||
mask[:2] = pd.NA | ||
mask[2:4] = True | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
msg = ( | ||
"Cannot mask with a boolean indexer containing NA values|" | ||
"cannot mask with array containing NA / NaN values" | ||
) | ||
with pytest.raises(ValueError, match=msg): | ||
data[mask] | ||
result = data[mask] | ||
expected = data[mask.fillna(False)] | ||
|
||
self.assert_extension_array_equal(result, expected) | ||
|
||
s = pd.Series(data) | ||
|
||
with pytest.raises(ValueError): | ||
s[mask] | ||
result = s[mask] | ||
expected = s[mask.fillna(False)] | ||
|
||
self.assert_series_equal(result, expected) | ||
|
||
@pytest.mark.parametrize( | ||
"idx", | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -98,8 +98,9 @@ def test_setitem_iloc_scalar_multiple_homogoneous(self, data): | |
[ | ||
np.array([True, True, True, False, False]), | ||
pd.array([True, True, True, False, False], dtype="boolean"), | ||
pd.array([True, True, True, pd.NA, pd.NA], dtype="boolean"), | ||
], | ||
ids=["numpy-array", "boolean-array"], | ||
ids=["numpy-array", "boolean-array", "boolean-array-na"], | ||
) | ||
def test_setitem_mask(self, data, mask, box_in_series): | ||
arr = data[:5].copy() | ||
|
@@ -124,20 +125,17 @@ def test_setitem_mask_raises(self, data, box_in_series): | |
with pytest.raises(IndexError, match="wrong length"): | ||
data[mask] = data[0] | ||
|
||
def test_setitem_mask_boolean_array_raises(self, data, box_in_series): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. rather than remove this can you turn it into a test (obviously changed that we no longer raise) |
||
# missing values in mask | ||
def test_setitem_mask_boolean_array_with_na(self, data, box_in_series): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't this test duplicating the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Somewhat, yes There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can simply remove it then, I think? |
||
mask = pd.array(np.zeros(data.shape, dtype="bool"), dtype="boolean") | ||
mask[:2] = pd.NA | ||
mask[:3] = True | ||
mask[3:5] = pd.NA | ||
|
||
if box_in_series: | ||
data = pd.Series(data) | ||
|
||
msg = ( | ||
"Cannot mask with a boolean indexer containing NA values|" | ||
"cannot mask with array containing NA / NaN values" | ||
) | ||
with pytest.raises(ValueError, match=msg): | ||
data[mask] = data[0] | ||
data[mask] = data[0] | ||
|
||
assert (data[:3] == data[0]).all() | ||
|
||
@pytest.mark.parametrize( | ||
"idx", | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -415,10 +415,6 @@ def test_setitem_mask(self, data, mask, box_in_series): | |
def test_setitem_mask_raises(self, data, box_in_series): | ||
super().test_setitem_mask_raises(data, box_in_series) | ||
|
||
@skip_nested | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we have a setitem test? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I added a case containing NA to the the |
||
def test_setitem_mask_boolean_array_raises(self, data, box_in_series): | ||
super().test_setitem_mask_boolean_array_raises(data, box_in_series) | ||
|
||
@skip_nested | ||
@pytest.mark.parametrize( | ||
"idx", | ||
|
Uh oh!
There was an error while loading. Please reload this page.