Skip to content

BUG: df.replace over pd.Period columns (#34871) #36867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Nov 17, 2020
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -434,6 +434,8 @@ Strings

Interval
^^^^^^^^

- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` where :class:`Interval` dtypes would be converted to object dtypes (:issue:34871)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backticks around the issue number

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed

- Bug in :meth:`IntervalIndex.take` with negative indices and ``fill_value=None`` (:issue:`37330`)
-
-
Expand Down Expand Up @@ -488,6 +490,11 @@ I/O
- Bug in :class:`HDFStore` threw a ``TypeError`` when exporting an empty :class:`DataFrame` with ``datetime64[ns, tz]`` dtypes with a fixed HDF5 store (:issue:`20594`)
- Bug in :class:`HDFStore` was dropping timezone information when exporting :class:`Series` with ``datetime64[ns, tz]`` dtypes with a fixed HDF5 store (:issue:`20594`)

Period
^^^^^^

- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` where :class:`Period` dtypes would be converted to object dytpes (:issue:34871)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo on dtypes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dytpes

typo is still there. BTW in general let the person who made the original comment hit the "conversation resolved" button

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha thanks for the heads up, haven't used github at all really. Fixed


Plotting
^^^^^^^^

Expand Down
12 changes: 12 additions & 0 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -2002,6 +2002,18 @@ class ObjectValuesExtensionBlock(ExtensionBlock):
def external_values(self):
return self.values.astype(object)

def _can_hold_element(self, element: Any) -> bool:
if is_valid_nat_for_dtype(element, self.dtype):
return True
if element is NaT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel ok here?

@samc1213 are all of these branches actually hit? in particular why L2008 and L2010

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Added these due to unit tests. Specifically, L2010 is required for pandas/tests/extension/test_interval.py::TestSetitem::test_setitem_empty_indxer to pass. L2008 may not be required, I just removed it and will see if the tests pass...

return True
if isinstance(element, list) and len(element) == 0:
return True
tipo = maybe_infer_dtype_type(element)
if tipo is not None:
return issubclass(tipo.type, self.dtype.type)
return isinstance(element, self.dtype.type)


class NumericBlock(Block):
__slots__ = ()
Expand Down
12 changes: 5 additions & 7 deletions pandas/tests/frame/methods/test_replace.py
Original file line number Diff line number Diff line change
Expand Up @@ -1517,17 +1517,15 @@ def test_replace_with_duplicate_columns(self, replacement):

tm.assert_frame_equal(result, expected)

@pytest.mark.xfail(
reason="replace() changes dtype from period to object, see GH34871", strict=True
)
def test_replace_period_ignore_float(self):
@pytest.mark.parametrize("value", [pd.Period("2020-01"), pd.Interval(0, 5)])
def test_replace_ea_ignore_float(self, value):
"""
Regression test for GH#34871: if df.replace(1.0, 0.0) is called on a df
with a Period column the old, faulty behavior is to raise TypeError.
with a Period/Interval column the old, faulty behavior is to raise TypeError.
"""
df = DataFrame({"Per": [pd.Period("2020-01")] * 3})
df = DataFrame({"Per": [value] * 3})
result = df.replace(1.0, 0.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is now frame_or_series, can you call it obj instead of df

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, done

expected = DataFrame({"Per": [pd.Period("2020-01")] * 3})
expected = DataFrame({"Per": [value] * 3})
tm.assert_frame_equal(expected, result)

def test_replace_value_category_type(self):
Expand Down
12 changes: 12 additions & 0 deletions pandas/tests/series/methods/test_replace.py
Original file line number Diff line number Diff line change
Expand Up @@ -450,6 +450,18 @@ def test_replace_with_compiled_regex(self):
expected = pd.Series(["z", "b", "c"])
tm.assert_series_equal(result, expected)

@pytest.mark.parametrize("value", [pd.Period("2020-01"), pd.Interval(0, 5)])
def test_replace_ea_ignore_float(self, value):
"""
Regression test for corrolary to GH#34871: if series.replace(1.0, 0.0)
is called on a Period/Interval Series, the old, faulty behavior
is to raise TypeError.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be a comment, not a docstring, just needs to see # GH#34871, ditto above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually you can share this test using the frame_or_series fixture

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to a comment, but I grepped for frame_or_series and not finding much in the codebase. I did find index_or_series. Can you please elaborate more on how to use frame_or_series? There do seem to be a fair amount of duplication between these two files (series/test_replace and frame/test_replace). test_replace_with_compiled_regex is basically copy-pasted between the two, for example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate more on how to use frame_or_series?

It's a pytest fixture:

def frame_or_series(request):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @arw2019 - I needed to merge master in locally to find this

series = pd.Series([value] * 3)
result = series.replace(1.0, 0.0)
expected = pd.Series([value] * 3)
tm.assert_series_equal(expected, result)

@pytest.mark.parametrize("pattern", ["^.$", "."])
def test_str_replace_regex_default_raises_warning(self, pattern):
# https://github.com/pandas-dev/pandas/pull/24809
Expand Down