Skip to content

Conversation

@phofl
Copy link
Member

@phofl phofl commented Sep 5, 2022

  • closes #xxxx (Replace xxxx with the Github issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

This won't be able to set inplace, so should not warn

@phofl phofl added this to the 1.5 milestone Sep 5, 2022
@phofl
Copy link
Member Author

phofl commented Sep 6, 2022

We are trying to cast [nan, 1.0] to integer, which raises the RuntimeWarning, but I don't think that this should leak to our users.

@mroeschke mroeschke added the Warnings Warnings that appear or should be added to pandas label Sep 6, 2022
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM off to @jbrockmendel

@phofl
Copy link
Member Author

phofl commented Sep 7, 2022

@jbrockmendel ok to merge?

or new_values.shape != orig_values.shape
or (
not can_hold_element(orig_values, np.nan)
and isna(new_values).any()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we get here with warn=True?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iloc._setitem_with_indexer(indexer, value, self.name)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so i think what's happening here is that we are checking can_hold_element too soon, bc reindexing occurs within self.obj._iset_item(loc, value). I think it would be better to do the reindex before the can_hold_element check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a suggestion. Not sure what is better performance wise. Checking initially and the rechecking? Is _get_column_array expensive or cheap? If it is cheap, it might be better only checking can_hold_element for new values?

This breaks one test, not sure if you can set an all NaT Series into an underlying Series with timezone UTC?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_column_array is pretty cheap. its the isna(...).any() that i think may be expensive

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks one test, not sure if you can set an all NaT Series into an underlying Series with timezone UTC?

depends on the dtype of the all-NaT Series

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

original values are datetime64[ns, UTC] and all NaT Series is datetime64[ns]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that should definitely not be inplace

Copy link
Member Author

@phofl phofl Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, we fixed the test instead of breaking it :)

Anything against checking only new_values then? See latest commit now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that makes sense. caffeine is still kicking in though

@phofl
Copy link
Member Author

phofl commented Sep 15, 2022

We should try to get this into 1.5 too to avoid confusione

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fairly good to me. @jbrockmendel for any additional comments.

if tz is None:
msg = "will attempt to set the values inplace instead"
with tm.assert_produces_warning(FutureWarning, match=msg):
df.iloc[:, 0] = pd.NaT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phofl i think i misunderstood earlier when i said this should warn. i was under the impression that the RHS was pd.Series([pd.NaT]*N, dtype="M8[ns]"), not the scalar pd.NaT. With the scalar, this should not warn.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is an existing problem that i think is orthogonal to this PR, so no need to hold up on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep agreed, the NaT gets cast to an all NaT Series without a timezone, hence this seems equal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideal then would be to assert_warns(None) unconditionally and xfail the tzaware case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jbrockmendel
Copy link
Member

some nitpicks, otherwise LGTM

@mroeschke mroeschke merged commit b934b0e into pandas-dev:main Sep 16, 2022
@mroeschke
Copy link
Member

Thanks @phofl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Copy / view semantics Warnings Warnings that appear or should be added to pandas