-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
ENH: add DataFrame.is_unique method #37565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c8c6064
to
1683699
Compare
pandas/core/base.py
Outdated
@@ -1038,16 +1038,19 @@ def nunique(self, dropna: bool = True) -> int: | |||
n -= 1 | |||
return n | |||
|
|||
def _is_unique(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think this necessary
pandas/core/frame.py
Outdated
return self.loc[:, subset].is_unique() | ||
|
||
if len(self.columns): | ||
return self.apply(Series._is_unique) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should just work with
return self.apply(lambda x: x.is_unique())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seemed to be the the same, but I guess someone could have subclassed _constructor_sliced
, which could make a difference.
pandas/core/frame.py
Outdated
DataFrame.duplicated : Indicate duplicate rows. | ||
""" | ||
if subset is not None: | ||
subset = subset if is_list_like(subset) else [subset] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you maybe_make_list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
self = self[subset] | ||
|
||
if len(self.columns): | ||
return self.apply(lambda x: x.is_unique) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this not work on empties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A dataframe with no columns doesn't call the inner func in apply (here lambda x: x.is_unique
), so the output (empty) series has to guess its own dtype. It guesses float64
, which is wrong, so we have to special-case it.
For a DataFrame, |
@jorisvandenbossche that's true, and could be fixes by having an |
d411050
to
9f51784
Compare
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
@topper-123 this was close, can you merge master and move the note to 1.3 |
9f51784
to
45f5b88
Compare
45f5b88
to
63643a0
Compare
Ok, done. |
Looks pretty good overall. Should there be a corresponding |
Thanks for the PR, but it appears that it's gone stale. Please let us know if you'd like to continue by merging master and targeting for 1.4. |
Currently, to see columns with unique values, we have to do
df.apply(lambda x: x.is_unique)
, which is a bit awkward.IMO checking for unique columns in a DataFrame is a common enough need to justify a method for that specific purpose.
Xref: #11948