Skip to content

ENH: add DataFrame.is_unique method #37565

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Nov 1, 2020

Currently, to see columns with unique values, we have to do df.apply(lambda x: x.is_unique), which is a bit awkward.

IMO checking for unique columns in a DataFrame is a common enough need to justify a method for that specific purpose.

Xref: #11948

@topper-123 topper-123 force-pushed the dataframe_is_unique branch 2 times, most recently from c8c6064 to 1683699 Compare November 1, 2020 22:14
@@ -1038,16 +1038,19 @@ def nunique(self, dropna: bool = True) -> int:
n -= 1
return n

def _is_unique(self) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this necessary

return self.loc[:, subset].is_unique()

if len(self.columns):
return self.apply(Series._is_unique)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should just work with
return self.apply(lambda x: x.is_unique())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seemed to be the the same, but I guess someone could have subclassed _constructor_sliced, which could make a difference.

@jreback jreback added Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 2, 2020
DataFrame.duplicated : Indicate duplicate rows.
"""
if subset is not None:
subset = subset if is_list_like(subset) else [subset]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you maybe_make_list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

self = self[subset]

if len(self.columns):
return self.apply(lambda x: x.is_unique)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this not work on empties?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dataframe with no columns doesn't call the inner func in apply (here lambda x: x.is_unique), so the output (empty) series has to guess its own dtype. It guesses float64, which is wrong, so we have to special-case it.

@jorisvandenbossche
Copy link
Member

For a DataFrame, is_unique could also be interpreted as whether the DataFrame has all unique rows?

@topper-123
Copy link
Contributor Author

@jorisvandenbossche that's true, and could be fixes by having an axis parameter. I didn't ínclude it, because duplicated doesn't have one and worried that doing self.T.is_unique() would change the types for the values and could give wrong results. Not sure about that one, actually.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Dec 16, 2020
@jreback
Copy link
Contributor

jreback commented Feb 11, 2021

@topper-123 this was close, can you merge master and move the note to 1.3

@topper-123
Copy link
Contributor Author

Ok, done.

@topper-123 topper-123 removed the Stale label Mar 5, 2021
@mroeschke
Copy link
Member

Looks pretty good overall. Should there be a corresponding Series.is_unique method as well? There is a Series.duplicated method (as your referenced duplicated earlier)

@mroeschke
Copy link
Member

Thanks for the PR, but it appears that it's gone stale. Please let us know if you'd like to continue by merging master and targeting for 1.4.

@mroeschke mroeschke closed this Jul 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants