ENH: add DataFrame.is_unique method #37565

topper-123 · 2020-11-01T19:22:44Z

Currently, to see columns with unique values, we have to do df.apply(lambda x: x.is_unique), which is a bit awkward.

IMO checking for unique columns in a DataFrame is a common enough need to justify a method for that specific purpose.

Xref: #11948

jreback · 2020-11-02T00:44:54Z

pandas/core/base.py

@@ -1038,16 +1038,19 @@ def nunique(self, dropna: bool = True) -> int:
            n -= 1
        return n

+    def _is_unique(self) -> bool:


i don't think this necessary

pandas/core/frame.py

jreback · 2020-11-02T00:45:38Z

pandas/core/frame.py

+            return self.loc[:, subset].is_unique()
+
+        if len(self.columns):
+            return self.apply(Series._is_unique)


this should just work with
return self.apply(lambda x: x.is_unique())

Seemed to be the the same, but I guess someone could have subclassed _constructor_sliced, which could make a difference.

jreback · 2020-11-03T03:18:41Z

pandas/core/frame.py

+        DataFrame.duplicated : Indicate duplicate rows.
+        """
+        if subset is not None:
+            subset = subset if is_list_like(subset) else [subset]


can you maybe_make_list

jreback · 2020-11-03T03:19:06Z

pandas/core/frame.py

+            self = self[subset]
+
+        if len(self.columns):
+            return self.apply(lambda x: x.is_unique)


why does this not work on empties?

A dataframe with no columns doesn't call the inner func in apply (here lambda x: x.is_unique), so the output (empty) series has to guess its own dtype. It guesses float64, which is wrong, so we have to special-case it.

jorisvandenbossche · 2020-11-03T20:34:15Z

For a DataFrame, is_unique could also be interpreted as whether the DataFrame has all unique rows?

topper-123 · 2020-11-15T11:26:34Z

@jorisvandenbossche that's true, and could be fixes by having an axis parameter. I didn't ínclude it, because duplicated doesn't have one and worried that doing self.T.is_unique() would change the types for the values and could give wrong results. Not sure about that one, actually.

github-actions · 2020-12-16T00:14:39Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

jreback · 2021-02-11T01:36:25Z

@topper-123 this was close, can you merge master and move the note to 1.3

topper-123 · 2021-02-11T15:47:29Z

Ok, done.

mroeschke · 2021-04-11T00:43:26Z

Looks pretty good overall. Should there be a corresponding Series.is_unique method as well? There is a Series.duplicated method (as your referenced duplicated earlier)

mroeschke · 2021-07-11T23:30:59Z

Thanks for the PR, but it appears that it's gone stale. Please let us know if you'd like to continue by merging master and targeting for 1.4.

topper-123 force-pushed the dataframe_is_unique branch 2 times, most recently from c8c6064 to 1683699 Compare November 1, 2020 22:14

jreback requested changes Nov 2, 2020

View reviewed changes

jreback added Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 2, 2020

jreback requested changes Nov 3, 2020

View reviewed changes

topper-123 force-pushed the dataframe_is_unique branch from d411050 to 9f51784 Compare November 15, 2020 11:35

github-actions bot added the Stale label Dec 16, 2020

topper-123 added 6 commits February 11, 2021 11:32

ENH: add DataFrame.is_unique method

a835b37

minor cleanups

f9d9e42

cleanup doc string

5472bbb

simplyfy if subset

846b26d

revert internal method

2e02f10

use maybe_make_list, add doc examples

47d2593

topper-123 force-pushed the dataframe_is_unique branch from 9f51784 to 45f5b88 Compare February 11, 2021 11:34

move to v1.3.0

63643a0

topper-123 force-pushed the dataframe_is_unique branch from 45f5b88 to 63643a0 Compare February 11, 2021 11:35

topper-123 removed the Stale label Mar 5, 2021

mroeschke closed this Jul 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: add DataFrame.is_unique method #37565

ENH: add DataFrame.is_unique method #37565

Uh oh!

topper-123 commented Nov 1, 2020 •

edited

Loading

Uh oh!

jreback Nov 2, 2020

Uh oh!

Uh oh!

jreback Nov 2, 2020

Uh oh!

topper-123 Nov 2, 2020

Uh oh!

jreback Nov 3, 2020

Uh oh!

topper-123 Nov 15, 2020

Uh oh!

jreback Nov 3, 2020

Uh oh!

topper-123 Nov 15, 2020

Uh oh!

jorisvandenbossche commented Nov 3, 2020

Uh oh!

topper-123 commented Nov 15, 2020

Uh oh!

github-actions bot commented Dec 16, 2020

Uh oh!

jreback commented Feb 11, 2021

Uh oh!

topper-123 commented Feb 11, 2021

Uh oh!

mroeschke commented Apr 11, 2021

Uh oh!

mroeschke commented Jul 11, 2021

Uh oh!

Uh oh!

Uh oh!

ENH: add DataFrame.is_unique method #37565

ENH: add DataFrame.is_unique method #37565

Uh oh!

Conversation

topper-123 commented Nov 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback Nov 2, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jreback Nov 2, 2020

Choose a reason for hiding this comment

Uh oh!

topper-123 Nov 2, 2020

Choose a reason for hiding this comment

Uh oh!

jreback Nov 3, 2020

Choose a reason for hiding this comment

Uh oh!

topper-123 Nov 15, 2020

Choose a reason for hiding this comment

Uh oh!

jreback Nov 3, 2020

Choose a reason for hiding this comment

Uh oh!

topper-123 Nov 15, 2020

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Nov 3, 2020

Uh oh!

topper-123 commented Nov 15, 2020

Uh oh!

github-actions bot commented Dec 16, 2020

Uh oh!

jreback commented Feb 11, 2021

Uh oh!

topper-123 commented Feb 11, 2021

Uh oh!

mroeschke commented Apr 11, 2021

Uh oh!

mroeschke commented Jul 11, 2021

Uh oh!

Uh oh!

topper-123 commented Nov 1, 2020 •

edited

Loading