-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffIndexingRelated to indexing on series/frames, not to indexes themselvesRelated to indexing on series/frames, not to indexes themselvesNumeric OperationsArithmetic, Comparison, and Logical operationsArithmetic, Comparison, and Logical operations
Milestone
Description
When working with external data, I often see rows with primary key violations. Currently, I could not easily select all the violating rows. For example, if I have a massive file with some inconsistent data
datecol,valuecol
...
2014-01-01,12
2014-01-01,13
2014-01-02,10
...
In this use case, it would be good if we can do df[df.duplicated('datecol', take_all=True)]
to directly get the bad rows
2014-01-01,12
2014-01-01,13
Metadata
Metadata
Assignees
Labels
AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffIndexingRelated to indexing on series/frames, not to indexes themselvesRelated to indexing on series/frames, not to indexes themselvesNumeric OperationsArithmetic, Comparison, and Logical operationsArithmetic, Comparison, and Logical operations