-
-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Closed
Labels
Duplicate ReportDuplicate issue or pull requestDuplicate issue or pull requestReshapingConcat, Merge/Join, Stack/Unstack, ExplodeConcat, Merge/Join, Stack/Unstack, Explode
Description
When I upgraded from 0.16.2 to 0.17.0, I was met with a nasty surprise when dropping duplicates. It looks like DataFrame.drop_duplicates() is not working as I would expect it to based on the previous version. I have a dataframe
test_ids = df['test_id'].unique()
print('N test ids: {}'.format(test_ids.shape))
print('N tests: {}'.format(df[['test_id', <some other columns>]].drop_duplicates().shape))
the output is:
N test ids: (341334,)
N tests: (237426, 10)
when I run the same in 0.16.2 the output is:
N test ids: (341334,)
N tests: (341334, 10)
I don't think you should be able to get fewer rows than the number of unique entries in a single column.
Metadata
Metadata
Assignees
Labels
Duplicate ReportDuplicate issue or pull requestDuplicate issue or pull requestReshapingConcat, Merge/Join, Stack/Unstack, ExplodeConcat, Merge/Join, Stack/Unstack, Explode