Skip to content

0.17 drop_duplicates() incorrectly dropping non-unique values #11459

@amanhanda

Description

@amanhanda

Under 0.17, numpy 1.10

In [32]: import pandas

In [33]: df = pandas.DataFrame(data={"a" : [0, 1, 3, 4], "b":[20, 16, 8, 4]})

In [34]: df
Out[34]:
   a   b
0  0  20
1  1  16
2  3   8
3  4   4

In [35]: df.drop_duplicates(['a','b'], keep='last')
Out[35]:
   a  b
3  4  4

0.16.2 has the correct behavior.

In [6]: import pandas

In [7]: df = pandas.DataFrame(data={"a" : [0, 1, 3, 4], "b":[20, 16, 8, 4]})

In [8]: df
Out[8]:
   a   b
0  0  20
1  1  16
2  3   8
3  4   4

In [9]: df.drop_duplicates(['a','b'])
Out[9]:
   a   b
0  0  20
1  1  16
2  3   8
3  4   4

Metadata

Metadata

Assignees

No one assigned

    Labels

    Duplicate ReportDuplicate issue or pull requestReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions