You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DataFrame.duplicated correctly uses hashtable.duplicated_int64 a specialized routine, while core.series.IndexOpsMixin.duplicated uses lib.duplicated an object based one.
as its almost always better to factorize then use the fast routine, than to do object comparisons.
but YMMV, so needs a couple of perf tests.
In [3]: s = Series(np.random.randint(0,10000,size=100000))
In [4]: %timeit s.duplicated()
100 loops, best of 3: 5.75 ms per loop
In [6]: df = s.to_frame()
In [8]: %timeit df.duplicated()
1000 loops, best of 3: 1.86 ms per loop