-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
Once again, caused by #6328 investigation.
There's something very strange with how Index objects handle slices:
In [1]: import pandas.util.testing as tm
In [2]: idx = tm.makeStringIndex(1000000)
In [3]: timeit idx[:-1]
100000 loops, best of 3: 2 µs per loop
In [4]: timeit idx[slice(None,-1)]
100 loops, best of 3: 6.5 ms per loopObviously, this happens because Index doesn't override __getslice__ provided by ndarray, hence idx[:-1] is executed via ndarray.__getslice__ -> Index.__array_finalize__ and idx[slice(None, -1)] goes via Index.__getitem__ -> Index.__new__.
__getitem__ is made 1000x slower trying to infer slice data type and convert it to a different subclass. The problem is that interactive invocation idx[:-1], which is when that milliseconds-vs-microseconds issue doesn't matter, is likely to miss this feature, because it's dispatched via __getslice__ . But for programmatic invocation idx[slice(None, -1)] which hits this soft spot, I'd argue that this type conversion magic is not at all necessary.
Is there a rationale behind this?