ENH/PERF: cache sort/mask per column #3539
Labels
Closing Candidate
May be closeable, needs more eyeballs
Enhancement
Performance
Memory or execution speed performance
Along the lines of wes's answer to this question:
http://stackoverflow.com/questions/16384332/how-to-speed-up-pandas-row-filtering-by-string-matching
I think it is possible to have a dictionary recording certain parameters for a series (or a column in a frame), something like
conditions = dict(sorted=False, nulls=False, unique=True)
that would alllow certain operations to be speeded up, of course these conditions
would have to be updated in various scenarios, e.g. when sorting by a certain
column, then you could set the sorted condition =
True
(and invalidate whensorting by other columns). however, and this might be a bit complicated to determine
(in which case you could just set
sorted = None
, meaning I don't know).But many operations could preserve these conditions (e.g. a reindexing with a monotonic index will preserver the sort, but will invalidate the nulls, if its not identical to the current index)
not-trivial but might be worth it
e.g. using the fact that I already computed nulls, I can go directly to numpy land if I already know I don't need to do the null check
or if its already sorted, then can use
searchsorted
The text was updated successfully, but these errors were encountered: