-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Open
Labels
EnhancementExtensionArrayExtending pandas with custom dtypes or arrays.Extending pandas with custom dtypes or arrays.Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNA - MaskedArraysRelated to pd.NA and nullable extension arraysRelated to pd.NA and nullable extension arraysNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further action
Description
Currently, our nullable / masked extension arrays (boolean, integer, for now) are using a numpy boolean array as their _mask
to keep track of missing values. A potential route for improving memory and performance would be using a bitarray instead of a boolean numpy array (which is a byte per value).
This should require some exploration: what are options how to implement this? (existing libraries, custom implementation) What is the performance impact? (some things like masking will also be slower, since we still rely on numpy for that, which needs boolean arrays) Is this worth it to do a custom implementation rather than using pyarrow for this? etc
Metadata
Metadata
Assignees
Labels
EnhancementExtensionArrayExtending pandas with custom dtypes or arrays.Extending pandas with custom dtypes or arrays.Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNA - MaskedArraysRelated to pd.NA and nullable extension arraysRelated to pd.NA and nullable extension arraysNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further action