-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
Hello there
Is your feature request related to a problem?
[this should provide a description of what the problem is, e.g. "I wish I could use pandas to do [...]"]
Context : I am creating a package to handle physical units (yes, another one), and I started working on the pandas interface implementation. I looked into pandas extension page, as well as what pint did with pint-pandas. I am pretty satisfied with the result, except for one thing : When creating pandas objects (Series of DataFrame), I have to explicitly specify what dtype (using my DtypeExtension for my "Quantity" class) pandas should use to cast my Quantity object to the correspond QuantityArrayExtension. Categorical objects kinda exhibit the same problem :
# create indeed a Categorical dtype
s = pd.Series(["a", "b", "c", "a"], dtype="category")
# use "object" as dtype
pd.Series(["a", "b", "c", "a"])
from physipy import m # import the "meter" object
from physipy import QuantityDtype # import the DtypeExtension for Quantity object
# create indeed a QuantityDtype serie
s = pd.Series([1, 2, 3]*m, dtype=DtypeExtension)
# casts into integers, dropping the "unit" (because bypasses my object by accessing its "array" value directly
pd.Series([1, 2, 3]*m)Now, I understand that for the Categorical example, it is not obvious what kind of dtype pandas should use, but for my custom class, I would like to be able to tell pandas how to behave.
Describe the solution you'd like
I would expect some interface like this :
import pandas as pd
from physipy import Quantity, QuantityDtype
# tell pandas to use QuantityDtype when a Quantity object is passed
pd.dtype_lut[Quantity] = QuantityDtype
# then a series can be created directly
my_quantity_object = [1, 2, 3]*m # this is a Quantity object
s = pd.Series(my_quantity_object)) # note the absence of dtype specificationHere, pandas admits it doesn't know the passed object's type, and so check in its dtype_lut if a corresponding dtype is set.
Another interface would be to add a method, pandas-specifically named, to Quantity that does this look-up table :
# into my Quantity object
class Quantity:
....
def pd_dtype(self):
return QuantityDtypeso that when pandas encounters an unknown object type, it first tries to get its Dtype using "obj.pd_type()"
Cheers