-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Decide whether (new) ExtensionArrays and Dtypes are public #22860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@pandas-dev/pandas-core here's a proposal: Everything goes at the top level.
This is consistent with Categorical and SparseArray, meaning we wouldn't have to deprecate those if we want all the arrays in one place. I'm concerned about nesting them in So we add
I don't think we need a timedelta_array / TimedeltaDtype in the public API. These shouldn't be user-facing at all. |
I would be ecstatic if we got to a place where a)
A lot of the pandas-internal EA code could be simplified if there were something analogous to |
I think we should leave the dtypes themselves in pandas.api.types no objection to the *_array in the pandas namespace |
+1 on making these public in general.
This does not currently exist and I'm a little confused as to what it would look like if implemented. I understand how this makes sense for the non-interval EA's, as they are usually constructed in a fairly direct manner. With intervals, my impression is that users have generally been using the Seems like trying to support all these construction methods in a single
I agree, these don't seem to provide much additional utility in general. We could maybe make an exception for |
I agree with @jschendel that CategoricalDtype is especially useful on its own and should be in the top-level namespace. |
Also agreed that IntervalArray's alternative constructors are especially useful, and conflating them in a single |
I have the feeling that adding a function for each array type is some API bloat. I think it would be nice to restrict ourselves to a general I suppose the main drawback of having a single function vs specialized functions for each array is the ability of having additional kwargs? (eg Further, I think we also need to discuss and somewhat define the "scope" of those I think for me the core use case that should certainly be covered is the round trip to/from "python objects" (so something like |
Another remark about where to put them in our API: I don't think those belong in From the existing submodules, I would say that |
+1 on being clear about this. |
On the idea of a single arr = pd.array([1, 2, 3], type=pd.Int64())
I'd really like to avoid I think that a simple |
I hate So what do people think? |
what do I get if I write |
numpy array I would assume.
…On Wed, Nov 7, 2018 at 9:50 AM jbrockmendel ***@***.***> wrote:
what do I get if I write arr = pd.array([2.0, 3.5], dtype='f8')? i.e. do
we raise or just return a numpy array?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#22860 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIq00zi6gcJZ894vDNG4OVH-xQG0dks5uswErgaJpZM4W9Utk>
.
|
Somehow this is maybe fine, as we then not have a bunch of
Unless we are fine with a
For that we already have Related to that, want to repeat my question of above (#22860 (comment)) on the "scope" of those functions. Should a |
Would something like |
Yeah, I think it would be one or the other (except for IntervalArray, which will be special).
When you say "mainly useful for integer", you mean mainly useful
I'm not sure what's best here. I think list / object array of scalars, clearly. For convenience, I think unboxing Series / Index, and idempotenecy is nice. So all of
would work.
To be clear, you can do |
Yep, I know, I was only looking at readability / writability of |
Yes, for sure that as well (next to the list of scalars). |
One additional thing that we didn't really discuss yet: I think the idea for now is that this will also return numpy arrays. There are clear advantages of having that behaviour (knowing that you can give it any data and will give you a usable array-like regardless of the data type and this being what would otherwise also be stored in a Series if you passed it there).
|
We mentioned it briefly on the call, and Jeff was for it returning ndarrays.
In https://github.com/pandas-dev/pandas/pull/23581/files#diff-69ac57923b848af43df327c311b79db4R18 we have a nice, succinct description of what |
Also not for built-in EAs? |
Yes, but only if there is no |
Hmm yea, you're right. It would be strange for OK, two problems then: that'll put pandas EAs on a different level from 3rd party EAs (unless we let And then the problem of |
This mostly affects
Categorical is already public, so let's leave that out.
A few questions
to_*_array
methods, or a top-ish-levelpd.array([...], dtype)
method)?.values
or any operation returning an array (.unique
, probably others)?The text was updated successfully, but these errors were encountered: