Skip to content

EA.fillna(method) deprecation and EA.pad_or_backfill issues #54831

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Aug 28, 2023 · 6 comments · Fixed by #54838
Closed

EA.fillna(method) deprecation and EA.pad_or_backfill issues #54831

jorisvandenbossche opened this issue Aug 28, 2023 · 6 comments · Fixed by #54838
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Milestone

Comments

@jorisvandenbossche
Copy link
Member

I have been looking at the EA.fillna method keyword deprecation (original PR at #54001) from the point of view of a downstream EA implementation (from looking into geopandas CI's warnings for the release candidate)

Two issues I am noting:

1) When an external EA currently has a fillna method accepting the method keyword, the user sees three deprecation warnings (the third might be specific to the geopandas example, because of calling super().fillna(..) in GeometryArray.fillna()):

In [2]: s
Out[2]: 
0    POLYGON ((0.00000 0.00000, 1.00000 1.00000, 0....
1                                                 None
2    POLYGON ((0.00000 0.00000, -1.00000 1.00000, 0...
dtype: geometry

In [3]: s.fillna(method="ffill")
/home/joris/scipy/repos/geopandas/geopandas/geoseries.py:873: FutureWarning: GeoSeries.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.
  return super().fillna(
/home/joris/scipy/repos/geopandas/geopandas/geoseries.py:873: FutureWarning: ExtensionArray.fillna 'method' keyword is deprecated. In a future version. arr.pad_or_backfill will be called instead. 3rd-party ExtensionArray authors need to implement pad_or_backfill.
  return super().fillna(
/home/joris/scipy/repos/geopandas/geopandas/array.py:1100: FutureWarning: The 'method' keyword in GeometryArray.fillna is deprecated and will be removed in a future version. Use pad_or_backfill instead.
  return super().fillna(method=method, limit=limit)
Out[3]: 
0    POLYGON ((0.00000 0.00000, 1.00000 1.00000, 0....
1    POLYGON ((0.00000 0.00000, 1.00000 1.00000, 0....
2    POLYGON ((0.00000 0.00000, -1.00000 1.00000, 0...
dtype: geometry

This might be as simple as changing the second warning to a DeprecationWarning for now (the third one might not be easy to avoid, as mentioned above this is called in the geopandas implementation through super().fillna(), and we still want to deprecate direct usage of the EA.fillna method as well)

2) This added a new method to the ExtensionArray, which is pad_or_backfill. Some issues with this name:

  • Methods on the ExtensionArray are user-visible and public methods, so I don't think we should name it "pad_or_backfill". If we want to use this as the mechanism for EAs to override this behaviour, we should call it _pad_or_backfill (like the other official EA methods starting with an underscore), or actually _ffill_or_bfill to use the non-deprecated names.
  • Direct calls of EA.fillna(method=..) are deprecated and currently point to pad_or_backfill. If we don't make this public for end users (point above), then we don't have anything to point to. If we still want to keep this user-accessible at the EA level, we could also simply add ffill and bfill methods to the EA? (to make it consistent with the methods on the Series level)

The first option is very easy to do (simple search/replace on the name), but maybe adding both ffill and bfill as public methods (our own EAs could still dispatch to a shared implementation if that is the easiest) might be the cleaner option long term? And @jbrockmendel based on your comment in the original issue about adding those methods (#53621), I suppose you would be fine with that? (not sure if there was a specific reason to change your mind for the PR to use a combined method?)

Given that those methods need to be implemented by downstream EAs, ideally we get a final naming in for 2.1.0 (or otherwise a quick 2.1.1). But I could look into it tomorrow if we agree on the best path forward.

@jorisvandenbossche jorisvandenbossche added the ExtensionArray Extending pandas with custom dtypes or arrays. label Aug 28, 2023
@jorisvandenbossche jorisvandenbossche added this to the 2.1 milestone Aug 28, 2023
@jbrockmendel
Copy link
Member

This might be as simple as changing the second warning to a DeprecationWarning for now

+1

not sure if there was a specific reason to change your mind for the PR to use a combined method?

In untangling the DataFrame/Manager/Block/EA methods one pain point was that we use different/multiple method names and different levels. I don't particularly care what the eventual name is, but would like for us to have a single consistent name at each level of the call stack (possible leading underscore not-withstanding)

or actually _ffill_or_bfill to use the non-deprecated names

As above, I'm fine with this as long as we make it consistent from NDFrame->Manager->Block->EA.

If we still want to keep this user-accessible at the EA level

AFAIK users don't really use EAs directly. Do you have reason to believe otherwise? I'm fine with making things private as being a safer option (easier to un-privatize later than the other way around).

@jorisvandenbossche
Copy link
Member Author

OK, given we already use (_)pad_or_backfill in many places internally, let's keep that name. If we make it non-user-visible on the ExtensionArray, I don't care too much about the name (as long as we keep it the same in the future, since this is now official API for external EAs).

I opened #54838 for doing the EA.pad_or_backfill -> EA._pad_or_backfill rename

AFAIK users don't really use EAs directly. Do you have reason to believe otherwise? I'm fine with making things private as being a safer option (easier to un-privatize later than the other way around).

I don't know if many users do use those EA methods directly, but nevertheless you can get an EA easily through public API, and so all methods on EAs are public, so I don't think we should expose something like "pad_or_backfill" in our user-visible public namespace. Now, it's true that we can always still add public bfill/ffill methods to EA later (as simple wrappers to _pad_or_backfill), if there is demand for it. So let's not bother with that now, and just use _pad_or_backfill

@jorisvandenbossche
Copy link
Member Author

One additional aspect of the new EA API that I noticed while doing #54838, is that the EA._pad_or_backfill base class method has an unused (and undocumented) limit_area keyword. Looking where this is used, my understanding is that this is only kept in there because we might call that through interpolate(method="ffill", limit_area=..). However, this is 1) deprecated and 2) currently only works for numpy dtypes anyway (our EAs just ignore the limit_area keyword already).

So I think we can easily remove limit_area now from the EA methods (not giving problems later if we would want to remove it after the deprecation enforcement in interpolate). And we only need to keep in the NumpyExtensionArray (the Block.pad_or_backfill method which calls those EA methods actually already doesn't pass through limit_area in the case of EAs).
I added this to the PR as well.

@rhshadrach
Copy link
Member

Are we good with enforcing this deprecation for 3.0? I'm looking to enforce the deprecation of method in fillna and running into this. Currently we internally call _pad_or_backfill and if our implementation gets called then check if the EA author has overridden fillna. This no longer works if fillna doesn't have a method argument.

@jorisvandenbossche
Copy link
Member Author

I think so, yes. We have been warning about this since 2.1, so fine with enforcing that for 3.0

@rhshadrach
Copy link
Member

rhshadrach commented Mar 8, 2024

Thanks - just want to make sure: this is okay to enforce without promoting to FutureWarning because the warning targets EA authors. Is that right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants