Skip to content

Conversation

m-richards
Copy link
Contributor

I've added a fixture for the specific construction of a subclassed dataframe from the issue - which mirrors the setup in geopandas, I expect this probably needs to be relocated (I could have constructed this inline with the test and maybe that's the right solution, but I can see analogous changes for astype perhaps being useful in the geopandas context as well).

I've also added the call to __finalize__ although it's not strictly needed, since that's also needed by #28283 (Note that the 1 dimensional case already has __finalize__ called indirectly via astype.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok, one question

@jreback jreback added this to the 1.4 milestone Nov 5, 2021
@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Constructors Series/DataFrame/Index/pd.array Constructors Subclassing Subclassing pandas objects labels Nov 5, 2021
@m-richards
Copy link
Contributor Author

m-richards commented Nov 5, 2021

CI failures now don't seem related, but I could be wrong

Edit: now it definitely does, trying to sort out the type hints

@@ -6219,8 +6220,12 @@ def convert_dtypes(
for col_name, col in self.items()
]
if len(results) > 0:
result = concat(results, axis=1, copy=False)
cons = cast(Type[DataFrame], self._constructor)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super happy with this. Was getting the error

pandas/core/generic.py:6222: error: Argument 1 to "NDFrame" has incompatible type "DataFrame"; expected "Union[ArrayManager, SingleArrayManager, BlockManager, SingleBlockManager]"  [arg-type]

at https://github.com/pandas-dev/pandas/runs/4114882794?check_suite_focus=true, which refers to the line
result = self._constructor(concat(results, axis=1, copy=False)).

Bit confused because NDFrame._constructor raises an abstractmethod error, I'm guessing mypy is then reading the parent class PandasObject which returns type(self) i.e. in this case NDFrame. Then we're calling NDFrame with a DataFrame (the result of concat with more that 1 column). Finally the mypy type error seems to come from DataFrame in the __init__ of NDFrame which specifies the first argument is a Manager and not a DataFrame.

I guess given the return signature of concat is Series | DataFrame this is technically correct, but it seems awkward to use the DataFrame directly in the definition of the more general NDFrame

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok, need to remove the print. ping on green.

@m-richards
Copy link
Contributor Author

looks ok, need to remove the print. ping on green.

@jreback looks like everything is passing now.

@jreback jreback merged commit 14bfd6b into pandas-dev:master Nov 13, 2021
@jreback
Copy link
Contributor

jreback commented Nov 13, 2021

thanks @m-richards

nickleus27 pushed a commit to nickleus27/pandas that referenced this pull request Nov 28, 2021
@m-richards m-richards deleted the convert_dtypes_subclasses branch September 16, 2023 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Constructors Series/DataFrame/Index/pd.array Constructors Subclassing Subclassing pandas objects
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.convert_dtypes doesn't preserve subclasses
3 participants