-
-
Notifications
You must be signed in to change notification settings - Fork 147
Description
Describe the bug
Calling apply
on a grouped DataFrame - ie a DataFrameGroupBy object - usually returns a DataFrame, and this is correctly typed in the stubs. However, if the callable being applied returns a scalar rather than a DataFrame or Series, then apply
will in turn return a Series. This is documented in example 3 in the docs for GroupBy.apply
.
To Reproduce
Toy example:
df = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})
result: pd.Series = df.groupby('col1').apply(lambda x: x.sum().mean())
assert(isinstance(result, pd.Series))
This code runs correctly and the assertion succeeds, but mypy gives "Incompatible types in assignment (expression has type "DataFrame", variable has type "Series[Any]")".
Please complete the following information:
- OS: MacOS
- OS Version: 12.5
- python version: 3.9.9
- mypy version: 0.971
- version of installed
pandas-stubs
: 1.4.3.220718
Additional context
It's not obvious how this could be fixed. I tried using overload
to give different return types depending on the type of Callable passed:
@overload
def apply(self, func: Callable[[DataFrame], Union[DataFrame, Series]], *args, **kwargs) -> DataFrame: ...
@overload
def apply(self, func: Callable[[DataFrame], Union[Scalar, Sequence]], *args, **kwargs) -> Series: ...
and this correctly lints the example above, but fails in at least one particular circumstance: Pandas special-cases the built-in functions sum
, max
and min
when passed to apply
, replacing them with their numpy equivalents, so the above code causes the test in https://github.com/pandas-dev/pandas-stubs/blob/main/tests/test_frame.py#L558 to fail.