-
-
Notifications
You must be signed in to change notification settings - Fork 144
add proper type when grouping by a Series #708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ies.groupby and DataFrame.groupby
Just uploaded tests for iteration over groupby objects. |
It looks as if it should work :) I would start with adding |
My vscode was happy so I guess pyright was happy. However, mypy do fail on local too. I'll do some additional investigation. |
Interesting take with reveal_types : def test_types_groupby_iter() -> None:
s = pd.Series([1, 1, 2])
series_groupby: pd.Series[bool] = pd.Series([True, True, False])
first_group = next(iter(s.groupby(series_groupby)))
reveal_type(s.groupby(series_groupby))
reveal_type(s.groupby(series_groupby).__iter__())
reveal_type(iter(s.groupby(series_groupby))) Outputs
How can the type of Even more puzzling is that the type is right for DataFrames : def test_types_groupby_iter() -> None:
df = pd.DataFrame(data={"col1": [1, 1, 2], "col2": [3, 4, 5]})
series_groupby: pd.Series[bool] = pd.Series([True, True, False])
first_group = next(iter(df.groupby(series_groupby)))
reveal_type(df.groupby(series_groupby))
reveal_type(df.groupby(series_groupby).__iter__())
reveal_type(iter(df.groupby(series_groupby))) output
But I don't see any particular additional type hint |
I think the issue is that when you are doing NOTE - my ability to help out will be pretty limited between now (5/24) and 6/4 due to vacation. |
You are right, mypy now works ! Did not know the dtype was useful to help the type checker. |
I will merge low-risk PRs during that timeframe. |
Maybe this is deserving of a separate issue, but shouldn't we be able to create an initializer overload that takes a |
Can you create a separate issue? I think this might work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple small things from me, plus address issue to add comment about SeriesByT
tests/test_frame.py
Outdated
assert_type(first_group[0], "bool"), | ||
bool, | ||
) | ||
check( | ||
assert_type(first_group[1], "pd.DataFrame"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need quotes around the types in the two assert_type()
statements, because those are valid types at runtime.
tests/test_series.py
Outdated
series_groupby = pd.Series([True, True, False], dtype=bool) | ||
first_group = next(iter(s.groupby(series_groupby))) | ||
check( | ||
assert_type(first_group[0], "bool"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no quotes needed here
tests/test_series.py
Outdated
assert_type(first_group[1], "pd.Series[int]"), | ||
pd.Series, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_type(first_group[1], "pd.Series[int]"), | |
pd.Series, | |
assert_type(first_group[1], "pd.Series[int]"), | |
pd.Series, np.integer |
You do need quotes here because pd.Series[int]
is not a type at runtime.
Thank you for the review, I do believe I have addressed the comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of a recent change to the definition of S1
, can you merge with main
, address the comment below, and if the tests pass, we should be good to go.
pandas-stubs/_typing.pyi
Outdated
# Essentially, an intersection between Series S1 TypeVar, and ByT TypeVar | ||
SeriesByT = TypeVar( | ||
"SeriesByT", | ||
str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been recently advised to no longer use constrained TypeVars
, so can you change this to SeriesByT = TypeVar("SeriesByT", bound=str | bytes | ...)
We made the change for S1
in a recent PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
What is the difference between TypeVar("SeriesByT", bound = str | bytes ...)
and TypeVar("SeriesByT, str, byts, ...)
?
I tried to replace the SeriesByT with the bound
keyword, but then the mypy did not pass anymore :
Poe => mypy
===========================================
Beginning: 'Run mypy on 'tests' (using the local stubs) and on the local stubs'
===========================================
pandas-stubs/core/series.pyi:648: error: Type variable "SeriesByT" not valid as type argument value for "SeriesGroupBy" [type-var]
pandas-stubs/core/frame.pyi:1100: error: Type variable "SeriesByT" not valid as type argument value for "DataFrameGroupBy" [type-var]
Found 2 errors in 2 files (checked 224 source files)
===========================================
Step: 'Run mypy on 'tests' (using the local stubs) and on the local stubs' failed!
===========================================
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bound
allows the type to be a union of all the listed types. Without it, the type has to be one of the listed (sub-)types.
You probably also need to adjust the type used for SeriesGroupBy
and DataFrameGroupBy
to use bound
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks I actually tried that and got rid of the errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @ClementPinard
Work in Progress, No test yet
I added a new TypeVar that is the intersection of S1 and ByT.
The type hint for S1 types outside of ByT is not covered anymore, but I suppose that it would theoretically fail in pandas, hence its absence of ByT
assert_type()
to assert the type of any return value