-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: add documentation to DataFrameGroupBy.skew and SeriesGroupBy.skew #50958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
42cb9dd
032fa5f
97823a6
7ace24e
3e1bc24
2a413ac
1701d1f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1001,14 +1001,65 @@ def take( | |
result = self._op_via_apply("take", indices=indices, axis=axis, **kwargs) | ||
return result | ||
|
||
@doc(Series.skew.__doc__) | ||
def skew( | ||
self, | ||
axis: Axis | lib.NoDefault = lib.no_default, | ||
skipna: bool = True, | ||
numeric_only: bool = False, | ||
**kwargs, | ||
) -> Series: | ||
""" | ||
Return unbiased skew within groups. | ||
|
||
Normalized by N-1. | ||
|
||
Parameters | ||
---------- | ||
axis : {0 or 'index', 1 or 'columns', None}, default 0 | ||
Axis for the function to be applied on. | ||
For `Series` this parameter is unused and defaults to 0. | ||
|
||
skipna : bool, default True | ||
Exclude NA/null values when computing the result. | ||
|
||
numeric_only : bool, default False | ||
Include only float, int, boolean columns. Not implemented for Series. | ||
Comment on lines
+1030
to
+1031
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm seeing numeric_only functioning here:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can reproduce this. However, when I run this
I get the following error:
When I set There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah - I didn't realize you were referring to Series here. It is true that operating on a non-numeric Series will always fail with skew, and as far as I know this is consistent across Series and SeriesGroupBy ops. Perhaps there is a better behavior or the argument should be removed, but I think that's for a separate issue. |
||
|
||
**kwargs | ||
Additional keyword arguments to be passed to the function. | ||
|
||
Returns | ||
------- | ||
scalar or scalar | ||
rhshadrach marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
See Also | ||
-------- | ||
Series.skew : Return unbiased skew over requested axis. | ||
|
||
Examples | ||
-------- | ||
>>> ser = pd.Series([390., 350., 357., np.nan, 22., 20., 30.], | ||
... index=['Falcon', 'Falcon', 'Falcon', 'Falcon', \ | ||
... 'Parrot', 'Parrot', 'Parrot'], | ||
... name="Max Speed") | ||
>>> ser | ||
Falcon 390.0 | ||
Falcon 350.0 | ||
Falcon 357.0 | ||
Falcon NaN | ||
Parrot 22.0 | ||
Parrot 20.0 | ||
Parrot 30.0 | ||
Name: Max Speed, dtype: float64 | ||
>>> ser.groupby(level=0).skew() | ||
Falcon 1.525174 | ||
Parrot 1.457863 | ||
Name: Max Speed, dtype: float64 | ||
>>> ser.groupby(level=0).skew(skipna=False) | ||
Falcon NaN | ||
Parrot 1.457863 | ||
Name: Max Speed, dtype: float64 | ||
""" | ||
result = self._op_via_apply( | ||
"skew", | ||
axis=axis, | ||
|
@@ -2470,14 +2521,76 @@ def take( | |
result = self._op_via_apply("take", indices=indices, axis=axis, **kwargs) | ||
return result | ||
|
||
@doc(DataFrame.skew.__doc__) | ||
def skew( | ||
self, | ||
axis: Axis | None | lib.NoDefault = lib.no_default, | ||
skipna: bool = True, | ||
numeric_only: bool = False, | ||
**kwargs, | ||
) -> DataFrame: | ||
""" | ||
Return unbiased skew within groups. | ||
|
||
Normalized by N-1. | ||
|
||
Parameters | ||
---------- | ||
axis : {0 or 'index', 1 or 'columns', None}, default 0 | ||
Axis for the function to be applied on. | ||
|
||
For DataFrames, specifying ``axis=None`` will apply the aggregation | ||
rhshadrach marked this conversation as resolved.
Show resolved
Hide resolved
|
||
across both axes. | ||
|
||
.. versionadded:: 2.0.0 | ||
|
||
skipna : bool, default True | ||
Exclude NA/null values when computing the result. | ||
|
||
numeric_only : bool, default False | ||
Include only float, int, boolean columns. | ||
|
||
**kwargs | ||
Additional keyword arguments to be passed to the function. | ||
|
||
Returns | ||
------- | ||
Series or scalar | ||
rhshadrach marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
See Also | ||
-------- | ||
DataFrame.skew : Return unbiased skew over requested axis. | ||
|
||
Examples | ||
-------- | ||
>>> df = pd.DataFrame([('falcon', 'bird', 389.0), | ||
... ('parrot', 'bird', 24.0), | ||
... ('cockatoo', 'bird', 70.0), | ||
... ('kiwi', 'bird', np.nan), | ||
... ('lion', 'mammal', 80.5), | ||
... ('monkey', 'mammal', 21.5), | ||
... ('rabbit', 'mammal', 15.0)], | ||
... columns=['name', 'class', 'max_speed']) | ||
>>> df | ||
name class max_speed | ||
0 falcon bird 389.0 | ||
1 parrot bird 24.0 | ||
2 cockatoo bird 70.0 | ||
3 kiwi bird NaN | ||
4 lion mammal 80.5 | ||
5 monkey mammal 21.5 | ||
6 rabbit mammal 15.0 | ||
>>> gb = df.groupby(["class"]) | ||
>>> gb.skew() | ||
max_speed | ||
class | ||
bird 1.628296 | ||
mammal 1.669046 | ||
>>> gb.skew(skipna=False) | ||
max_speed | ||
class | ||
bird NaN | ||
mammal 1.669046 | ||
""" | ||
result = self._op_via_apply( | ||
"skew", | ||
axis=axis, | ||
|
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reST, there should be two backticks,
``Series``
(markdown would only have one). But this phrasing is used when the docstring is shared between Series and DataFrame implementations. Since this is just for Series, what do you think about something like:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it makes sense to write the same for
numeric_only
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think numeric_only is slightly different - if you specify
numeric_only=True
and the Series is non-numeric, then it will raise. This could be useful to identify errors - similar to adding an assert.