Skip to content

DOC: Improved the docstring of pandas.Series.sample #20109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 15, 2018

Conversation

ottiP
Copy link
Contributor

@ottiP ottiP commented Mar 10, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

 
################################################################################
####################### Docstring (pandas.Series.sample) #######################
################################################################################

Return a random sample of items from an axis of object.

You can use `random state` for reproducibility

Parameters
----------
n : int, optional
    Number of items from axis to return. Cannot be used with `frac`.
    Default = 1 if `frac` = None.
frac : float, optional
    Fraction of axis items to return. Cannot be used with `n`.
replace : boolean, optional
    Sample with or without replacement. Default = False.
weights : str or ndarray-like, optional
    Default 'None' results in equal probability weighting.
    If passed a Series, will align with target object on index. Index
    values in weights not found in sampled object will be ignored and
    index values in sampled object not in weights will be assigned
    weights of zero.
    If called on a DataFrame, will accept the name of a column
    when axis = 0.
    Unless weights are a Series, weights must be same length as axis
    being sampled.
    If weights do not sum to 1, they will be normalized to sum to 1.
    Missing values in the weights column will be treated as zero.
    inf and -inf values not allowed.
random_state : int or numpy.random.RandomState, optional
    Seed for the random number generator (if int), or numpy RandomState
    object.
axis : int or string, optional
    Axis to sample. Accepts axis number or name. Default is stat axis
    for given data type (0 for Series and DataFrames, 1 for Panels).

Returns
-------
A new object of same type as caller.

See Also
--------
Series.sample : Returns a random sample of items
    from an axis of object.
DataFrame.sample : Returns a random sample of items
    from an axis of object.
Panel.sample : Returns a random sample of items
    from an axis of object.

Examples
--------
Generate an example ``Series`` and ``DataFrame``:

>>> s = pd.Series(np.random.randn(50))
>>> s.head()
0   -0.038497
1    1.820773
2   -0.972766
3   -1.598270
4   -1.095526
dtype: float64
>>> df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
>>> df.head()
          A         B         C         D
0  0.016443 -2.318952 -0.566372 -1.028078
1 -1.051921  0.438836  0.658280 -0.175797
2 -1.243569 -0.364626 -0.215065  0.057736
3  1.768216  0.404512 -0.385604 -1.457834
4  1.072446 -1.137172  0.314194 -0.046661

Next extract a random sample from both of these objects...

3 random elements from the ``Series``:

>>> s.sample(n=3)
27   -0.994689
55   -1.049016
67   -0.224565
dtype: float64

And a random 10% of the ``DataFrame`` with replacement:

>>> df.sample(frac=0.1, replace=True)
           A         B         C         D
35  1.981780  0.142106  1.817165 -0.290805
49 -1.336199 -0.448634 -0.789640  0.217116
40  0.823173 -0.078816  1.009536  1.015108
15  1.421154 -0.055301 -1.922594 -0.019696
6  -0.148339  0.832938  1.787600 -1.383767

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Examples do not pass tests

################################################################################
################################### Doctests ###################################
################################################################################

**********************************************************************
Line 53, in pandas.Series.sample
Failed example:
    s.head()
Expected:
    0   -0.038497
    1    1.820773
    2   -0.972766
    3   -1.598270
    4   -1.095526
    dtype: float64
Got:
    0   -0.316288
    1   -0.109803
    2    0.398450
    3   -0.307658
    4   -0.210365
    dtype: float64
**********************************************************************
Line 61, in pandas.Series.sample
Failed example:
    df.head()
Expected:
              A         B         C         D
    0  0.016443 -2.318952 -0.566372 -1.028078
    1 -1.051921  0.438836  0.658280 -0.175797
    2 -1.243569 -0.364626 -0.215065  0.057736
    3  1.768216  0.404512 -0.385604 -1.457834
    4  1.072446 -1.137172  0.314194 -0.046661
Got:
              A         B         C         D
    0  0.374238 -0.608431 -0.126340 -0.764207
    1  0.433942  0.576081 -0.704511  1.708611
    2  1.145009 -0.051829 -0.614948 -0.458692
    3  0.153273 -0.692912 -0.200969 -0.725891
    4  0.780466  0.616172  2.143758 -2.081198
**********************************************************************
Line 73, in pandas.Series.sample
Failed example:
    s.sample(n=3)
Expected:
    27   -0.994689
    55   -1.049016
    67   -0.224565
    dtype: float64
Got:
    20    1.077020
    41   -0.847340
    11   -1.567316
    dtype: float64
**********************************************************************
Line 81, in pandas.Series.sample
Failed example:
    df.sample(frac=0.1, replace=True)
Expected:
               A         B         C         D
    35  1.981780  0.142106  1.817165 -0.290805
    49 -1.336199 -0.448634 -0.789640  0.217116
    40  0.823173 -0.078816  1.009536  1.015108
    15  1.421154 -0.055301 -1.922594 -0.019696
    6  -0.148339  0.832938  1.787600 -1.383767
Got:
               A         B         C         D
    7   0.663274  0.980879 -0.290907 -0.063392
    7   0.663274  0.980879 -0.290907 -0.063392
    37  2.074749 -0.062022 -0.766187 -0.501413
    36 -0.315902  0.125332 -1.271485 -1.619816
    44  1.438970 -1.112939  0.386373  0.828501

The validation errors are correct because sample requires a randomic and unpredictable output.

Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):

@pep8speaks
Copy link

pep8speaks commented Mar 10, 2018

Hello @ottiP! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 15, 2018 at 14:11 Hours UTC

@ottiP ottiP force-pushed the doc_generic_sample branch from 4b0bd50 to 5b6d02c Compare March 10, 2018 11:58
@ottiP
Copy link
Contributor Author

ottiP commented Mar 10, 2018 via email

Returns a random sample of items from an axis of object.
Return a random sample of items from an axis of object.

You can use `random state` for reproducibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since this is specifically discussing a parameter, it can be moved to the "Parameters" section. And we can mention reproducible in the examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you address this


Examples
--------
Copy link
Contributor

@TomAugspurger TomAugspurger Mar 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just below this line would be a good place to mention "We use a random state for reproducible results".

Do you have any interest in updating the examples to use non-random data?

That would let us run the doctest on the output.

@ottiP
Copy link
Contributor Author

ottiP commented Mar 10, 2018 via email

from an axis of object.
DataFrame.sample : Returns a random sample of items
from an axis of object.
Panel.sample : Returns a random sample of items
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't add Panel here

@jreback jreback added Docs Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 10, 2018
@ottiP ottiP force-pushed the doc_generic_sample branch from 5b6d02c to 3cf410a Compare March 10, 2018 14:45
@codecov
Copy link

codecov bot commented Mar 10, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@d7bcb22). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #20109   +/-   ##
=========================================
  Coverage          ?    91.7%           
=========================================
  Files             ?      150           
  Lines             ?    49152           
  Branches          ?        0           
=========================================
  Hits              ?    45074           
  Misses            ?     4078           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.08% <100%> (?)
#single 41.84% <100%> (?)
Impacted Files Coverage Δ
pandas/core/generic.py 95.84% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d7bcb22...170eabf. Read the comment docs.

Returns a random sample of items from an axis of object.
Return a random sample of items from an axis of object.

You can use `random state` for reproducibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you address this

@ottiP ottiP force-pushed the doc_generic_sample branch from 3cf410a to 837c263 Compare March 10, 2018 15:01
@jorisvandenbossche jorisvandenbossche merged commit 72eafde into pandas-dev:master Mar 15, 2018
@jorisvandenbossche
Copy link
Member

I just remove the "See also", because since this is a shared docstring for Series and DataFrame, they were pointing to "themselves".

@ottiP Thanks for the PR!

@jorisvandenbossche jorisvandenbossche added this to the 0.23.0 milestone Mar 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants