DOC: Improved the docstring of pandas.Series.sample #20109

ottiP · 2018-03-10T11:49:35Z

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

 
################################################################################
####################### Docstring (pandas.Series.sample) #######################
################################################################################

Return a random sample of items from an axis of object.

You can use `random state` for reproducibility

Parameters
----------
n : int, optional
    Number of items from axis to return. Cannot be used with `frac`.
    Default = 1 if `frac` = None.
frac : float, optional
    Fraction of axis items to return. Cannot be used with `n`.
replace : boolean, optional
    Sample with or without replacement. Default = False.
weights : str or ndarray-like, optional
    Default 'None' results in equal probability weighting.
    If passed a Series, will align with target object on index. Index
    values in weights not found in sampled object will be ignored and
    index values in sampled object not in weights will be assigned
    weights of zero.
    If called on a DataFrame, will accept the name of a column
    when axis = 0.
    Unless weights are a Series, weights must be same length as axis
    being sampled.
    If weights do not sum to 1, they will be normalized to sum to 1.
    Missing values in the weights column will be treated as zero.
    inf and -inf values not allowed.
random_state : int or numpy.random.RandomState, optional
    Seed for the random number generator (if int), or numpy RandomState
    object.
axis : int or string, optional
    Axis to sample. Accepts axis number or name. Default is stat axis
    for given data type (0 for Series and DataFrames, 1 for Panels).

Returns
-------
A new object of same type as caller.

See Also
--------
Series.sample : Returns a random sample of items
    from an axis of object.
DataFrame.sample : Returns a random sample of items
    from an axis of object.
Panel.sample : Returns a random sample of items
    from an axis of object.

Examples
--------
Generate an example ``Series`` and ``DataFrame``:

>>> s = pd.Series(np.random.randn(50))
>>> s.head()
0   -0.038497
1    1.820773
2   -0.972766
3   -1.598270
4   -1.095526
dtype: float64
>>> df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
>>> df.head()
          A         B         C         D
0  0.016443 -2.318952 -0.566372 -1.028078
1 -1.051921  0.438836  0.658280 -0.175797
2 -1.243569 -0.364626 -0.215065  0.057736
3  1.768216  0.404512 -0.385604 -1.457834
4  1.072446 -1.137172  0.314194 -0.046661

Next extract a random sample from both of these objects...

3 random elements from the ``Series``:

>>> s.sample(n=3)
27   -0.994689
55   -1.049016
67   -0.224565
dtype: float64

And a random 10% of the ``DataFrame`` with replacement:

>>> df.sample(frac=0.1, replace=True)
           A         B         C         D
35  1.981780  0.142106  1.817165 -0.290805
49 -1.336199 -0.448634 -0.789640  0.217116
40  0.823173 -0.078816  1.009536  1.015108
15  1.421154 -0.055301 -1.922594 -0.019696
6  -0.148339  0.832938  1.787600 -1.383767

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Examples do not pass tests

################################################################################
################################### Doctests ###################################
################################################################################

**********************************************************************
Line 53, in pandas.Series.sample
Failed example:
    s.head()
Expected:
    0   -0.038497
    1    1.820773
    2   -0.972766
    3   -1.598270
    4   -1.095526
    dtype: float64
Got:
    0   -0.316288
    1   -0.109803
    2    0.398450
    3   -0.307658
    4   -0.210365
    dtype: float64
**********************************************************************
Line 61, in pandas.Series.sample
Failed example:
    df.head()
Expected:
              A         B         C         D
    0  0.016443 -2.318952 -0.566372 -1.028078
    1 -1.051921  0.438836  0.658280 -0.175797
    2 -1.243569 -0.364626 -0.215065  0.057736
    3  1.768216  0.404512 -0.385604 -1.457834
    4  1.072446 -1.137172  0.314194 -0.046661
Got:
              A         B         C         D
    0  0.374238 -0.608431 -0.126340 -0.764207
    1  0.433942  0.576081 -0.704511  1.708611
    2  1.145009 -0.051829 -0.614948 -0.458692
    3  0.153273 -0.692912 -0.200969 -0.725891
    4  0.780466  0.616172  2.143758 -2.081198
**********************************************************************
Line 73, in pandas.Series.sample
Failed example:
    s.sample(n=3)
Expected:
    27   -0.994689
    55   -1.049016
    67   -0.224565
    dtype: float64
Got:
    20    1.077020
    41   -0.847340
    11   -1.567316
    dtype: float64
**********************************************************************
Line 81, in pandas.Series.sample
Failed example:
    df.sample(frac=0.1, replace=True)
Expected:
               A         B         C         D
    35  1.981780  0.142106  1.817165 -0.290805
    49 -1.336199 -0.448634 -0.789640  0.217116
    40  0.823173 -0.078816  1.009536  1.015108
    15  1.421154 -0.055301 -1.922594 -0.019696
    6  -0.148339  0.832938  1.787600 -1.383767
Got:
               A         B         C         D
    7   0.663274  0.980879 -0.290907 -0.063392
    7   0.663274  0.980879 -0.290907 -0.063392
    37  2.074749 -0.062022 -0.766187 -0.501413
    36 -0.315902  0.125332 -1.271485 -1.619816
    44  1.438970 -1.112939  0.386373  0.828501

The validation errors are correct because sample requires a randomic and unpredictable output.

Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):

pep8speaks · 2018-03-10T11:49:38Z

Hello @ottiP! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 15, 2018 at 14:11 Hours UTC

ottiP · 2018-03-10T12:00:18Z

Thank you for your email. I have just submitted the corrected version. Ottavia

…

Il giorno 10 mar 2018, alle ore 12:49, PEP8 Speaks ***@***.***> ha scritto: Hello @ottiP <https://github.com/ottip>! Thanks for submitting the PR. In the file pandas/core/generic.py <https://github.com/pandas-dev/pandas/blob/4b0bd50260c689be7553ec57053764a4cadc3cea/pandas/core/generic.py>, following are the PEP8 issues : Line 3722:1 <https://github.com/pandas-dev/pandas/blob/4b0bd50260c689be7553ec57053764a4cadc3cea/pandas/core/generic.py#L3722>: W293 <https://duckduckgo.com/?q=pep8%20W293> blank line contains whitespace Line 3757:1 <https://github.com/pandas-dev/pandas/blob/4b0bd50260c689be7553ec57053764a4cadc3cea/pandas/core/generic.py#L3757>: W293 <https://duckduckgo.com/?q=pep8%20W293> blank line contains whitespace — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20109 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ARil01y5UDhn1ReSpCAFqjxNmVb-tsiNks5tc73ggaJpZM4SlP4d>.

-- ------------------------ Indirizzo istituzionale di posta elettronica degli studenti e dei laureati dell'Università degli Studi di Torino Official University of Turin email address for students and graduates

TomAugspurger · 2018-03-10T12:12:56Z

pandas/core/generic.py

-        Returns a random sample of items from an axis of object.
+        Return a random sample of items from an axis of object.
+
+        You can use `random state` for reproducibility


I think since this is specifically discussing a parameter, it can be moved to the "Parameters" section. And we can mention reproducible in the examples.

can you address this

TomAugspurger · 2018-03-10T12:14:14Z

pandas/core/generic.py


+        Examples
+        --------


Just below this line would be a good place to mention "We use a random state for reproducible results".

Do you have any interest in updating the examples to use non-random data?

That would let us run the doctest on the output.

ottiP · 2018-03-10T13:30:53Z

I will add an example with random state such as it will be reproducible. It makes sense to update examples with non random data considering the function sample is for randomness? Thank you Ottavia

…

Il giorno 10 mar 2018, alle ore 13:14, Tom Augspurger ***@***.***> ha scritto: @TomAugspurger commented on this pull request. In pandas/core/generic.py <#20109 (comment)>: > + Examples + -------- Just below this line would be a good place to mention "We use a random state for reproducible. Do you have any interest in updating the examples to use non-random data? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20109 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ARil0yAsIm39fHvGK4rZSOXPS-FGURHuks5tc8OrgaJpZM4SlP4d>.

-- ------------------------ Indirizzo istituzionale di posta elettronica degli studenti e dei laureati dell'Università degli Studi di Torino Official University of Turin email address for students and graduates

jreback · 2018-03-10T14:24:50Z

pandas/core/generic.py

+            from an axis of object.
+        DataFrame.sample : Returns a random sample of items
+            from an axis of object.
+        Panel.sample : Returns a random sample of items


don't add Panel here

codecov · 2018-03-10T14:45:52Z

Codecov Report

❗ No coverage uploaded for pull request base (master@d7bcb22). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master   #20109   +/-   ##
=========================================
  Coverage          ?    91.7%           
=========================================
  Files             ?      150           
  Lines             ?    49152           
  Branches          ?        0           
=========================================
  Hits              ?    45074           
  Misses            ?     4078           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.08% <100%> (?)`
#single	`41.84% <100%> (?)`

Impacted Files	Coverage Δ
pandas/core/generic.py	`95.84% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d7bcb22...170eabf. Read the comment docs.

jreback · 2018-03-10T14:47:13Z

pandas/core/generic.py

-        Returns a random sample of items from an axis of object.
+        Return a random sample of items from an axis of object.
+
+        You can use `random state` for reproducibility


can you address this

jorisvandenbossche · 2018-03-15T14:12:08Z

I just remove the "See also", because since this is a shared docstring for Series and DataFrame, they were pointing to "themselves".

@ottiP Thanks for the PR!

ottiP force-pushed the doc_generic_sample branch from 4b0bd50 to 5b6d02c Compare March 10, 2018 11:58

TomAugspurger reviewed Mar 10, 2018

View reviewed changes

jreback requested changes Mar 10, 2018

View reviewed changes

jreback added Docs Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 10, 2018

ottiP force-pushed the doc_generic_sample branch from 5b6d02c to 3cf410a Compare March 10, 2018 14:45

jreback requested changes Mar 10, 2018

View reviewed changes

DOC: Improved the docstring of pandas.Series.sample

837c263

ottiP force-pushed the doc_generic_sample branch from 3cf410a to 837c263 Compare March 10, 2018 15:01

remove see also (they are pointing to themselves)

170eabf

jorisvandenbossche approved these changes Mar 15, 2018

View reviewed changes

jorisvandenbossche merged commit 72eafde into pandas-dev:master Mar 15, 2018

jorisvandenbossche added this to the 0.23.0 milestone Mar 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC: Improved the docstring of pandas.Series.sample #20109

DOC: Improved the docstring of pandas.Series.sample #20109

Uh oh!

ottiP commented Mar 10, 2018

Uh oh!

pep8speaks commented Mar 10, 2018 •

edited

Loading

Uh oh!

ottiP commented Mar 10, 2018 via email

Uh oh!

TomAugspurger Mar 10, 2018

Uh oh!

jreback Mar 10, 2018

Uh oh!

TomAugspurger Mar 10, 2018 •

edited

Loading

Uh oh!

ottiP commented Mar 10, 2018 via email

Uh oh!

jreback Mar 10, 2018

Uh oh!

codecov bot commented Mar 10, 2018 •

edited

Loading

Uh oh!

jreback Mar 10, 2018

Uh oh!

jorisvandenbossche commented Mar 15, 2018

Uh oh!

Uh oh!

Uh oh!

DOC: Improved the docstring of pandas.Series.sample #20109

DOC: Improved the docstring of pandas.Series.sample #20109

Uh oh!

Conversation

ottiP commented Mar 10, 2018

Uh oh!

pep8speaks commented Mar 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on March 15, 2018 at 14:11 Hours UTC

Uh oh!

ottiP commented Mar 10, 2018 via email

Uh oh!

TomAugspurger Mar 10, 2018

Choose a reason for hiding this comment

Uh oh!

jreback Mar 10, 2018

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Mar 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ottiP commented Mar 10, 2018 via email

Uh oh!

jreback Mar 10, 2018

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jreback Mar 10, 2018

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Mar 15, 2018

Uh oh!

Uh oh!

pep8speaks commented Mar 10, 2018 •

edited

Loading

TomAugspurger Mar 10, 2018 •

edited

Loading

codecov bot commented Mar 10, 2018 •

edited

Loading